# Capstone Project - Battle of the Neighborhoods

# Introduction

The purpose of this project will be to determine which inner city suburbs of the city of Melbourne, Australia as the highest rated coffee shops and cafes. The information may be useful for those who are considering moving to the city from interstate or abroad.

# Data

To solve this problem, the suburbs were chosen to represent 15 of the inner city post codes. Some post codes has more than one suburb, however for the purposes of this project, only 1 suburb per post code is used. The data for the suburbs was obtained from the following link: 
https://en.wikipedia.org/wiki/List_of_Melbourne_suburbs#City_of_Melbourne 

There was no need to scrape this data as the data set was small enough to be copied and pasted into the notebook.

Information for coffee shops and cafes (venues) was obtained using the Foursquare API

In [1]:
# Importing necessary libraries
import numpy as np
import pandas as pd
import json
from pandas.io.json import json_normalize
from geopy.geocoders import Nominatim
!pip -q install folium
import folium
from folium.features import DivIcon
import requests

Let's begin by visualising a map of Melbourne, Australia

In [2]:
# Getting coordinates of Melbourne to create the map
address = 'Melbourne, VIC'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Melbourne are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Melbourne are -37.8142176, 144.9631608.


In [3]:
map_mel = folium.Map(location=[latitude, longitude], zoom_start=12, width='70%', height = '70%')
map_mel

Let's create a dataframe with inner melbourne suburbs and their coordinates

In [4]:
column_list = ['Suburb', 'Latitude', 'Longitude']
mel_data = pd.DataFrame(columns = column_list)
mel_data

Unnamed: 0,Suburb,Latitude,Longitude


In [5]:
wiki_data = ['Carlton',
        'Carlton North',
        'Docklands',
        'East Melbourne',
        'Flemington', 
        'Kensington',
        'Melbourne CBD',
        'St Kilda', 
        'North Melbourne', 
        'Parkville',
        'Port Melbourne', 
        'Southbank', 
        'South Wharf',
        'South Yarra', 
        'West Melbourne'
            ]

Now to fill the dataframe with the coordinates for each suburb

In [6]:
for suburb in wiki_data:
        
        geolocator = Nominatim(user_agent="ny_explorer")
        location = geolocator.geocode(suburb + ', VIC')
        latitude = location.latitude
        longitude = location.longitude
        mel_data = mel_data.append({'Suburb': suburb,
                                          
                                          'Latitude': latitude,
                                          'Longitude': longitude}, ignore_index=True)
mel_data

Unnamed: 0,Suburb,Latitude,Longitude
0,Carlton,-37.800423,144.968434
1,Carlton North,-37.784559,144.972855
2,Docklands,-37.817542,144.939492
3,East Melbourne,-37.812498,144.985885
4,Flemington,-37.786759,144.919367
5,Kensington,-37.793938,144.930565
6,Melbourne CBD,-37.814182,144.959801
7,St Kilda,-37.863826,144.981637
8,North Melbourne,-37.807609,144.942351
9,Parkville,-37.787115,144.951553


Now I'd like to see where these inner city suburbs are on the map.

In [7]:
# add markers for the suburbs to the map
for lat, lng, suburb in zip(mel_data['Latitude'], mel_data['Longitude'], mel_data['Suburb']):
    label = '{}'.format(suburb)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=25,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mel)  
    
map_mel

# Methodology


In this section, we will be getting the specific details of the venues in each suburb.
1. Getting a list of all venues in each suburb
2. Getting the ratings of each suburb
3. Getting the average rating of each suburb
4. Displaying the results on the map

Now let's start getting details of the coffee shops and cafes in these suburbs using the Foursquare API

In [8]:
# The code was removed by Watson Studio for sharing.

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


Firstly, I'll get the details for the suburb first on the list in our mel_data dataframe - Carlton

In [9]:
categoryId = '4bf58dd8d48988d16d941735,4bf58dd8d48988d1e0931735' # cafe, coffee shop
radius = 800 # define radius

url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    mel_data.Latitude.iloc[0], 
    mel_data.Longitude.iloc[0], 
    categoryId,
    radius 
)
url

'https://api.foursquare.com/v2/venues/search?&client_id=I0AHHU4T0TTEHBW5F53RRYSDU4J1IMCKTLJFZCRB3BGGFWJ3&client_secret=4I1YCGJSYOWMHPTVXMRTQUUDAE4WZL3B5XATGB43YIDMX3VA&v=20180605&ll=-37.8004228,144.9684343&categoryId=4bf58dd8d48988d16d941735,4bf58dd8d48988d1e0931735&radius=800'

In [10]:
results = requests.get(url).json()
venues = results['response']['venues']
carlton = json_normalize(venues)
print('There are {} venues returned in Carlton within an 800m radius'.format(carlton.shape[0]))
carlton.head()

There are 30 venues returned in Carlton within an 800m radius


Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4bf58dd8d48988d16d941735', 'name': 'C...",False,5b146651c58ed7002cba5c78,306 Lygon St,AU,Carlton,Australia,,137,"[306 Lygon St, Carlton VIC 3053, Australia]","[{'label': 'display', 'lat': -37.799755, 'lng'...",-37.799755,144.967123,,3053,VIC,St. Charly,v-1591605363,
1,"[{'id': '4bf58dd8d48988d1d0941735', 'name': 'D...",False,5d46cb143350fe000875f6c9,346 Lygon St,AU,Carlton,Australia,,179,"[346 Lygon St, Carlton VIC 3053, Australia]","[{'label': 'display', 'lat': -37.799052, 'lng'...",-37.799052,144.967354,,3053,VIC,Lukumades,v-1591605363,
2,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",False,50f6173be4b0cdce511debaa,176 Faraday St,AU,Carlton,Australia,,198,"[176 Faraday St, Carlton VIC 3053, Australia]","[{'label': 'display', 'lat': -37.798637, 'lng'...",-37.798637,144.96846,Carlton,3053,VIC,Market Lane Coffee,v-1591605363,
3,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",False,4b0b8439f964a520053223e3,295 Drummond St,AU,Carlton,Australia,at Faraday St,163,"[295 Drummond St (at Faraday St), Carlton VIC ...","[{'label': 'display', 'lat': -37.798954, 'lng'...",-37.798954,144.96849,,3053,VIC,D.O.C. Pizza & Mozzarella Bar,v-1591605363,
4,"[{'id': '4bf58dd8d48988d16d941735', 'name': 'C...",False,4b2cc0bff964a52006c924e3,380 Lygon St,AU,Melbourne,Australia,Goldhar Pl.,263,"[380 Lygon St (Goldhar Pl.), Melbourne VIC 305...","[{'label': 'display', 'lat': -37.7981053541505...",-37.798105,144.967827,Lygon Street Italian Precinct,3053,VIC,Brunetti,v-1591605363,


So we have 30 venues in Carlton under the 'cafe' or 'coffee shop' categories. Let's see these on our map 

In [11]:
# add markers to map
for lat, lng, name in zip(carlton['location.lat'], carlton['location.lng'], carlton['name']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=1,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mel)
    
    
map_mel

Now let's fill in the map with the rest of the venues for each suburb

In [12]:
# Create a list for all foursquare urls
url_list = []

for lat, lng in zip(mel_data['Latitude'], mel_data['Longitude']):
    categoryId = '4bf58dd8d48988d16d941735,4bf58dd8d48988d1e0931735' # cafe, coffee shop
    Latitude = '{}'.format(lat)
    Longitude = '{}'.format(lng)
    radius = 800 # define radius

    url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        Latitude, 
        Longitude, 
        categoryId,
        radius 
        )
    url_list.append(url)

In [13]:
for url in url_list:
    results = requests.get(url).json()
    venues = results['response']['venues']
    venues = json_normalize(venues)
    for lat, lng, name in zip(venues['location.lat'], venues['location.lng'], venues['name']):
        label = '{}'.format(name)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=1,
            popup=label,
            color= 'red',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(map_mel)
map_mel

This map doesn't give us the greatest of visualisations, but we'll continue for now.

I want to get some counts for all venues in each suburb

In [14]:
venue_count = []
for url in url_list:
        results = requests.get(url).json()
        venues = results['response']['venues']
        venues = json_normalize(venues)
        venue_count.append(venues.shape[0])
        
venue_count = np.array(venue_count, dtype=np.int32)
print('Here are the individual counts: ',venue_count)
print('Here is the total amount of venues we\'re working with:' ,venue_count.sum())

Here are the individual counts:  [30 30 30 30 16 29 30 28 30 16 11 30 30 30  1]
Here is the total amount of venues we're working with: 371


It looks like there might be a maximum limit of venues that can be retrieved (30). Also, some suburbs have a very low count. This appears to be due to some of the coordinates for the suburbs being poorly situated. (Parkville, for example, is right over the top of a zoo). Again, we'll continue.

## I'd like to try and the the average ratings for all venues in each suburb.

I'll need to generate new urls to access rating information as it is not listed in the current dataset below

In [15]:
carlton.head(2)

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4bf58dd8d48988d16d941735', 'name': 'C...",False,5b146651c58ed7002cba5c78,306 Lygon St,AU,Carlton,Australia,,137,"[306 Lygon St, Carlton VIC 3053, Australia]","[{'label': 'display', 'lat': -37.799755, 'lng'...",-37.799755,144.967123,,3053,VIC,St. Charly,v-1591605363,
1,"[{'id': '4bf58dd8d48988d1d0941735', 'name': 'D...",False,5d46cb143350fe000875f6c9,346 Lygon St,AU,Carlton,Australia,,179,"[346 Lygon St, Carlton VIC 3053, Australia]","[{'label': 'display', 'lat': -37.799052, 'lng'...",-37.799052,144.967354,,3053,VIC,Lukumades,v-1591605363,


In [16]:
# Generate url to access rating details of specific venues
venue_id = '50f6173be4b0cdce511debaa' # ID of Market Lane Coffee
url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
url

'https://api.foursquare.com/v2/venues/50f6173be4b0cdce511debaa?client_id=I0AHHU4T0TTEHBW5F53RRYSDU4J1IMCKTLJFZCRB3BGGFWJ3&client_secret=4I1YCGJSYOWMHPTVXMRTQUUDAE4WZL3B5XATGB43YIDMX3VA&v=20180605'

In [17]:
# Accessing rating
venue_details = requests.get(url).json()
venue_details['response']['venue']['rating']

KeyError: 'venue'

Now we know how to get a rating, will try and get the average rating for the entire suburb of Carlton

In [None]:
venue_rating = []
for i in carlton.id: #iterating through every venue id in the dataframe to generate the url
    try:
        venue_id = i 
        url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
        venue_details = requests.get(url).json()
        venue_details = venue_details['response']['venue']['rating']
    
        venue_rating.append(venue_details)
    except: # Any venues without a rating return an error, and so are skipped instead
        pass
venue_rating_average = sum(venue_rating) / len(venue_rating) # Gets the average of all ratings retrieved 
print('The average rating for coffee shops and cafes in Carlton is: ',venue_rating_average)

Now I want to get the averages for all suburbs. Unfortunately I'm still no python expert, and can't figure out how to properly automate this process. So I'll be re-running the code below 15 times, each time changing the mel_data.Latitude/Longitude parameter to get the average ratings for each suburb (rounded to 2 decimals) and manually adding them to a variable, which you'll see further below.

In [None]:
# categoryId = '4bf58dd8d48988d16d941735,4bf58dd8d48988d1e0931735' # cafe, coffee shop
# radius = 800 # define radius

# url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}'.format(
#     CLIENT_ID, 
#     CLIENT_SECRET, 
#     VERSION, 
#     mel_data.Latitude.iloc[14], 
#     mel_data.Longitude.iloc[14], 
#     categoryId,
#     radius 
#         )
# results = requests.get(url).json()
# venues = results['response']['venues']
# suburb = json_normalize(venues)
# venue_rating = []
# for i in suburb.id:
#     try:
#         venue_id = i 
#         url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
#         venue_details = requests.get(url).json()
#         venue_details = venue_details['response']['venue']['rating']
    
#         venue_rating.append(venue_details)
#     except: # Any venues without a rating return an error, and so are skipped
#         pass
# venue_rating_average = sum(venue_rating) / len(venue_rating)
# suburb_average = venue_rating_average
# print(str(round(suburb_average, 2)))

There were no venues in the West Melbourne that had any ratings (understandably so, because the coordinates for that suburb are located directly over a port.)

In [None]:
Carlton_average = 7.87
Carlton_North_average = 7.57
Docklands_average = 7.38
East_Melbourne_average = 7.77
Flemington_average = 7.4
Kensington_average = 7.33
Melbourne_cbd_average = 7.66
St_Kilda_average = 7.27
North_Melbourne_average = 7.66
Parkville_average = 7.2
Port_Melbourne_average = 6.9
Southbank_average = 7.19
South_Wharf_average = 7.28
South_Yarra_average = 7.66
West_Melbourne_average = 1 # Placeholder as there were no venues in this suburb with a rating

In [None]:
all_averages = [Carlton_average,
Carlton_North_average,
Docklands_average,
East_Melbourne_average,
Flemington_average,
Kensington_average,
Melbourne_cbd_average,
St_Kilda_average,
North_Melbourne_average,
Parkville_average,
Port_Melbourne_average,
Southbank_average,
South_Wharf_average,
South_Yarra_average,
West_Melbourne_average]

In [None]:
all_averages
all_averages_df = pd.DataFrame(all_averages)
all_averages_df

Now that I finally have all the averages, I'm going to append this to our mel_data dataframe

In [None]:
mel_data['Suburb Averages'] = all_averages
mel_data

Now let's see these averages on the map as text

In [None]:
for lat, lng, average in zip(mel_data['Latitude'], mel_data['Longitude'], mel_data['Suburb Averages']):
    folium.map.Marker(
    [lat, lng],
    icon=DivIcon(
        icon_size=(150,36),
        icon_anchor=(0,0),
        html='<div style="font-size: 10pt">{}</div>'.format(average),
        )
    ).add_to(map_mel)
map_mel

Now let's see the dataframe in descending order of averages

In [None]:
mel_data.sort_values(['Suburb Averages'], ascending = False, inplace = True)
mel_data.to_csv('Results')

Here we can see the suburb of Carlton has the highest average rated coffee shops and cafes. West Melbourne was given a '1' as it did not contain any venues with a rating. 

But what other insights can we extract from this data? What machine learning models can we apply to get a better understanding? What will it achieve?

# Results and Discussion

Now we know that the suburb with the highest average rated coffee shops and cafes in the inner city suburbs of Melbourne is the suburb of Carlton, followed by East Melbourne, and Melbourne CBD, and would recommend choosing from one of these three suburbs to live as a newcomer to the city of Melbourne.

 # Conclusion

There is a lot than could be done to improve these results. First, geospatial data could have been used to give a more accurate representation of the suburbs. In future versions of this notebook I would consider using a choropleth map to better visualise the results. I would also improve the location coordinates of the suburbs so that it would not be placed in more urban areas, and not over a zoo or port where there would be little venues to analyse.