The purpose of this Notebook to add the latitude and the longitude coordinates of each neighborhood to the dataframe created in the Segmenting and Clustering Neighborhoods in Toronto_part1.ipynb notebook so the Foursquare location data can be utilized

#### Importing required libraries


In [2]:
import pandas as pd
import requests   
import lxml
from bs4 import BeautifulSoup       


#### Scraping the Wikipedia page and conerting its data into a pandas dataframe with three columns: Postalcode, Borough, Neighborhood

In [3]:
raw_data = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text
soup = BeautifulSoup(raw_data,'lxml')
table=soup.find('table', attrs={'class':'wikitable sortable'})
df = df = pd.DataFrame(columns = ['Postalcode','Borough','Neighborhood'])
rows = table.find_all('tr')
for row in rows:
    data=[]
    for td in row.find_all('td'):
        data.append(td.text.strip())
    if len(data)==3:
        df.loc[len(df)] = data


#### Only the cells that have an assigned borough are processed and therefore the cells with a borough that is Not assigned are ignored

In [4]:
df = df[df.Borough != "Not assigned"]

#### If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough

In [5]:
df['Neighborhood'] = df.Neighborhood.apply(lambda x: x if not pd.isnull(x) else df.Borough)

#### Reading the csv file that includes the geographical coordinates of each postal code into a new data frame

In [6]:
coordinates_df=pd.read_csv('http://cocl.us/Geospatial_data')

We rename the "Postal Code" column to "Postalcode" so we can merge it with the df dataframe

In [8]:
coordinates_df.rename(columns={'Postal Code':'Postalcode'},inplace=True)
new_df = pd.merge(df, coordinates_df, on='Postalcode')

In [9]:
new_df.head()

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


### Exploring and clustering the neighborhoods in Toronto. Working with North Yourk Borough

First the necessary libraries are imported

In [12]:
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library


Solving environment: done


  current version: 4.5.11
  latest version: 4.8.3

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/nbuser/anaconda3_501

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    conda-4.8.2                |           py36_0         3.0 MB  conda-forge
    conda-package-handling-1.6.0|   py36h8c4c3a4_2         947 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.0 MB

The following NEW packages will be INSTALLED:

    conda-packag

wurlitzer-2.0.0      | 12 KB     | ##################################### | 100% 
joblib-0.15.1        | 202 KB    | ##################################### | 100% 
certifi-2020.4.5.1   | 151 KB    | ##################################### | 100% 
backports.os-0.1.1   | 15 KB     | ##################################### | 100% 
sphinxcontrib-htmlhe | 27 KB     | ##################################### | 100% 
ca-certificates-2020 | 146 KB    | ##################################### | 100% 
sphinxcontrib-serial | 24 KB     | ##################################### | 100% 
sphinxcontrib-appleh | 28 KB     | ##################################### | 100% 
anaconda-custom      | 3 KB      | ##################################### | 100% 
fsspec-0.7.3         | 51 KB     | ##################################### | 100% 
pyrsistent-0.16.0    | 89 KB     | ##################################### | 100% 
liblief-0.9.0        | 4.2 MB    | ##################################### | 100% 
openssl-1.0.2u       | 3.2 M

Now the coordinates of the North York Borough is printed. The user_agent is named NorthYork and an instance of the geocoder is defined

In [14]:
address = 'North York, Toronto, ON'

geolocator = Nominatim(user_agent="NorthYork")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of {} are {}, {}.'.format(address, latitude, longitude))

The geograpical coordinate of North York, Toronto, ON are 43.7543263, -79.44911696639593.


### Define Foursquare Credentials and Version

In [15]:
CLIENT_ID = '2BT23JO4VR1V04OTBSNAW51L05YP54X22GWNDJXUYFQN1HHO' # your Foursquare ID
CLIENT_SECRET = 'P5WEQHLO3NEWMJ1B3FVEI0AFN4Q5TW4KP5SKPST12DAVIKEV' # your Foursquare Secret
VERSION = '20200515'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 2BT23JO4VR1V04OTBSNAW51L05YP54X22GWNDJXUYFQN1HHO
CLIENT_SECRET:P5WEQHLO3NEWMJ1B3FVEI0AFN4Q5TW4KP5SKPST12DAVIKEV


#### Searching for Italian venueswithin 4 Km from North York

In [18]:
search_query = 'Italian'
radius = 4000
url_italian = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url_italian

'https://api.foursquare.com/v2/venues/search?client_id=2BT23JO4VR1V04OTBSNAW51L05YP54X22GWNDJXUYFQN1HHO&client_secret=P5WEQHLO3NEWMJ1B3FVEI0AFN4Q5TW4KP5SKPST12DAVIKEV&ll=43.7543263,-79.44911696639593&v=20200515&query=Italian&radius=4000&limit=30'

#### Send the GET Request and examine the results

In [23]:
results = requests.get(url_italian).json()
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,postalCode,state,id
0,Jamie's Italian,Italian Restaurant,3401 Dufferin St,CA,Toronto,Canada,Allen Road and 401,3093,"[3401 Dufferin St (Allen Road and 401), Toront...","[{'label': 'display', 'lat': 43.72668644119483...",43.726686,-79.453133,M6A 2T9,ON,566344c9498eedf4e11af0fa
1,Saggio Italian Eatery & Espresdo Bar,Italian Restaurant,,CA,Toronto,Canada,,2265,"[Toronto ON, Canada]","[{'label': 'display', 'lat': 43.75834033292955...",43.75834,-79.476741,,ON,4de3e16efa7651589f21395e
2,Italian canadian savings and credit union,Building,Dufferin,CA,Toronto,Canada,,5131,"[Dufferin, Toronto ON, Canada]","[{'label': 'display', 'lat': 43.70835683159475...",43.708357,-79.453877,,ON,52430d7611d2b3a076a88132
3,Dora's Italian,Italian Restaurant,,CA,Toronto,Canada,,2595,"[Toronto ON, Canada]","[{'label': 'display', 'lat': 43.76812, 'lng': ...",43.76812,-79.475147,,ON,4cfe726a084f54811f969009
4,Cumpari's Italian Eatery,Italian Restaurant,3610 dufferin street,CA,Toronto,Canada,Dufferin/Wilson,2620,"[3610 dufferin street (Dufferin/Wilson), Toron...","[{'label': 'display', 'lat': 43.73211827821453...",43.732118,-79.459921,M3K 1N7,ON,52b5da0e498e96708cf1c974
5,La Paloma Italian Gelateria & Cafe,Ice Cream Shop,,CA,,Canada,,3144,[Canada],"[{'label': 'display', 'lat': 43.72617409960949...",43.726174,-79.452317,,,4fca400ae4b0ba2d58c1a97f
6,San Genaro Italian Eatery,Italian Restaurant,3500 Dufferin St,CA,Toronto,Canada,,3821,"[3500 Dufferin St, Toronto ON, Canada]","[{'label': 'display', 'lat': 43.72034199204557...",43.720342,-79.455828,,ON,51830da6498eafb14b40f22c
7,Italiana Food Tech,Office,,CA,Toronto,Canada,,3323,"[Toronto ON M2M 2H8, Canada]","[{'label': 'display', 'lat': 43.778259, 'lng':...",43.778259,-79.473842,M2M 2H8,ON,5afc6bea829b0c002c8ee4e8
8,Sandra's Italian Kitchen,,"2899 Steeles Ave W., Unit 12",CA,Toronto,Canada,Petrolia Rd.,4583,"[2899 Steeles Ave W., Unit 12 (Petrolia Rd.), ...","[{'label': 'display', 'lat': 43.78199532390047...",43.781995,-79.491345,M3J 3B2,ON,4d8b6fed7139b1f7b6bbdfd4
9,Paisano's,Italian Restaurant,116 Willowdale Ave,CA,Toronto,Canada,Don Mills,4116,"[116 Willowdale Ave (Don Mills), Toronto ON M3...","[{'label': 'display', 'lat': 43.76451778181641...",43.764518,-79.399898,M3B 1Y6,ON,4b69ad98f964a520f3ac2be3


Now we can show the venues on the map

In [24]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map