<h1> Capstone Project - Find the Neighbourhood <h1/>

<h3> <i> Introduction/Business Problem </i> </h3>

<p> This project is for a prospective client who will need to find the best location/neighbourhood to open a new Yoga studio that specializes on Power Yoga and Yin-Yang Yoga.</p>

<p>There are several key factors to research, before selecting a particular location/neighbourhood. Such as, proximity of yoga studios and gyms around, types of restaurants nearby and strength/age/gender of the population. </p>

<p> Conducting a research is very important - as it will provide the optimum neightbourhood(s) to successfully run a new yoga studio. </p> <br>

<h3> <i> Data section </i> </h3>

<p> This project focuses on finding the best location/neighbourhood in the city of North York </p>
<p> As offered, the above research will be attained by leveraging Foursquare location data and perhaps other datasets might be approached as well. </p>
<p> Mainly, Foursquare location data will be used to search for surrounding yoga studios, gyms, restaurants and other venues. These are all key factors in deciding the optimum neighbourhood(s). </p>

<h6> Examples </h6>

<p>1. For each neighbourhood, a count of nearby yoga studios, gyms, etc will be calculated to compare. For example:
search_query = 'Yoga' in the url. </p>
<p>2. Next the filtered neighbourhoods will be mapped using Folium to get a visual representation.</p>
<p>3. if possible to get the data, the population will be compared to determine the optimum neighbourhood.</p>
<p> </p>


<br><h4> <i>Data Analysis </i> </h4> <br>


<br><h5><i> ------------------------------------------------------Installing neccessary modules  ------------------------------------------------------- </i></h5>

In [2]:
pip install beautifulsoup4

In [4]:
pip install lxml

In [8]:
!conda install -c conda-forge geopy --yes 

In [7]:
import numpy as np
import pandas as pd
import requests
import bs4
import lxml
import random 
from math import pi
from math import sqrt
from geopy.geocoders import Nominatim 
from IPython.display import Image 
from IPython.core.display import HTML 
from pandas.io.json import json_normalize
import folium 
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt 
from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs 
%matplotlib inline
from geopy.distance import geodesic
from sklearn.preprocessing import StandardScaler


<br><h5><i> -------------------------------------------------------------Geospatial Data  -------------------------------------------------------------- </i></h5>

In [10]:
!wget -O 'Geospatial_data.csv' http://cocl.us/Geospatial_data # Latitude and Longitude for each neighbourhood

In [11]:
geo_df = pd.read_csv('Geospatial_data.csv')
geo_df.sort_values(by='Postal Code', ascending=True, inplace=True)
geo_df = geo_df.rename(columns={"Postal Code": "Postal_Code"})
#geo_df.head()


<br><h5><i> -----------------------------------------------------------Neighbourhood Data  ----------------------------------------------------------- </i></h5>

In [12]:
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(website_url,'lxml')
My_table = soup.find('table',{'class':'wikitable sortable'})


#Converting HTML code into a list object
output_rows = []
for table_row in My_table.findAll('tr'):
    columns = table_row.findAll('td')
    output_row = []
    for column in columns:
        output_row.append(column.text)
    output_rows.append(output_row)

    
#Converting list object to DataFrame object
df = pd.DataFrame(output_rows)
df = df.drop(df.index[0]) #---> because first row was empty
df.columns = ['PostalCode', 'Borough', 'Neighbourhood'] # Adding column names to the table


#Below loop removes last 2 characters ("/n") from the Neighbourhood column which came during earlier conversion. 
neighbourhood = df['Neighbourhood'].values
new_neighbourhood = [] #new column with last two characters removed
for n in neighbourhood:
    new_neigh = n[:-1]
    new_neighbourhood.append(new_neigh)
    
#replacing neighbourhood column with new_neighbourhood column    
df.drop(columns = 'Neighbourhood', inplace = True)
df['Neighbourhood'] = new_neighbourhood # replacing old column with the new column
df = df[df.Borough != 'Not assigned'] # removing boroughs that are "Not Assigned"
df = df.groupby(['PostalCode','Borough'])['Neighbourhood'].apply(', '.join).reset_index()

#Checking if the order of postal codes are same in both df anf Geo_df tables
check = []
p_codes_df = df['PostalCode'].values # from main Neighbourhood table
p_codes_geo_df =geo_df['Postal_Code'].values # from geo table
rows = np.arange(103)
for r in rows:
    if p_codes_df[r]==p_codes_geo_df[r]: # checking if the postal codes in same row of both tables match
        check.append('True')
    else:
        check.append('False')

False in check #if False, then all are in same order!

False

In [13]:
df['Latitude'] = geo_df['Latitude'] # added the latitude column
df['Longitude'] = geo_df['Longitude'] # added the longitude column
#df.head()

north_york_df = df[df.Borough == 'North York'] # creating a dataframe with only North York neighbourhoods
north_york_df.reset_index(drop=True, inplace=True)
#north_york_df


<br><h5><i> ------------------------------------------------------------Foursquare Data  ------------------------------------------------------------ </i></h5>

In [14]:
CLIENT_ID = 'UFGXTXBXX4HICDVMBCNFCB241NQPLMSPYGLNENOAXQE0JEKE'
CLIENT_SECRET = 'PMV4NWYWBMZNZGG4SXRAZGLZKJEL1O0UBS110CP131DNA1NW'
VERSION = '20180604'

In [15]:
address = 'North York, Ontario, Canada'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
ny_latitude = location.latitude
ny_longitude = location.longitude

<h6>Calculate the distance of each neighbourhood from the center of North York...............</h6>

In [16]:
NorthYorkArea = 176.98*1000*1000
Radius = round(sqrt((NorthYorkArea/pi)))

Distance = []
NorthYork = (ny_latitude, ny_longitude)
Rows = north_york_df.shape[0]

for R in np.arange(Rows):
    neigh = (north_york_df.iloc[R,3], north_york_df.iloc[R,4])
    d = geodesic(NorthYork, neigh).m
    #print(d)
    Distance.append(d)

north_york_df['Distance'] = Distance
north_york_df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Distance
0,M2H,North York,Hillcrest Village,43.803762,-79.363452,8816.425669
1,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,8684.649208
2,M2K,North York,Bayview Village,43.786947,-79.385975,6243.564234
3,M2L,North York,"Silver Hills, York Mills",43.75749,-79.374714,6002.284234
4,M2M,North York,"Newtonbrook, Willowdale",43.789053,-79.408493,5058.190506



<br><h5><i> ------------------------------------------------------Search for Yoga places around  ------------------------------------------------------ </i></h5>

In [17]:
Farthest_neighbourhood = max(north_york_df['Distance'].values)
#Adding 10km to the Radius because, average people travel for 15mins (~10km)
radius = (Farthest_neighbourhood +10000) 
LIMIT = 100
search_query = 'Yoga'
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, ny_latitude, ny_longitude, VERSION, search_query, radius, LIMIT)
results = requests.get(url).json()
venues = results['response']['venues'] # assign relevant part of JSON to venues
dataframe = json_normalize(venues) # tranform venues into a dataframe

# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]
yoga_df = dataframe_filtered

# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

# generate map centred around the Conrad Hotel
venues_map = folium.Map(location=[ny_latitude, ny_longitude], zoom_start=11) 

# add a red circle marker to represent North York
folium.features.CircleMarker(
    [ny_latitude, ny_longitude],
    radius=10,
    color='red',
    popup='Conrad Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6,
    zoom = 20
).add_to(venues_map)

# add the Yoga studios as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

#Adding the neighbourhoods of North York
for lat, lng, label in zip(north_york_df.Latitude, north_york_df.Longitude, north_york_df.Neighbourhood):
    folium.features.CircleMarker(
        [lat, lng],
        radius=10,
        color='orange',
        popup=label,
        fill = True,
        fill_color='orange',
        fill_opacity=0.6
    ).add_to(venues_map)
    
# Most populated are M2N and M2J ----> from https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

venues_map


<br><h5><i> --------------------------------------------------------Search for Gyms around  -------------------------------------------------------- </i></h5>

In [18]:
search_query = 'Gym'
radius = (Farthest_neighbourhood +10000)
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, ny_latitude, ny_longitude, VERSION, search_query, radius, LIMIT)
results = requests.get(url).json()

# assign relevant part of JSON to venues
gym = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(gym)
#dataframe.head()

# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
df_gym = dataframe.loc[:, filtered_columns]
Gyms = df_gym

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
df_gym['categories'] = df_gym.apply(get_category_type, axis=1)

# clean column names by keeping only last term
df_gym.columns = [column.split('.')[-1] for column in df_gym.columns]

# add a red circle marker to represent city of Toronto
for lat, lng, label in zip(df_gym.lat, df_gym.lng, df_gym.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=4,
        color='green',
        popup=label,
        fill = True,
        fill_color='green',
        fill_opacity=0.7
    ).add_to(venues_map)

# display map
venues_map


<br><h5><i> --------------------------------------------------------Cleaning Yoga locations data  ------------------------------------------------------ </i></h5>

In [19]:
#yoga_df.columns
yoga_df = yoga_df[['name','location.lat','location.lng', 'location.distance','location.postalCode']]
new_columns = ['Name','Latitude', 'Longitude', 'Distance','PostalCode']
yoga_df.columns = new_columns

Names = ['The Basement Yoga & Fitness Company Inc', 'Aum Centre Yoga', 'Ahimsa Yoga Centre', 
         'Downward Dog Yoga Centre-Beaches', 'Yoga Tree Vaughan', 'Yoga Tree Bay and Dundas',
         'Jaya Yoga Centre', 'Iyengar Yoga School Of Toronto', 'Ashtanga Yoga Centre',
         'Moksha Yoga Thornhill', 'Moksha Yoga Uptown', 'Yoga Heritage Healing Arts', 'Breathe Yoga Studio',
         'Yoga Therapy Toronto','Yin Yang Yoga Loft']
PC = ['M3H 2S1','L4K 4K5','M5S 1X5','M6J1G1','L4K 5B4','M5G 1C4','M2N 6A3','M4N 2M7','M4N 2L3','L3T 1P2',
      'M4T 1Z6','M5R 3G9','M6P 1Y8','M6G 1L4','L4J 1Y9']

Rows = yoga_df.shape[0]

for R in np.arange(Rows):
    yoga_name = yoga_df.iloc[R,0]
    i = 0
    for n in Names:
        if n == yoga_name:
            yoga_df.iloc[R,4] = PC[i]
            break
        i = i+1
        
yoga_df.dropna(inplace=True)
yoga_df['PostalCode'] = yoga_df['PostalCode'].str.lstrip()
yoga_df['PostalCode'] = yoga_df['PostalCode'].str[:3]
yoga_df['PostalCode'] = yoga_df['PostalCode'].str.upper()
yoga_df.head()     

Unnamed: 0,Name,Latitude,Longitude,Distance,PostalCode
0,The Basement Yoga & Fitness Company Inc,43.7556,-79.440141,735,M3H
1,Hot Yoga Wellness International,43.785537,-79.47795,4176,L4K
2,Sivananda Yoga Centre,43.662754,-79.402951,10849,M5S
3,The Yoga Sanctuary,43.661499,-79.383636,11599,M5G
4,Oxygen Yoga & Fitness,43.638176,-79.41799,13170,M6K



<br><h5><i> --------------------------------------------------------Cleaning Gym locations data  ------------------------------------------------------ </i></h5>

In [20]:
#Gyms_df.columns
Gyms = Gyms[['name','lat','lng', 'distance','postalCode']]
new_columns = ['Name','Latitude', 'Longitude', 'Distance','PostalCode']
Gyms.columns = new_columns

Names = ['The Gym at the Shangri-La','Concorde Park Condo Gym','Action Reaction Gym']
PC = ['M5H 0A3','M3C 3M8','M2H 2C9']

Rows = Gyms.shape[0]

for R in np.arange(Rows):
    Gym_name = Gyms.iloc[R,0]
    i = 0
    for n in Names:
        if n == Gym_name:
            Gyms.iloc[R,4] = PC[i]
            break
        i = i+1
        
Gyms.dropna(inplace=True)
Gyms['PostalCode'] = Gyms['PostalCode'].str.lstrip()
Gyms['PostalCode'] = Gyms['PostalCode'].str[:3]
Gyms['PostalCode'] = Gyms['PostalCode'].str.upper()
Gyms.head()    

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-ve

Unnamed: 0,Name,Latitude,Longitude,Distance,PostalCode
0,Spectrum I Gym,43.756378,-79.408334,3287,M2N
3,The Gym at the Shangri-La,43.648774,-79.386517,12784,M5H
5,M1 Gym,43.751125,-79.463981,1247,M3K
7,Gym At 25 Montgomery,43.709061,-79.39987,6409,M4R
8,Scarborough Gym-Elites Club,43.764712,-79.16649,22752,M1E



<br><h5><i> --------------------------------------Count Yoga locations within 10Km of each neighbourhood--------------------------------------</i></h5>

In [22]:
Rows = north_york_df.shape[0]
rows = yoga_df.shape[0]
Counts = []

for R in np.arange(Rows):
    c=0
    n_location = (north_york_df.iloc[R,3], north_york_df.iloc[R,4])
    for r in np.arange(rows):
        y_location = (yoga_df.iloc[r,1], yoga_df.iloc[r,2])
        d = geodesic(n_location, y_location).m
        if d <= 8000:
            c += 1
    Counts.append(c)
    
north_york_df['Yoga'] = Counts
north_york_df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  from ipykernel import kernelapp as app


Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Distance,Yoga
0,M2H,North York,Hillcrest Village,43.803762,-79.363452,8816.425669,9
1,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,8684.649208,11
2,M2K,North York,Bayview Village,43.786947,-79.385975,6243.564234,15
3,M2L,North York,"Silver Hills, York Mills",43.75749,-79.374714,6002.284234,14
4,M2M,North York,"Newtonbrook, Willowdale",43.789053,-79.408493,5058.190506,15



<br><h5><i> --------------------------------------Count Gym locations within 10Km of each neighbourhood--------------------------------------</i></h5>

In [26]:
Rows = north_york_df.shape[0]
rows = Gyms.shape[0]
Counts = []

for R in np.arange(Rows):
    c=0
    n_location = (north_york_df.iloc[R,3], north_york_df.iloc[R,4])
    for r in np.arange(rows):
        g_location = (Gyms.iloc[r,1], Gyms.iloc[r,2])
        d = geodesic(n_location, y_location).m
        if d <= 8000:
            c += 1
    Counts.append(c)
    
north_york_df['Gym'] = Counts
north_york_df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  from ipykernel import kernelapp as app


Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Distance,Yoga,Gym
0,M2H,North York,Hillcrest Village,43.803762,-79.363452,8816.425669,9,26
1,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,8684.649208,11,26
2,M2K,North York,Bayview Village,43.786947,-79.385975,6243.564234,15,26
3,M2L,North York,"Silver Hills, York Mills",43.75749,-79.374714,6002.284234,14,0
4,M2M,North York,"Newtonbrook, Willowdale",43.789053,-79.408493,5058.190506,15,26



<br><h5><i> -----------------------------------------------------------K-Means Clustering----------------------------------------------------------</i></h5>

In [27]:
X = north_york_df.values[:,6:]
X = np.nan_to_num(X)
Clus_dataSet = StandardScaler().fit_transform(X)
Clus_dataSet



array([[-0.5596376 ,  1.94935887],
       [-0.19165671,  1.94935887],
       [ 0.54430506,  1.94935887],
       [ 0.36031462, -0.51298918],
       [ 0.54430506,  1.94935887],
       [ 0.7282955 , -0.51298918],
       [ 0.54430506, -0.51298918],
       [ 0.36031462,  1.94935887],
       [-0.74362804, -0.51298918],
       [ 0.36031462, -0.51298918],
       [ 0.54430506, -0.51298918],
       [ 0.54430506, -0.51298918],
       [-0.00766627, -0.51298918],
       [ 0.54430506, -0.51298918],
       [-1.66358026, -0.51298918],
       [-0.92761848, -0.51298918],
       [-1.29559937, -0.51298918],
       [-0.37564715, -0.51298918],
       [ 1.46425728, -0.51298918],
       [ 1.09627639, -0.51298918],
       [ 2.0162286 , -0.51298918],
       [-0.00766627, -0.51298918],
       [-1.8475707 , -0.51298918],
       [-2.03156114, -0.51298918]])

In [41]:
clusterNum = 3
k_means = KMeans(init = "k-means++", n_clusters = clusterNum, n_init = 12)
k_means.fit(X)
labels = k_means.labels_
print(labels)

[1 1 1 0 1 0 0 1 2 0 0 0 0 0 2 2 2 2 0 0 0 0 2 2]


In [42]:
north_york_df["Clus_km"] = labels
north_york_df.head(5)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Distance,Yoga,Gym,Clus_km
0,M2H,North York,Hillcrest Village,43.803762,-79.363452,8816.425669,9,26,1
1,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,8684.649208,11,26,1
2,M2K,North York,Bayview Village,43.786947,-79.385975,6243.564234,15,26,1
3,M2L,North York,"Silver Hills, York Mills",43.75749,-79.374714,6002.284234,14,0,0
4,M2M,North York,"Newtonbrook, Willowdale",43.789053,-79.408493,5058.190506,15,26,1


In [43]:
k_0 = north_york_df[north_york_df.Clus_km == 0]
k_1 = north_york_df[north_york_df.Clus_km == 1]
k_2 = north_york_df[north_york_df.Clus_km == 2]
k_3 = north_york_df[north_york_df.Clus_km == 3]

In [44]:
#k_0
for lat, lng, label in zip(k_0.Latitude, k_0.Longitude, k_0.PostalCode):
    folium.features.CircleMarker(
        [lat, lng],
        radius=7,
        color='purple',
        fill = True,
        fill_color='purple',
        fill_opacity=0.6
    ).add_to(venues_map)

#k_1
for lat, lng, label in zip(k_1.Latitude, k_1.Longitude, k_1.PostalCode):
    folium.features.CircleMarker(
        [lat, lng],
        radius=7,
        color='yellow',
        fill = True,
        fill_color='yellow',
        fill_opacity=0.6
    ).add_to(venues_map)
    
#k_2
for lat, lng, label in zip(k_2.Latitude, k_2.Longitude, k_2.PostalCode):
    folium.features.CircleMarker(
        [lat, lng],
        radius=7,
        color='white',
        fill = True,
        fill_color='white',
        fill_opacity=0.6
    ).add_to(venues_map)
    
#k_3
for lat, lng, label in zip(k_3.Latitude, k_3.Longitude, k_3.PostalCode):
    folium.features.CircleMarker(
        [lat, lng],
        radius=7,
        color='black',
        fill = True,
        fill_color='black',
        fill_opacity=0.6
    ).add_to(venues_map)
    
venues_map

In [52]:
k_0

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Distance,Yoga,Gym,Clus_km
3,M2L,North York,"Silver Hills, York Mills",43.75749,-79.374714,6002.284234,14,0,0
5,M2N,North York,Willowdale South,43.77012,-79.408493,3712.229646,16,0,0
6,M2P,North York,York Mills West,43.752758,-79.400049,3955.623352,15,0,0
9,M3B,North York,Don Mills North,43.745906,-79.352188,7862.717756,14,0,0
10,M3C,North York,"Flemingdon Park, Don Mills South",43.7259,-79.340923,9270.243571,15,0,0
11,M3H,North York,"Bathurst Manor, Downsview North, Wilson Heights",43.754328,-79.442259,552.291837,15,0,0
12,M3J,North York,"Northwood Park, York University",43.76798,-79.487262,3425.910212,12,0,0
13,M3K,North York,"CFB Toronto, Downsview East",43.737473,-79.464763,2257.113302,15,0,0
18,M5M,North York,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,3326.05133,20,0,0
19,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,4173.452665,18,0,0


In [53]:
k_1

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Distance,Yoga,Gym,Clus_km
0,M2H,North York,Hillcrest Village,43.803762,-79.363452,8816.425669,9,26,1
1,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,8684.649208,11,26,1
2,M2K,North York,Bayview Village,43.786947,-79.385975,6243.564234,15,26,1
4,M2M,North York,"Newtonbrook, Willowdale",43.789053,-79.408493,5058.190506,15,26,1
7,M2R,North York,Willowdale West,43.782736,-79.442259,3204.510713,14,26,1


In [54]:
k_2

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Distance,Yoga,Gym,Clus_km
8,M3A,North York,Parkwoods,43.753259,-79.329656,9621.732584,8,0,2
14,M3L,North York,Downsview West,43.739015,-79.506944,4958.708536,3,0,2
15,M3M,North York,Downsview Central,43.728496,-79.495697,4723.924685,7,0,2
16,M3N,North York,Downsview Northwest,43.761631,-79.520999,5845.423121,5,0,2
17,M4A,North York,Victoria Village,43.725882,-79.315572,11212.410701,10,0,2
22,M9L,North York,Humber Summit,43.756303,-79.565963,9412.79138,2,0,2
23,M9M,North York,"Emery, Humberlea",43.724766,-79.532242,7458.360296,1,0,2



<br><h5><i> -----------------------------------------------------------Further analysis----------------------------------------------------------</i></h5>

In [57]:
k = k_2.iloc[-2:,:] #narrowing down from chosen third cluster
k

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Distance,Yoga,Gym,Clus_km
22,M9L,North York,Humber Summit,43.756303,-79.565963,9412.79138,2,0,2
23,M9M,North York,"Emery, Humberlea",43.724766,-79.532242,7458.360296,1,0,2


In [None]:
#Adding to the venue_map
for lat, lng, label in zip(k.Latitude, k.Longitude, k.PostalCode):
    folium.features.CircleMarker(
        [lat, lng],
        radius=7,
        color='black',
        fill = True,
        fill_color='black',
        fill_opacity=0.6
    ).add_to(venues_map)
    
venues_map


<br><h5><i> -----------------------------------------------------------Evaluation----------------------------------------------------------</i></h5>

In [64]:
#M9L
Rows = north_york_df.shape[0]
rows = yoga_df.shape[0]

Yoga_place = []
yoga_distance =[]

for R in np.arange(Rows):
    n_location = (north_york_df.iloc[R,3], north_york_df.iloc[R,4])
    if north_york_df.iloc[R,0] == 'M9L':
        for r in np.arange(rows):
            y_location = (yoga_df.iloc[r,1], yoga_df.iloc[r,2])
            d = geodesic(n_location, y_location).m
            if d <= 8000:
                name = yoga_df.iloc[r,0]
                Yoga_place.append(name)
                yoga_distance.append(d)

Yoga_place, yoga_distance

(['Hot Yoga Wellness International', 'Yoga Bodies'],
 [7795.283251872631, 7568.605503371539])

In [65]:
#M9M
Rows = north_york_df.shape[0]
rows = yoga_df.shape[0]

Yoga_place = []
yoga_distance =[]

for R in np.arange(Rows):
    n_location = (north_york_df.iloc[R,3], north_york_df.iloc[R,4])
    if north_york_df.iloc[R,0] == 'M9M':
        for r in np.arange(rows):
            y_location = (yoga_df.iloc[r,1], yoga_df.iloc[r,2])
            d = geodesic(n_location, y_location).m
            if d <= 8000:
                name = yoga_df.iloc[r,0]
                Yoga_place.append(name)
                yoga_distance.append(d)

Yoga_place, yoga_distance

(['Yoga Tree'], [6948.201567360444])