## The Capstone Project - Battle of the Neighborhoods


### Migrating to Quebec: a decision based on the rigth choice of a school

#### Introduction

##### Background

A few years ago my wife and I started the process of emigrating to Quebec. In those years we had two young children and one of our biggest concerns was making a good neighborhood choice. Access to a nearby school, a hospital, markets and bus routes were very important aspects for the final decision.

According to the 2016 census, there are about 70,000 Colombians living in Canada, of whom 25,000 are in the province of Quebec and approximately 10% reside in the Quebec metropolitan census area (MCA).

Despite the fact that historically Latin American emigration to Canada was marked for social reasons, there are many interested in emigrating who have very good academic profiles and have the possibility of choosing good neighborhoods to live in.

An important factor to consider is the effort that Canada is making in its 2020-2022 immigration plan to increase the numbers of immigrants, seeking to maintain a balance between the number and the adaptability and productivity of immigrants.

This indicates that many families will continue to encounter the same need to make an adequate neighborhood selection.

##### Problem

What is the best neighborhood to live for many of these people who start this long process of emigration? The choice is complicated when the family also has school-age children. Obtaining a list of schools is not a difficult task, but other important places must be considered. For this analysis, schools with cafes, restaurants and supermarkets will be chosen.

##### Audience

Those interested in this project are parents who are already in the process of emigrating to Quebec with their family and have not yet chosen a neighborhood to settle in.

### Data Section

Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.

This submission will eventually become your Data section in your final report. So I recommend that you push the report (having your Data section) to your Github repository and submit a link to it.

#### Data acquisition
The data set consists of a list of colleges and universities in Quebec City along with their coordinates. For the search of the nearby points of interest we will use foursquare.

the list of schools includes the name, type of school, x coordinate and y coordinate.

For the search of places we will use a radius of 1000 meters and we will limit ourselves to 100 places for each school


#### Import libraries

In [1]:
#importamos las librerias
import pandas as pd
import numpy as np

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library


import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import wikipedia as wp

# import k-means from clustering stage
from sklearn.cluster import KMeans

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

from ipywidgets import interact, fixed

import seaborn as sns



#### Obtain Quebec Coordinates

In [2]:
address = 'quebec, quebec, ca'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Quebec are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Quebec are 46.8259601, -71.2352226.


#### Load Quebec School Data

In [56]:
df_QuebecPoints = pd.read_excel("PointofInterest.xlsx")

#### Create a map of Quebec using latitude and longitude values

In [57]:
# create map of Quebec using latitude and longitude values
map_quebec = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, school in zip(df_QuebecPoints['latitude'], df_QuebecPoints['longitude'], df_QuebecPoints['point']):
    label = '{}'.format(school)    
    label = folium.Popup(label, parse_html=True)
        
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_quebec)  
    
map_quebec

In [58]:
CLIENT_ID = 'E1BCS0BSJLOPNAXIXE1CM5N4CXRFZHWGIXOMJIFUDVVW4EM2' # your Foursquare ID
CLIENT_SECRET = 'UXFSMSGSMRJZKRZARICTRDFSRBSBYLUS13HKNI3DVF1XK1EW' # your Foursquare Secret
VERSION = '20200505' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: E1BCS0BSJLOPNAXIXE1CM5N4CXRFZHWGIXOMJIFUDVVW4EM2
CLIENT_SECRET:UXFSMSGSMRJZKRZARICTRDFSRBSBYLUS13HKNI3DVF1XK1EW


#### Set search parameters

In [59]:
# type your answer here
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 1000 # define radius

#### Explore the first school in our dataset.

In [60]:
df_QuebecPoints.loc[0, 'point']

'Cégep Limoilou'

In [61]:
school_latitude  = df_QuebecPoints.loc[0, 'latitude'] 
school_longitude = df_QuebecPoints.loc[0, 'longitude'] 

school_name = df_QuebecPoints.loc[0, 'point'] 

print('Latitude and longitude values of {} are {}, {}.'.format(school_name, 
                                                               school_latitude, 
                                                               school_longitude))

Latitude and longitude values of Cégep Limoilou are 46.83017655, -71.22696367322604.


In [62]:
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    school_latitude, 
    school_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=E1BCS0BSJLOPNAXIXE1CM5N4CXRFZHWGIXOMJIFUDVVW4EM2&client_secret=UXFSMSGSMRJZKRZARICTRDFSRBSBYLUS13HKNI3DVF1XK1EW&v=20200505&ll=46.83017655,-71.22696367322604&radius=1000&limit=100'

In [63]:
results = requests.get(url).json()

In [64]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [65]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,La Souche,Brewery,46.829038,-71.225458
1,La Planque,Restaurant,46.826714,-71.230191
2,Nektar Caféologue,Café,46.826568,-71.230025
3,Bal du Lézard,Bar,46.826802,-71.230251
4,Fournée Bio (La),Bakery,46.828384,-71.231527


#### Explore all schools in our dataset

In [66]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['school', 
                  'school Latitude', 
                  'school Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [67]:
quebec_venues = getNearbyVenues(names      = df_QuebecPoints['point'],
                                latitudes  = df_QuebecPoints['latitude'],
                                longitudes = df_QuebecPoints['longitude'])


Cégep Limoilou
Cégep de Sainte-Foy
Cégep Garneau
Collège régional Champlain St. Lawrence
Collège Mérici
Collège Bart
Collège O'Sullivan de Québec
Collège CDI
Aviron Québec College Technique
Université Laval
Université du Quebec 
École nationale d'administration publique
Institut national de la recherche scientifique
TÉLUQ o Télé-Université
Secondary School la courvilloise
Ecole secondaire la seigneurieu
Ecole secondaire francois-bourrin
Ecole des Sentiers
Ecole le Sommet
Ecole secondaire l'Odissée
Ecole secondaire Roger-Comtois
Neufchatel High School
La Camaradiere High School
School Secondary of the Cité
Ecole secondaire Quebec High School
College Francois-de-laval
Ecole Jean-de-Brebeuf
de Rochebelle High School
Middle School des Compagnons


In [68]:
quebec_venues.head()

Unnamed: 0,school,school Latitude,school Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Cégep Limoilou,46.830177,-71.226964,La Souche,46.829038,-71.225458,Brewery
1,Cégep Limoilou,46.830177,-71.226964,La Planque,46.826714,-71.230191,Restaurant
2,Cégep Limoilou,46.830177,-71.226964,Nektar Caféologue,46.826568,-71.230025,Café
3,Cégep Limoilou,46.830177,-71.226964,Bal du Lézard,46.826802,-71.230251,Bar
4,Cégep Limoilou,46.830177,-71.226964,Fournée Bio (La),46.828384,-71.231527,Bakery


In [69]:
quebec_venues.groupby('school').count()

Unnamed: 0_level_0,school Latitude,school Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
school,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aviron Québec College Technique,32,32,32,32,32,32
College Francois-de-laval,71,71,71,71,71,71
Collège CDI,53,53,53,53,53,53
Collège Mérici,4,4,4,4,4,4
Collège O'Sullivan de Québec,53,53,53,53,53,53
Collège régional Champlain St. Lawrence,28,28,28,28,28,28
Cégep Garneau,6,6,6,6,6,6
Cégep Limoilou,22,22,22,22,22,22
Cégep de Sainte-Foy,26,26,26,26,26,26
Ecole Jean-de-Brebeuf,10,10,10,10,10,10


In [70]:
# one hot encoding
quebec_onehot = pd.get_dummies(quebec_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
quebec_onehot['school'] = quebec_venues['school'] 

# move neighborhood column to the first column
fixed_columns = [quebec_onehot.columns[-1]] + list(quebec_onehot.columns[:-1])
quebec_onehot = quebec_onehot[fixed_columns]

quebec_onehot.head()

Unnamed: 0,school,Adult Boutique,Art Gallery,Arts & Entertainment,Asian Restaurant,Auto Garage,Bakery,Bank,Bar,Bed & Breakfast,...,Swiss Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant
0,Cégep Limoilou,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Cégep Limoilou,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Cégep Limoilou,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Cégep Limoilou,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
4,Cégep Limoilou,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### violin plot for 3 main categories searched for each school

In [71]:
def ViolintPlot(venue1, venue2, venue3):
    
    fig = plt.figure(figsize=(60,30))
    sns.set(font_scale=1.1)

    ax = plt.subplot(4,1,1)
    sns.violinplot(x="school", y=venue1, data=quebec_onehot, cut=0);
    plt.xlabel("")

    ax = plt.subplot(4,1,2)
    sns.violinplot(x="school", y=venue2, data=quebec_onehot, cut=0);
    plt.xlabel("")

    plt.subplot(4,1,3)
    sns.violinplot(x="school", y=venue3, data=quebec_onehot, cut=0);
    


    ax.text(-1.0, 3.1, 'Violin Plot for {}, {} and {} categories for each school'.format(venue1, venue2, venue3), fontsize=50)
    plt.savefig ("ViolinPlot_VenCat.png", dpi=240)
    plt.show()

In [72]:
interact(ViolintPlot, venue1=quebec_venues["Venue Category"].unique(), venue2=quebec_venues["Venue Category"].unique(), venue3=quebec_venues["Venue Category"].unique());

interactive(children=(Dropdown(description='venue1', options=('Brewery', 'Restaurant', 'Café', 'Bar', 'Bakery'…

### Schools with Coffee Shop, Restaurant and Park at least 1000 meters

- Collège O'Sullivan de Québec
- Collège CDI
- Aviron Québec College Technique	
- Institut national de la recherche scientifique
- TÉLUQ o Télé-Université
- College Francois-de-laval	

#### Detailed analysis of the selected schools.

In [73]:
df_QuebecSel = df_QuebecPoints

In [74]:
df_QuebecSel.replace(to_replace ="Collège O'Sullivan de Québec",                       value ="Collège_OSullivan", inplace = True) 
df_QuebecSel.replace(to_replace ="Collège CDI",                                        value ="Collège_CDI", inplace = True) 
df_QuebecSel.replace(to_replace ="Aviron Québec College Technique",                    value ="Aviron_College_Technique", inplace = True) 
df_QuebecSel.replace(to_replace ="Institut national de la recherche scientifique",     value ="Institut_national_recherche_scientifique", inplace = True) 
df_QuebecSel.replace(to_replace ="TÉLUQ o Télé-Université",                            value ="TÉLUQ", inplace = True) 
df_QuebecSel.replace(to_replace ="College Francois-de-laval",                          value ="College_Francois-de-laval", inplace = True) 

In [75]:
df_QuebecSel = df_QuebecSel[(df_QuebecSel.point == "Collège_OSullivan") | (df_QuebecSel.point == "Collège_CDI") | (df_QuebecSel.point == "Aviron_College_Technique") | (df_QuebecSel.point == "Institut_national_recherche_scientifique") | (df_QuebecSel.point == "TÉLUQ")| (df_QuebecSel.point == "College_Francois-de-laval")]

In [76]:
df_QuebecSel.head(10)

Unnamed: 0,point,category,latitude,longitude
6,Collège_OSullivan,cégep privado especializado,46.811454,-71.217082
7,Collège_CDI,cégep privado especializado,46.811484,-71.214568
8,Aviron_College_Technique,cégep privado especializado,46.812938,-71.226343
12,Institut_national_recherche_scientifique,University,46.812737,-71.224343
13,TÉLUQ,University,46.813351,-71.222514
25,College_Francois-de-laval,Secondary School,46.815323,-71.206413


In [77]:
# create map of Quebec using latitude and longitude values
map_sel= folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, school in zip(df_QuebecSel['latitude'], df_QuebecSel['longitude'], df_QuebecSel['point']):
    label = '{}'.format(school)    
    label = folium.Popup(label, parse_html=True)
        
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_sel)  
    
map_sel

#### now we look about 2000 meters around

In [78]:
#On the selected school 2km
LIMIT = 200 # limit of number of venues returned by Foursquare API

radius = 2000 # define radius

In [79]:
quebec_venues = getNearbyVenues(names      = df_QuebecSel['point'],
                                latitudes  = df_QuebecSel['latitude'],
                                longitudes = df_QuebecSel['longitude'])

Collège_OSullivan
Collège_CDI
Aviron_College_Technique
Institut_national_recherche_scientifique
TÉLUQ
College_Francois-de-laval


In [80]:
quebec_venues.groupby('school').count()

Unnamed: 0_level_0,school Latitude,school Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
school,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aviron_College_Technique,32,32,32,32,32,32
College_Francois-de-laval,71,71,71,71,71,71
Collège_CDI,53,53,53,53,53,53
Collège_OSullivan,53,53,53,53,53,53
Institut_national_recherche_scientifique,46,46,46,46,46,46
TÉLUQ,43,43,43,43,43,43


In [81]:
# one hot encoding
quebec_onehot = pd.get_dummies(quebec_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
quebec_onehot['school'] = quebec_venues['school'] 

# move neighborhood column to the first column
fixed_columns = [quebec_onehot.columns[-1]] + list(quebec_onehot.columns[:-1])
quebec_onehot = quebec_onehot[fixed_columns]

quebec_onehot.head()

Unnamed: 0,school,Art Gallery,Asian Restaurant,Bakery,Bar,Bed & Breakfast,Bistro,Bookstore,Breakfast Spot,Brewery,...,Sandwich Place,Scandinavian Restaurant,Snack Place,Steakhouse,Sushi Restaurant,Swiss Restaurant,Tea Room,Theater,Toy / Game Store,Vegetarian / Vegan Restaurant
0,Collège_OSullivan,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Collège_OSullivan,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Collège_OSullivan,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Collège_OSullivan,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Collège_OSullivan,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [82]:
quebec_grouped = quebec_onehot.groupby('school').mean().reset_index()
quebec_grouped

Unnamed: 0,school,Art Gallery,Asian Restaurant,Bakery,Bar,Bed & Breakfast,Bistro,Bookstore,Breakfast Spot,Brewery,...,Sandwich Place,Scandinavian Restaurant,Snack Place,Steakhouse,Sushi Restaurant,Swiss Restaurant,Tea Room,Theater,Toy / Game Store,Vegetarian / Vegan Restaurant
0,Aviron_College_Technique,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0625,...,0.03125,0.0,0.0,0.0,0.0,0.0,0.03125,0.03125,0.03125,0.0
1,College_Francois-de-laval,0.014085,0.0,0.014085,0.042254,0.0,0.014085,0.014085,0.0,0.0,...,0.014085,0.014085,0.014085,0.014085,0.0,0.0,0.0,0.0,0.0,0.0
2,Collège_CDI,0.0,0.018868,0.018868,0.037736,0.0,0.0,0.018868,0.0,0.0,...,0.0,0.018868,0.018868,0.0,0.018868,0.018868,0.0,0.0,0.0,0.018868
3,Collège_OSullivan,0.0,0.0,0.0,0.056604,0.018868,0.018868,0.0,0.018868,0.0,...,0.037736,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.0,0.0
4,Institut_national_recherche_scientifique,0.0,0.0,0.043478,0.021739,0.0,0.021739,0.0,0.0,0.065217,...,0.021739,0.0,0.0,0.0,0.0,0.0,0.021739,0.021739,0.021739,0.0
5,TÉLUQ,0.0,0.0,0.046512,0.046512,0.0,0.023256,0.0,0.0,0.046512,...,0.023256,0.0,0.0,0.0,0.0,0.023256,0.023256,0.023256,0.023256,0.0


In [83]:
quebec_grouped.shape

(6, 77)

#### We look for the top 5 categories around each school

In [84]:
num_top_venues = 5

for hood in quebec_grouped['school']:
    print("----"+hood+"----")
    temp = quebec_grouped[quebec_grouped['school'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Aviron_College_Technique----
        venue  freq
0   Gastropub  0.12
1     Brewery  0.06
2      Bakery  0.06
3  Restaurant  0.06
4        Café  0.06


----College_Francois-de-laval----
               venue  freq
0              Hotel  0.14
1  French Restaurant  0.13
2              Plaza  0.06
3         Restaurant  0.06
4               Café  0.04


----Collège_CDI----
               venue  freq
0         Restaurant  0.08
1              Hotel  0.06
2  French Restaurant  0.06
3      Grocery Store  0.06
4              Plaza  0.04


----Collège_OSullivan----
               venue  freq
0              Hotel  0.09
1        Coffee Shop  0.06
2                Bar  0.06
3  French Restaurant  0.06
4       Concert Hall  0.06


----Institut_national_recherche_scientifique----
               venue  freq
0          Gastropub  0.11
1         Restaurant  0.07
2               Café  0.07
3            Brewery  0.07
4  French Restaurant  0.07


----TÉLUQ----
               venue  freq
0         Restauran

In [85]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [86]:
num_top_venues = 15

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['school']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['school'] = quebec_grouped['school']

for ind in np.arange(quebec_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(quebec_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(10)

Unnamed: 0,school,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Aviron_College_Technique,Gastropub,Brewery,Coffee Shop,Bakery,Café,Restaurant,Pizza Place,Library,Cocktail Bar,Deli / Bodega,Other Nightlife,Park,Diner,Cambodian Restaurant,Fast Food Restaurant
1,College_Francois-de-laval,Hotel,French Restaurant,Restaurant,Plaza,Café,Bar,Neighborhood,Park,Historic Site,Gastropub,Pizza Place,Bistro,Bakery,History Museum,Greek Restaurant
2,Collège_CDI,Restaurant,Grocery Store,French Restaurant,Hotel,Concert Hall,Bar,Coffee Shop,Park,Plaza,Historic Site,Café,Italian Restaurant,Ice Cream Shop,Capitol Building,History Museum
3,Collège_OSullivan,Hotel,French Restaurant,Concert Hall,Coffee Shop,Bar,Restaurant,Gym,Plaza,Gastropub,Pub,Park,Sandwich Place,Café,Convenience Store,Hostel
4,Institut_national_recherche_scientifique,Gastropub,Restaurant,French Restaurant,Brewery,Pub,Café,Coffee Shop,Bakery,Grocery Store,Bar,Toy / Game Store,Japanese Restaurant,Library,Cocktail Bar,Deli / Bodega
5,TÉLUQ,French Restaurant,Coffee Shop,Café,Pub,Gastropub,Restaurant,Bar,Brewery,Bakery,Gym,Sandwich Place,Ice Cream Shop,Theater,Library,Cocktail Bar


#### Clustering waiting for 3 groups

In [87]:
# set number of clusters
kclusters = 3

quebec_grouped_clustering = quebec_grouped.drop('school', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(quebec_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 2, 2, 0, 0])

In [88]:
neighborhoods_venues_sorted.head(10)

Unnamed: 0,school,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Aviron_College_Technique,Gastropub,Brewery,Coffee Shop,Bakery,Café,Restaurant,Pizza Place,Library,Cocktail Bar,Deli / Bodega,Other Nightlife,Park,Diner,Cambodian Restaurant,Fast Food Restaurant
1,College_Francois-de-laval,Hotel,French Restaurant,Restaurant,Plaza,Café,Bar,Neighborhood,Park,Historic Site,Gastropub,Pizza Place,Bistro,Bakery,History Museum,Greek Restaurant
2,Collège_CDI,Restaurant,Grocery Store,French Restaurant,Hotel,Concert Hall,Bar,Coffee Shop,Park,Plaza,Historic Site,Café,Italian Restaurant,Ice Cream Shop,Capitol Building,History Museum
3,Collège_OSullivan,Hotel,French Restaurant,Concert Hall,Coffee Shop,Bar,Restaurant,Gym,Plaza,Gastropub,Pub,Park,Sandwich Place,Café,Convenience Store,Hostel
4,Institut_national_recherche_scientifique,Gastropub,Restaurant,French Restaurant,Brewery,Pub,Café,Coffee Shop,Bakery,Grocery Store,Bar,Toy / Game Store,Japanese Restaurant,Library,Cocktail Bar,Deli / Bodega
5,TÉLUQ,French Restaurant,Coffee Shop,Café,Pub,Gastropub,Restaurant,Bar,Brewery,Bakery,Gym,Sandwich Place,Ice Cream Shop,Theater,Library,Cocktail Bar


In [89]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

quebec_merged = df_QuebecSel

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
quebec_merged = quebec_merged.join(neighborhoods_venues_sorted.set_index('school'), on='point')

quebec_merged.head(10) # check the last columns!

Unnamed: 0,point,category,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
6,Collège_OSullivan,cégep privado especializado,46.811454,-71.217082,2,Hotel,French Restaurant,Concert Hall,Coffee Shop,Bar,Restaurant,Gym,Plaza,Gastropub,Pub,Park,Sandwich Place,Café,Convenience Store,Hostel
7,Collège_CDI,cégep privado especializado,46.811484,-71.214568,2,Restaurant,Grocery Store,French Restaurant,Hotel,Concert Hall,Bar,Coffee Shop,Park,Plaza,Historic Site,Café,Italian Restaurant,Ice Cream Shop,Capitol Building,History Museum
8,Aviron_College_Technique,cégep privado especializado,46.812938,-71.226343,0,Gastropub,Brewery,Coffee Shop,Bakery,Café,Restaurant,Pizza Place,Library,Cocktail Bar,Deli / Bodega,Other Nightlife,Park,Diner,Cambodian Restaurant,Fast Food Restaurant
12,Institut_national_recherche_scientifique,University,46.812737,-71.224343,0,Gastropub,Restaurant,French Restaurant,Brewery,Pub,Café,Coffee Shop,Bakery,Grocery Store,Bar,Toy / Game Store,Japanese Restaurant,Library,Cocktail Bar,Deli / Bodega
13,TÉLUQ,University,46.813351,-71.222514,0,French Restaurant,Coffee Shop,Café,Pub,Gastropub,Restaurant,Bar,Brewery,Bakery,Gym,Sandwich Place,Ice Cream Shop,Theater,Library,Cocktail Bar
25,College_Francois-de-laval,Secondary School,46.815323,-71.206413,1,Hotel,French Restaurant,Restaurant,Plaza,Café,Bar,Neighborhood,Park,Historic Site,Gastropub,Pizza Place,Bistro,Bakery,History Museum,Greek Restaurant


#### Create the map with the school in clusters

In [95]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(quebec_merged['latitude'], quebec_merged['longitude'], quebec_merged['point'], quebec_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-3],
        fill=True,
        fill_color=rainbow[cluster-3],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### detailed look at the clusters

#### This first cluster is more alive, has a lot of fun and is perfect for us. It has parks, libraries, cafes, many gastropubs, restaurants, a French restaurant. we believe it is a very interesting area of Quebec. Finally it is near a boulevard and it is good for our travel in the city. 

In [45]:
quebec_merged.loc[quebec_merged['Cluster Labels'] == 0, quebec_merged.columns[[1] + list(range(5, quebec_merged.shape[1]))]]

Unnamed: 0,category,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,cégep privado especializado,Gastropub,Bakery,Restaurant,Café,Park,Brewery,Diner,Other Nightlife,Coffee Shop,Library
12,University,Gastropub,Pub,French Restaurant,Restaurant,Brewery,Café,Grocery Store,Coffee Shop,Bakery,Cocktail Bar
13,University,Restaurant,Gastropub,French Restaurant,Café,Pub,Brewery,Bar,Coffee Shop,Bakery,Ice Cream Shop


#### In this group we have the feeling that it is a more tourist area. It has a hotel as the first common place and we are currently not interested in moving to a tourist neighborhood

In [46]:
quebec_merged.loc[quebec_merged['Cluster Labels'] == 1, quebec_merged.columns[[1] + list(range(5, quebec_merged.shape[1]))]]

Unnamed: 0,category,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
25,Secondary School,Hotel,French Restaurant,Restaurant,Plaza,Neighborhood,Park,Bar,Café,Pizza Place,Historic Site


#### the same from the last cluster. we found many hotels around

In [47]:
quebec_merged.loc[quebec_merged['Cluster Labels'] == 2, quebec_merged.columns[[1] + list(range(5, quebec_merged.shape[1]))]]

Unnamed: 0,category,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,cégep privado especializado,Café,Hotel,Italian Restaurant,Coffee Shop,Restaurant,Gym,Boutique,Ramen Restaurant,Hostel,Hookah Bar
7,cégep privado especializado,Restaurant,Grocery Store,Hotel,French Restaurant,Concert Hall,Park,Plaza,Coffee Shop,Bar,Swiss Restaurant


#### Finally, our neighborhood to study for our emigration to Quebec is close to group 0.