# Battle of the Neighborhoods
## Venue recommendation by Subway Station in Montreal

### Table of Contents

##### Setup:
    A. Problem
    B. Background
    C. Data
##### Report:
    1. Introduction
    2. Data
    3. Methodology
    4. Results
    5. Discussion
    6. Conclusion
##### Appendix:
    a. Code

## **Setup**

### **A. Problem:**

Every summer, the influx of tourists for the festival season creates and incredible gridlock in the streets of Montreal. Much to the dismay of locals, very few people use the amazing public transport system to get from place to place, opting instead to drive around. I believe this is due to the lack of awareness of all the amazing restaurants, bars, theatres, and other points of interest that can easily be reached by subway in a matter of minutes.

### **B. Background:**

As a Montreal local, I have seen it every summer. Droves of tourists clutter the streets of the city, as they try to drive a few blocks in rush-hour traffic. It always boggles the mind that they would opt to spend 20 minutes in downtown traffic, instead of jumping on the subway for a few stops and getting to their destination in relative tranquility (the odd tipsy university student notwithstanding). Figuring it’s probably due to their lack of knowledge, I have decided to help them out by clustering and comparing the top venues near each of our subway stations. 

### **C. Data:**

We will be working mainly with two datasets for this project. 

First and foremost, we need geolocation coordinates for Montreal’s 68 subway (or Metro) stations. The source for these coordinates is the City of Montreal’s Open Data Portal (http://donnees.ville.montreal.qc.ca/dataset). More specifically we will be using their data set on “STM Bus and Subway lines (http://donnees.ville.montreal.qc.ca/dataset/stm-traces-des-lignes-de-bus-et-de-metro). Now unfortunately, they only make it available as a large .SHP Shapefile, so we are gong to have to do a lot of cleanup to make it workable.

Our second dataset will be the venues information queried from Foursquare using the geolocation coordinates obtained above. Instead of focusing on quantity (i.e. concentration of venues in a location), we will be focusing on quality (i.e. what are the top venues in a location). We are, after all, trying to convince tourists to use our world-class public transport system instead of contributing to the summer gridlock – and what better way than to guide them to the best Metro stations, with the best venues?


## **Report**

### **1. Introduction**

### **1. Data**

### **1. Methodology**

### **1. Results**

### **1. Discussion**

### **1. Conclusion**

## **Appendix**

### **a. Code**

#### Start by importing the libraries we need for this project.

In [1]:
import pandas as pd
import geopandas as gpd
!pip install matplotlib -U
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import folium
from folium import plugins
import seaborn as sns
!pip install descartes -U
import descartes
import re
import requests

Requirement already up-to-date: matplotlib in c:\users\alexi\anaconda3\lib\site-packages (3.2.1)



Bad key "text.kerning_factor" on line 4 in
C:\Users\alexi\anaconda3\lib\site-packages\matplotlib\mpl-data\stylelib\_classic_test_patch.mplstyle.
You probably need to get an updated matplotlibrc file from
https://github.com/matplotlib/matplotlib/blob/v3.1.3/matplotlibrc.template
or from the matplotlib source distribution


Requirement already up-to-date: descartes in c:\users\alexi\anaconda3\lib\site-packages (1.1.0)


#### Because we're working with a Shapefile, we are using a geoPandas geoDataFrame. We will eventually revert back to a Pandas DataFrame for convenience's sake.

In [2]:
df = gpd.read_file('stm_arrets_sig.shp')
df.head()

Unnamed: 0,stop_id,stop_code,stop_name,stop_url,wheelchair,route_id,loc_type,service_id,geometry
0,43-01,10118,Station Angrignon,,2,,2,20M,POINT (296677.562 5034048.338)
1,43,10118,Station Angrignon,http://www.stm.info/fr/infos/reseaux/metro/ang...,2,1.0,0,20M,POINT (296733.669 5034064.602)
2,42-01,10120,Station Monk - Édicule Nord,,2,,2,20M,POINT (297515.753 5034601.626)
3,42-02,10120,Station Monk - Édicule Sud,,2,,2,20M,POINT (297496.004 5034568.310)
4,42,10120,Station Monk,http://www.stm.info/fr/infos/reseaux/metro/monk,2,1.0,0,20M,POINT (297506.817 5034585.078)


#### Let's do a little bit of cleanup. Having had a look-through the file, I noticed we could drop all of the rows with "None" for route_id as they are all duplicates. We're also going to drop some useless columns.

In [3]:
df.replace(r'None', np.nan, regex=True, inplace = True)
df.dropna(axis = 0, how = "any", inplace = True)
df.drop(['stop_id', 'stop_code', 'wheelchair', 'loc_type', 'service_id'], axis = 1, inplace = True)

#### Every subway station, and bus stop has an associated URL (it's used to look up subway and bus schedules). Let's create a new dataframe containing only those entries where the URL contains the word "metro" (as this indicates this is a subway station).

In [4]:
df_metro = df[df['stop_url'].str.contains('.*metro.*')]

#### Let's re-project the geometry data to a coordinate system we are more familiar with (and something that folium will work with without complaining too much).

In [5]:
df_metro = df_metro.to_crs(epsg='4326')

#### Now let's convert the geoDataFrame to a regular DataFrame, and cast the "geometry" column to string so that we may parse it with regex. We have to do this because we started with a geoDataFrame created from a shapefile.

In [6]:
df_metro = pd.DataFrame(df_metro)
df_metro['geometry'] = df_metro['geometry'].astype('str')
print(df_metro.dtypes)
df_metro.head()

stop_name    object
stop_url     object
route_id     object
geometry     object
dtype: object


Unnamed: 0,stop_name,stop_url,route_id,geometry
1,Station Angrignon,http://www.stm.info/fr/infos/reseaux/metro/ang...,1,POINT (-73.60311799999998 45.44646599999288)
4,Station Monk,http://www.stm.info/fr/infos/reseaux/metro/monk,1,POINT (-73.593242 45.45115799999289)
6,Station Jolicoeur,http://www.stm.info/fr/infos/reseaux/metro/jol...,1,POINT (-73.58169099999999 45.45700999999288)
9,Station Verdun,http://www.stm.info/fr/infos/reseaux/metro/verdun,1,POINT (-73.57202099999999 45.45944099999288)
12,Station De l'Église,http://www.stm.info/fr/infos/reseaux/metro/de-...,1,POINT (-73.56707400000001 45.46189399999288)


Now, let's parse the clunky database, and extract all of the useful information into a new dataframe. We will take this opportunity to clean up the geolocation coordinates and make something more usable.

In [7]:
df_metro_geo = pd.DataFrame()
df_metro_geo['stop'] = ''
df_metro_geo['lat'] = ''
df_metro_geo['lon'] = ''

In [8]:
for name, geometry in zip('df_metro.stop_name', 'df_metro.geometry'):
    df_metro_geo.stop = df_metro.stop_name
    df_metro_geo.lon = df_metro.geometry.str.extract(pat = r"(-[0-9][0-9].[0-9]*)")
    df_metro_geo.lat = df_metro.geometry.str.extract(pat = r"[0-9]\s([0-9][0-9].[0-9]*)")

In [9]:
df_metro_geo.head()

Unnamed: 0,stop,lat,lon
1,Station Angrignon,45.44646599999288,-73.60311799999998
4,Station Monk,45.45115799999289,-73.593242
6,Station Jolicoeur,45.45700999999288,-73.58169099999999
9,Station Verdun,45.45944099999288,-73.57202099999999
12,Station De l'Église,45.46189399999288,-73.567074


#### Finally, let's draw a map of Montreal, and use our newly cleaned up geolocation coordinates to mark all of the subway stations.

In [10]:
mtl_map = folium.Map(location = [45.52, -73.62], zoom_start = 12, tiles = 'stamenterrain')

for row in df_metro_geo.itertuples():
    mtl_map.add_child(folium.CircleMarker(location = [row.lat, row.lon],
                                         radius = 5,
                                         fill = True,
                                         fill_color = 'red',
                                         fill_opacity = 0.7,
                                         popup = row.stop))

mtl_map

#### Ok, now let's make some queries to Foursquare using our subway stations coordinates. Couple of things to note - we're going to limit the number of venues to 10 per location, and have Foursquare return only the 5 most popular venues (passing "sortByPopularity = 1, and topPicks).

In [11]:
import config as cfg

CLIENT_ID = cfg.client_id
CLIENT_SECRET = cfg.client_secret
VERSION = "20200426"

In [52]:
radius = 1000
LIMIT = 10

venues = []

for stop, lat, lon in zip(df_metro_geo['stop'], df_metro_geo['lat'], df_metro_geo['lon']):
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&sortByPopularity=1&section=topPicks".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        lon,
        radius, 
        LIMIT)

    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for venue in results:
        venues.append((
            stop,
            lat, 
            lon, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

#### Dump the returned venues into a Pandas DataFrame, & add some column headers. Note that some locations have less than 5 venues. That's normal as these are periphery stations where there's little other than the subway station and a park.

In [53]:
venues_df = pd.DataFrame(venues)
venues_df.columns = ['stop', 'Lat', 'Lon', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

venues_df.head()

Unnamed: 0,stop,Lat,Lon,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Station Angrignon,45.44646599999288,-73.60311799999998,Carrefour Angrignon,45.44795,-73.615155,Shopping Mall
1,Station Angrignon,45.44646599999288,-73.60311799999998,Parc Angrignon,45.443001,-73.603334,Park
2,Station Angrignon,45.44646599999288,-73.60311799999998,allô! mon coco,45.448993,-73.609534,Breakfast Spot
3,Station Angrignon,45.44646599999288,-73.60311799999998,Dilallo Burger,45.450364,-73.598175,Deli / Bodega
4,Station Angrignon,45.44646599999288,-73.60311799999998,Sports Experts,45.44628,-73.614556,Sporting Goods Shop


In [54]:
print(venues_df['VenueCategory'].unique())

['Shopping Mall' 'Park' 'Breakfast Spot' 'Deli / Bodega'
 'Sporting Goods Shop' 'Discount Store' 'Furniture / Home Store'
 'Smoke Shop' 'Liquor Store' 'Clothing Store' 'Comedy Club'
 'Italian Restaurant' 'Indian Restaurant' 'Restaurant' 'Automotive Shop'
 'Food & Drink Shop' 'Bagel Shop' 'Pizza Place' 'Beer Store'
 'Grocery Store' 'Trail' 'Cheese Shop' 'Café' 'Convenience Store'
 'Hockey Arena' 'Bar' 'Market' 'Canal' 'Diner' 'Malay Restaurant'
 'Event Space' 'Bakery' 'Gym' 'Office' 'Movie Theater' 'Department Store'
 'Gourmet Shop' 'American Restaurant' 'Supermarket' 'Japanese Restaurant'
 'Burger Joint' 'Museum' 'Bookstore' 'Church' 'Art Museum'
 'Monument / Landmark' 'Skating Rink' 'Plaza' 'Performing Arts Venue'
 'Concert Hall' 'Hotel' 'Indie Movie Theater' 'Record Shop' 'Hostel'
 'Poutine Place' 'Pub' 'Gay Bar' 'Coffee Shop' 'Gastropub' 'Hot Dog Joint'
 'Fish Market' 'Health Food Store' 'Recreation Center' 'Farmers Market'
 'Rock Climbing Spot' 'Fast Food Restaurant' 'French Restau

In [104]:
venues_df = venues_df[~venues_df['VenueCategory'].isin(['Playground', 'Grocery Store', 'Convenience Store', 'Supermarket', 'Office', 'Electronics Store', 'College Gym', 'Automotive Shop', 'Auto Dealership', 'Building', 'Coworking Space', 'Pharmacy', 'College Cafeteria', 'Metro Station', 'Clothing Store', 'Furniture / Home Store', 'Department Store', 'Gym / Fitness Center', 'Residential Building (Apartment / Condo)', 'Discount Store'])]

In [105]:
print(venues_df['VenueCategory'].unique())

['Shopping Mall' 'Park' 'Breakfast Spot' 'Deli / Bodega'
 'Sporting Goods Shop' 'Smoke Shop' 'Liquor Store' 'Comedy Club'
 'Italian Restaurant' 'Indian Restaurant' 'Restaurant' 'Food & Drink Shop'
 'Bagel Shop' 'Pizza Place' 'Beer Store' 'Trail' 'Cheese Shop' 'Café'
 'Hockey Arena' 'Bar' 'Market' 'Canal' 'Diner' 'Malay Restaurant'
 'Event Space' 'Bakery' 'Gym' 'Movie Theater' 'Gourmet Shop'
 'American Restaurant' 'Japanese Restaurant' 'Burger Joint' 'Museum'
 'Bookstore' 'Church' 'Art Museum' 'Monument / Landmark' 'Skating Rink'
 'Plaza' 'Performing Arts Venue' 'Concert Hall' 'Hotel'
 'Indie Movie Theater' 'Record Shop' 'Hostel' 'Poutine Place' 'Pub'
 'Gay Bar' 'Coffee Shop' 'Gastropub' 'Hot Dog Joint' 'Fish Market'
 'Health Food Store' 'Recreation Center' 'Farmers Market'
 'Rock Climbing Spot' 'Fast Food Restaurant' 'French Restaurant'
 'Sandwich Place' 'Vegetarian / Vegan Restaurant' 'Portuguese Restaurant'
 'Modern European Restaurant' 'Irish Pub' 'Athletics & Sports' 'Garden'
 'Spo

In [106]:
venues_df.head()

Unnamed: 0,stop,Lat,Lon,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Station Angrignon,45.44646599999288,-73.60311799999998,Carrefour Angrignon,45.44795,-73.615155,Shopping Mall
1,Station Angrignon,45.44646599999288,-73.60311799999998,Parc Angrignon,45.443001,-73.603334,Park
2,Station Angrignon,45.44646599999288,-73.60311799999998,allô! mon coco,45.448993,-73.609534,Breakfast Spot
3,Station Angrignon,45.44646599999288,-73.60311799999998,Dilallo Burger,45.450364,-73.598175,Deli / Bodega
4,Station Angrignon,45.44646599999288,-73.60311799999998,Sports Experts,45.44628,-73.614556,Sporting Goods Shop


#### Get dummy variables.

In [107]:
venues_oh = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

venues_oh.head()

Unnamed: 0,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,...,Sports Club,Sri Lankan Restaurant,Sushi Restaurant,Theme Park Ride / Attraction,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [108]:
venues_oh.insert(0, 'stop', venues_df['stop'])

central_venues = venues_oh.groupby(["stop"]).mean().reset_index()

In [109]:
central_venues.head()

Unnamed: 0,stop,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,...,Sports Club,Sri Lankan Restaurant,Sushi Restaurant,Theme Park Ride / Attraction,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Yoga Studio
0,Station Acadie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,...,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.2,0.0
1,Station Angrignon,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Station Assomption,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Station Atwater,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Station Beaubien,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [110]:
areaColumns = ['stop']
freqColumns = []
for ind in np.arange(5):
    freqColumns.append('Top {}'.format(ind+1))

columns = areaColumns+freqColumns

top5_venues = pd.DataFrame(columns=columns)
top5_venues['stop'] = central_venues['stop']

top5_venues.head()

Unnamed: 0,stop,Top 1,Top 2,Top 3,Top 4,Top 5
0,Station Acadie,,,,,
1,Station Angrignon,,,,,
2,Station Assomption,,,,,
3,Station Atwater,,,,,
4,Station Beaubien,,,,,


#### Get top 5 venues for each location.

In [111]:
for ind in np.arange(central_venues.shape[0]):
    row_categories = central_venues.iloc[ind, :].iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    top5_venues.iloc[ind, 1:] = row_categories_sorted.index.values[0:5]

In [112]:
top5_venues.head()

Unnamed: 0,stop,Top 1,Top 2,Top 3,Top 4,Top 5
0,Station Acadie,Vietnamese Restaurant,Greek Restaurant,Indian Restaurant,Toy / Game Store,Sandwich Place
1,Station Angrignon,Breakfast Spot,Sporting Goods Shop,Deli / Bodega,Park,Liquor Store
2,Station Assomption,Pizza Place,Sports Club,Planetarium,Movie Theater,Skating Rink
3,Station Atwater,Pizza Place,Gourmet Shop,Bagel Shop,Museum,Japanese Restaurant
4,Station Beaubien,Beer Store,Pizza Place,Poutine Place,Bakery,Farmers Market


#### Import and run k-means clustering on the venues.

In [113]:
from sklearn.cluster import KMeans

In [114]:
cluster_df = central_venues.drop(["stop"], axis = 1)

kmeans = KMeans(n_clusters=3).fit(cluster_df)

In [115]:
cluster_df = df_metro_geo.copy()
cluster_df = pd.merge(cluster_df, central_venues, on=['stop'], how='inner')
cluster_df.head()

Unnamed: 0,stop,lat,lon,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Athletics & Sports,BBQ Joint,Bagel Shop,...,Sports Club,Sri Lankan Restaurant,Sushi Restaurant,Theme Park Ride / Attraction,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Yoga Studio
0,Station Angrignon,45.44646599999288,-73.60311799999998,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Station Monk,45.45115799999289,-73.593242,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Station Jolicoeur,45.45700999999288,-73.58169099999999,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Station Verdun,45.45944099999288,-73.57202099999999,0.0,0.0,0.0,0.0,0.0,0.0,0.125,...,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0
4,Station De l'Église,45.46189399999288,-73.567074,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0


#### Add in the cluster labels at the begining of the dataframe.

In [116]:
cluster_df["Cluster_labels"] = kmeans.labels_
cluster_df = cluster_df.join(top5_venues.set_index("stop"), on="stop")
cluster_df.head()

Unnamed: 0,stop,lat,lon,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Athletics & Sports,BBQ Joint,Bagel Shop,...,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Yoga Studio,Cluster_labels,Top 1,Top 2,Top 3,Top 4,Top 5
0,Station Angrignon,45.44646599999288,-73.60311799999998,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,2,Breakfast Spot,Sporting Goods Shop,Deli / Bodega,Park,Liquor Store
1,Station Monk,45.45115799999289,-73.593242,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,2,Comedy Club,Italian Restaurant,Park,Breakfast Spot,Deli / Bodega
2,Station Jolicoeur,45.45700999999288,-73.58169099999999,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,...,0.0,0.0,0.0,0.0,2,Park,Restaurant,Pizza Place,Food & Drink Shop,Indian Restaurant
3,Station Verdun,45.45944099999288,-73.57202099999999,0.0,0.0,0.0,0.0,0.0,0.0,0.125,...,0.0,0.0,0.0,0.0,1,Bagel Shop,Park,Restaurant,Trail,Beer Store
4,Station De l'Église,45.46189399999288,-73.567074,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0,Cheese Shop,Indian Restaurant,Restaurant,Beer Store,Café


In [117]:
cluster_df.sort_values(["Cluster_labels"], inplace=True)

mid = cluster_df['Cluster_labels']
cluster_df.drop(labels=['Cluster_labels'], axis=1,inplace = True)
cluster_df.insert(0, 'Cluster_labels', mid)
cluster_df = cluster_df[['Cluster_labels', 'stop', 'lat', 'lon', 'Top 1', 'Top 2', 'Top 3', 'Top 4', 'Top 5']]

cluster_df = cluster_df.reset_index(drop=True)
print(cluster_df.shape)
cluster_df.head()

(72, 9)


Unnamed: 0,Cluster_labels,stop,lat,lon,Top 1,Top 2,Top 3,Top 4,Top 5
0,0,Station Snowdon 5,45.48540299999285,-73.62775699999999,Sri Lankan Restaurant,Art Gallery,Filipino Restaurant,Park,Gym
1,0,Station Du Collège,45.50925799999283,-73.67479599999999,Café,Flower Shop,Italian Restaurant,Hot Dog Joint,Skating Rink
2,0,Station De la Savane,45.50005099999282,-73.66153799999998,Restaurant,Market,Indian Restaurant,Fast Food Restaurant,Middle Eastern Restaurant
3,0,Station Namur,45.49464299999285,-73.65282799999997,Restaurant,Coffee Shop,Market,Caribbean Restaurant,Fast Food Restaurant
4,0,Station Plamondon,45.49464399999284,-73.63825999999999,Food & Drink Shop,Gym,Chinese Restaurant,Caribbean Restaurant,Gas Station


#### Let's define some colors for these clusters.

In [118]:
color_list = cluster_df["Cluster_labels"]
color_df = pd.DataFrame(color_list)
color_df.rename(columns = {'Cluster_labels':'colors'}, inplace = True)

In [119]:
color_df["colors"] = color_df["colors"].replace(0, 'yellow')
color_df["colors"] = color_df["colors"].replace(1, 'red')
color_df["colors"] = color_df["colors"].replace(2, 'blue')
color_df["colors"] = color_df["colors"].replace(3, 'green')
color_df["colors"] = color_df["colors"].replace(4, 'purple')
cluster_df.insert(0, 'colors', color_df)

#### And now, let's have a look at a map of these clusters.

In [120]:
mtl_venue_map = folium.Map(location = [45.52, -73.62], zoom_start = 12, tiles = 'stamenterrain')

for row in cluster_df.itertuples():
    mtl_venue_map.add_child(folium.CircleMarker(location = [row.lat, row.lon],
                                  color = row.colors,
                                  fill = True,
                                  fill_color = row.colors,
                                  fill_opacity = 0.5,
                                  popup = [("stop:", row.stop), ("Cluster:", row.Cluster_labels)]))

mtl_venue_map

In [121]:
cluster_df.loc[cluster_df['Cluster_labels'] == 0, cluster_df.columns[[1] + [2] + list(range(5, cluster_df.shape[1]))]]

Unnamed: 0,Cluster_labels,stop,Top 1,Top 2,Top 3,Top 4,Top 5
0,0,Station Snowdon 5,Sri Lankan Restaurant,Art Gallery,Filipino Restaurant,Park,Gym
1,0,Station Du Collège,Café,Flower Shop,Italian Restaurant,Hot Dog Joint,Skating Rink
2,0,Station De la Savane,Restaurant,Market,Indian Restaurant,Fast Food Restaurant,Middle Eastern Restaurant
3,0,Station Namur,Restaurant,Coffee Shop,Market,Caribbean Restaurant,Fast Food Restaurant
4,0,Station Plamondon,Food & Drink Shop,Gym,Chinese Restaurant,Caribbean Restaurant,Gas Station
5,0,Station D'Iberville,Park,Peruvian Restaurant,Vietnamese Restaurant,Coffee Shop,Liquor Store
6,0,Station Vendôme,Bakery,BBQ Joint,Breakfast Spot,Skating Rink,Sushi Restaurant
7,0,Station Place-d'Armes,Concert Hall,Plaza,Historic Site,Church,Restaurant
8,0,Station Mont-Royal,Park,Bakery,Pastry Shop,Portuguese Restaurant,Poutine Place
9,0,Station Laurier,Bagel Shop,Bakery,Pastry Shop,Park,Portuguese Restaurant


In [122]:
cluster_df.loc[cluster_df['Cluster_labels'] == 1, cluster_df.columns[[1] + [2] + list(range(5, cluster_df.shape[1]))]]

Unnamed: 0,Cluster_labels,stop,Top 1,Top 2,Top 3,Top 4,Top 5
30,1,Station Lucien-L'Allier,Hockey Arena,Monument / Landmark,Church,Movie Theater,Restaurant
31,1,Station Guy-Concordia,Movie Theater,Hockey Arena,Gym,Church,Monument / Landmark
32,1,Station Beaubien,Beer Store,Pizza Place,Poutine Place,Bakery,Farmers Market
33,1,Station Atwater,Pizza Place,Gourmet Shop,Bagel Shop,Museum,Japanese Restaurant
34,1,Station Place-des-Arts,Concert Hall,Hotel,Performing Arts Venue,Bookstore,Movie Theater
35,1,Station Lionel-Groulx,American Restaurant,Park,Gourmet Shop,Canal,Movie Theater
36,1,Station Champ-de-Mars,Plaza,Concert Hall,Historic Site,Performing Arts Venue,Harbor / Marina
37,1,Station Montmorency,Hockey Arena,Concert Hall,Shopping Mall,Bookstore,Coffee Shop
38,1,Station Bonaventure,Hockey Arena,Art Museum,Gym,Church,Plaza
39,1,Station Jarry,Breakfast Spot,Park,Portuguese Restaurant,Bakery,Fast Food Restaurant


In [123]:
cluster_df.loc[cluster_df['Cluster_labels'] == 2, cluster_df.columns[[1] + [2] + list(range(5, cluster_df.shape[1]))]]

Unnamed: 0,Cluster_labels,stop,Top 1,Top 2,Top 3,Top 4,Top 5
48,2,Station LongueuilUniversité-de-Sherbrooke,Theme Park Ride / Attraction,Harbor / Marina,Diner,Sushi Restaurant,Middle Eastern Restaurant
49,2,Station De Castelnau,Vietnamese Restaurant,Bakery,Park,Farmers Market,Café
50,2,Station Jean-Drapeau,French Restaurant,Historic Site,Japanese Restaurant,Park,Casino
51,2,Station Monk,Comedy Club,Italian Restaurant,Park,Breakfast Spot,Deli / Bodega
52,2,Station Cartier,Park,Hockey Arena,Sushi Restaurant,Fast Food Restaurant,Coffee Shop
53,2,Station Jolicoeur,Park,Restaurant,Pizza Place,Food & Drink Shop,Indian Restaurant
54,2,Station Assomption,Pizza Place,Sports Club,Planetarium,Movie Theater,Skating Rink
55,2,Station McGill,Hockey Arena,Shopping Mall,Gym,Movie Theater,Performing Arts Venue
56,2,Station Pie-IX,Performing Arts Venue,Park,Athletics & Sports,Bakery,Farmers Market
57,2,Station Côte-Vertu,Italian Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Café,Hot Dog Joint
