# Introduction/Business Problem

Given a specific borough of a city chosen, namely Rabat, Morocco's Hay Riad. Can we explain the prices of renting an appartment by a similarity with prestigious areas of Paris, France.
If so, then everything would make sense. To do so, we need to compare and cluster Paris's arrondissments with Hay Riad included so we get to know which areas of Paris are similar to Hay Riad, based on the Foursquare dataset.

First things first, we need to import the libraries to work with

In [98]:
import pandas as pd
import numpy as np
import requests
import json

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# import k-means from clustering stage
from sklearn.cluster import KMeans

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

import folium

In [4]:
paris_geojson = requests.get("https://raw.githubusercontent.com/codeforamerica/click_that_hood/master/public/data/paris.geojson").json()

We need to turn `.geojson` file into pandas' `DataFrame` object.

## Data
Here so far, the data taken are arrondissements data that we need to put in the right format.  
We select the center of magnitude as the main area's location latitude+longitude

In [64]:
dfParis = pd.DataFrame(columns=['Postal Code', 'Borough', 'Latitude', 'Longitude'])

import statistics

for feature in paris_geojson['features']:
    #print(json.dumps(feature,indent=2))
    code = feature['properties']['cartodb_id']
    borough = feature['properties']['name']
    equilibrium = feature["geometry"]["coordinates"][0][0]
    #print(json.dumps(equilibrium, indent=2))
    def select(obj, idx):
        for o in obj:
            yield o[idx]
    longitude = statistics.mean(list(select(equilibrium,0)))
    latitude = statistics.mean(list(select(equilibrium,1)))
    
    value = {'Postal Code':75000 + code, 'Borough': borough, 'Latitude': latitude, 'Longitude': longitude}
    print(json.dumps(value, indent = 2))
    dfParis = dfParis.append(value, ignore_index=True)

dfParis = dfParis.append({'Postal Code':10000, 'Borough': 'Hay Riad', 'Latitude':33.958752, 'Longitude':-6.871090}, ignore_index=True)

{
  "Postal Code": 75002,
  "Borough": "Bourse",
  "Latitude": 48.869408125,
  "Longitude": 2.3399715
}
{
  "Postal Code": 75003,
  "Borough": "Temple",
  "Latitude": 48.86219675,
  "Longitude": 2.3603217499999998
}
{
  "Postal Code": 75005,
  "Borough": "Panth\u00e9on",
  "Latitude": 48.84591771428572,
  "Longitude": 2.3516000714285714
}
{
  "Postal Code": 75006,
  "Borough": "Luxembourg",
  "Latitude": 48.85151675,
  "Longitude": 2.3316077500000003
}
{
  "Postal Code": 75007,
  "Borough": "Palais-Bourbon",
  "Latitude": 48.853572176470585,
  "Longitude": 2.3168273529411763
}
{
  "Postal Code": 75008,
  "Borough": "\u00c9lys\u00e9e",
  "Latitude": 48.87250172413793,
  "Longitude": 2.3101774827586206
}
{
  "Postal Code": 75009,
  "Borough": "Op\u00e9ra",
  "Latitude": 48.87732752380953,
  "Longitude": 2.3366013333333333
}
{
  "Postal Code": 75010,
  "Borough": "Enclos-St-Laurent",
  "Latitude": 48.87698195454546,
  "Longitude": 2.3612876363636364
}
{
  "Postal Code": 75011,
  "Borough"

## Methodology
What we achieved so far is to include a position for each arrondissment of Paris, they were given as the barycenters from a geojson served polygons. Since the vicinity is small, the barycenter as mean of the values of latitudes and longitudes independently is Ok...  
We can see it in the map, further below. For the map, even Rabat's Hay Riad is available with its appropriate Postal Code

In [65]:
dfParis

Unnamed: 0,Postal Code,Borough,Latitude,Longitude
0,75002,Bourse,48.869408,2.339971
1,75003,Temple,48.862197,2.360322
2,75005,Panthéon,48.845918,2.3516
3,75006,Luxembourg,48.851517,2.331608
4,75007,Palais-Bourbon,48.853572,2.316827
5,75008,Élysée,48.872502,2.310177
6,75009,Opéra,48.877328,2.336601
7,75010,Enclos-St-Laurent,48.876982,2.361288
8,75011,Popincourt,48.858493,2.381793
9,75013,Gobelins,48.827353,2.353338


In [66]:
# create map of Paris using latitude and longitude values
map_paris = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(dfParis['Latitude'], dfParis['Longitude'], dfParis['Borough'], dfParis['Postal Code']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_paris)  
    
map_paris

## Data available
The data available for now are location data for each and every center of interest in Paris surroundings. All in all, we have 20 locations of reference and 1 test center that will be mapped to the clusters that will be made through the experiment. We will try and query data from Foursquare concerning those surroundings.

In [67]:
neighborhood_latitude = dfParis.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = dfParis.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = dfParis.loc[0, 'Borough'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Bourse are 48.869408125, 2.3399715.


In [68]:
# type your answer here
latlong = str(neighborhood_latitude) + ',' + str(neighborhood_longitude)

params= {
    'client_id': input('Client ID?'),
    'client_secret': input('Client SECRET?'),
    'v': '20180323',
    'll': latlong,
    'radius': 500,
    'offset': 0,
    'limit': 50
}

url = 'http://api.foursquare.com/v2/venues/explore'

part1 = requests.get(url, params=params)

params['offset']=50

part2 = requests.get(url, params=params)

Client ID?0GN5JVKHW1XDZC425EYGS5CVXLZA33KTKSZ2I30QQXA5MJ0J
Client SECRET?TBPAQ4WST1QKAEUVRUCH0SN3DUXQ2Z1IC5QUIQF2CJKVISEI


In [69]:
part1.json()

{'meta': {'code': 200, 'requestId': '5caf63acdd579719753fea89'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Vivienne',
  'headerFullLocation': 'Vivienne, Paris',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 198,
  'suggestedBounds': {'ne': {'lat': 48.8739081295, 'lng': 2.3467999525805046},
   'sw': {'lat': 48.8649081205, 'lng': 2.333143047419495}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bc324a874a9a5936c18d4f6',
       'name': 'Le Moderne',
       'contact': {},
       'location': {'address': '40 rue Notre Dame des Victoires',
        'lat': 48.868856,
        'lng': 2.342142,
        'labeledLatLngs': [{'label': 'display',
          'lat': 48.868

Here we add helper functions that will help parse the data, so that every time we call the Foursquare API, these functions help populate our dataset

In [70]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [74]:
results = part1.json()

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Le Moderne,French Restaurant,48.868856,2.342142
1,A. Noste,Tapas Restaurant,48.869122,2.339138
2,Coinstot Vino,Wine Bar,48.870646,2.341688
3,Les Athlètes,French Restaurant,48.869516,2.339537
4,Workshop Issé,Gourmet Shop,48.868895,2.337066


In [75]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

50 venues were returned by Foursquare.


### Another helper function
Here, the objective is to crawl explore function of Foursquare API over all the data set

In [None]:
CLIENT_ID = input('Client ID?')
CLIENT_SECRET = input('Client SECRET?')
VERSION = '20180323'
LIMIT=100

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
del CLIENT_ID
del CLIENT_SECRET

### Results

Below are the resulting data sets comprised of the results of queries to the Foursquare database.  
They are ready for wrangling, put aside the venues categories, then merge as top 10 venues by frequency.

In [77]:
LIMIT = 100

paris_venues = getNearbyVenues(names=dfParis['Borough'],
                                   latitudes=dfParis['Latitude'],
                                   longitudes=dfParis['Longitude']
                                  )

Bourse
Temple
Panthéon
Luxembourg
Palais-Bourbon
Élysée
Opéra
Enclos-St-Laurent
Popincourt
Gobelins
Observatoire
Vaugirard
Passy
Batignolles-Monceau
Butte-Montmartre
Buttes-Chaumont
Louvre
Hôtel-de-Ville
Reuilly
Ménilmontant
Hay Riad


We observe that even Hay Riad has been handled with 26 venues explored.
The question that remains is:
* What district of Paris resembles the Hay Riad output?

In [82]:
paris_venues[paris_venues['Neighborhood'] == 'Hay Riad']#.shape

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
1441,Hay Riad,33.958752,-6.87109,Grillade Adil,33.957243,-6.872676,Snack Place
1442,Hay Riad,33.958752,-6.87109,Yoka Sushi,33.957164,-6.872755,Sushi Restaurant
1443,Hay Riad,33.958752,-6.87109,La Grillardière,33.958091,-6.872575,Sandwich Place
1444,Hay Riad,33.958752,-6.87109,Starbucks,33.956639,-6.867852,Coffee Shop
1445,Hay Riad,33.958752,-6.87109,Mahaj Ryad,33.960149,-6.867783,Plaza
1446,Hay Riad,33.958752,-6.87109,Gotham Burger,33.96106,-6.866507,American Restaurant
1447,Hay Riad,33.958752,-6.87109,Twin Apple,33.956526,-6.867844,Café
1448,Hay Riad,33.958752,-6.87109,Sushi Box,33.95989,-6.867686,Sushi Restaurant
1449,Hay Riad,33.958752,-6.87109,NAGA,33.959313,-6.872653,Thai Restaurant
1450,Hay Riad,33.958752,-6.87109,Paul,33.960088,-6.868079,Bakery


In [83]:
paris_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Batignolles-Monceau,59,59,59,59,59,59
Bourse,100,100,100,100,100,100
Butte-Montmartre,58,58,58,58,58,58
Buttes-Chaumont,44,44,44,44,44,44
Enclos-St-Laurent,88,88,88,88,88,88
Gobelins,100,100,100,100,100,100
Hay Riad,26,26,26,26,26,26
Hôtel-de-Ville,100,100,100,100,100,100
Louvre,100,100,100,100,100,100
Luxembourg,100,100,100,100,100,100


In [84]:
print('There are {} uniques categories.'.format(len(paris_venues['Venue Category'].unique())))

There are 199 uniques categories.


## Analyze each neighbourhood

In [85]:
# one hot encoding
paris_onehot = pd.get_dummies(paris_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
paris_onehot['Neighborhood'] = paris_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [paris_onehot.columns[-1]] + list(paris_onehot.columns[:-1])
paris_onehot = paris_onehot[fixed_columns]

paris_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo
0,Bourse,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Bourse,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Bourse,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
3,Bourse,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Bourse,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [87]:
paris_grouped = paris_onehot.groupby('Neighborhood').mean().reset_index()
paris_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo
0,Batignolles-Monceau,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0
1,Bourse,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0
2,Butte-Montmartre,0.0,0.0,0.0,0.0,0.0,0.017241,0.034483,0.0,0.0,...,0.0,0.034483,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0
3,Buttes-Chaumont,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727,0.0,0.0,0.0
4,Enclos-St-Laurent,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.011364,0.0,0.0,0.0,0.011364,0.022727,0.0,0.0,0.0
5,Gobelins,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,...,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0
6,Hay Riad,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Hôtel-de-Ville,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,...,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0
8,Louvre,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,...,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0
9,Luxembourg,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.01,0.0,...,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.03,0.0,0.0


### Let's display Top 5 venues for each neighbourhood

In [88]:
num_top_venues = 5

for hood in paris_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = paris_grouped[paris_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Batignolles-Monceau----
                venue  freq
0   French Restaurant  0.19
1               Hotel  0.15
2  Italian Restaurant  0.10
3              Bakery  0.08
4                Café  0.05


----Bourse----
                 venue  freq
0    French Restaurant  0.15
1  Japanese Restaurant  0.08
2               Bistro  0.05
3                Hotel  0.05
4             Wine Bar  0.03


----Butte-Montmartre----
                           venue  freq
0                            Bar  0.17
1              French Restaurant  0.14
2                    Coffee Shop  0.05
3  Vegetarian / Vegan Restaurant  0.03
4              Convenience Store  0.03


----Buttes-Chaumont----
                venue  freq
0   French Restaurant  0.11
1                 Bar  0.11
2          Restaurant  0.07
3             Brewery  0.05
4  Seafood Restaurant  0.05


----Enclos-St-Laurent----
               venue  freq
0  French Restaurant  0.17
1        Coffee Shop  0.06
2              Hotel  0.06
3                Bar  

Now let's create the new dataframe and display the top 10 venues for each neighborhood.
Notice there's a helper function `return_most_common_venues` that simplifies the job for us.

In [90]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [116]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = paris_grouped['Neighborhood']

for ind in np.arange(paris_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(paris_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Batignolles-Monceau,French Restaurant,Hotel,Italian Restaurant,Bakery,Café,Bistro,Plaza,Bar,Japanese Restaurant,Supermarket
1,Bourse,French Restaurant,Japanese Restaurant,Hotel,Bistro,Bar,Wine Bar,Italian Restaurant,Salad Place,Plaza,Furniture / Home Store
2,Butte-Montmartre,Bar,French Restaurant,Coffee Shop,Pizza Place,Vegetarian / Vegan Restaurant,Bakery,Fast Food Restaurant,Art Gallery,Convenience Store,Bistro
3,Buttes-Chaumont,French Restaurant,Bar,Restaurant,Italian Restaurant,Seafood Restaurant,Japanese Restaurant,Supermarket,Brewery,Metro Station,Bus Stop
4,Enclos-St-Laurent,French Restaurant,Coffee Shop,Hotel,Bar,Thai Restaurant,Indian Restaurant,Japanese Restaurant,Italian Restaurant,Pizza Place,Bistro


#### Cluster neighbourhoods

In [117]:
# set number of clusters
kclusters = 5

paris_grouped_clustering = paris_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(paris_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 2, 2, 2, 2, 2, 4, 2, 2, 2], dtype=int32)

In [118]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

paris_merged = paris_data = dfParis

# merge paris_grouped with paris_data to add latitude/longitude for each neighborhood
paris_merged = paris_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Borough')

paris_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,75002,Bourse,48.869408,2.339971,2,French Restaurant,Japanese Restaurant,Hotel,Bistro,Bar,Wine Bar,Italian Restaurant,Salad Place,Plaza,Furniture / Home Store
1,75003,Temple,48.862197,2.360322,2,French Restaurant,Art Gallery,Boutique,Coffee Shop,Bistro,Bakery,Café,Burger Joint,Restaurant,Sandwich Place
2,75005,Panthéon,48.845918,2.3516,2,French Restaurant,Bar,Bakery,Plaza,Wine Bar,Italian Restaurant,Hotel,Pub,Coffee Shop,Vietnamese Restaurant
3,75006,Luxembourg,48.851517,2.331608,2,French Restaurant,Hotel,Italian Restaurant,Café,Plaza,Chocolate Shop,Bakery,Tea Room,Pastry Shop,Dessert Shop
4,75007,Palais-Bourbon,48.853572,2.316827,3,French Restaurant,History Museum,Café,Garden,Historic Site,Plaza,Hotel,Italian Restaurant,Japanese Restaurant,Park


Let's visualize the clusters

In [120]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(paris_merged['Latitude'], paris_merged['Longitude'], paris_merged['Borough'], paris_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Discussions

### As a result, zooming out and checking Rabat Hay Riad results:

Rabat Hay Riad, ended much like 75012(Reuilly) in its own cluster.
The other centers of Paris are differentiated and share 1 of 2 types of clusters that will be examined. The 7th and 16th arrondissement have made it of theirs to be in a single cluster, which totally makes sense, since they are the touristic Top1 areas of Paris. Otherwise, Paris is a lot differentiated East-West wise.

In the following section, we display by cluster the groups that have been identified to see what common elements differentiate them from the other results

In [121]:
#Cluster index 0: generic, +zoo area
paris_merged.loc[paris_merged['Cluster Labels'] == 0, paris_merged.columns[[1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Reuilly,Zoo,Pastry Shop,Cafeteria,Athletics & Sports,Monument / Landmark,Creperie,Electronics Store,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


In [122]:
#Cluster index 1: hotels
paris_merged.loc[paris_merged['Cluster Labels'] == 1, paris_merged.columns[[1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Élysée,Hotel,French Restaurant,Italian Restaurant,Bakery,Café,Cosmetics Shop,Salad Place,Japanese Restaurant,Art Gallery,Steakhouse
10,Observatoire,Hotel,Bistro,Italian Restaurant,Sushi Restaurant,Bar,French Restaurant,Supermarket,Middle Eastern Restaurant,Bike Rental / Bike Share,Convenience Store
11,Vaugirard,French Restaurant,Hotel,Italian Restaurant,Bakery,Restaurant,Indian Restaurant,Lebanese Restaurant,Bike Rental / Bike Share,Japanese Restaurant,Coffee Shop
13,Batignolles-Monceau,French Restaurant,Hotel,Italian Restaurant,Bakery,Café,Bistro,Plaza,Bar,Japanese Restaurant,Supermarket


In [128]:
#Cluster index 2: restaurants (french)
paris_merged.loc[paris_merged['Cluster Labels'] == 2, paris_merged.columns[[1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bourse,French Restaurant,Japanese Restaurant,Hotel,Bistro,Bar,Wine Bar,Italian Restaurant,Salad Place,Plaza,Furniture / Home Store
1,Temple,French Restaurant,Art Gallery,Boutique,Coffee Shop,Bistro,Bakery,Café,Burger Joint,Restaurant,Sandwich Place
2,Panthéon,French Restaurant,Bar,Bakery,Plaza,Wine Bar,Italian Restaurant,Hotel,Pub,Coffee Shop,Vietnamese Restaurant
3,Luxembourg,French Restaurant,Hotel,Italian Restaurant,Café,Plaza,Chocolate Shop,Bakery,Tea Room,Pastry Shop,Dessert Shop
6,Opéra,French Restaurant,Hotel,Bistro,Cocktail Bar,Italian Restaurant,Bar,Japanese Restaurant,Café,Lounge,Theater
7,Enclos-St-Laurent,French Restaurant,Coffee Shop,Hotel,Bar,Thai Restaurant,Indian Restaurant,Japanese Restaurant,Italian Restaurant,Pizza Place,Bistro
8,Popincourt,French Restaurant,Bar,Restaurant,Bistro,Pastry Shop,Diner,Cocktail Bar,Italian Restaurant,Japanese Restaurant,Wine Bar
9,Gobelins,French Restaurant,Vietnamese Restaurant,Bar,Thai Restaurant,Bakery,Hotel,Bistro,Japanese Restaurant,Italian Restaurant,Diner
14,Butte-Montmartre,Bar,French Restaurant,Coffee Shop,Pizza Place,Vegetarian / Vegan Restaurant,Bakery,Fast Food Restaurant,Art Gallery,Convenience Store,Bistro
15,Buttes-Chaumont,French Restaurant,Bar,Restaurant,Italian Restaurant,Seafood Restaurant,Japanese Restaurant,Supermarket,Brewery,Metro Station,Bus Stop


In [127]:
#Cluster index 3: Gardens
paris_merged.loc[paris_merged['Cluster Labels'] == 3, paris_merged.columns[[1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Palais-Bourbon,French Restaurant,History Museum,Café,Garden,Historic Site,Plaza,Hotel,Italian Restaurant,Japanese Restaurant,Park
12,Passy,French Restaurant,Plaza,Italian Restaurant,Garden,Tea Room,Brasserie,Soccer Stadium,Bistro,Bike Rental / Bike Share,Bus Stop


In [126]:
#Cluster index 4: food food food
paris_merged.loc[paris_merged['Cluster Labels'] == 4, paris_merged.columns[[1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,Hay Riad,Café,Ice Cream Shop,Snack Place,Sandwich Place,Italian Restaurant,Sushi Restaurant,American Restaurant,Pizza Place,Coffee Shop,Japanese Restaurant
