<a href="https://colab.research.google.com/github/brainchild-vc/Coursera_Capstone/blob/master/Capstone_Project_Paris.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Paris: clustering neighborhoods by Clothing store category (Week 1)

#### Part 1 - description of the problem and a discussion of the background

Paris is one of the fashion capitals of this world. Many fashion retailers that want to open a boutique in Paris will want to know where the best location could be for their specific fashion business and category.

Therefore the goal of this analysis will be to:
- cluster neighborhoods by category of fashion retailers

This analysis will show clusters of specific retail categories. Retailers will be able to see, which area should be best for opening a store with a given clothing type, e.g. Women's Clothing, Shoes etc.

#### Part 2 - description of the data and how it will be used to solve the problem

For the above mentioned goal of this analysis we will use the following data sources:

1. Foursquare Places API: through Foursquare we will get names, locations and retail categories for all Clothings stores in Paris
2. https://en.wikipedia.org/wiki/Arrondissements_of_Paris : from this page we will read out names, population density and any other information about the arrondissements of Paris that might be helpful for our analysis

# Paris: clustering neighborhoods by Clothing store category (Week 2)

#### Part 1a - first we create a dataframe from our Paris Wikipage

In [0]:
import pandas as pd
import numpy as np

In [0]:
# read Wikipedia HTML

import requests
website_url = requests.get('https://en.wikipedia.org/wiki/Arrondissements_of_Paris').text

from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,'lxml')
# print(soup.prettify())

In [95]:
# read table and assign to dataframe
postal_codes = soup.find('table',{'class':'wikitable sortable'})
df = pd.read_html(str(postal_codes))[0]

df.head()

Unnamed: 0,"Arrondissement (R for Right Bank, L for Left Bank)",Name,Area (km2),Population(March 1999 census),Population(July 2005 estimate),Density (2005)(inhabitants per km2),Peak of population,Mayor
0,1st (Ier) R,Louvre,1.826 km2 (0.705 sq mi),16888,17700,9693,before 1861,Jean-François Legaret (LR)
1,2nd (IIe) R,Bourse,0.992 km2 (0.383 sq mi),19585,20700,20867,before 1861,Jacques Boutault (EELV)
2,3rd (IIIe) R,Temple,1.171 km2 (0.452 sq mi),34248,35100,29974,before 1861,Pierre Aidenbaum (PS)
3,4th (IVe) R,Hôtel-de-Ville,1.601 km2 (0.618 sq mi),30675,28600,17864,before 1861,Ariel Weil (PS)
4,5th (Ve) L,Panthéon,2.541 km2 (0.981 sq mi),58849,60600,23849,1911,Florence Berthout (LR)


# Now we clean the table
1) We need to fill in the zip code for each arrondissement. Since the number of the arrondissement is indicated by the last two digits in most Parisian postal codes (75001 up to 75020), we can just fill in the zip code

2) We will drop columns 0, 6 and 7 since we don't need them for our analysis

In [96]:
# clean table
# The number of the arrondissement is indicated by the last two digits in most Parisian postal codes (75001 up to 75020)
df['postal_code'] = range(75001, 75001+len(df))

# now we drop all unneccassry columns
df = df.drop(df.columns[[0, 6, 7]], axis = 1)

# and we will rename one of the columns
df = df.rename(columns={'Name': 'Neighborhood'})
df


Unnamed: 0,Neighborhood,Area (km2),Population(March 1999 census),Population(July 2005 estimate),Density (2005)(inhabitants per km2),postal_code
0,Louvre,1.826 km2 (0.705 sq mi),16888,17700,9693,75001
1,Bourse,0.992 km2 (0.383 sq mi),19585,20700,20867,75002
2,Temple,1.171 km2 (0.452 sq mi),34248,35100,29974,75003
3,Hôtel-de-Ville,1.601 km2 (0.618 sq mi),30675,28600,17864,75004
4,Panthéon,2.541 km2 (0.981 sq mi),58849,60600,23849,75005
5,Luxembourg,2.154 km2 (0.832 sq mi),44919,45200,20984,75006
6,Palais-Bourbon,4.088 km2 (1.578 sq mi),56985,55400,13552,75007
7,Élysée,3.881 km2 (1.498 sq mi),39314,38700,9972,75008
8,Opéra,2.179 km2 (0.841 sq mi),55838,58500,26847,75009
9,Entrepôt,2.892 km2 (1.117 sq mi),89612,88800,30705,75010


# Part 1b - now we add longitude/latitude to our dataframe using PGeocode, which will convert our zip codes to geo locations

In [97]:
!pip install pgeocode
import pgeocode



In [98]:
postal = df["postal_code"].astype(str).values.tolist()
nomi = pgeocode.Nominatim('fr')
latlon = nomi.query_postal_code(postal)
latlon = latlon.drop(latlon.columns[[1, 2, 3, 4, 5, 6, 7, 8, 11]], axis = 1)
latlon["postal_code"] = latlon["postal_code"].astype(int)
latlon

Unnamed: 0,postal_code,latitude,longitude
0,75001,48.8592,2.34525
1,75002,48.8655,2.3457
2,75003,48.8637,2.35515
3,75004,48.8601,2.34975
4,75005,48.8448,2.34795
5,75006,48.8493,2.3394
6,75007,48.8565,2.3349
7,75008,48.8763,2.33355
8,75009,48.8718,2.34435
9,75010,48.8709,2.35245


# now we merge the list of geo locations with our list of arrondissements:

In [99]:
df = pd.merge(df, latlon, how='left', left_on='postal_code', right_on='postal_code')
df

Unnamed: 0,Neighborhood,Area (km2),Population(March 1999 census),Population(July 2005 estimate),Density (2005)(inhabitants per km2),postal_code,latitude,longitude
0,Louvre,1.826 km2 (0.705 sq mi),16888,17700,9693,75001,48.8592,2.34525
1,Bourse,0.992 km2 (0.383 sq mi),19585,20700,20867,75002,48.8655,2.3457
2,Temple,1.171 km2 (0.452 sq mi),34248,35100,29974,75003,48.8637,2.35515
3,Hôtel-de-Ville,1.601 km2 (0.618 sq mi),30675,28600,17864,75004,48.8601,2.34975
4,Panthéon,2.541 km2 (0.981 sq mi),58849,60600,23849,75005,48.8448,2.34795
5,Luxembourg,2.154 km2 (0.832 sq mi),44919,45200,20984,75006,48.8493,2.3394
6,Palais-Bourbon,4.088 km2 (1.578 sq mi),56985,55400,13552,75007,48.8565,2.3349
7,Élysée,3.881 km2 (1.498 sq mi),39314,38700,9972,75008,48.8763,2.33355
8,Opéra,2.179 km2 (0.841 sq mi),55838,58500,26847,75009,48.8718,2.34435
9,Entrepôt,2.892 km2 (1.117 sq mi),89612,88800,30705,75010,48.8709,2.35245


# Part 1c - Cluster analysis of Paris arrondissements

In [0]:
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

## Getting coordinates of Paris

In [101]:
from geopy.geocoders import Nominatim

address = 'Paris, France'

geolocator = Nominatim(user_agent="fr_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Paris are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Paris are 48.8566969, 2.3514616.


## Let's visualise all neighborhoods on a map

In [102]:
# create map of Paris using latitude and longitude values
map_paris = folium.Map(location=[latitude, longitude], zoom_start=13)
# zip = [[df['latitude']], [df['longitude']], [df['Name']]]

# add markers to map 
for lat, lng, label in zip(df['latitude'], df['longitude'], df['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_paris) 
    
map_paris

In [0]:
#@title
CLIENT_ID = 'QSO024GQN1CYLNY22I51MDHXMLVYQJXAHMOCJOWVAQTVECVY' # your Foursquare ID
CLIENT_SECRET = 'S3IOYL1B5SEHONTTYCBRFG2HXUONHSGMDP2TLE15R0RQXVSO' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

# Here we will define the venue category to be only clothing stores. This will ensure that Foursquare API will only return venues that match this category:

In [0]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
category = "4bf58dd8d48988d103951735" # we only want Clothing Stores

# for a list of all categories and their IDs, please see here: https://developer.foursquare.com/docs/build-with-foursquare/categories

## We create a function to get name, location and category of all clothing stores in Paris

In [0]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            category)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

## now we create a DataFrame containing the above data

In [107]:
paris_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['latitude'],
                                   longitudes=df['longitude']
                                  )

Louvre
Bourse
Temple
Hôtel-de-Ville
Panthéon
Luxembourg
Palais-Bourbon
Élysée
Opéra
Entrepôt
Popincourt
Reuilly
Gobelins
Observatoire
Vaugirard
Passy
Batignolles-Monceau
Butte-Montmartre
Buttes-Chaumont
Ménilmontant


## Let's check the size of the resulting dataframe

In [108]:
print(paris_venues.shape)
paris_venues.head()

(614, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Louvre,48.8592,2.34525,Zara,48.859719,2.343541,Clothing Store
1,Louvre,48.8592,2.34525,L'Exception Concept Store,48.861212,2.346815,Clothing Store
2,Louvre,48.8592,2.34525,Pull&Bear,48.859618,2.344772,Clothing Store
3,Louvre,48.8592,2.34525,GAP,48.858751,2.347513,Clothing Store
4,Louvre,48.8592,2.34525,Mango,48.857864,2.350234,Women's Store


## And now we check how many venues were returned for each neighborhood

In [109]:
paris_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Batignolles-Monceau,22,22,22,22,22,22
Bourse,53,53,53,53,53,53
Butte-Montmartre,2,2,2,2,2,2
Buttes-Chaumont,4,4,4,4,4,4
Entrepôt,13,13,13,13,13,13
Gobelins,32,32,32,32,32,32
Hôtel-de-Ville,71,71,71,71,71,71
Louvre,96,96,96,96,96,96
Luxembourg,40,40,40,40,40,40
Ménilmontant,6,6,6,6,6,6


## How many unique clothing store categories can be curated from all the returned venues?

In [110]:
print('There are {} uniques categories.'.format(len(paris_venues['Venue Category'].unique())))

There are 21 uniques categories.


# Analyze each neighbordhood

In [111]:
# one hot encoding
paris_onehot = pd.get_dummies(paris_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
paris_onehot['Neighborhood'] = paris_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [paris_onehot.columns[-1]] + list(paris_onehot.columns[:-1])
paris_onehot = paris_onehot[fixed_columns]

paris_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Arts & Crafts Store,Baby Store,Boutique,Clothing Store,Department Store,Furniture / Home Store,Jewelry Store,Kids Store,Lingerie Store,Luggage Store,Men's Store,Miscellaneous Shop,Optical Shop,Paper / Office Supplies Store,Perfume Shop,Shoe Store,Sporting Goods Shop,Tailor Shop,Women's Store
0,Louvre,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Louvre,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Louvre,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Louvre,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Louvre,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


## examine the new dataframe size

In [112]:
paris_onehot.shape

(614, 22)

## Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [113]:
paris_grouped = paris_onehot.groupby('Neighborhood').mean().reset_index()
paris_grouped

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Arts & Crafts Store,Baby Store,Boutique,Clothing Store,Department Store,Furniture / Home Store,Jewelry Store,Kids Store,Lingerie Store,Luggage Store,Men's Store,Miscellaneous Shop,Optical Shop,Paper / Office Supplies Store,Perfume Shop,Shoe Store,Sporting Goods Shop,Tailor Shop,Women's Store
0,Batignolles-Monceau,0.0,0.0,0.090909,0.0,0.136364,0.272727,0.0,0.0,0.0,0.045455,0.045455,0.0,0.090909,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.227273
1,Bourse,0.018868,0.0,0.0,0.0,0.132075,0.320755,0.0,0.0,0.0,0.075472,0.0,0.0,0.113208,0.0,0.018868,0.0,0.0,0.113208,0.0,0.018868,0.188679
2,Butte-Montmartre,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0
3,Buttes-Chaumont,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
4,Entrepôt,0.076923,0.0,0.0,0.0,0.153846,0.307692,0.0,0.0,0.0,0.0,0.0,0.0,0.153846,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.307692
5,Gobelins,0.0,0.0,0.0,0.03125,0.0625,0.25,0.0,0.0,0.0,0.15625,0.09375,0.0,0.15625,0.0,0.03125,0.0,0.0,0.1875,0.0,0.0,0.03125
6,Hôtel-de-Ville,0.014085,0.014085,0.0,0.0,0.140845,0.352113,0.0,0.0,0.014085,0.042254,0.028169,0.0,0.126761,0.0,0.014085,0.0,0.0,0.183099,0.0,0.0,0.070423
7,Louvre,0.03125,0.010417,0.0,0.0,0.072917,0.3125,0.0,0.0,0.0,0.0625,0.072917,0.0,0.09375,0.0,0.010417,0.0,0.0,0.21875,0.010417,0.010417,0.09375
8,Luxembourg,0.025,0.0,0.0,0.025,0.05,0.325,0.0,0.0,0.0,0.025,0.025,0.0,0.225,0.0,0.025,0.0,0.025,0.125,0.0,0.0,0.125
9,Ménilmontant,0.0,0.0,0.0,0.0,0.0,0.833333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0


## Let's confirm the new size

In [114]:
paris_grouped.shape

(20, 22)

## Let's print each neighborhood along with the top 5 most common venues

In [115]:
num_top_venues = 5

for hood in paris_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = paris_grouped[paris_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Batignolles-Monceau----
                 venue  freq
0       Clothing Store  0.27
1        Women's Store  0.23
2             Boutique  0.14
3  Arts & Crafts Store  0.09
4          Men's Store  0.09


----Bourse----
            venue  freq
0  Clothing Store  0.32
1   Women's Store  0.19
2        Boutique  0.13
3      Shoe Store  0.11
4     Men's Store  0.11


----Butte-Montmartre----
               venue  freq
0       Optical Shop   0.5
1         Kids Store   0.5
2  Accessories Store   0.0
3      Luggage Store   0.0
4        Tailor Shop   0.0


----Buttes-Chaumont----
                           venue  freq
0                 Clothing Store  0.50
1  Paper / Office Supplies Store  0.25
2                     Kids Store  0.25
3              Accessories Store  0.00
4                    Men's Store  0.00


----Entrepôt----
               venue  freq
0      Women's Store  0.31
1     Clothing Store  0.31
2           Boutique  0.15
3        Men's Store  0.15
4  Accessories Store  0.08


----G

## Let's put that into a *pandas* dataframe with top 10 venues for each neighborhood

In [0]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [119]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = paris_grouped['Neighborhood']

for ind in np.arange(paris_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(paris_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Batignolles-Monceau,Clothing Store,Women's Store,Boutique,Men's Store,Arts & Crafts Store,Kids Store,Tailor Shop,Lingerie Store,Optical Shop,Miscellaneous Shop
1,Bourse,Clothing Store,Women's Store,Boutique,Men's Store,Shoe Store,Kids Store,Tailor Shop,Accessories Store,Optical Shop,Miscellaneous Shop
2,Butte-Montmartre,Kids Store,Optical Shop,Women's Store,Adult Boutique,Arts & Crafts Store,Baby Store,Boutique,Clothing Store,Department Store,Furniture / Home Store
3,Buttes-Chaumont,Clothing Store,Kids Store,Paper / Office Supplies Store,Women's Store,Adult Boutique,Arts & Crafts Store,Baby Store,Boutique,Department Store,Furniture / Home Store
4,Entrepôt,Women's Store,Clothing Store,Men's Store,Boutique,Accessories Store,Paper / Office Supplies Store,Optical Shop,Miscellaneous Shop,Perfume Shop,Luggage Store
5,Gobelins,Clothing Store,Shoe Store,Men's Store,Kids Store,Lingerie Store,Boutique,Baby Store,Women's Store,Optical Shop,Miscellaneous Shop
6,Hôtel-de-Ville,Clothing Store,Shoe Store,Boutique,Men's Store,Women's Store,Kids Store,Lingerie Store,Optical Shop,Jewelry Store,Adult Boutique
7,Louvre,Clothing Store,Shoe Store,Women's Store,Men's Store,Boutique,Lingerie Store,Kids Store,Accessories Store,Optical Shop,Tailor Shop
8,Luxembourg,Clothing Store,Men's Store,Women's Store,Shoe Store,Boutique,Baby Store,Kids Store,Lingerie Store,Optical Shop,Perfume Shop
9,Ménilmontant,Clothing Store,Shoe Store,Women's Store,Kids Store,Adult Boutique,Arts & Crafts Store,Baby Store,Boutique,Department Store,Furniture / Home Store


# 4. Cluster neighborhoods into 5 clusters

Run *k*-means to cluster the neighborhood into 5 clusters.

In [120]:
# set number of clusters
kclusters = 5

paris_grouped_clustering = paris_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(paris_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 2, 0, 0, 0, 0, 0, 0, 3], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [121]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

paris_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
paris_merged = paris_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

paris_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Area (km2),Population(March 1999 census),Population(July 2005 estimate),Density (2005)(inhabitants per km2),postal_code,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Louvre,1.826 km2 (0.705 sq mi),16888,17700,9693,75001,48.8592,2.34525,0,Clothing Store,Shoe Store,Women's Store,Men's Store,Boutique,Lingerie Store,Kids Store,Accessories Store,Optical Shop,Tailor Shop
1,Bourse,0.992 km2 (0.383 sq mi),19585,20700,20867,75002,48.8655,2.3457,0,Clothing Store,Women's Store,Boutique,Men's Store,Shoe Store,Kids Store,Tailor Shop,Accessories Store,Optical Shop,Miscellaneous Shop
2,Temple,1.171 km2 (0.452 sq mi),34248,35100,29974,75003,48.8637,2.35515,0,Clothing Store,Men's Store,Shoe Store,Women's Store,Boutique,Furniture / Home Store,Optical Shop,Kids Store,Jewelry Store,Adult Boutique
3,Hôtel-de-Ville,1.601 km2 (0.618 sq mi),30675,28600,17864,75004,48.8601,2.34975,0,Clothing Store,Shoe Store,Boutique,Men's Store,Women's Store,Kids Store,Lingerie Store,Optical Shop,Jewelry Store,Adult Boutique
4,Panthéon,2.541 km2 (0.981 sq mi),58849,60600,23849,75005,48.8448,2.34795,1,Shoe Store,Men's Store,Women's Store,Jewelry Store,Adult Boutique,Arts & Crafts Store,Baby Store,Boutique,Clothing Store,Department Store


## Finally, let's visualize the resulting clusters

In [122]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(paris_merged['latitude'], paris_merged['longitude'], paris_merged['Neighborhood'], paris_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# 5. Examine clusters

## Cluster 1: this cluster would generally the best choice for Fashion retailers to open a store in Paris

In [128]:
paris_merged.loc[paris_merged['Cluster Labels'] == 0, paris_merged.columns[[1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Area (km2),postal_code,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1.826 km2 (0.705 sq mi),75001,48.8592,2.34525,0,Clothing Store,Shoe Store,Women's Store,Men's Store,Boutique,Lingerie Store,Kids Store,Accessories Store,Optical Shop,Tailor Shop
1,0.992 km2 (0.383 sq mi),75002,48.8655,2.3457,0,Clothing Store,Women's Store,Boutique,Men's Store,Shoe Store,Kids Store,Tailor Shop,Accessories Store,Optical Shop,Miscellaneous Shop
2,1.171 km2 (0.452 sq mi),75003,48.8637,2.35515,0,Clothing Store,Men's Store,Shoe Store,Women's Store,Boutique,Furniture / Home Store,Optical Shop,Kids Store,Jewelry Store,Adult Boutique
3,1.601 km2 (0.618 sq mi),75004,48.8601,2.34975,0,Clothing Store,Shoe Store,Boutique,Men's Store,Women's Store,Kids Store,Lingerie Store,Optical Shop,Jewelry Store,Adult Boutique
5,2.154 km2 (0.832 sq mi),75006,48.8493,2.3394,0,Clothing Store,Men's Store,Women's Store,Shoe Store,Boutique,Baby Store,Kids Store,Lingerie Store,Optical Shop,Perfume Shop
6,4.088 km2 (1.578 sq mi),75007,48.8565,2.3349,0,Clothing Store,Boutique,Women's Store,Men's Store,Accessories Store,Kids Store,Shoe Store,Baby Store,Paper / Office Supplies Store,Optical Shop
7,3.881 km2 (1.498 sq mi),75008,48.8763,2.33355,0,Clothing Store,Women's Store,Boutique,Accessories Store,Shoe Store,Kids Store,Men's Store,Lingerie Store,Arts & Crafts Store,Sporting Goods Shop
8,2.179 km2 (0.841 sq mi),75009,48.8718,2.34435,0,Boutique,Shoe Store,Men's Store,Women's Store,Clothing Store,Optical Shop,Miscellaneous Shop,Tailor Shop,Furniture / Home Store,Adult Boutique
9,2.892 km2 (1.117 sq mi),75010,48.8709,2.35245,0,Women's Store,Clothing Store,Men's Store,Boutique,Accessories Store,Paper / Office Supplies Store,Optical Shop,Miscellaneous Shop,Perfume Shop,Luggage Store
10,3.666 km2 (1.415 sq mi),75011,48.8574,2.36415,0,Clothing Store,Boutique,Women's Store,Shoe Store,Men's Store,Kids Store,Baby Store,Lingerie Store,Sporting Goods Shop,Miscellaneous Shop


## Cluster 2: this cluster would be great for opening a shoe store

In [130]:
paris_merged.loc[paris_merged['Cluster Labels'] == 1, paris_merged.columns[[1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Area (km2),postal_code,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,2.541 km2 (0.981 sq mi),75005,48.8448,2.34795,1,Shoe Store,Men's Store,Women's Store,Jewelry Store,Adult Boutique,Arts & Crafts Store,Baby Store,Boutique,Clothing Store,Department Store
15,16.305 km2 (6.295 sq mi)³7.846 km2 (3.029 sq mi)4,75016,48.8637,2.31285,1,Kids Store,Shoe Store,Women's Store,Adult Boutique,Arts & Crafts Store,Baby Store,Boutique,Clothing Store,Department Store,Furniture / Home Store


## Cluster 3: we can see that this cluster would be great to open clothing stores for kids

In [131]:
paris_merged.loc[paris_merged['Cluster Labels'] == 2, paris_merged.columns[[1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Area (km2),postal_code,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,6.005 km2 (2.319 sq mi),75018,48.8925,2.3466,2,Kids Store,Optical Shop,Women's Store,Adult Boutique,Arts & Crafts Store,Baby Store,Boutique,Clothing Store,Department Store,Furniture / Home Store


## Cluster 4

In [132]:
paris_merged.loc[paris_merged['Cluster Labels'] == 3, paris_merged.columns[[1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Area (km2),postal_code,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,5.984 km2 (2.310 sq mi),75020,48.8646,2.3736,3,Clothing Store,Shoe Store,Women's Store,Kids Store,Adult Boutique,Arts & Crafts Store,Baby Store,Boutique,Department Store,Furniture / Home Store


## Cluster 5

In [133]:
paris_merged.loc[paris_merged['Cluster Labels'] == 4, paris_merged.columns[[1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Area (km2),postal_code,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,5.621 km2 (2.170 sq mi),75014,48.8331,2.3376,4,Women's Store,Accessories Store,Sporting Goods Shop,Jewelry Store,Adult Boutique,Arts & Crafts Store,Baby Store,Boutique,Clothing Store,Department Store
