<a href="https://colab.research.google.com/github/TharindaDilshan/Coursera_Capstone/blob/main/Capstone%20Project/Paris_Tour_Guide.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Paris Tour Guide**

## Exploring the City of Light


---


### Introduction
Paris is a popular tourist destination for all types of tourists and it is also called the City of Lights. The city attracts millions of tourists every year and it features grandiose monuments such as Arc de Triomphe, Eiffel Tower, and so on. The city has a romantic charm and it is filled with a plethora of activities that you can try out.

### Business Problem
The intention of this project is to help tourists explore Paris depending on the experiences the neighborhoods in Paris has to offer. This project can later be extended in the future to generalize results, so that it is possible for tourists to get details about other interesting tourist destinations as well.

### Data Description
The neighborhoods, boroughs, and venues are derived using the postal codes of Paris.

Data source: https://www.data.gouv.fr/fr/datasets/r/e88c6fda-1d09-42a0-a069-606d3259114e

The above source contains data related to all the neighborhoods in France. For the purpose of this project, only neighborhoods in Paris will be considered for now.

The source returns a JSON file that contains the following data,

* postal_code : Postal codes for France
* nom_comm : Neighbourhoods in France
* nom_dept : Boroughs(towns)
* geo_point_2d : latitude and longitude tuple of the Neighbourhoods

### Foursquare API Usage
Foursquare API will be used to retrieve data related to venues in different neighborhoods. For each neighborhood, related venues and tourist attractions that are within the radius will be identified using the foursquare API.

The final dataframe created after processing the information obtained through the foursquare API are as follows,

* Neighbourhood
* Neighborhood latitude and Longitude
* Name of the venue
* Venue latitude and longitude
* Venue category

### Implementation


Importing Python Libraries

In [13]:
import numpy as np
import pandas as pd
import matplotlib.cm as cm
import matplotlib.colors as colors
import requests
import folium
from sklearn.cluster import KMeans

In [17]:
!pip install geocoder

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |███▎                            | 10kB 22.7MB/s eta 0:00:01[K     |██████▋                         | 20kB 31.5MB/s eta 0:00:01[K     |██████████                      | 30kB 33.6MB/s eta 0:00:01[K     |█████████████▎                  | 40kB 21.1MB/s eta 0:00:01[K     |████████████████▋               | 51kB 15.0MB/s eta 0:00:01[K     |████████████████████            | 61kB 11.9MB/s eta 0:00:01[K     |███████████████████████▎        | 71kB 13.4MB/s eta 0:00:01[K     |██████████████████████████▋     | 81kB 14.8MB/s eta 0:00:01[K     |██████████████████████████████  | 92kB 12.2MB/s eta 0:00:01[K     |████████████████████████████████| 102kB 6.0MB/s 
Collecting ratelim
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fd

Reading France data using Pandas(We are only interested in the Paris data)

In [4]:
!wget -q -O 'france.json' https://www.data.gouv.fr/fr/datasets/r/e88c6fda-1d09-42a0-a069-606d3259114e

france_data = pd.read_json('france.json')
france_data.head()

Unnamed: 0,datasetid,recordid,fields,geometry,record_timestamp
0,correspondances-code-insee-code-postal,2bf36b38314b6c39dfbcd09225f97fa532b1fc45,"{'code_comm': '645', 'nom_dept': 'ESSONNE', 's...","{'type': 'Point', 'coordinates': [2.2517129721...",2016-09-21T00:29:06.175+02:00
1,correspondances-code-insee-code-postal,7ee82e74e059b443df18bb79fc5a19b1f05e5a88,"{'code_comm': '133', 'nom_dept': 'SEINE-ET-MAR...","{'type': 'Point', 'coordinates': [3.0529405055...",2016-09-21T00:29:06.175+02:00
2,correspondances-code-insee-code-postal,e2cd3186f07286705ed482a10b6aebd9de633c81,"{'code_comm': '378', 'nom_dept': 'ESSONNE', 's...","{'type': 'Point', 'coordinates': [2.1971816504...",2016-09-21T00:29:06.175+02:00
3,correspondances-code-insee-code-postal,868bf03527a1d0a9defe5cf4e6fa0a730d725699,"{'code_comm': '243', 'nom_dept': 'SEINE-ET-MAR...","{'type': 'Point', 'coordinates': [2.7097808131...",2016-09-21T00:29:06.175+02:00
4,correspondances-code-insee-code-postal,1bbcee92101fdb50f5f5fceb052681f2421ff961,"{'code_comm': '414', 'nom_dept': 'SEINE-ET-MAR...","{'type': 'Point', 'coordinates': [3.2582355268...",2016-09-21T00:29:06.175+02:00


Create a dataframe by preprocessing data

In [5]:
france_dataframe = pd.DataFrame()
for field in france_data.fields:
    field_dict = field
    france_dataframe = france_dataframe.append(field_dict, ignore_index=True)

france_dataframe.head()

Unnamed: 0,code_arr,code_cant,code_comm,code_dept,code_reg,geo_point_2d,geo_shape,id_geofla,insee_com,nom_comm,nom_dept,nom_region,population,postal_code,statut,superficie,z_moyen
0,3,3,645,91,11,"[48.750443119964764, 2.251712972144151]","{'type': 'Polygon', 'coordinates': [[[2.238024...",16275,91645,VERRIERES-LE-BUISSON,ESSONNE,ILE-DE-FRANCE,15.5,91370,Commune simple,999.0,121.0
1,3,20,133,77,11,"[48.41256065214989, 3.052940505560729]","{'type': 'Polygon', 'coordinates': [[[3.076046...",31428,77133,COURCELLES-EN-BASSEE,SEINE-ET-MARNE,ILE-DE-FRANCE,0.2,77126,Commune simple,1082.0,88.0
2,1,9,378,91,11,"[48.52726809075556, 2.19718165044305]","{'type': 'Polygon', 'coordinates': [[[2.203466...",30975,91378,MAUCHAMPS,ESSONNE,ILE-DE-FRANCE,0.3,91730,Commune simple,313.0,150.0
3,5,14,243,77,11,"[48.87307018579678, 2.7097808131278462]","{'type': 'Polygon', 'coordinates': [[[2.727542...",17000,77243,LAGNY-SUR-MARNE,SEINE-ET-MARNE,ILE-DE-FRANCE,20.2,77400,Chef-lieu canton,579.0,71.0
4,3,25,414,77,11,"[48.62891464105825, 3.2582355268439223]","{'type': 'Polygon', 'coordinates': [[[3.294591...",34949,77414,SAINT-HILLIERS,SEINE-ET-MARNE,ILE-DE-FRANCE,0.4,77160,Commune simple,1907.0,158.0


Select only the columns mentioned in the Data Description

In [6]:
df = france_dataframe[['postal_code','nom_comm','nom_dept','geo_point_2d']]

Filter Paris data from the France dataframe

In [8]:
df = df[df['nom_dept'].str.contains('PARIS')].reset_index(drop=True)
df.head()

Unnamed: 0,postal_code,nom_comm,nom_dept,geo_point_2d
0,75009,PARIS-9E-ARRONDISSEMENT,PARIS,"[48.87689616237872, 2.337460241388529]"
1,75002,PARIS-2E-ARRONDISSEMENT,PARIS,"[48.86790337886785, 2.344107166658533]"
2,75011,PARIS-11E-ARRONDISSEMENT,PARIS,"[48.85941549762748, 2.378741060237548]"
3,75008,PARIS-8E-ARRONDISSEMENT,PARIS,"[48.87252726662346, 2.312582560420059]"
4,75013,PARIS-13E-ARRONDISSEMENT,PARIS,"[48.82871768452136, 2.362468228516128]"


Modify the Paris dataframe and separate Latitude and Longitudes into two columns

In [11]:
lat_lng = df['geo_point_2d'].astype('str')

# Process latitudes
lat = lat_lng.apply(lambda x: x.split(',')[0])
lat = lat.apply(lambda x: x.lstrip('['))

df_lat  = pd.DataFrame(lat.astype(float))
df_lat.columns=['Latitude']

# Process longitudes
lng = lat_lng.apply(lambda x: x.split(',')[1])
lng = lng.apply(lambda x: x.rstrip(']'))

df_lng = pd.DataFrame(lng.astype(float))
df_lng.columns=['Longitude']

# Combine columns
df = pd.concat([df.drop('geo_point_2d', axis=1), df_lat, df_lng], axis=1)
df.head()

Unnamed: 0,postal_code,nom_comm,nom_dept,Latitude,Longitude
0,75009,PARIS-9E-ARRONDISSEMENT,PARIS,48.876896,2.33746
1,75002,PARIS-2E-ARRONDISSEMENT,PARIS,48.867903,2.344107
2,75011,PARIS-11E-ARRONDISSEMENT,PARIS,48.859415,2.378741
3,75008,PARIS-8E-ARRONDISSEMENT,PARIS,48.872527,2.312583
4,75013,PARIS-13E-ARRONDISSEMENT,PARIS,48.828718,2.362468


Visualize the neighborhoods of Paris

In [28]:
import geocoder

# paris = geocode(address='Paris, France, FR')[0]
paris = geocoder.arcgis('Paris, France, FR')
paris_lat = paris.json['lat']
paris_lng = paris.json['lng']

map = folium.Map(location=[paris_lat, paris_lng], zoom_start=12)
map

# adding markers to map
for latitude, longitude, borough, town in zip(df['Latitude'], df['Longitude'], df['nom_comm'], df['nom_dept']):
    label = '{}, {}'.format(town, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='Blue',
        fill=True,
        fill_opacity=0.8
        ).add_to(map)  
    
map

Configure the Foursquare API 

In [29]:
CLIENT_ID = 'NKH5LOIG3E1FIGHVASEZIY42KD5O0FCNSSIBXFWYP5BIYSTF' 
CLIENT_SECRET = 'EJVUHICOHKIE0QQQTJ1JMLIS1QQQGQNXIOLXO0FO5LCWCUSZ'
VERSION = '20190101'

Impement a function to fetch nearby venues from the Foursquare API

In [30]:
LIMIT=100

def fetchNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues = []
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # Define URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)
            
        # GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # Append results to vunue list
        venues.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    # Create Dataframe
    nearby_venues = pd.DataFrame([item for venue in venues for item in venue])
    nearby_venues.columns = ['Neighbourhood', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

Fetch nearby venues in each neighborhood of Paris

In [31]:
nearby_venues = fetchNearbyVenues(df['nom_comm'], df['Latitude'], df['Longitude'])
nearby_venues.head()

Unnamed: 0,Neighbourhood,Latitude,Longitude,Venue,Venue Category
0,PARIS-9E-ARRONDISSEMENT,48.876896,2.33746,Farine & O,Bakery
1,PARIS-9E-ARRONDISSEMENT,48.876896,2.33746,RAP,Gourmet Shop
2,PARIS-9E-ARRONDISSEMENT,48.876896,2.33746,Place Saint-Georges,Plaza
3,PARIS-9E-ARRONDISSEMENT,48.876896,2.33746,Le Bouclier de Bacchus,Wine Bar
4,PARIS-9E-ARRONDISSEMENT,48.876896,2.33746,La Compagnie du Café,Café


Explore the nearby venues

In [32]:
nearby_venues.shape

(1269, 5)

In [33]:
nearby_venues.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighbourhood,Latitude,Longitude,Venue
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Afghan Restaurant,PARIS-11E-ARRONDISSEMENT,48.859415,2.378741,Afghanistan
African Restaurant,PARIS-9E-ARRONDISSEMENT,48.876896,2.361113,Wally Le Saharien
American Restaurant,PARIS-19E-ARRONDISSEMENT,48.892735,2.384694,Harper's
Antique Shop,PARIS-9E-ARRONDISSEMENT,48.876896,2.337460,Hôtel des Ventes Drouot
Argentinian Restaurant,PARIS-3E-ARRONDISSEMENT,48.863054,2.359361,Anahi
...,...,...,...,...
Wine Bar,PARIS-9E-ARRONDISSEMENT,48.892735,2.400820,Vingt Vins d'Art
Wine Shop,PARIS-3E-ARRONDISSEMENT,48.886869,2.400820,Trois Fois Vin
Women's Store,PARIS-2E-ARRONDISSEMENT,48.867903,2.344107,L'Appartement Sézane
Zoo,PARIS-12E-ARRONDISSEMENT,48.835156,2.419807,Parc zoologique de Paris


Encode venue categories for analysis

In [34]:
encoded = pd.get_dummies(nearby_venues[['Venue Category']], prefix="", prefix_sep="")
encoded.head()

Unnamed: 0,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auvergne Restaurant,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basque Restaurant,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Bistro,Boat or Ferry,Bookstore,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Burger Joint,Bus Station,Bus Stop,Cafeteria,Café,Cambodian Restaurant,Canal,Candy Store,...,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soba Restaurant,South American Restaurant,Southwestern French Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Zoo,Zoo Exhibit
0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Add neighborhood colunm to encoded dataframe

In [35]:
encoded['Neighbourhood'] = nearby_venues['Neighbourhood'] 

encoded_updates = [encoded.columns[-1]] + list(encoded.columns[:-1])
encoded = encoded[encoded_updates]

encoded.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auvergne Restaurant,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basque Restaurant,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Bistro,Boat or Ferry,Bookstore,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Burger Joint,Bus Station,Bus Stop,Cafeteria,Café,Cambodian Restaurant,Canal,...,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soba Restaurant,South American Restaurant,Southwestern French Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Zoo,Zoo Exhibit
0,PARIS-9E-ARRONDISSEMENT,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,PARIS-9E-ARRONDISSEMENT,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,PARIS-9E-ARRONDISSEMENT,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,PARIS-9E-ARRONDISSEMENT,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
4,PARIS-9E-ARRONDISSEMENT,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Calculate mean venue categories in each neighborhood

In [36]:
df_paris = encoded.groupby('Neighbourhood').mean().reset_index()
df_paris.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auvergne Restaurant,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basque Restaurant,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Bistro,Boat or Ferry,Bookstore,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Burger Joint,Bus Station,Bus Stop,Cafeteria,Café,Cambodian Restaurant,Canal,...,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soba Restaurant,South American Restaurant,Southwestern French Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Zoo,Zoo Exhibit
0,PARIS-10E-ARRONDISSEMENT,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.07,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.04,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0
1,PARIS-11E-ARRONDISSEMENT,0.023256,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.046512,0.0,0.0,0.0,0.0,0.046512,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.069767,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.023256,0.046512,0.046512,0.0,0.0,0.0,0.0
2,PARIS-12E-ARRONDISSEMENT,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2
3,PARIS-13E-ARRONDISSEMENT,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.20339,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.101695,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.220339,0.0,0.0,0.0,0.0,0.0
4,PARIS-14E-ARRONDISSEMENT,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Analyze the top venue categories
Venue columns will be sorted from the most common to the least common

In [39]:
venue_count = 10

# Dataframe columns
columns = ['Neighbourhood']
for index in np.arange(venue_count):
    try:
        columns.append('Venue {}'.format(index+1))
    except:
        columns.append('Venue {}'.format(index+1))

def getTopVenues(row, venue_count):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:venue_count]

# create a new dataframe for Paris
top_venues_in_neighborhood = pd.DataFrame(columns=columns)
top_venues_in_neighborhood['Neighbourhood'] = df_paris['Neighbourhood']

for index in np.arange(df_paris.shape[0]):
    top_venues_in_neighborhood.iloc[index, 1:] = getTopVenues(df_paris.iloc[index, :], venue_count)

top_venues_in_neighborhood.head()

Unnamed: 0,Neighbourhood,Venue 1,Venue 2,Venue 3,Venue 4,Venue 5,Venue 6,Venue 7,Venue 8,Venue 9,Venue 10
0,PARIS-10E-ARRONDISSEMENT,French Restaurant,Bistro,Café,Hotel,Coffee Shop,Japanese Restaurant,Indian Restaurant,Pizza Place,Asian Restaurant,Bar
1,PARIS-11E-ARRONDISSEMENT,Café,Restaurant,Asian Restaurant,Pastry Shop,Italian Restaurant,Wine Bar,Vietnamese Restaurant,French Restaurant,Bakery,Sandwich Place
2,PARIS-12E-ARRONDISSEMENT,Zoo Exhibit,Bistro,Monument / Landmark,Supermarket,Zoo,Argentinian Restaurant,Frozen Yogurt Shop,Fountain,Food Court,Food & Drink Shop
3,PARIS-13E-ARRONDISSEMENT,Vietnamese Restaurant,Asian Restaurant,Thai Restaurant,Chinese Restaurant,French Restaurant,Juice Bar,Park,Coffee Shop,Creperie,Plaza
4,PARIS-14E-ARRONDISSEMENT,French Restaurant,Hotel,Bakery,Plaza,Supermarket,Tea Room,Pizza Place,Italian Restaurant,Japanese Restaurant,Brasserie


### Cluster Analysis using K means

Cluster Paris into 5 clusters

In [41]:
k = 5

clusters = df_paris.drop('Neighbourhood', 1)

paris_k = KMeans(n_clusters=k, random_state=0).fit(clusters)
print(paris_k)
print("Cluster labels: ", paris_k.labels_)

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
       n_clusters=5, n_init=10, n_jobs=None, precompute_distances='auto',
       random_state=0, tol=0.0001, verbose=0)
Cluster labels:  [0 0 4 2 1 0 3 1 0 0 0 0 0 0 0 0 0 1 1 0]


Insert cluster label into dataframe to construct the complete dataframe

In [42]:
top_venues_in_neighborhood.insert(0, 'Cluster Labels', paris_k.labels_ +1)

paris_data = df
paris_data = paris_data.join(top_venues_in_neighborhood.set_index('Neighbourhood'), on='nom_comm')

paris_data.head()

Unnamed: 0,postal_code,nom_comm,nom_dept,Latitude,Longitude,Cluster Labels,Venue 1,Venue 2,Venue 3,Venue 4,Venue 5,Venue 6,Venue 7,Venue 8,Venue 9,Venue 10
0,75009,PARIS-9E-ARRONDISSEMENT,PARIS,48.876896,2.33746,1,French Restaurant,Hotel,Bistro,Japanese Restaurant,Restaurant,Wine Bar,Cocktail Bar,Lounge,Bakery,Gym / Fitness Center
1,75002,PARIS-2E-ARRONDISSEMENT,PARIS,48.867903,2.344107,1,French Restaurant,Cocktail Bar,Wine Bar,Bakery,Salad Place,Plaza,Coffee Shop,Hotel,Creperie,Italian Restaurant
2,75011,PARIS-11E-ARRONDISSEMENT,PARIS,48.859415,2.378741,1,Café,Restaurant,Asian Restaurant,Pastry Shop,Italian Restaurant,Wine Bar,Vietnamese Restaurant,French Restaurant,Bakery,Sandwich Place
3,75008,PARIS-8E-ARRONDISSEMENT,PARIS,48.872527,2.312583,2,French Restaurant,Hotel,Spa,Corsican Restaurant,Art Gallery,Cocktail Bar,Theater,Plaza,Resort,Park
4,75013,PARIS-13E-ARRONDISSEMENT,PARIS,48.828718,2.362468,3,Vietnamese Restaurant,Asian Restaurant,Thai Restaurant,Chinese Restaurant,French Restaurant,Juice Bar,Park,Coffee Shop,Creperie,Plaza


Visualize the clustered neighborhood

In [43]:
# Get rid of NaN values
paris_data = paris_data.dropna(subset=['Cluster Labels'])

paris_cluster_map = folium.Map(location=[paris_lat, paris_lng], zoom_start=12)

x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]

# Set colors
k_colors = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in k_colors]

# Add markers
for lat, lon, poi, cluster in zip(paris_data['Latitude'], paris_data['Longitude'], paris_data['nom_comm'], paris_data['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster) +1) + ' ' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.8
        ).add_to(paris_cluster_map)
        
paris_cluster_map

#### Cluster Description

Cluster 1

In [44]:
paris_data.loc[paris_data['Cluster Labels'] == 1, paris_data.columns[[1] + list(range(5, paris_data.shape[1]))]]

Unnamed: 0,nom_comm,Cluster Labels,Venue 1,Venue 2,Venue 3,Venue 4,Venue 5,Venue 6,Venue 7,Venue 8,Venue 9,Venue 10
0,PARIS-9E-ARRONDISSEMENT,1,French Restaurant,Hotel,Bistro,Japanese Restaurant,Restaurant,Wine Bar,Cocktail Bar,Lounge,Bakery,Gym / Fitness Center
1,PARIS-2E-ARRONDISSEMENT,1,French Restaurant,Cocktail Bar,Wine Bar,Bakery,Salad Place,Plaza,Coffee Shop,Hotel,Creperie,Italian Restaurant
2,PARIS-11E-ARRONDISSEMENT,1,Café,Restaurant,Asian Restaurant,Pastry Shop,Italian Restaurant,Wine Bar,Vietnamese Restaurant,French Restaurant,Bakery,Sandwich Place
6,PARIS-3E-ARRONDISSEMENT,1,French Restaurant,Japanese Restaurant,Coffee Shop,Art Gallery,Gourmet Shop,Cocktail Bar,Bakery,Wine Bar,Italian Restaurant,Sandwich Place
7,PARIS-6E-ARRONDISSEMENT,1,French Restaurant,Chocolate Shop,Bakery,Plaza,Pastry Shop,Restaurant,Fountain,Theater,Italian Restaurant,Garden
8,PARIS-4E-ARRONDISSEMENT,1,French Restaurant,Clothing Store,Ice Cream Shop,Pastry Shop,Hotel,Park,Wine Bar,Gay Bar,Italian Restaurant,Pedestrian Plaza
9,PARIS-10E-ARRONDISSEMENT,1,French Restaurant,Bistro,Café,Hotel,Coffee Shop,Japanese Restaurant,Indian Restaurant,Pizza Place,Asian Restaurant,Bar
11,PARIS-5E-ARRONDISSEMENT,1,French Restaurant,Hotel,Italian Restaurant,Plaza,Bakery,Pub,Coffee Shop,Café,Bar,Historic Site
12,PARIS-19E-ARRONDISSEMENT,1,French Restaurant,Bar,Supermarket,Hotel,Sushi Restaurant,Beer Bar,Brewery,Seafood Restaurant,Bakery,Bistro
13,PARIS-20E-ARRONDISSEMENT,1,Plaza,Japanese Restaurant,Bakery,Bistro,French Restaurant,Café,Bar,Hotel,Italian Restaurant,Laundromat


Cluster 2

In [45]:
paris_data.loc[paris_data['Cluster Labels'] == 2, paris_data.columns[[1] + list(range(5, paris_data.shape[1]))]]

Unnamed: 0,nom_comm,Cluster Labels,Venue 1,Venue 2,Venue 3,Venue 4,Venue 5,Venue 6,Venue 7,Venue 8,Venue 9,Venue 10
3,PARIS-8E-ARRONDISSEMENT,2,French Restaurant,Hotel,Spa,Corsican Restaurant,Art Gallery,Cocktail Bar,Theater,Plaza,Resort,Park
14,PARIS-7E-ARRONDISSEMENT,2,Hotel,French Restaurant,Italian Restaurant,Café,History Museum,Bistro,Cocktail Bar,Art Museum,Plaza,Coffee Shop
16,PARIS-17E-ARRONDISSEMENT,2,French Restaurant,Hotel,Italian Restaurant,Japanese Restaurant,Bakery,Café,Plaza,Bistro,Restaurant,Breakfast Spot
19,PARIS-14E-ARRONDISSEMENT,2,French Restaurant,Hotel,Bakery,Plaza,Supermarket,Tea Room,Pizza Place,Italian Restaurant,Japanese Restaurant,Brasserie


Cluster 3

In [46]:
paris_data.loc[paris_data['Cluster Labels'] == 3, paris_data.columns[[1] + list(range(5, paris_data.shape[1]))]]

Unnamed: 0,nom_comm,Cluster Labels,Venue 1,Venue 2,Venue 3,Venue 4,Venue 5,Venue 6,Venue 7,Venue 8,Venue 9,Venue 10
4,PARIS-13E-ARRONDISSEMENT,3,Vietnamese Restaurant,Asian Restaurant,Thai Restaurant,Chinese Restaurant,French Restaurant,Juice Bar,Park,Coffee Shop,Creperie,Plaza


Cluster 4

In [47]:
paris_data.loc[paris_data['Cluster Labels'] == 4, paris_data.columns[[1] + list(range(5, paris_data.shape[1]))]]

Unnamed: 0,nom_comm,Cluster Labels,Venue 1,Venue 2,Venue 3,Venue 4,Venue 5,Venue 6,Venue 7,Venue 8,Venue 9,Venue 10
10,PARIS-16E-ARRONDISSEMENT,4,Plaza,Lake,French Restaurant,Park,Pool,Boat or Ferry,Art Museum,Bus Station,Bus Stop,Food


Cluster 5

In [48]:
paris_data.loc[paris_data['Cluster Labels'] == 5, paris_data.columns[[1] + list(range(5, paris_data.shape[1]))]]

Unnamed: 0,nom_comm,Cluster Labels,Venue 1,Venue 2,Venue 3,Venue 4,Venue 5,Venue 6,Venue 7,Venue 8,Venue 9,Venue 10
5,PARIS-12E-ARRONDISSEMENT,5,Zoo Exhibit,Bistro,Monument / Landmark,Supermarket,Zoo,Argentinian Restaurant,Frozen Yogurt Shop,Fountain,Food Court,Food & Drink Shop


The above clusters describe the Neighborhood that belong to the cluster and venues in the descending order of their popularity. 

### Conclusion

The intention of this study is to explore the neighborhoods in the city of Paris to guide its tourists with what the city has to offer. Based on the neighborhood, this provides the top 10 places that vistors of Paris can explore from popular landmarks to cafes. 

Based on the ranking of the venue the tourist get the chance to experience the best the city of lights has to offer.