# Capstone Project - The Battle of Neighborhoods

## Where to live in Paris?

![alt text](paris-cityscape-overview-guide.jpg)

## I. Introduction

Paris is a vibrant and complex city. For someone who has not been living in Paris for many years, the city may look impenetrable. Which neighborhoods are great for a coffee? Which neighborhoods are famous for its markets? Where are located the best bars? When confronted with these questions, tourists and new residents generally use a guide or use websites such as Yelp or Tripadvisor. My experience with these platforms has often been disappointing because the website (or the guide) is not tailored to my tastes and preferences. For this capstone project, I would like to offer an alternative based on data mining and clustering. 

Data on Paris neighborhoods amenities (bars, cafés, museums, bakeries, etc.) can easily be collected and treated to generate a map of Paris. Based on the user's preferences, we can direct the user towards a specific neighborhood in Paris. This approach can be seen as a refinement of the traditional tourism websites, with the addition of a data-driven customization layer. Having a data-driven map of Paris is also helpful for people moving to Paris when deciding where to live. The French capital is extremely expensive. Yes, [the Marais](https://en.wikipedia.org/wiki/The_Marais) is great, but maybe you prefer living in the much cheaper 19th or 20th? To answer this question, we need data and a robust clustering methodology.

## II. Data

To create a data-driven map of Paris' neighborhoods, I will data from [Foursquare API](https://developer.foursquare.com/). Foursquare defines itself as "a location technology platform dedicated to improving how people move through the real world".

In practice, people use Foursquare [platform](https://foursquare.com/bestbarsuk) or app to find places. They can then rate the place, give a rating, add photos and/or a description. The Foursquare API allows us the retrieve the data that was created by users. 

I will also use data from Wikipedia to get information on Paris. This [page](https://en.wikipedia.org/wiki/Arrondissements_of_Paris) contains the name of Paris areas ("arrondissements"), as well as some basic information such as area and population. 

## III. Preview of Paris

As a proof of concept, below I generate a map of Paris using **folium**.

In [35]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
import time

print('Libraries imported.')

Libraries imported.


In [36]:
address = 'Paris, France'

geolocator = Nominatim(user_agent="paris_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Paris are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Paris are 48.8566969, 2.3514616.


In [37]:
# create map of New York using latitude and longitude values
map_paris = folium.Map(location=[latitude, longitude], zoom_start=12)

map_paris

Let's try to add "arrondissements" to the map above.

In [38]:
import pandas as pd
import numpy as np

link = "https://en.wikipedia.org/wiki/Arrondissements_of_Paris"

tables = pd.read_html(link, header=0)

# read_html creates a list of dataframes.
# Let's just take the first one:
df = tables[2]

# rename the first column:
df.rename(columns = {"Arrondissement (R for Right Bank, L for Left Bank)" : "Arrondissement"}, inplace=True)
df.head()

Unnamed: 0,Arrondissement,Name,Area (km2),Population(March 1999 census),Population(July 2005 estimate),Density (2005)(inhabitants per km2),Peak of population,Mayor
0,1st (Ier) R,Louvre,1.826 km2 (0.705 sq mi),16888,17700,9693,before 1861,Jean-François Legaret (LR)
1,2nd (IIe) R,Bourse,0.992 km2 (0.383 sq mi),19585,20700,20867,before 1861,Jacques Boutault (EELV)
2,3rd (IIIe) R,Temple,1.171 km2 (0.452 sq mi),34248,35100,29974,before 1861,Pierre Aidenbaum (PS)
3,4th (IVe) R,Hôtel-de-Ville,1.601 km2 (0.618 sq mi),30675,28600,17864,before 1861,Ariel Weil (PS)
4,5th (Ve) L,Panthéon,2.541 km2 (0.981 sq mi),58849,60600,23849,1911,Florence Berthout (LR)


Let's geocode the dataframe above:

In [39]:
# List to store values
Postcode = []
Borough = []
Neighbourhood = []

# Options
# Normal sleeping time
sleeping_time = 0.1
# Sleeping time in case of error
sleeping_time_error = 1.0
# Max number of attempts (sometimes, we don't get a value)
max_nb_attempts = 2

In [40]:
geolocator = Nominatim(user_agent="Coursera_Capstone")

In [41]:
#Initialization
df["Latitude"] = 0
df["Longitude"] = 0

for index, row in df.iterrows():
    # Select the first neighborhood
    l =  row["Name"] + ", Paris France"
    print(l)
    # Loop until success
    current_attempt = 0
    while True:
        current_attempt +=1 
        try:
            location = geolocator.geocode(l)
            print((location.latitude, location.longitude))
        except:
            print("Error with geolocator.geocode(l)")
            print("Sleeping for a while and trying again")
            time.sleep(sleeping_time_error)      
        # Sleep for some time to prevent us being blocked:
        if location is not None:
            time.sleep(sleeping_time)
            df.loc[index, "Latitude"] = location.latitude
            df.loc[index, "Longitude"] = location.longitude
            # exit while loop
            break
        # If error, sleep a little bit longer
        else:
            print("Sleeping for a while and trying again")
            time.sleep(sleeping_time_error)
        # Exit if reached the max number of attempts
        if current_attempt == max_nb_attempts:
            print("Max number of attempts reached. Setting lat and lon to 0")
            df.loc[index,"Latitude"] = 0
            df.loc[index,"Longitude"] = 0
            break

Louvre, Paris France
(48.8611473, 2.33802768704666)
Bourse, Paris France
(48.8686296, 2.3414739)
Temple, Paris France
(48.8665004, 2.360708)
Hôtel-de-Ville, Paris France
(48.856426299999995, 2.3525275780116073)
Panthéon, Paris France
(48.84619085, 2.346078521905153)
Luxembourg, Paris France
(48.8493919, 2.3322597335758593)
Palais-Bourbon, Paris France
(48.86159615, 2.3179092733655935)
Élysée, Paris France
(48.8466437, 2.3698297)
Opéra, Paris France
(48.8706446, 2.33233)
Entrepôt, Paris France
(48.876008049999996, 2.360445036409199)
Popincourt, Paris France
(48.860071149999996, 2.3781433620616133)
Reuilly, Paris France
(48.83520025, 2.445135854630304)
Gobelins, Paris France
(48.82985295, 2.3630259501036406)
Observatoire, Paris France
(48.8295667, 2.3239624642685364)
Vaugirard, Paris France
(48.841430200000005, 2.296164933019397)
Passy, Paris France
(48.8575047, 2.2809828)
Batignolles-Monceau, Paris France
(48.8813119, 2.3157496)
Butte-Montmartre, Paris France
(48.89212585, 2.34817754532

In [42]:
df.head()

Unnamed: 0,Arrondissement,Name,Area (km2),Population(March 1999 census),Population(July 2005 estimate),Density (2005)(inhabitants per km2),Peak of population,Mayor,Latitude,Longitude
0,1st (Ier) R,Louvre,1.826 km2 (0.705 sq mi),16888,17700,9693,before 1861,Jean-François Legaret (LR),48.861147,2.338028
1,2nd (IIe) R,Bourse,0.992 km2 (0.383 sq mi),19585,20700,20867,before 1861,Jacques Boutault (EELV),48.86863,2.341474
2,3rd (IIIe) R,Temple,1.171 km2 (0.452 sq mi),34248,35100,29974,before 1861,Pierre Aidenbaum (PS),48.8665,2.360708
3,4th (IVe) R,Hôtel-de-Ville,1.601 km2 (0.618 sq mi),30675,28600,17864,before 1861,Ariel Weil (PS),48.856426,2.352528
4,5th (Ve) L,Panthéon,2.541 km2 (0.981 sq mi),58849,60600,23849,1911,Florence Berthout (LR),48.846191,2.346079


In [44]:
# create map of New York using latitude and longitude values
map_paris = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for (lat, lng, borough, arr) in zip(df['Latitude'], df['Longitude'], df['Name'], df['Arrondissement']):
    label = '{},{}'.format(borough, arr)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_paris)  
    
map_paris

![alt text](map_Paris_2.png)