## Capstone Project - The Battle of Neighborhoods (Week 1)

### Introduction

####  Description of the problem and a discussion of the background

Paris is the most densely populated capital city in europe and the fourth in the world
(21,498/km² - 53,000/sq mi).
As a consequence the real estate prices are amongst the highest in the world ( 8th in the world with 14,017.63 $  per square meter).

Paris being a city where relatively few people possess a car (due to excessive taxes and parking fees). So a car rental agency can be a juicy business because a lot of residents may need ponctually car

For this project let's put ourselves in the shoes of an entrepeneur looking to open a car rental agency. So for the location we have to find a balance between the real estate prices and the presence of competitors. The vicinity of a major railway station will be an asset as well




Before starting the analysis let's import all the needed libraries :

In [1]:
import pandas as pd
!pip install geocoder
import geocoder # import geocoder
!pip install geopy
from geopy.geocoders import Nominatim 
!pip install folium
import folium
import requests
import json
import numpy as np
# import k-means from clustering stage
from sklearn.cluster import KMeans
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors



## Table of content
[1. Data Description](#Data_Description)

[2. Methodology](#Methodology)

<a id='Data_Description'></a>

#### 1. Data Description
To solve this problem we will use the following data :

* An excel file (Arrondissements_Paris.xlsx) downloaded from "Open platform for French public data"  containing the list of all the Paris districts with their coordinates. [1]

* Foursquare API to get the car rental agencies for each district [2]

*  Another csv file from "Open platform for French public data" with the real estate prices for all the french cities from which I extracted the price for only the 20 Paris districts [3]


<a id='Methodology'></a>

#### 2. Methodology

##### Load and exploring dataset

Let's import the file Arrondissements_Paris.xlsx that contains the list of the Paris districts with their coordinates :

In [55]:
paris_neighbourhood_df=pd.read_excel('Arrondissements_Paris.xlsx')
paris_neighbourhood_df.head()


Unnamed: 0,PostalCode,Neighbourhood_Number,Numéro d’arrondissement INSEE,Nom de l’arrondissement,Neighbourhood_Name,N_SQ_CO,Surface,Périmètre,Latitude,Longitude,average housing price
0,75017,17,75117,17ème Ardt,Batignolles-Monceau,750001537,5668835.0,10775.579516,48.883669,2.303638,10210
1,75020,20,75120,20ème Ardt,Ménilmontant,750001537,5983446.0,10704.940486,48.865439,2.400913,8560
2,75009,9,75109,9ème Ardt,Opéra,750001537,2178303.0,6471.58829,48.877164,2.337458,10730
3,75018,18,75118,18ème Ardt,Buttes-Montmartre,750001537,5996051.0,9916.464176,48.892569,2.348161,9360
4,75003,3,75103,3ème Ardt,Temple,750001537,1170883.0,4519.263648,48.862872,2.360001,12260


We will keep only the following columns : PostalCode,Neighbourhood_Number,Neighbourhood_Name and Coordinates

In [56]:
paris_neighbourhood_df=paris_neighbourhood_df[['PostalCode','Neighbourhood_Number','Neighbourhood_Name','Latitude','Longitude','average housing price']]
paris_neighbourhood_df.head()

Unnamed: 0,PostalCode,Neighbourhood_Number,Neighbourhood_Name,Latitude,Longitude,average housing price
0,75017,17,Batignolles-Monceau,48.883669,2.303638,10210
1,75020,20,Ménilmontant,48.865439,2.400913,8560
2,75009,9,Opéra,48.877164,2.337458,10730
3,75018,18,Buttes-Montmartre,48.892569,2.348161,9360
4,75003,3,Temple,48.862872,2.360001,12260


##### Use geopy library to get the latitude and longitude values of Paris

In [57]:
address = 'Paris, Île-de-France, France'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Paris are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Paris are 48.8566969, 2.3514616.


##### Paris neighbourhoods visualization

In [58]:
# create map of Paris using latitude and longitude values
map_paris = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng,neighborhood in zip(paris_neighbourhood_df['Latitude'], paris_neighbourhood_df['Longitude'], paris_neighbourhood_df['Neighbourhood_Name']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_paris)  
    
map_paris

Next, we are going to use the Foursquare API to find all the car rental agencies in the 20 district

##### Define Foursquare Credentials and Version

In [59]:
CLIENT_ID = 'FI0NVYWODT3CVTGQXA3CAU5EYXQ4OTAEUBECRSIL44SAOQVU' # your Foursquare ID
CLIENT_SECRET = 'KOYST14NLBRHIX2GNX4EK2RNSOLNYX0QFECBQ2OBZDEFWSNE' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: FI0NVYWODT3CVTGQXA3CAU5EYXQ4OTAEUBECRSIL44SAOQVU
CLIENT_SECRET:KOYST14NLBRHIX2GNX4EK2RNSOLNYX0QFECBQ2OBZDEFWSNE


Let's use the getNearbyVenues from the lab "Segmenting and Clustering Neighborhoods in New York City"

In [60]:
def getNearbyVenues(PostalCode,names, latitudes, longitudes, categoryid_search="4bf58dd8d48988d1ef941735", radius=1000 ):
    

    venues_list=[]
    for pc, name, lat, lng in zip(PostalCode, names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&intent=browse'.format(CLIENT_ID, CLIENT_SECRET, lat, lng, VERSION, categoryid_search, radius)

            
        # make the GET request
        results = requests.get(url).json()['response']['venues']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            pc,
            name, 
            lat, 
            lng, 
            v['name'], 
            v['location']['lat'], 
            v['location']['lng']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal_code',
                  'Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude']
    
    return(nearby_venues)

Let's run the above function to create a data frame containing all the car rental agencies of Paris

In [61]:
paris_car_rental = getNearbyVenues(PostalCode=paris_neighbourhood_df['PostalCode'],
                                   names=paris_neighbourhood_df['Neighbourhood_Name'],
                                   latitudes=paris_neighbourhood_df['Latitude'],
                                   longitudes=paris_neighbourhood_df['Longitude'])
                                  



Batignolles-Monceau
Ménilmontant
Opéra
Buttes-Montmartre
Temple
Palais-Bourbon
Popincourt
Vaugirard
Gobelins
Panthéon
Élysée
Reuilly
Passy
Luxembourg
Louvre
Bourse
Buttes-Chaumont
Entrepôt
Hôtel-de-Ville
Observatoire


Let's check the size of the resulting dataframe

In [62]:
#The "autolib' station" are not car rent agencies but self service cars 
#So Let's remove them
paris_car_rental=paris_car_rental[paris_car_rental['Venue'] != "Autolib’ Station" ]
paris_car_rental=paris_car_rental[paris_car_rental['Venue'] != "Autolib' Station"]

print(paris_car_rental.shape)

# let's count the numbre of car agencies per district
car_rental_agencies=paris_car_rental.groupby(['Postal_code'])['Venue'].count().to_frame()
car_rental_agencies.columns=['Number_of_car_agencies']
car_rental_agencies=car_rental_agencies
car_rental_agencies

(229, 7)


Unnamed: 0_level_0,Number_of_car_agencies
Postal_code,Unnamed: 1_level_1
75001,17
75002,11
75003,11
75004,13
75005,12
75006,13
75007,10
75008,14
75009,11
75010,18


Merge the number of car agencies per district with the data frame paris_neighbourhood_df

In [63]:
paris_neighbourhood_df.sort_values(by=['PostalCode'],inplace=True)
paris_neighbourhood_df.reset_index(inplace=True)
paris_neighbourhood_df['Car_rental_agencies']=car_rental_agencies['Number_of_car_agencies']
paris_neighbourhood_df

Unnamed: 0,Postal_code,Number_of_car_agencies
0,75001,17
1,75002,11
2,75003,11
3,75004,13
4,75005,12
5,75006,13
6,75007,10
7,75008,14
8,75009,11
9,75010,18


Unnamed: 0,index,PostalCode,Neighbourhood_Number,Neighbourhood_Name,Latitude,Longitude,average housing price,Car_rental_agencies
0,14,75001,1,Louvre,48.862563,2.336443,12840,17
1,15,75002,2,Bourse,48.868279,2.342803,11250,11
2,4,75003,3,Temple,48.862872,2.360001,12260,11
3,18,75004,4,Hôtel-de-Ville,48.854341,2.35763,12790,13
4,9,75005,5,Panthéon,48.844443,2.350715,12140,12
5,13,75006,6,Luxembourg,48.84913,2.332898,14180,13
6,5,75007,7,Palais-Bourbon,48.856174,2.312188,13230,10
7,10,75008,8,Élysée,48.872721,2.312554,11240,14
8,2,75009,9,Opéra,48.877164,2.337458,10730,11
9,17,75010,10,Entrepôt,48.87613,2.360728,9730,18


##### Cluster Neighborhoods


We will use unsupervised learning K-means algorithm to cluster the districts

In [None]:


paris_car_rental_clustering=paris_car_rental

### References

* [1] [Paris districts coordinates](https://www.data.gouv.fr/fr/datasets/arrondissements-1/) 

* [2] [Foursquare API](https://developer.foursquare.com/)

* [3] [Average housing prices in France](https://www.data.gouv.fr/fr/datasets/prix-moyen-au-m2-des-ventes-de-maisons-et-dappartements-par-commune-en-2017/)