# <center> <span style="color:Blue"> **CAPSTONE PROJECT - The Battle of the Neighborhoods** </span> </center>

This notebook will be used for the capstone project. This is the final project of "**IBM Data Science Certificat Professionnel**" in partnership with Coursera

# Table of Content 

* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

# Introduction <a name="introduction"></a>

Our goal here is to provide insight on the different neighborhoods in Bordeaux Metropole that will help decider to choose thecorrect place to invest.   
Choosing a correct business emplacement will depend on several aspects: the type of business, the target, the population density, the competition, price per square meter of the local …).    
Choosing a correct place to live will depend also on several aspects : the age of the buyer, the family structure (single, couple, kids…) , their hobbies, the place of their work, commodities, transport services, price per square meter, type of housing facilities ….    
Finally, investors will be mainly interested in the capacity of the borrowers to pay of their loan, but they will also be interested in the potential price trends of the neighborhood in order to secure their investment.    

In order to achieve our goal of showing relationships between neighborhoods, a descriptive approach will be conduct. We will aggregate neighborhoods in clusters depending on the following information: 
* Real estate price 
* Most common type of real estate properties (apartments, houses…)
* Principal venues of the neighborhoods 


In [1]:
import pandas as pd 
import numpy as np
# import of request in order to use FOURSQUARE API 
import requests
import json
from pandas.io.json import json_normalize 
import folium

# Data <a name="data"></a>

First we need to import all the data necessary to conduct our project. We want to create clusters that allow us to decide which activities are most likely to succeed in a given neighborhood, we need to capture some informations like, the location of neighborhoods, the main venues of it, the real estate price, <span style="color:red">the population density</span> and the structure of real estate market. To acheive it, we will use several databases and API : 

1. We import the postal code of Bordeaux Metropole 
2. We add the coordinates
3. We obtain real estate information from dvf API 
4. We enrich our data with FOURSQUARE venue
5. We classify the neighborhoods
6. <span style="color:red">We add population density (INSEE)</span>

Our first objective is to obtain the coordinates of the different neighborhoods of Bordeaux Metropole.   
No dataset exists with the coordinates of neighborhoods, and the town to which their are attached.    
For this reason, we have in a first time found a dataset with all the neighborhoods of Bordeaux Metropole in a csv format.   The dataset is available here : 
https://www.data.gouv.fr/fr/datasets/quartiers-des-communes-sur-le-territoire-de-bordeaux-metropole/

We will transform it in a dataframe and then enrich it with the name of attached towns by using an restful API provided by the city of Bordeaux. 


| data type  | column name | content |
| ------------ | ------------- | ----------- |
|int  | GID | Primary key|
|Geo Point |  Geo point | Geopooints|
|Geo Shape |  Geo points  | geoshape of neighborhood|
|int  | GEOM_ERR | error code geopoints |
|String | NOM | neighborhood name|
|String | INSEE |Code INSEE of town |
|Boolean | VALIDE | validate neighborhood|
|Boolean | QUARPOLI | political neighborhood|
|Date | CDATE | creation date|
|Date | MDATE | modification date|

In [2]:
# we load the data  
with open('project_data/se_quart_s.json',encoding="UTF-8") as json_data:
    bordeaux_data = json.load(json_data)

Let's give a quick look at the data  

In [3]:
# we are interesed by the name of the neighborhood, 
#by the insee code (key to find town names) 
#and by the geo_point_2d (FOURSQUARE)
bordeaux_data['features'][0]['properties']

{'nom': 'Toctoucau',
 'insee': '33318',
 'mdate': '2016/12/02 10:36:56+01',
 'valide': '1',
 'quarpoli': '1',
 'geo_point_2d': [44.766930057, -0.733236943925],
 'cdate': '2016/03/31 19:45:28+02',
 'gid': 135}

In [4]:
#here we can take the geometry part to demarcate the neighborhood
bordeaux_data['features'][0]['geometry']

{'type': 'Polygon',
 'coordinates': [[[-0.7236434, 44.7806362],
   [-0.7263344, 44.7792995],
   [-0.7291259, 44.7779141],
   [-0.7307535, 44.7770885],
   [-0.7332663, 44.7758391],
   [-0.7338081, 44.7755676],
   [-0.7338882, 44.7755114],
   [-0.7339438, 44.7755508],
   [-0.7339448, 44.7755444],
   [-0.7339489, 44.7755483],
   [-0.7343875, 44.7753319],
   [-0.7403993, 44.7703313],
   [-0.7408029, 44.77],
   [-0.7441674, 44.7672699],
   [-0.7512356, 44.7615481],
   [-0.7517709, 44.761106],
   [-0.7522809, 44.7606862],
   [-0.752645, 44.7604251],
   [-0.7587161, 44.7555004],
   [-0.7531676, 44.7516568],
   [-0.7506216, 44.7498913],
   [-0.7501806, 44.7495772],
   [-0.7501583, 44.7495867],
   [-0.7461762, 44.7518437],
   [-0.744993, 44.7525111],
   [-0.7432294, 44.7535073],
   [-0.7415752, 44.7544483],
   [-0.7398473, 44.7554301],
   [-0.7383658, 44.7562666],
   [-0.7369807, 44.7570635],
   [-0.7358568, 44.7577103],
   [-0.7346792, 44.7583909],
   [-0.733496, 44.7590618],
   [-0.7334125, 4

In [5]:
# we create a data frame with the columns we are interested in :
#columns 
columns=['Neighborhood','INSEE','Latitude','Longitude','Geometry']
bordeaux_neighborhoods=pd.DataFrame(columns=columns)
bordeaux_neighborhoods

Unnamed: 0,Neighborhood,INSEE,Latitude,Longitude,Geometry


In [6]:
# loop in json file and fill dataframe a row at a time
for data in bordeaux_data['features']:
    neighborhood_name = data['properties']['nom']
    neighborhodd_insee = data['properties']['insee']
    neighborhood_latlon = data['properties']['geo_point_2d']
    neighborhood_lat = neighborhood_latlon[0]
    neighborhood_lon = neighborhood_latlon[1]
    neighborhodd_geometry=data['geometry']
    
    bordeaux_neighborhoods = bordeaux_neighborhoods.append({'Neighborhood': neighborhood_name,
                                                            'INSEE':neighborhodd_insee,
                                                            'Latitude': neighborhood_lat,
                                                            'Longitude': neighborhood_lon,
                                                            'Geometry':neighborhodd_geometry
                                                           }, ignore_index=True)

As this file do not have all towns of Bordeaux Metropole, we have found another dataset at : `"https://opendata.bordeaux-metropole.fr/explore/dataset/fv_commu_s/table/"` containing all bordeaux without the inforamtion about neighborhoods. 
We will add these towns and duplicate town name in neighborhood column.  

In [7]:
# we load the data containing all towns of bordeaux Metropole
with open('project_data/communes.json') as json_data_communes:
    bordeaux_data_comm = json.load(json_data_communes)

In [9]:
for data in bordeaux_data_comm: 
    insee= data['fields']['code_commune']
    latitude= data['fields']['geo_point_2d'][0]
    longitude=data['fields']['geo_point_2d'][1]
    geometry =data['fields']['geo_shape']
    town= data['fields']['commune']
    if insee not in set(bordeaux_neighborhoods['INSEE']):
        bordeaux_neighborhoods= bordeaux_neighborhoods.append({'Neighborhood':town ,
                                                               'INSEE': insee,
                                                               'Latitude':latitude,
                                                               'Longitude':longitude,
                                                               'Geometry': geometry
                                                              },ignore_index=True)
                                                           

In [10]:
bordeaux_neighborhoods.head()

Unnamed: 0,Neighborhood,INSEE,Latitude,Longitude,Geometry
0,Toctoucau,33318,44.76693,-0.733237,"{'type': 'Polygon', 'coordinates': [[[-0.72364..."
1,3M-Bourgailh,33318,44.806158,-0.677987,"{'type': 'Polygon', 'coordinates': [[[-0.67910..."
2,Saige,33318,44.786531,-0.635364,"{'type': 'Polygon', 'coordinates': [[[-0.62840..."
3,Casino,33318,44.807801,-0.628748,"{'type': 'Polygon', 'coordinates': [[[-0.62803..."
4,Arlac,33281,44.822855,-0.626316,"{'type': 'Polygon', 'coordinates': [[[-0.62901..."


Now that we have all the coordinates of each neighborhood, we just need to add the town to which they are attached. 
To do so, we will use an API provided by public opensoft data.

`url= 'https://public.opendatasoft.com/api/records/1.0/search/?dataset=correspondance-code-insee-code-postal&facet=insee_com&facet=nom_dept&facet=nom_region&facet=statut&refine.insee_com=33063'`

In [11]:
for index, CODEINSEE in enumerate(bordeaux_neighborhoods['INSEE']): 
    url= f'https://public.opendatasoft.com/api/records/1.0/search/?dataset=correspondance-code-insee-code-postal&facet=insee_com&facet=nom_dept&facet=nom_region&facet=statut&refine.insee_com={CODEINSEE}'
    results=requests.get(url).json()
    bordeaux_neighborhoods.at[index,'postal_code']=results['records'][0]['fields']['postal_code']
    bordeaux_neighborhoods.at[index,'town']=results['records'][0]['fields']['nom_comm']
bordeaux_neighborhoods

Unnamed: 0,Neighborhood,INSEE,Latitude,Longitude,Geometry,postal_code,town
0,Toctoucau,33318,44.766930,-0.733237,"{'type': 'Polygon', 'coordinates': [[[-0.72364...",33600,PESSAC
1,3M-Bourgailh,33318,44.806158,-0.677987,"{'type': 'Polygon', 'coordinates': [[[-0.67910...",33600,PESSAC
2,Saige,33318,44.786531,-0.635364,"{'type': 'Polygon', 'coordinates': [[[-0.62840...",33600,PESSAC
3,Casino,33318,44.807801,-0.628748,"{'type': 'Polygon', 'coordinates': [[[-0.62803...",33600,PESSAC
4,Arlac,33281,44.822855,-0.626316,"{'type': 'Polygon', 'coordinates': [[[-0.62901...",33700,MERIGNAC
...,...,...,...,...,...,...,...
60,Saint-Jean-d'Illac,33422,44.821373,-0.735295,"{'type': 'LineString', 'coordinates': [[-0.729...",33127,SAINT-JEAN-D'ILLAC
61,Bruges,33075,44.888994,-0.578345,"{'type': 'LineString', 'coordinates': [[-0.576...",33520,BRUGES
62,Cubzac-les-Ponts,33143,44.972850,-0.470020,"{'type': 'LineString', 'coordinates': [[-0.495...",33240,CUBZAC-LES-PONTS
63,Saint-Aubin-de-Médoc,33376,44.948365,-0.753345,"{'type': 'LineString', 'coordinates': [[-0.697...",33160,SAINT-AUBIN-DE-MEDOC


In [12]:
bordeaux_neighborhoods.to_csv('bordeaux_neighborhoods-enrichi.csv',encoding="UTF-8")

In [13]:
bordeaux_neighborhoods=pd.read_csv('bordeaux_neighborhoods-enrichi.csv',index_col=0,encoding="UTF-8")
bordeaux_neighborhoods

Unnamed: 0,Neighborhood,INSEE,Latitude,Longitude,Geometry,postal_code,town
0,Toctoucau,33318,44.766930,-0.733237,"{'type': 'Polygon', 'coordinates': [[[-0.72364...",33600,PESSAC
1,3M-Bourgailh,33318,44.806158,-0.677987,"{'type': 'Polygon', 'coordinates': [[[-0.67910...",33600,PESSAC
2,Saige,33318,44.786531,-0.635364,"{'type': 'Polygon', 'coordinates': [[[-0.62840...",33600,PESSAC
3,Casino,33318,44.807801,-0.628748,"{'type': 'Polygon', 'coordinates': [[[-0.62803...",33600,PESSAC
4,Arlac,33281,44.822855,-0.626316,"{'type': 'Polygon', 'coordinates': [[[-0.62901...",33700,MERIGNAC
...,...,...,...,...,...,...,...
60,Saint-Jean-d'Illac,33422,44.821373,-0.735295,"{'type': 'LineString', 'coordinates': [[-0.729...",33127,SAINT-JEAN-D'ILLAC
61,Bruges,33075,44.888994,-0.578345,"{'type': 'LineString', 'coordinates': [[-0.576...",33520,BRUGES
62,Cubzac-les-Ponts,33143,44.972850,-0.470020,"{'type': 'LineString', 'coordinates': [[-0.495...",33240,CUBZAC-LES-PONTS
63,Saint-Aubin-de-Médoc,33376,44.948365,-0.753345,"{'type': 'LineString', 'coordinates': [[-0.697...",33160,SAINT-AUBIN-DE-MEDOC


In [14]:
import geocoder # import geocoder
g2 = geocoder.geocodefarm('Bordeaux, Gironde,France')
latlong= g2.latlng
latitude= latlong[0]
longitude = latlong[1]
print(f'latitude {latitude} & longitude {longitude}')

latitude 44.8367004394734 & longitude -0.58107000589395


In [15]:
for lat,long,bo,nei in zip(bordeaux_neighborhoods['Latitude'],bordeaux_neighborhoods['Longitude'],bordeaux_neighborhoods['town'],bordeaux_neighborhoods['Neighborhood']):
    print(bo,'\n', nei)

PESSAC 
 Toctoucau
PESSAC 
 3M-Bourgailh
PESSAC 
 Saige
PESSAC 
 Casino
MERIGNAC 
 Arlac
PESSAC 
 Verthamon
CENON 
 Gambetta-Mairie-Lissandre
PESSAC 
 Cap de Bos
MERIGNAC 
 Bourran
MERIGNAC 
 Capeyron
BORDEAUX 
 Saint Augustin - Tauzin - Alphonse Dupeux
BORDEAUX 
 La Bastide
CENON 
 Palmer-Gravières-Cavailles
PESSAC 
 Magonty
PESSAC 
 Chiquet-Fontaudin
PESSAC 
 Le Monteil
MERIGNAC 
 La Glacière
BORDEAUX 
 Bordeaux Maritime
BORDEAUX 
 Bordeaux Sud
MERIGNAC 
 Les Eyquems
PESSAC 
 Noès
PESSAC 
 Arago-La Chataigneraie
PESSAC 
 Le Vallon-Les Echoppes
MERIGNAC 
 Le Burck
PESSAC 
 Sardine
MERIGNAC 
 Beutre
PESSAC 
 La Paillère-Compostelle
BORDEAUX 
 Nansouty - Saint Genès
MERIGNAC 
 Centre ville
BORDEAUX 
 Chartrons - Grand Parc - Jardin Public
PESSAC 
 Brivazac-Candau
PESSAC 
 France Alouette
BORDEAUX 
 Centre ville
BORDEAUX 
 Caudéran
MERIGNAC 
 Chemin Long
MERIGNAC 
 Beaudésert
CENON 
 Plaisance-Loret-Maregue
PESSAC 
 Le Bourg
ARTIGUES-PRES-BORDEAUX 
 Artigues-près-Bordeaux
LE BOUSCAT 
 Le

In [16]:
neighborhoods= folium.Map(location=[latitude,longitude],zoom_start=11,encoding="UTF-8")
for lat,long,bo,nei in zip(bordeaux_neighborhoods['Latitude'],bordeaux_neighborhoods['Longitude'],bordeaux_neighborhoods['town'],bordeaux_neighborhoods['Neighborhood']):
    label=(f'{bo},\n, {nei}')
    label=folium.Popup(label, parse_html=True, encoding="UTF-8")
    folium.CircleMarker(location=[lat,long],
                        popup=label,
                        radius=5,
                        fill=True,
                        fill_color='#3388ff',
                        fill_opacity=1,
                        parse_html=False,
                       encoding="UTF-8").add_to(neighborhoods)
neighborhoods

In [None]:
test= pd.read_json('project_data/se_cpost_s.geojson')

In [None]:
test['features'][0]['geometry']

In [None]:
test=pd.read_csv('project_data/valeur_fonciere_gironde.csv')

In [None]:
len(test)

In [None]:
url= 'https://api.cquest.org/dvf?code_postal=33000'

In [None]:
results=requests.get(url).json() 

In [None]:
results.keys()

In [None]:
results['nb_resultats']

Here again, let's give a look at the result returned from the API
The information we will keep is located in the `results['resultats']` section. 
In detail, we want to keep : 
* Value of the property: `['valeur_fonciere']`
* kind of property : `['type_local']`
* Area of the property :`['surface_relle_bati'`]
* Number of rooms : `['nombre_pieces_principales']`
* latitute :`['lat': 44.830855]`
* longitude: `['lon']`
* surface of parcel:  `['surface_terrain']`


In [None]:
results['resultats'][0]