# <center> <span style="color:Blue"> **CAPSTONE PROJECT - The Battle of the Neighborhoods** </span> </center>

This notebook will be used for the capstone project. This is the final project of "**IBM Data Science Certificat Professionnel**" in partnership with Coursera

# Table of Content 

* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

# Introduction <a name="introduction"></a>

Our goal here is to provide insight on the different neighborhoods in Bordeaux Metropole that will help decider to choose thecorrect place to invest.   
Choosing a correct business emplacement will depend on several aspects: the type of business, the target, the population density, the competition, price per square meter of the local …).    
Choosing a correct place to live will depend also on several aspects : the age of the buyer, the family structure (single, couple, kids…) , their hobbies, the place of their work, commodities, transport services, price per square meter, type of housing facilities ….    
Finally, investors will be mainly interested in the capacity of the borrowers to pay of their loan, but they will also be interested in the potential price trends of the neighborhood in order to secure their investment.    

In order to achieve our goal of showing relationships between neighborhoods, a descriptive approach will be conduct. We will aggregate neighborhoods in clusters depending on the following information: 
* Real estate price 
* Most common type of real estate properties (apartments, houses…)
* Principal venues of the neighborhoods 


In [151]:
import pandas as pd 
import numpy as np
# import of request in order to use FOURSQUARE API 
import requests
import json
from pandas.io.json import json_normalize 

# Data <a name="data"></a>

First we need to import all the data necessary to conduct our project. We want to create clusters that allow us to decide which activities are most likely to succeed in a given neighborhood, we need to capture some informations like, the location of neighborhoods, the main venues of it, the real estate price, <span style="color:red">the population density</span> and the structure of real estate market. To acheive it, we will use several databases and API : 

1. We import the postal code of Bordeaux Metropole 
2. We add the coordinates
3. We obtain real estate information from dvf API 
4. We enrich our data with FOURSQUARE venue
5. We classify the neighborhoods
6. <span style="color:red">We add population density (INSEE)</span>

Our first objective is to obtain the coordinates of the different neighborhoods of Bordeaux Metropole.   
No dataset exists with the coordinates of neighborhoods, and the town to which their are attached.    
For this reason, we have in a first time found a dataset with all the neighborhoods of Bordeaux Metropole in a csv format.    
We will transform it in a dataframe and then enrich it with the name of attached towns by using an restful API provided by the city of Bordeaux. 


| data type  | column name | content |
| ------------ | ------------- | ----------- |
|int  | GID | Primary key|
|Geo Point |  Geo point | Geopooints|
|Geo Shape |  Geo points  | geoshape of neighborhood|
|int  | GEOM_ERR | error code geopoints |
|String | NOM | neighborhood name|
|String | INSEE |Code INSEE of town |
|Boolean | VALIDE | validate neighborhood|
|Boolean | QUARPOLI | political neighborhood|
|Date | CDATE | creation date|
|Date | MDATE | modification date|

In [194]:
# we load the data  
with open('project_data/se_quart_s.json') as json_data:
    bordeaux_data = json.load(json_data)

Let's give a quick look at the data  

In [195]:
# we are interesed by the name of the neighborhood, 
#by the insee code (key to find town names) 
#and by the geo_point_2d (FOURSQUARE)
bordeaux_data['features'][0]['properties']

{'nom': 'Toctoucau',
 'insee': '33318',
 'mdate': '2016/12/02 10:36:56+01',
 'valide': '1',
 'quarpoli': '1',
 'geo_point_2d': [44.766930057, -0.733236943925],
 'cdate': '2016/03/31 19:45:28+02',
 'gid': 135}

In [196]:
#here we can take the geometry part to demarcate the neighborhood
bordeaux_data['features'][0]['geometry']

{'type': 'Polygon',
 'coordinates': [[[-0.7236434, 44.7806362],
   [-0.7263344, 44.7792995],
   [-0.7291259, 44.7779141],
   [-0.7307535, 44.7770885],
   [-0.7332663, 44.7758391],
   [-0.7338081, 44.7755676],
   [-0.7338882, 44.7755114],
   [-0.7339438, 44.7755508],
   [-0.7339448, 44.7755444],
   [-0.7339489, 44.7755483],
   [-0.7343875, 44.7753319],
   [-0.7403993, 44.7703313],
   [-0.7408029, 44.77],
   [-0.7441674, 44.7672699],
   [-0.7512356, 44.7615481],
   [-0.7517709, 44.761106],
   [-0.7522809, 44.7606862],
   [-0.752645, 44.7604251],
   [-0.7587161, 44.7555004],
   [-0.7531676, 44.7516568],
   [-0.7506216, 44.7498913],
   [-0.7501806, 44.7495772],
   [-0.7501583, 44.7495867],
   [-0.7461762, 44.7518437],
   [-0.744993, 44.7525111],
   [-0.7432294, 44.7535073],
   [-0.7415752, 44.7544483],
   [-0.7398473, 44.7554301],
   [-0.7383658, 44.7562666],
   [-0.7369807, 44.7570635],
   [-0.7358568, 44.7577103],
   [-0.7346792, 44.7583909],
   [-0.733496, 44.7590618],
   [-0.7334125, 4

In [197]:
# we create a data frame with the columns we are interested in :
#columns 
columns=['Neighborhood','INSEE','Latitude','Longitude','Geometry']
bordeaux_neighborhoods=pd.DataFrame(columns=columns)
bordeaux_neighborhoods

Unnamed: 0,Neighborhood,INSEE,Latitude,Longitude,Geometry


In [198]:
# loop in json file and fill dataframe a row at a time
for data in bordeaux_data['features']:
    neighborhood_name = data['properties']['nom']
    neighborhodd_insee = data['properties']['insee']
    neighborhood_latlon = data['properties']['geo_point_2d']
    neighborhood_lat = neighborhood_latlon[0]
    neighborhood_lon = neighborhood_latlon[1]
    neighborhodd_geometry=data['geometry']
    
    bordeaux_neighborhoods = bordeaux_neighborhoods.append({'Neighborhood': neighborhood_name,
                                                            'INSEE':neighborhodd_insee,
                                                            'Latitude': neighborhood_lat,
                                                            'Longitude': neighborhood_lon,
                                                            'Geometry':neighborhodd_geometry
                                                           }, ignore_index=True)

In [199]:
bordeaux_neighborhoods.head()

Unnamed: 0,Neighborhood,INSEE,Latitude,Longitude,Geometry
0,Toctoucau,33318,44.76693,-0.733237,"{'type': 'Polygon', 'coordinates': [[[-0.72364..."
1,3M-Bourgailh,33318,44.806158,-0.677987,"{'type': 'Polygon', 'coordinates': [[[-0.67910..."
2,Saige,33318,44.786531,-0.635364,"{'type': 'Polygon', 'coordinates': [[[-0.62840..."
3,Casino,33318,44.807801,-0.628748,"{'type': 'Polygon', 'coordinates': [[[-0.62803..."
4,Arlac,33281,44.822855,-0.626316,"{'type': 'Polygon', 'coordinates': [[[-0.62901..."


Now that we have all the coordinates of each neighborhood, we just need to add the town to which they are attached. 
To do so, we will use an API provided by public opensoft data.

`url= 'https://public.opendatasoft.com/api/records/1.0/search/?dataset=correspondance-code-insee-code-postal&facet=insee_com&facet=nom_dept&facet=nom_region&facet=statut&refine.insee_com=33063'`

In [200]:
for index, CODEINSEE in enumerate(bordeaux_neighborhoods['INSEE']): 
    url= f'https://public.opendatasoft.com/api/records/1.0/search/?dataset=correspondance-code-insee-code-postal&facet=insee_com&facet=nom_dept&facet=nom_region&facet=statut&refine.insee_com={CODEINSEE}'
    results=requests.get(url).json()
    bordeaux_neighborhoods.at[index,'postal_code']=results['records'][0]['fields']['postal_code']
    bordeaux_neighborhoods.at[index,'town']=results['records'][0]['fields']['nom_comm']
bordeaux_neighborhoods

Unnamed: 0,Neighborhood,INSEE,Latitude,Longitude,Geometry,postal_code,town
0,Toctoucau,33318,44.76693,-0.733237,"{'type': 'Polygon', 'coordinates': [[[-0.72364...",33600,PESSAC
1,3M-Bourgailh,33318,44.806158,-0.677987,"{'type': 'Polygon', 'coordinates': [[[-0.67910...",33600,PESSAC
2,Saige,33318,44.786531,-0.635364,"{'type': 'Polygon', 'coordinates': [[[-0.62840...",33600,PESSAC
3,Casino,33318,44.807801,-0.628748,"{'type': 'Polygon', 'coordinates': [[[-0.62803...",33600,PESSAC
4,Arlac,33281,44.822855,-0.626316,"{'type': 'Polygon', 'coordinates': [[[-0.62901...",33700,MERIGNAC
5,Verthamon,33318,44.816601,-0.613376,"{'type': 'Polygon', 'coordinates': [[[-0.60980...",33600,PESSAC
6,Gambetta-Mairie-Lissandre,33119,44.852746,-0.534511,"{'type': 'Polygon', 'coordinates': [[[-0.52933...",33150,CENON
7,Cap de Bos,33318,44.790098,-0.686331,"{'type': 'Polygon', 'coordinates': [[[-0.68063...",33600,PESSAC
8,Bourran,33281,44.844425,-0.634709,"{'type': 'Polygon', 'coordinates': [[[-0.63940...",33700,MERIGNAC
9,Capeyron,33281,44.85371,-0.649825,"{'type': 'Polygon', 'coordinates': [[[-0.66025...",33700,MERIGNAC


In [201]:
bordeaux_neighborhoods.to_csv('bordeaux_neighborhoods-enrichi.csv')

In [6]:
test= pd.read_json('project_data/se_cpost_s.geojson')

In [21]:
test['features'][0]['geometry']

{'type': 'Polygon',
 'coordinates': [[[-0.5031743, 44.8480085],
   [-0.5031962999999999, 44.8480594],
   [-0.5035961, 44.8490448],
   [-0.5037096999999999, 44.8493247],
   [-0.5039673, 44.8499738],
   [-0.5044141, 44.8510549],
   [-0.5045772, 44.8514547],
   [-0.5047682, 44.8519421],
   [-0.5049087999999999, 44.8523433],
   [-0.5050084, 44.8526707],
   [-0.5051045, 44.8530569],
   [-0.5051508, 44.853309],
   [-0.5051901, 44.8534974],
   [-0.5052365, 44.8538657],
   [-0.5052629, 44.8541288],
   [-0.5052785, 44.8544128],
   [-0.5052850999999999, 44.8545785],
   [-0.5052885, 44.8546947],
   [-0.505286, 44.8548256],
   [-0.5052810999999999, 44.8549964],
   [-0.5052772999999999, 44.8551916],
   [-0.5052662, 44.8553575],
   [-0.5052529, 44.8555461],
   [-0.5052207, 44.8557967],
   [-0.5051973, 44.855972],
   [-0.5051662, 44.8561252],
   [-0.5047299, 44.8580269],
   [-0.5044069, 44.859452],
   [-0.5043489, 44.8597077],
   [-0.5042706, 44.8600579],
   [-0.5042114999999999, 44.8603572],
   [-0.

In [17]:
test=pd.read_csv('project_data/valeur_fonciere_gironde.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [18]:
len(test)

100380

In [4]:
url= 'https://api.cquest.org/dvf?code_postal=33000'

In [5]:
results=requests.get(url).json() 

In [11]:
results.keys()

dict_keys(['source', 'derniere_maj', 'licence', 'nb_resultats', 'resultats'])

In [15]:
results['nb_resultats']

27673

In [14]:
results['resultats'][0]

{'code_service_ch': None,
 'reference_document': None,
 'articles_1': None,
 'articles_2': None,
 'articles_3': None,
 'articles_4': None,
 'articles_5': None,
 'numero_disposition': '000001',
 'date_mutation': '2015-12-07',
 'nature_mutation': 'Vente',
 'valeur_fonciere': 160000,
 'numero_voie': '5',
 'suffixe_numero': None,
 'type_voie': 'RUE',
 'code_voie': '3120',
 'voie': 'ELIE GINTRAC',
 'code_postal': '33000',
 'commune': 'BORDEAUX',
 'code_departement': '33',
 'code_commune': '33063',
 'prefixe_section': None,
 'section': 'DC',
 'numero_plan': '33063000DC0006',
 'numero_volume': None,
 'lot_1': '3',
 'surface_lot_1': None,
 'lot_2': None,
 'surface_lot_2': None,
 'lot_3': None,
 'surface_lot_3': None,
 'lot_4': None,
 'surface_lot_4': None,
 'lot_5': None,
 'surface_lot_51': None,
 'nombre_lots': '1',
 'code_type_local': '2',
 'type_local': 'Appartement',
 'identifiant_local': None,
 'surface_relle_bati': 26,
 'nombre_pieces_principales': 1,
 'nature_culture': None,
 'nature_cu