# Capstone Project : Battle of the Neighborhoods

## Housing Market in Paris

Paris is one of the most expensive cities in the world when it comes to real estate. Finding an appartement in Paris is a true mission, and can take up to months of searching, appointments, visits, only to see the appartment go to someone else for most cases. However, with interest rates as low as 1% for a credit line over 25 years, and with real estate picking up value year after year at high percentages, becoming a home owner has become appealing to most parisians, as soon as they become active in the work market.

With demand for real estate increasing in Paris and offers rare, expensive and far between, it is reasonable to assume that home buyers need to be guided in the this important decision making. In this project, we use machine learning techniques to cluster neighborhoods based on real estate prices, and make recommendations based on venues of the surrounding area in order to help them make the best suited decision for them.

## Data description

To solve the problem at hand, we will be using data scraped from the french governmental website (https://www.data.gouv.fr/fr/datasets/demandes-de-valeurs-foncieres/). This database will give us access to addresses of properties, their values, their types, their surfaces and the number of rooms in them. We will also be using the foutsquare API interface to explore the neighborhoods and recommend locations according to the presence of nearby accomodations, by using maps for visualization. 

Another important data source of this study is the metro stations around Paris, in order to help us determine the accessibility to a given appartment or house. This information is available on the website of the transport service provider of the city, RATP (https://www.ratp.fr/).

Combining data on the properties, their features, their locations and their surroundings, we should be able to provide the home buyers with sufficient insight to make informed decisions.

## Data

Let's import the data below to have a look at the first few rows of our data bases

In [5]:
#Real Estate features
!wget 'https://www.data.gouv.fr/fr/datasets/r/3004168d-bec4-44d9-a781-ef16f41856a2' -O housingdata.csv
#RATP stations
!wget https://dataratp2.opendatasoft.com/explore/dataset/positions-geographiques-des-stations-du-reseau-ratp/download/\?format\=csv\&timezone\=Europe/Berlin\&lang\=fr\&use_labels_for_header\=true\&csv_separator\=%3B -O ratp_stations.csv

--2020-03-01 00:03:28--  https://www.data.gouv.fr/fr/datasets/r/3004168d-bec4-44d9-a781-ef16f41856a2
Resolving www.data.gouv.fr (www.data.gouv.fr)... 37.59.183.73
Connecting to www.data.gouv.fr (www.data.gouv.fr)|37.59.183.73|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://static.data.gouv.fr/resources/demandes-de-valeurs-foncieres/20191030-122930/valeursfoncieres-2019.txt [following]
--2020-03-01 00:03:28--  https://static.data.gouv.fr/resources/demandes-de-valeurs-foncieres/20191030-122930/valeursfoncieres-2019.txt
Resolving static.data.gouv.fr (static.data.gouv.fr)... 37.59.183.73
Connecting to static.data.gouv.fr (static.data.gouv.fr)|37.59.183.73|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 135748668 (129M) [text/plain]
Saving to: ‘housingdata.csv’


2020-03-01 00:03:31 (48.2 MB/s) - ‘housingdata.csv’ saved [135748668/135748668]

--2020-03-01 00:03:32--  https://dataratp2.opendatasoft.com/explore/dataset/positions

In [8]:
import pandas as pd
df = pd.read_csv('housingdata.csv', sep = '|')
df.head()

Unnamed: 0,Code service CH,Reference document,1 Articles CGI,2 Articles CGI,3 Articles CGI,4 Articles CGI,5 Articles CGI,No disposition,Date mutation,Nature mutation,...,Surface Carrez du 5eme lot,Nombre de lots,Code type local,Type local,Identifiant local,Surface reelle bati,Nombre pieces principales,Nature culture,Nature culture speciale,Surface terrain
0,,,,,,,,1,11/01/2019,Vente,...,,1,3.0,Dépendance,,0.0,0.0,,,
1,,,,,,,,1,11/01/2019,Vente,...,,2,2.0,Appartement,,67.0,3.0,,,
2,,,,,,,,1,08/02/2019,Vente,...,,0,1.0,Maison,,118.0,4.0,AG,PARC,913.0
3,,,,,,,,1,08/02/2019,Vente,...,,0,1.0,Maison,,118.0,4.0,S,,1000.0
4,,,,,,,,1,04/04/2019,Vente,...,,0,1.0,Maison,,60.0,3.0,S,,96.0


In [9]:
ratp = pd.read_csv('ratp_stations.csv', sep=";")
ratp.head()

Unnamed: 0,ID,Name,Description,Coordinates
0,3677668,LYCEE EMILY BRONTE,RUE GABRIEL - 77258,"48.839179854,2.63877127865"
1,3677692,LES QUATRE PAVES,ROND-POINT DES QUATRE PAVES - 77337,"48.8516342568,2.62146597068"
2,6869589,SUZANNE BUISSON,AVENUE HENRI VARAGNAT - 93010,"48.9135017218,2.48226363192"
3,1656,La Fourche,Avenue de Clichy - 75117,"48.8874368425,2.32572522202"
4,3677694,CHOCOLATERIE,BOULEVARD PIERRE CARLE - 77337,"48.8560221346,2.62439108407"
