# **The Battle of Neighborhoods Capstone - Final Project**

<p align="center">
  <img width="460" height="300" src="https://foter.com/photos/395/bedroom-in-hotel.jpg">
</p>

Photo on Foter.com

## **Adding neighborhoods' informations when choosing a hotel or Airbnb location**

This note book provide an example on how to use [Foursquare API](https://foursquare.com/). 

It contains:

     1. A detailed description of business problem and discussion of the background
     2. A descriptions of data, its different sources and how it will solve the problem
    

## 1 Introduction and Discussion of the Business Objective and Problem¶


### The usual research of hotels or airbnb

#### What's the common point between below airbnb in Paris result :

<p align="center">
  <img width="460" height="300" src="https://github.com/Aymen-lng/Coursera_Capstone/blob/master/Capture5.PNG?raw=true">
</p>

#### and below google result for "Hotel Paris"

<p align="center">
  <img width="460" height="300" src="https://github.com/Aymen-lng/Coursera_Capstone/blob/master/Capture4.PNG?raw=true">
</p>

You get it....... ? Yes right! a total services/prices oriented results.   
What if you do not know anything about Paris different districts? Paris like New York, London or Berlin is a big, multicultural city with different life style depending on which district you are. You might be interested by an area with a lot of pubs and wine bar close to your place to avoid paying extra for a cab to go back to your place every nights. Or maybe, an calm area with a lot of green places and coffee shops to enjoy a nice view breakfast.  
So how would you choose ? I guess that you are doing like me:  
    * Or you anticipate by making researches on different districts of the city, and then choose them when booking your trip, taking the risk that no hotel or airbnb are available there.  
    * Or you search first on the best place based on prices and services proposed, then make additional researches on districts where the best results are.  

So the main problem is the missing informations about the different areas of a city when locking for your hotel or airbnb forcing you making different researches to just looking for a place to stay. 

### And what if...?

What if you could use a tool that shows different districts of a city based on its kind of venues and find the one that match the most with your trip mindset and shows it on a map with all airbnb and hotels present there.
Let's keep as example the city of Paris. This city is made up of 20 different districts named "arrondissement". All of them have particularities. Some are full of coffees, restaurants, others bars, others parks and museum. If it's your first trip in Paris, you can be totally lost.  
The request is: is it possible to build and use a machine learning method to train a model that will cluster all these districts based on the categories of each venues in the district (bar,pub,museum,park etc...), then, by defining what kind of trip you want, propose you the best district and shows it in a map of Paris with all hotels and airbnb geo-located?

## 2 Data 

### 2.1 Data sources

To build this tool we will need following informations (you can click on links) :  

+ Paris district location. 20 different districts also known as "arrondissements" that will be used as cluster. We need their exact localization in order to map them. [Open Data Paris](https://opendata.paris.fr/explore/dataset/arrondissements/table/?dataChart)  
+ Venues categories and Hotel localization. By using this api we will get all the venues in each neighborhood : [Foursquare API](https://foursquare.com/)
+ Airbnb localization : [Open Data Paris](https://data.opendatasoft.com/explore/dataset/airbnb-averages%40public/export/?disjunctive.room_type&sort=date&refine.location=France,+Paris)

### 2.2 Data approach 

With Paris districts data, we will be able to localize and identified them in a map of Paris.  
Using Foursqure data, we will be able to analye all the diferente venues categories grouped by districts.  
FInaly with foursquare again and Airbnb open data, we will be able to identified the place to stay by district.

## 3 Methodology

### Step 1: Libraries import

In [4]:
!pip install beautifulsoup4
import numpy as np # library to handle data in a vectorized manner
import json # library to handle JSON files
import pandas as pd

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library
from bs4 import BeautifulSoup

# Import k-means from clustering stage
from sklearn.cluster import KMeans



print('Libraries imported.')

Collecting beautifulsoup4
[?25l  Downloading https://files.pythonhosted.org/packages/e8/b5/7bb03a696f2c9b7af792a8f51b82974e51c268f15e925fc834876a4efa0b/beautifulsoup4-4.9.0-py3-none-any.whl (109kB)
[K     |████████████████████████████████| 112kB 8.2MB/s eta 0:00:01
[?25hCollecting soupsieve>1.2 (from beautifulsoup4)
  Downloading https://files.pythonhosted.org/packages/05/cf/ea245e52f55823f19992447b008bcbb7f78efc5960d77f6c34b5b45b36dd/soupsieve-2.0-py2.py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.9.0 soupsieve-2.0
Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


### Step 2: Data Analysis

#### Data cleaning and preparation

In [51]:
# Paris diistricts data will be download from 'https://opendata.paris.fr/explore/dataset/arrondissements/table/?dataChart'
# then upload into a github repository in order to get them avialable for this project.
# The format used is CSV

paris = pd.read_csv('https://raw.githubusercontent.com/Aymen-lng/Coursera_Capstone/master/Arrondissements%20(1).csv')
paris


Unnamed: 0,N_SQ_AR,C_AR,C_ARINSEE,L_AR,L_AROFF,N_SQ_CO,SURFACE,PERIMETRE,Geometry X Y
0,750000005,5,75105,5ème Ardt,Panthéon,750001537,2539375.0,6239.195396,"48.8444431505, 2.35071460958"
1,750000006,6,75106,6ème Ardt,Luxembourg,750001537,2153096.0,6483.686786,"48.8491303586, 2.33289799905"
2,750000012,12,75112,12ème Ardt,Reuilly,750001537,16314780.0,24089.6663,"48.8349743815, 2.42132490078"
3,750000009,9,75109,9ème Ardt,Opéra,750001537,2178303.0,6471.58829,"48.8771635173, 2.33745754348"
4,750000019,19,75119,19ème Ardt,Buttes-Chaumont,750001537,6792651.0,11253.18248,"48.8870759966, 2.38482096015"
5,750000015,15,75115,15ème Ardt,Vaugirard,750001537,8494994.0,13678.79831,"48.8400853759, 2.29282582242"
6,750000002,2,75102,2ème Ardt,Bourse,750001537,991153.7,4554.10436,"48.8682792225, 2.34280254689"
7,750000017,17,75117,17ème Ardt,Batignolles-Monceau,750001537,5668835.0,10775.57952,"48.887326522, 2.30677699057"
8,750000003,3,75103,3ème Ardt,Temple,750001537,1170883.0,4519.263648,"48.86287238, 2.3600009859"
9,750000011,11,75111,11ème Ardt,Popincourt,750001537,3665442.0,8282.011886,"48.8590592213, 2.3800583082"


In [52]:
# Let separate last column "Geometry XY" into two columns
paris = paris.join(paris['Geometry X Y'].str.split(',',1,expand=True))
paris.shape
paris

Unnamed: 0,N_SQ_AR,C_AR,C_ARINSEE,L_AR,L_AROFF,N_SQ_CO,SURFACE,PERIMETRE,Geometry X Y,0,1
0,750000005,5,75105,5ème Ardt,Panthéon,750001537,2539375.0,6239.195396,"48.8444431505, 2.35071460958",48.8444431505,2.35071460958
1,750000006,6,75106,6ème Ardt,Luxembourg,750001537,2153096.0,6483.686786,"48.8491303586, 2.33289799905",48.8491303586,2.33289799905
2,750000012,12,75112,12ème Ardt,Reuilly,750001537,16314780.0,24089.6663,"48.8349743815, 2.42132490078",48.8349743815,2.42132490078
3,750000009,9,75109,9ème Ardt,Opéra,750001537,2178303.0,6471.58829,"48.8771635173, 2.33745754348",48.8771635173,2.33745754348
4,750000019,19,75119,19ème Ardt,Buttes-Chaumont,750001537,6792651.0,11253.18248,"48.8870759966, 2.38482096015",48.8870759966,2.38482096015
5,750000015,15,75115,15ème Ardt,Vaugirard,750001537,8494994.0,13678.79831,"48.8400853759, 2.29282582242",48.8400853759,2.29282582242
6,750000002,2,75102,2ème Ardt,Bourse,750001537,991153.7,4554.10436,"48.8682792225, 2.34280254689",48.8682792225,2.34280254689
7,750000017,17,75117,17ème Ardt,Batignolles-Monceau,750001537,5668835.0,10775.57952,"48.887326522, 2.30677699057",48.887326522,2.30677699057
8,750000003,3,75103,3ème Ardt,Temple,750001537,1170883.0,4519.263648,"48.86287238, 2.3600009859",48.86287238,2.3600009859
9,750000011,11,75111,11ème Ardt,Popincourt,750001537,3665442.0,8282.011886,"48.8590592213, 2.3800583082",48.8590592213,2.3800583082


In [53]:
# Now we will rename some columns in order to clarify the dataset

# District : name of the central District for the Arrondissement
# Arrondissement : the Arrondissement or district number which is used to identify it
# Arrondissement_Fr : the descriptive French label for each Arrondissement

paris.rename(columns={'L_AROFF': 'Neighborhood ', 'C_AR': 'Arrondissement_Num', 'L_AR': 'French_Name',0: 'Latitude', 1: 'Longitude'}, inplace=True)

# Clean up the dataset to remove unnecessary columns.
# Some of the columns are for mapping software - not required here.

paris.drop(['N_SQ_AR','C_ARINSEE','N_SQ_CO','SURFACE', 'PERIMETRE', 'Geometry X Y' ], axis=1, inplace=True)
paris['Latitude']= paris['Latitude'].astype(float)
paris['Longitude']=paris['Longitude'].astype(float)
paris


Unnamed: 0,Arrondissement_Num,French_Name,Neighborhood,Latitude,Longitude
0,5,5ème Ardt,Panthéon,48.844443,2.350715
1,6,6ème Ardt,Luxembourg,48.84913,2.332898
2,12,12ème Ardt,Reuilly,48.834974,2.421325
3,9,9ème Ardt,Opéra,48.877164,2.337458
4,19,19ème Ardt,Buttes-Chaumont,48.887076,2.384821
5,15,15ème Ardt,Vaugirard,48.840085,2.292826
6,2,2ème Ardt,Bourse,48.868279,2.342803
7,17,17ème Ardt,Batignolles-Monceau,48.887327,2.306777
8,3,3ème Ardt,Temple,48.862872,2.360001
9,11,11ème Ardt,Popincourt,48.859059,2.380058


#### Localization of Paris and mapping

In [54]:
# Retrieve the Latitude and Longitude for Paris
from geopy.geocoders import Nominatim 

address = 'Paris'

# Define the user_agent as Paris_explorer
geolocator = Nominatim(user_agent="Paris_explorer")

location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('The geographical coordinates of Paris France are {}, {}.'.format(latitude, longitude))

The geographical coordinates of Paris France are 48.8566969, 2.3514616.


In [60]:
# create map of Paris using the above latitude and longitude values
map_paris = folium.Map(location=[latitude, longitude], zoom_start=12)


# add markers to map
for lat, lng, label in zip(paris['Latitude'], paris['Longitude'], paris['French_Name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=25,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.3,
        parse_html=False).add_to(map_paris)  
    
map_paris

#### Use the Foursquare API to explore the Arrondissements of Paris (Neighborhoods)¶

In [61]:
CLIENT_ID = 'XCCDCVVKRAGVZY2NAN0QTF4JJPL0P3WEZU3GYEX34Z5O0SAJ' # your Foursquare ID
CLIENT_SECRET = 'LCXXO41WIGRZOJAA0FJDM1UE132ALJHBQF15QUELLHSPB3PQ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XCCDCVVKRAGVZY2NAN0QTF4JJPL0P3WEZU3GYEX34Z5O0SAJ
CLIENT_SECRET:LCXXO41WIGRZOJAA0FJDM1UE132ALJHBQF15QUELLHSPB3PQ


In [62]:
paris.loc[0, 'French_Name']

'5ème Ardt'

In [64]:
neighborhood_latitude = paris.loc[0, 'Latitude'] # Neighborhood latitude value
neighborhood_longitude = paris.loc[0, 'Longitude'] # Neighborhood longitude value

neighborhood_name = paris.loc[0, 'French_Name'] # Neighborhood name

LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # displays the URL

'https://api.foursquare.com/v2/venues/explore?&client_id=XCCDCVVKRAGVZY2NAN0QTF4JJPL0P3WEZU3GYEX34Z5O0SAJ&client_secret=LCXXO41WIGRZOJAA0FJDM1UE132ALJHBQF15QUELLHSPB3PQ&v=20180605&ll=48.8444431505,2.35071460958&radius=500&limit=100'

In [65]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ebf12db69babe001b78f3da'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Panthéon',
  'headerFullLocation': 'Panthéon, Paris',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 89,
  'suggestedBounds': {'ne': {'lat': 48.848943155, 'lng': 2.357539657530384},
   'sw': {'lat': 48.839943145999996, 'lng': 2.3438895616296156}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bca09f70687ef3ba719dbcc',
       'name': "Au P'tit Grec",
       'location': {'address': '68 rue Mouffetard',
        'lat': 48.842858,
        'lng': 2.349721,
        'labeledLatLngs': [{'label': 'display',
          'lat': 48.842858,
          'lng': 2.349721