# Capstone Project - The battle of Neighborhoods (week 1)

## Data Description

Foursquare has an important quantity of data on the neighborhoods for a good number of cities.  We intend to explore the city of Paris over a radius of 60km, Get all of its trending restaurants and evaluate their ratings and their foot traffics (statistics). We then intent to do some neighborhood segmentation on the data and try to analyze the results. We will recommend to Mrs. Suzanne the neighborhood with the highest foot traffic to the restaurant, and probably the highest density in restaurants. We will adopt the following methodology:
-	Data importation from foursquare
-	Data cleansing
-	Data transformation
-	Data analysis
-	Result presentation.


### Foursquare Data presentation

Foursquare has a series of variables for each venue amongts which the following.
- id : the Id of the venue
- name: the name of the venue
- location: the location of the venue which is a composite variable and contsins,
    * address
    * crossStreet
    * lat
    * lng
    * cc
    * city
    * state
    * country
- categories: the category of the venue, which is also a composite variable.
    * id
    * name
    * vpluralName
    * shortName
    * icon

For each venue, we will get the statistics. The fields are the following
- totalCheckins:	Number of total checkins at the venue during the time period.
- newCheckins:	Number of new visitors to the venue during the time period.
- uniqueVisitors:	Number of unique visitors at the venue during the time period.
- sharing:	A subobject containing fields twitter and facebook indicating the number of check-ins at the venue pushed to twitter and facebook.
- genderBreakdown:	A subobject containing fields female and male indicating the number of checkins at the venue by men and women.
- ageBreakdown:	An array of subobjects with fields age and checkins indicating the number of check-ins at the venue by people of different age ranges.
- hourBreakdown:	A 24-element array of subobjects with fields hour and checkins indicating the number of checkins at the venue during each hour of the day.
- visitCountHistogram:	A list of subobjects with fields checkins and users indicating how many users checked in a given number of times.
- topVisitors:	An array of the top 10 users by check-in count during the time period. Array elements are objects with fields user, and checkins, the number of checkins by - that user. Does not include users who have opted out of sharing their check-ins with venue managers in their settings.
- recentVisitors:	An array of the 10 most recent users. Array elements are objects with fields user and lastCheckin, the timestamp of the user’s last check-in. This field is only present if startAt is specified and endAt is NOT specified (i.e., now). Does not include users who have opted out of sharing their check-ins with venue managers in their settings.   


### Foursquare Data example

In [1]:
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    geographiclib: 1.49-py_0   conda-forge
    geopy:         1.18.1-py_0 conda-forge

geographiclib- 100% |################################| Time: 0:00:00  23.89 MB/s
geopy-1.18.1-p 100% |################################| Time: 0:00:00  35.33 MB/s
Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    altair:  2.2.2-py35_1 conda-forge
    branca:  0.3.1-py_0   conda-forge
    folium:  0.5.0-py_0   conda-forge
    vincent: 0.4.4-py_1   conda-forge

altair-2.2.2-p 100% |################################| Time: 0:00:00  53.70 MB/s
branca-0.3.1-p 100% |################################| Time: 0:00:00  34.60 MB/s
vincent-0.4.4- 100% |###################

In [2]:
CLIENT_ID = 'RLOERECTSIUQXXJJGZGB1FQGVBCMMR1KJIZWHGF2ATNR5KTW' # your Foursquare ID
CLIENT_SECRET = 'MOSKWWH3L40MNHJE01NXTAFSZXW2KEYD4HP2PU31NYM0YRVW' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: RLOERECTSIUQXXJJGZGB1FQGVBCMMR1KJIZWHGF2ATNR5KTW
CLIENT_SECRET:MOSKWWH3L40MNHJE01NXTAFSZXW2KEYD4HP2PU31NYM0YRVW


In [3]:
latitude = 48.866667
search_query = 'Restaurant'
radius = 6000
print(search_query + ' .... OK!')
longitude = 2.333333
print(latitude, longitude)

Restaurant .... OK!
48.866667 2.333333


In [4]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=RLOERECTSIUQXXJJGZGB1FQGVBCMMR1KJIZWHGF2ATNR5KTW&client_secret=MOSKWWH3L40MNHJE01NXTAFSZXW2KEYD4HP2PU31NYM0YRVW&ll=48.866667,2.333333&v=20180604&query=Restaurant&radius=6000&limit=30'

In [5]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c4ea3fa1ed2193b44537d55'},
 'response': {'venues': [{'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/french_',
       'suffix': '.png'},
      'id': '4bf58dd8d48988d10c941735',
      'name': 'French Restaurant',
      'pluralName': 'French Restaurants',
      'primary': True,
      'shortName': 'French'}],
    'hasPerk': False,
    'id': '4e078807e4cdefcff6dce4f6',
    'location': {'address': 'Tour Eiffel',
     'cc': 'FR',
     'city': 'Paris',
     'country': 'France',
     'crossStreet': '1er étage',
     'distance': 3007,
     'formattedAddress': ['Tour Eiffel (1er étage)', '75007 Paris', 'France'],
     'labeledLatLngs': [{'label': 'display',
       'lat': 48.858365824021554,
       'lng': 2.294248938560486}],
     'lat': 48.858365824021554,
     'lng': 2.294248938560486,
     'postalCode': '75007',
     'state': 'Île-de-France'},
    'name': 'Restaurant 58 Tour Eiffel',
    'referralId': 'v-1548657658'},
   {'c

In [6]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'name': 'French Restaurant', 'shortName': 'F...",False,4e078807e4cdefcff6dce4f6,Tour Eiffel,FR,Paris,France,1er étage,3007,"[Tour Eiffel (1er étage), 75007 Paris, France]","[{'lat': 48.858365824021554, 'label': 'display...",48.858366,2.294249,,75007.0,Île-de-France,Restaurant 58 Tour Eiffel,v-1548657658,
1,"[{'name': 'French Restaurant', 'shortName': 'F...",False,4cbb05054352a1cd0d4396f5,Musée d'Orsay,FR,Paris,France,1 rue de la Légion d'Honneur,916,"[Musée d'Orsay (1 rue de la Légion d'Honneur),...","[{'lat': 48.86029986248957, 'label': 'display'...",48.8603,2.325392,,75007.0,Île-de-France,Le Restaurant du Musée d'Orsay,v-1548657658,
2,"[{'name': 'French Restaurant', 'shortName': 'F...",False,4adcda13f964a520e53621e3,228 rue de Rivoli,FR,Paris,France,,420,"[228 rue de Rivoli, 75001 Paris, France]","[{'lat': 48.86516, 'label': 'display', 'lng': ...",48.86516,2.32807,,75001.0,Île-de-France,Restaurant Le Meurice Alain Ducasse,v-1548657658,
3,"[{'name': 'Cafeteria', 'shortName': 'Cafeteria...",False,4e57797d7d8bf55c172826d9,,FR,Paris,France,,142,"[75002 Paris, France]","[{'lat': 48.86791550563465, 'label': 'display'...",48.867916,2.333772,,75002.0,Île-de-France,Restaurant d'Entreprise,v-1548657658,
4,"[{'name': 'French Restaurant', 'shortName': 'F...",False,52cf091111d2d9bb920d6122,Hôtel Costes,FR,Paris,France,239 rue Saint-Honoré,398,"[Hôtel Costes (239 rue Saint-Honoré), Paris, F...","[{'lat': 48.866697894503744, 'label': 'display...",48.866698,2.327894,,,Île-de-France,Restaurant Costes Saint-Honoré,v-1548657658,


In [7]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Restaurant 58 Tour Eiffel,French Restaurant,Tour Eiffel,FR,Paris,France,1er étage,3007,"[Tour Eiffel (1er étage), 75007 Paris, France]","[{'lat': 48.858365824021554, 'label': 'display...",48.858366,2.294249,,75007.0,Île-de-France,4e078807e4cdefcff6dce4f6
1,Le Restaurant du Musée d'Orsay,French Restaurant,Musée d'Orsay,FR,Paris,France,1 rue de la Légion d'Honneur,916,"[Musée d'Orsay (1 rue de la Légion d'Honneur),...","[{'lat': 48.86029986248957, 'label': 'display'...",48.8603,2.325392,,75007.0,Île-de-France,4cbb05054352a1cd0d4396f5
2,Restaurant Le Meurice Alain Ducasse,French Restaurant,228 rue de Rivoli,FR,Paris,France,,420,"[228 rue de Rivoli, 75001 Paris, France]","[{'lat': 48.86516, 'label': 'display', 'lng': ...",48.86516,2.32807,,75001.0,Île-de-France,4adcda13f964a520e53621e3
3,Restaurant d'Entreprise,Cafeteria,,FR,Paris,France,,142,"[75002 Paris, France]","[{'lat': 48.86791550563465, 'label': 'display'...",48.867916,2.333772,,75002.0,Île-de-France,4e57797d7d8bf55c172826d9
4,Restaurant Costes Saint-Honoré,French Restaurant,Hôtel Costes,FR,Paris,France,239 rue Saint-Honoré,398,"[Hôtel Costes (239 rue Saint-Honoré), Paris, F...","[{'lat': 48.866697894503744, 'label': 'display...",48.866698,2.327894,,,Île-de-France,52cf091111d2d9bb920d6122
5,Restaurant Mon Paris !,French Restaurant,6 rue Édouard VII,FR,Paris,France,,555,"[6 rue Édouard VII, 75009 Paris, France]","[{'lat': 48.870959, 'label': 'display', 'lng':...",48.870959,2.329453,,75009.0,Île-de-France,5640db04cd10809c53939c69
6,Restaurant Coréen,Korean Restaurant,Rue Sainte-Anne,FR,Paris,France,,112,"[Rue Sainte-Anne, Paris, France]","[{'lat': 48.867528, 'label': 'display', 'lng':...",48.867528,2.332523,,,Île-de-France,4bb7a0dd3db7b7133f49209a
7,Restaurant Le Mona Lisa,French Restaurant,47 rue Berger,FR,Paris,France,,838,"[47 rue Berger, 75001 Paris, France]","[{'lat': 48.862198091597314, 'label': 'display...",48.862198,2.342542,,75001.0,Île-de-France,59ab059c86bc49021b313ca9
8,Restaurant Sichuan,Szechuan Restaurant,17 rue Le Peletier,FR,Paris,France,,775,"[17 rue Le Peletier, 75009 Paris, France]","[{'lat': 48.872789, 'label': 'display', 'lng':...",48.872789,2.338398,,75009.0,Île-de-France,5a22aef7c8b2fb5c6278c5c2
9,Le Grand Amour Restaurant,Restaurant,18 rue de la Fidélité,FR,Paris,France,,1899,"[18 rue de la Fidélité, 75010 Paris, France]","[{'lat': 48.87456598277865, 'label': 'display'...",48.874566,2.356321,,75010.0,Île-de-France,56507fbd498eec7fabc67121


Let's visualize the restaurants that are nearby

In [None]:
dataframe_filtered.name