# Capstone Project

# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

In this project, we will find an optimal location for a specific type of commercial establishment, lets say we were hired to find one or more potencial locations to install a **Hotel** in **Paris**.

We will search for areas with the **minimum amount of Hotels as possible** and **as close to the city center as possible**

## Data

Based on definition of our problem, factors that will influence our decission are:
* number of existing Hotels in the area
* minimum distance to Hotels in the area, if any
* distance of area from city center

We decided to use regularly spaced grid of locations, centered around city center, to define our areas.

Following data sources will be needed to extract/generate the required information:
* to get the coordinates of the center of Paris we will use the **geopy** dependencie, and with those coordinates we will create all the other area's coordinates spread equally around Paris
* number of Hotels in every area will be obtained using **Foursquare API**

Before any action, we will import almost all the dependencies we will be using upon this project

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


Now, the Notre-Dame is right on the center of Paris, so we will take its coordinates to create all the areas 

In [2]:
address = '6 Parvis Notre-Dame - Pl. Jean-Paul II, 75004 Paris, France'

geolocator = Nominatim(user_agent="project_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Paris are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Paris are 48.85293705, 2.3500501225000026.


We create those functions to explore Paris ass an hexagonal grid of cells

In [3]:
#!pip install shapely
import shapely.geometry

#!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)


Now let's create a grid of area candidates, equaly spaced, centered around city center and within ~8km from Alexanderplatz. Our neighborhoods will be defined as circular areas with a radius of 300 meters, so our neighborhood centers will be 600 meters apart.

In [4]:
paris_center = [latitude, longitude]
paris_center

[48.85293705, 2.3500501225000026]

this code was used and modified to fit our porpuses, 

In [46]:
paris_center_x, paris_center_y = lonlat_to_xy(paris_center[1], paris_center[0]) # City center in Cartesian coordinates

circle_diameter = 600
hexagon_with = 8000
n_circles = 21
n_circles = int(((hexagon_with*2)/circle_diameter)) + 1
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = paris_center_x - hexagon_with
x_step = circle_diameter
y_min = paris_center_y - hexagon_with*k
y_step = circle_diameter * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


In [47]:
for i in range(0, int(n_circles/k)):
    y = y_min + i * y_step
    x_offset = (circle_diameter/2) if i%2==0 else 0
    for j in range(0, n_circles):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(paris_center_x, paris_center_y, x, y)
        if (distance_from_center <= (hexagon_with+1)):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)



  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon

In [48]:
print(len(latitudes), 'candidate neighborhood centers generated.')

630 candidate neighborhood centers generated.


In [49]:
map_paris = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.Marker(location=[latitude, longitude], popup='Notre-Dame').add_to(map_paris)
for lat, lon in zip(latitudes, longitudes):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin) 
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(map_paris)
    #folium.Marker([lat, lon]).add_to(map_berlin)
map_paris

Now, lets find and address to our areas that we got and put all this information in a DataFrame

In [51]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    input_coords = f'{lat}, {lon}'
    location = geolocator.reverse(input_coords)
    addresses.append(location.address)
    print(' .', end='')
print(' done.')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

In [53]:
addresses[350:380]

["Le Bistrot de Paris, 33, Rue de Lille, Quartier Saint-Thomas-d'Aquin, Paris 7e Arrondissement, Paris, Île-de-France, France métropolitaine, 75007, France",
 "Jardin de l'Infante, Ascenseur C, Quartier Saint-Germain-l'Auxerrois, Paris 1er Arrondissement, Paris, Île-de-France, France métropolitaine, 75001, France",
 'Restaurant Saudade, Rue des Bourdonnais, Quartier des Halles, Quartier Les Halles, Paris 1er Arrondissement, Paris, Île-de-France, France métropolitaine, 75001, France',
 "Rue Geoffroy l'Angevin (Fondation Galeries Lafayette), Rue Beaubourg, Beaubourg, Quartier Saint-Merri, Paris 4e Arrondissement, Paris, Île-de-France, France métropolitaine, France",
 '21, Rue Charlot, Quartier des Enfants-Rouges, Paris 3e Arrondissement, Paris, Île-de-France, France métropolitaine, 75003, France',
 '7, Passage Saint-Pierre Amelot, Quartier Saint-Ambroise, Paris 11e Arrondissement, Paris, Île-de-France, France métropolitaine, 75011, France',
 '12, Passage Saint-Ambroise, Quartier Saint-Am

In [55]:
len(addresses)

630

In [56]:
import pandas as pd

df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,"3, Rue du Président Roosevelt, Bourg-la-Reine,...",48.786934,2.319107,-430409.748445,5481822.0,7762.087348
1,"61, Avenue de la Division Leclerc, Cité-Jardin...",48.787825,2.327076,-429809.748445,5481822.0,7510.659092
2,"36, Allée Eugène Belgrand, Cachan, Arrondissem...",48.788716,2.335046,-429209.748445,5481822.0,7300.0
3,"Rue de la Concorde, Cachan, Arrondissement de ...",48.789606,2.343016,-428609.748445,5481822.0,7133.722731
4,"7, Allée Sonia Delaunay, Villejuif, Arrondisse...",48.790495,2.350986,-428009.748445,5481822.0,7014.983963
5,"Centre Hospitalier Paul Guiraud, Avenue de la ...",48.791384,2.358957,-427409.748445,5481822.0,6946.221995
6,"Temps des Délices, 85, Rue Jean Jaurès, Villej...",48.792273,2.366928,-426809.748445,5481822.0,6928.924881
7,"95, Rue du Génie, Coteau - Malassis, Vitry-sur...",48.79316,2.374899,-426209.748445,5481822.0,6963.476143
8,"Collège Lakanal, 11, Rue Lakanal, Coteau - Mal...",48.794048,2.382871,-425609.748445,5481822.0,7049.113419
9,"55, Rue Charles Infroit, Le Fort, Vitry-sur-Se...",48.794935,2.390843,-425009.748445,5481822.0,7184.010022


and let's now save/persist this data into local file.

In [57]:
df_locations.to_pickle('./locations.pkl')   

In [60]:
df_locations = pd.read_pickle('./locations.pkl')

In [61]:
df_locations.head(2)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,"3, Rue du Président Roosevelt, Bourg-la-Reine,...",48.786934,2.319107,-430409.748445,5481822.0,7762.087348
1,"61, Avenue de la Division Leclerc, Cité-Jardin...",48.787825,2.327076,-429809.748445,5481822.0,7510.659092


## Foursquare

Once we have our data_set built, We're ready to explore each area using Foursquare, we will pass ours credentials and we will explore each area using its cordinates and we will retrieve all the hotels located in each Area

In [63]:
CLIENT_ID = 'GKSVXMYQIKGQPZITFL52DOFLR0IYIB0IYVETAHC0F1HFZNVV' # your Foursquare ID
CLIENT_SECRET = 'A2ZBBMH5MAC1HIBTKX2FO1QZSTCILJ3HC0XP5QFALQ0GLS5A' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GKSVXMYQIKGQPZITFL52DOFLR0IYIB0IYVETAHC0F1HFZNVV
CLIENT_SECRET:A2ZBBMH5MAC1HIBTKX2FO1QZSTCILJ3HC0XP5QFALQ0GLS5A


Before we retrieve the locations for all of our areas, lets take a look of the API response for each area, we will take the cordinates of the city center and call the API

In [65]:
# type your answer here
LIMIT = 100
radius = circle_diameter/2
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=GKSVXMYQIKGQPZITFL52DOFLR0IYIB0IYVETAHC0F1HFZNVV&client_secret=A2ZBBMH5MAC1HIBTKX2FO1QZSTCILJ3HC0XP5QFALQ0GLS5A&ll=48.85293705,2.3500501225000026&v=20180605&radius=300.0&limit=100'

In [66]:
results = requests.get(url).json()
#results

In [67]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [68]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

Unnamed: 0,name,categories,lat,lng
0,Parvis Notre-Dame — Place Jean-Paul II,Plaza,48.853407,2.348456
1,Tours de la Cathédrale Notre-Dame de Paris,Scenic Lookout,48.85323,2.349207
2,Shakespeare & Company,Bookstore,48.852568,2.347096
3,Au Vieux Paris d'Arcole,French Restaurant,48.854196,2.350312
4,Sola,Japanese Restaurant,48.851569,2.348391
5,Comme chai Toi,French Restaurant,48.851749,2.349319
6,Square Jean XXIII,Park,48.852499,2.351375
7,A. Lacroix Pâtissier & Glacier,Pastry Shop,48.851714,2.349406
8,Le Petit Châtelet,French Restaurant,48.852637,2.346919
9,Sourire Tapas Françaises,Tapas Restaurant,48.851167,2.347728


We can see that we got 15 locations, that number is not even close to the max number of locs we can get, 100, and being the center we can assume that not other area will be get much more closer than that and wont have any risk of loosing any Hotel while calling the Geopy API

In [76]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [80]:
def getNearbyVenues(names, latitudes, longitudes, X, Y, dist_center, radius=circle_diameter/2):
    
    venues_list=[]
    index = 0
    for name, lat, lng, x, y, dist in zip(names, latitudes, longitudes, X, Y, dist_center):
        print('{}){}'.format(index, name) )
        
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            x, 
            y, 
            dist,
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        
        index = index + 1
        if lat == latitudes.iloc[-1]:
            print("DONE")

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'X',
                  'Y',
                  'Distance from center',           
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    
    return(nearby_venues)

In [81]:
paris_locs_areas = getNearbyVenues(names=df_locations['Address'], 
                                latitudes=df_locations['Latitude'], 
                                longitudes=df_locations['Longitude'],
                                X=df_locations['X'],
                                Y=df_locations['Y'],
                                dist_center=df_locations['Distance from center']
                                  )

0)3, Rue du Président Roosevelt, Bourg-la-Reine, Antony, Hauts-de-Seine, Île-de-France, France métropolitaine, 92340, France
1)61, Avenue de la Division Leclerc, Cité-Jardins, Cachan, Arrondissement de L'Haÿ-les-Roses, Val-de-Marne, Île-de-France, France métropolitaine, 94230, France
2)36, Allée Eugène Belgrand, Cachan, Arrondissement de L'Haÿ-les-Roses, Val-de-Marne, Île-de-France, France métropolitaine, 94230, France
3)Rue de la Concorde, Cachan, Arrondissement de L'Haÿ-les-Roses, Val-de-Marne, Île-de-France, France métropolitaine, 94230, France
4)7, Allée Sonia Delaunay, Villejuif, Arrondissement de L'Haÿ-les-Roses, Val-de-Marne, Île-de-France, France métropolitaine, 94800, France
5)Centre Hospitalier Paul Guiraud, Avenue de la République, Villejuif, Arrondissement de L'Haÿ-les-Roses, Val-de-Marne, Île-de-France, France métropolitaine, 94800, France
6)Temps des Délices, 85, Rue Jean Jaurès, Villejuif, Arrondissement de L'Haÿ-les-Roses, Val-de-Marne, Île-de-France, France métropolita

In [84]:
paris_locs_areas.head(10)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,X,Y,Distance from center,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"3, Rue du Président Roosevelt, Bourg-la-Reine,...",48.786934,2.319107,-430409.748445,5481822.0,7762.087348,Fou Lo,48.785442,2.317927,Asian Restaurant
1,"3, Rue du Président Roosevelt, Bourg-la-Reine,...",48.786934,2.319107,-430409.748445,5481822.0,7762.087348,Domino's Pizza,48.787782,2.318917,Pizza Place
2,"3, Rue du Président Roosevelt, Bourg-la-Reine,...",48.786934,2.319107,-430409.748445,5481822.0,7762.087348,Sanitaire Installation Moderne,48.786469,2.318365,Other Repair Shop
3,"3, Rue du Président Roosevelt, Bourg-la-Reine,...",48.786934,2.319107,-430409.748445,5481822.0,7762.087348,Arrêt Place de la Résistance - Charles de Gaul...,48.786512,2.317919,Bus Stop
4,"61, Avenue de la Division Leclerc, Cité-Jardin...",48.787825,2.327076,-429809.748445,5481822.0,7510.659092,Piscine Intercommunale de Cachan,48.786616,2.327954,Pool
5,"61, Avenue de la Division Leclerc, Cité-Jardin...",48.787825,2.327076,-429809.748445,5481822.0,7510.659092,Stade Léo Lagrange,48.785574,2.326525,Athletics & Sports
6,"61, Avenue de la Division Leclerc, Cité-Jardin...",48.787825,2.327076,-429809.748445,5481822.0,7510.659092,Autolib' Station,48.785661,2.32878,Rental Car Location
7,"7, Allée Sonia Delaunay, Villejuif, Arrondisse...",48.790495,2.350986,-428009.748445,5481822.0,7014.983963,La Fabrik,48.791494,2.353263,Non-Profit
8,"7, Allée Sonia Delaunay, Villejuif, Arrondisse...",48.790495,2.350986,-428009.748445,5481822.0,7014.983963,Place Pablo Picasso,48.79013,2.350865,Plaza
9,"7, Allée Sonia Delaunay, Villejuif, Arrondisse...",48.790495,2.350986,-428009.748445,5481822.0,7014.983963,Aire de jeux des hautes bruyeres,48.792107,2.352126,Playground


In [87]:
paris_locs_areas.to_pickle('./paris_venues.pkl')   

In [111]:
paris_locs_areas = pd.read_pickle('./paris_venues.pkl')

In [112]:
paris_locs_areas.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,X,Y,Distance from center,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"3, Rue du Président Roosevelt, Bourg-la-Reine,...",48.786934,2.319107,-430409.748445,5481822.0,7762.087348,Fou Lo,48.785442,2.317927,Asian Restaurant
1,"3, Rue du Président Roosevelt, Bourg-la-Reine,...",48.786934,2.319107,-430409.748445,5481822.0,7762.087348,Domino's Pizza,48.787782,2.318917,Pizza Place
2,"3, Rue du Président Roosevelt, Bourg-la-Reine,...",48.786934,2.319107,-430409.748445,5481822.0,7762.087348,Sanitaire Installation Moderne,48.786469,2.318365,Other Repair Shop
3,"3, Rue du Président Roosevelt, Bourg-la-Reine,...",48.786934,2.319107,-430409.748445,5481822.0,7762.087348,Arrêt Place de la Résistance - Charles de Gaul...,48.786512,2.317919,Bus Stop
4,"61, Avenue de la Division Leclerc, Cité-Jardin...",48.787825,2.327076,-429809.748445,5481822.0,7510.659092,Piscine Intercommunale de Cachan,48.786616,2.327954,Pool


In [113]:
paris_locs_areas.shape

(8823, 10)

Before we explore the Venue's Categories we will remove all the venues that are a restaurant and after that we will track the venues that are similitar to an Hotel but are not defined as one

In [114]:
paris_locs_areas = paris_locs_areas.drop( paris_locs_areas[paris_locs_areas['Venue Category'].str.contains('restaurant', case=False)].index    )

In [115]:
paris_locs_areas.shape

(5821, 10)

In [116]:
paris_locs_areas = paris_locs_areas.drop( paris_locs_areas[paris_locs_areas['Venue Category'].str.contains('store', case=False)].index    )

In [117]:
paris_locs_areas.shape

(5363, 10)

In [119]:
paris_locs_areas = paris_locs_areas.drop( paris_locs_areas[paris_locs_areas['Venue Category'].str.contains('shop', case=False)].index    )

In [120]:
paris_locs_areas.shape

(4748, 10)

We just got rid of almost 3K rows

In [123]:
pd.unique(paris_locs_areas['Venue Category'])

array(['Pizza Place', 'Bus Stop', 'Pool', 'Athletics & Sports',
       'Rental Car Location', 'Non-Profit', 'Plaza', 'Playground',
       'Funeral Home', 'Concert Hall', 'Bakery', 'Supermarket',
       'Speakeasy', 'Home Service', 'Bus Station', 'Gastropub',
       'Science Museum', 'Theater', 'Art Gallery', 'Gym', 'Pharmacy',
       'Boarding House', 'Trail', 'Bank', 'Hotel', 'Gym / Fitness Center',
       'Tennis Court', 'Construction & Landscaping', 'Park',
       'Metro Station', 'Performing Arts Venue', 'Bed & Breakfast',
       'Harbor / Marina', 'Train Station', 'Café', 'Movie Theater',
       'Stadium', 'Farmers Market', 'Motel', 'Bistro', 'Climbing Gym',
       'Event Space', 'Arcade', 'Boat or Ferry', 'Flea Market',
       'Gas Station', 'Brasserie', 'Comedy Club', 'Creperie',
       'Sports Club', 'Skate Park', 'Track Stadium', 'Basketball Stadium',
       'Organic Grocery', 'Bar', 'Dive Bar', 'General Entertainment',
       'Tram Station', 'Food Court', 'Bike Rental / Bike 

We will get all rows wich fit in one of the Categories above: 
Boarding House, Hotel, Bed & Breakfast, Hostel, Hotel Bar

In [137]:
paris_hotels = paris_locs_areas[ (paris_locs_areas['Venue Category'].str.contains('hotel', case=False)) |
                      (paris_locs_areas['Venue Category'].str.contains('boarding house', case=False)) |
                     (paris_locs_areas['Venue Category'].str.contains('bed & breakfast', case=False)) |
                     (paris_locs_areas['Venue Category'].str.contains('hostel', case=False)) |
                     (paris_locs_areas['Venue Category'].str.contains('hotel bar', case=False)) 
                                                                                                        ]
paris_hotels.reset_index(inplace=True)
paris_hotels.head()

Unnamed: 0,index,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,X,Y,Distance from center,Venue,Venue Latitude,Venue Longitude,Venue Category
0,32,"2, Rue d'Estienne d'Orves, Cité-Jardins, Cacha...",48.79194,2.321924,-430109.748445,5482342.0,7163.099888,Séjours & Affaires,48.793935,2.32045,Boarding House
1,42,"Rue François Delage, Cachan, Arrondissement de...",48.793721,2.337864,-428909.748445,5482342.0,6713.419397,Comfort Hotel,48.795742,2.335694,Hotel
2,60,"3, Rue Jules Ferry, Gare - Jean Jaurès, Vitry-...",48.800825,2.401641,-424109.748445,5482342.0,6993.568474,"La Maison Bacana, Paris",48.800733,2.400948,Bed & Breakfast
3,74,"Promenade des Vallons de la Bièvre, La Coulée ...",48.793377,2.292861,-432209.748445,5482861.0,7922.752047,La Roseraie - Hôtel Restaurant,48.795232,2.291929,Hotel
4,95,"6 bis, Impasse Guyton de Morveau, Arcueil, Arr...",48.798727,2.340683,-428609.748445,5482861.0,6129.437168,Hotel Stars Arcueil,48.797979,2.34397,Hotel


In [138]:
paris_hotels.drop('index', axis=1, inplace=True)
paris_hotels.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,X,Y,Distance from center,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"2, Rue d'Estienne d'Orves, Cité-Jardins, Cacha...",48.79194,2.321924,-430109.748445,5482342.0,7163.099888,Séjours & Affaires,48.793935,2.32045,Boarding House
1,"Rue François Delage, Cachan, Arrondissement de...",48.793721,2.337864,-428909.748445,5482342.0,6713.419397,Comfort Hotel,48.795742,2.335694,Hotel
2,"3, Rue Jules Ferry, Gare - Jean Jaurès, Vitry-...",48.800825,2.401641,-424109.748445,5482342.0,6993.568474,"La Maison Bacana, Paris",48.800733,2.400948,Bed & Breakfast
3,"Promenade des Vallons de la Bièvre, La Coulée ...",48.793377,2.292861,-432209.748445,5482861.0,7922.752047,La Roseraie - Hôtel Restaurant,48.795232,2.291929,Hotel
4,"6 bis, Impasse Guyton de Morveau, Arcueil, Arr...",48.798727,2.340683,-428609.748445,5482861.0,6129.437168,Hotel Stars Arcueil,48.797979,2.34397,Hotel


In [140]:
paris_hotels.shape

(743, 10)

In [141]:
df_hotels_only = paris_hotels[['Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']]

In [142]:
df_hotels_only.head()

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Séjours & Affaires,48.793935,2.32045,Boarding House
1,Comfort Hotel,48.795742,2.335694,Hotel
2,"La Maison Bacana, Paris",48.800733,2.400948,Bed & Breakfast
3,La Roseraie - Hôtel Restaurant,48.795232,2.291929,Hotel
4,Hotel Stars Arcueil,48.797979,2.34397,Hotel


In [None]:
def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

In [143]:
def get_X_Y_values(Latitudes, Longitudes):
    Xs = []
    Ys = []
    
    for lat, lon in zip(Latitudes, Longitudes):
        xx, yy = lonlat_to_xy(lon=lon, lat=lat)
        Xs.append(xx)
        Ys.append(yy)
        
    return Xs, Ys

In [144]:
Xs_venues, Ys_venues = get_X_Y_values(df_hotels_only['Venue Latitude'], df_hotels_only['Venue Longitude'])

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lo

In [145]:
len(Xs_venues)

743

In [146]:
df_hotels_only['X'] = Xs_venues
df_hotels_only['Y'] = Ys_venues

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_hotels_only['X'] = Xs_venues
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_hotels_only['Y'] = Ys_venues


In [148]:
df_hotels_only.head()

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,X,Y
0,Séjours & Affaires,48.793935,2.32045,Boarding House,-430180.205119,5482581.0
1,Comfort Hotel,48.795742,2.335694,Hotel,-429030.705448,5482592.0
2,"La Maison Bacana, Paris",48.800733,2.400948,Bed & Breakfast,-424162.17948,5482340.0
3,La Roseraie - Hôtel Restaurant,48.795232,2.291929,Hotel,-432243.148458,5483078.0
4,Hotel Stars Arcueil,48.797979,2.34397,Hotel,-428383.190997,5482738.0


In [150]:
paris_hotel_group = paris_hotels[['Neighborhood', 'Neighborhood Latitude', 
    'Neighborhood Longitude', 'X', 'Y', 'Distance from center', 'Venue']].groupby(['Neighborhood', 'Neighborhood Latitude',
                                                                          'Neighborhood Longitude', 'X', 'Y', 'Distance from center']).agg(['count'])

In [153]:
paris_hotel_group.reset_index(inplace=True)
paris_hotel_group.head(10)

Unnamed: 0_level_0,index,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,X,Y,Distance from center,Venue
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,count
0,0,"1, Rue Le Brun, Quartier de la Salpêtrière, Pa...",48.837883,2.355272,-426809.748445,5487018.0,1734.935157,3
1,1,"1, Square Paul Blanchet, Quartier du Bel-Air, ...",48.834087,2.405473,-423209.748445,5485979.0,4622.769733,3
2,2,"1, Villa Edgar Quinet, Garibaldi, Saint-Ouen-s...",48.905847,2.333739,-427109.748445,5494813.0,6065.476074,1
3,3,"1, Villa Marie Vassilieff, Quartier Necker, Pa...",48.843441,2.321023,-429209.748445,5488058.0,2402.08243,2
4,4,"10, Passage Turquetil, Quartier Sainte-Marguer...",48.851447,2.39284,-423809.748445,5488058.0,3176.476035,2
5,5,"10, Rue Gaston Paymal, Clichy, Arrondissement ...",48.903169,2.309776,-428909.748445,5494813.0,6383.572667,2
6,6,"10, Rue Jean Giraudoux, Quartier de Chaillot, ...",48.869015,2.298032,-430409.748445,5491175.0,4257.933771,17
7,7,"10, Rue Lecuirot, Plaisance, Quartier de Plais...",48.829314,2.320544,-429509.748445,5486499.0,3439.476704,5
8,8,"10, Rue d'Arcueil, Quartier Ferry-Buffalo, Mon...",48.811073,2.325221,-429509.748445,5484420.0,5050.74252,1
9,9,"10, Rue de l'Abbaye, Quartier de Saint-Germain...",48.854345,2.334641,-428009.748445,5489097.0,1153.256259,4


In [154]:
paris_hotel_group.shape

(253, 8)

In [155]:
def get_minimum_distance_to_hotel(X1, Y1, X2, Y2):
   
    minimum_distances = []
    
    for x1, y1  in zip(X1, Y1):
        min_dist = 10000000000000
        for x2, y2 in zip(X2, Y2):
            dist = calc_xy_distance(x1, y1, x2, y2)
            if dist < min_dist:
                min_dist = dist
        minimum_distances.append(min_dist)
        
    return minimum_distances

In [156]:
distances_to_hotels = get_minimum_distance_to_hotel(paris_hotel_group['X'], paris_hotel_group['Y'], Xs_venues, Ys_venues)

In [158]:
distances_to_hotels[0:11]

[24.147417355783208,
 111.54049212719345,
 147.26284335541084,
 171.04647130099636,
 98.23582500252003,
 225.1908588101217,
 28.518158246452,
 84.54645169371065,
 238.43632316600238,
 96.15518330974203,
 247.9733069649845]

In [159]:
paris_hotel_group['Closest Hotel'] = distances_to_hotels

In [161]:
paris_hotel_group.head()

Unnamed: 0_level_0,index,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,X,Y,Distance from center,Venue,Closest Hotel
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,count,Unnamed: 9_level_1
0,0,"1, Rue Le Brun, Quartier de la Salpêtrière, Pa...",48.837883,2.355272,-426809.748445,5487018.0,1734.935157,3,24.147417
1,1,"1, Square Paul Blanchet, Quartier du Bel-Air, ...",48.834087,2.405473,-423209.748445,5485979.0,4622.769733,3,111.540492
2,2,"1, Villa Edgar Quinet, Garibaldi, Saint-Ouen-s...",48.905847,2.333739,-427109.748445,5494813.0,6065.476074,1,147.262843
3,3,"1, Villa Marie Vassilieff, Quartier Necker, Pa...",48.843441,2.321023,-429209.748445,5488058.0,2402.08243,2,171.046471
4,4,"10, Passage Turquetil, Quartier Sainte-Marguer...",48.851447,2.39284,-423809.748445,5488058.0,3176.476035,2,98.235825


In [162]:
paris_hotel_group.to_pickle('./paris_hotels_groups.pkl')  

In [164]:
paris_hotel_group = pd.read_pickle('./paris_hotels_groups.pkl')

In [173]:
paris_hotel_group.head()

Unnamed: 0_level_0,index,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,X,Y,Distance from center,Venue,Closest Hotel
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,count,Unnamed: 9_level_1
0,0,"1, Rue Le Brun, Quartier de la Salpêtrière, Pa...",48.837883,2.355272,-426809.748445,5487018.0,1734.935157,3,24.147417
1,1,"1, Square Paul Blanchet, Quartier du Bel-Air, ...",48.834087,2.405473,-423209.748445,5485979.0,4622.769733,3,111.540492
2,2,"1, Villa Edgar Quinet, Garibaldi, Saint-Ouen-s...",48.905847,2.333739,-427109.748445,5494813.0,6065.476074,1,147.262843
3,3,"1, Villa Marie Vassilieff, Quartier Necker, Pa...",48.843441,2.321023,-429209.748445,5488058.0,2402.08243,2,171.046471
4,4,"10, Passage Turquetil, Quartier Sainte-Marguer...",48.851447,2.39284,-423809.748445,5488058.0,3176.476035,2,98.235825


In [174]:
paris_hotel_group.columns = ['index', 'Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'X', 'Y', 'Distance from center', 'Hotels nearby', 'Closest Hotel' ]
paris_hotel_group.head()

Unnamed: 0,index,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,X,Y,Distance from center,Hotels nearby,Closest Hotel
0,0,"1, Rue Le Brun, Quartier de la Salpêtrière, Pa...",48.837883,2.355272,-426809.748445,5487018.0,1734.935157,3,24.147417
1,1,"1, Square Paul Blanchet, Quartier du Bel-Air, ...",48.834087,2.405473,-423209.748445,5485979.0,4622.769733,3,111.540492
2,2,"1, Villa Edgar Quinet, Garibaldi, Saint-Ouen-s...",48.905847,2.333739,-427109.748445,5494813.0,6065.476074,1,147.262843
3,3,"1, Villa Marie Vassilieff, Quartier Necker, Pa...",48.843441,2.321023,-429209.748445,5488058.0,2402.08243,2,171.046471
4,4,"10, Passage Turquetil, Quartier Sainte-Marguer...",48.851447,2.39284,-423809.748445,5488058.0,3176.476035,2,98.235825


In [175]:
paris_hotel_group.drop('index', axis=1, inplace=True)

In [176]:
pd.unique(paris_hotel_group['Hotels nearby'])

array([ 3,  1,  2, 17,  5,  4,  7,  8, 12, 11,  6, 10,  9, 23, 14, 16],
      dtype=int64)

So, our conditions to pick a place will be:
* It cant be no more than 6 hotels in the area
* It must be 5km away from the center city maximum
* The closest hotel must be 100m away at minimum

In [177]:
paris_hotel_group.shape

(253, 8)

In [182]:
paris_hotels_filter = paris_hotel_group[ (paris_hotel_group['Hotels nearby']<=6) &
                                        (paris_hotel_group['Distance from center']<=5000) &
                                        (paris_hotel_group['Closest Hotel']<=100) 
                                           ]
paris_hotels_filter.reset_index(inplace=True)
paris_hotels_filter.drop('index', axis=1, inplace=True)
paris_hotels_filter.shape

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


(27, 8)

In [183]:
paris_hotels_filter.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,X,Y,Distance from center,Hotels nearby,Closest Hotel
0,"1, Rue Le Brun, Quartier de la Salpêtrière, Pa...",48.837883,2.355272,-426809.748445,5487018.0,1734.935157,3,24.147417
1,"10, Passage Turquetil, Quartier Sainte-Marguer...",48.851447,2.39284,-423809.748445,5488058.0,3176.476035,2,98.235825
2,"10, Rue Lecuirot, Plaisance, Quartier de Plais...",48.829314,2.320544,-429509.748445,5486499.0,3439.476704,5,84.546452
3,"10, Rue de l'Abbaye, Quartier de Saint-Germain...",48.854345,2.334641,-428009.748445,5489097.0,1153.256259,4,96.155183
4,"12, Rue Beautreillis, Quartier de l'Arsenal, P...",48.852899,2.36374,-425909.748445,5488577.0,1014.889157,4,54.132195


In [192]:
latitude, longitude =  paris_center[0], paris_center[1]
map_paris_hotels = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.Marker(location=[latitude, longitude], popup='Notre-Dame').add_to(map_paris_hotels)
#folium.TileLayer('cartodbpositron').add_to(map_paris_hotels)
latitudes = paris_hotels_filter['Neighborhood Latitude'].to_list()
longitudes = paris_hotels_filter['Neighborhood Longitude'].to_list()
for lat, lon in zip(latitudes, longitudes):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin) 
    folium.Circle([lat, lon], radius=300, color='red', fill=False).add_to(map_paris_hotels)
    #folium.Marker([lat, lon]).add_to(map_berlin)
map_paris_hotels

In [193]:
from sklearn.cluster import KMeans

number_of_clusters = 15

good_xys = paris_hotels_filter[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)


In [195]:
good_latitudes =  paris_hotels_filter['Neighborhood Latitude'].values
good_longitudes =  paris_hotels_filter['Neighborhood Longitude'].values
good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

In [197]:
from folium.plugins import HeatMap
map_paris_hotels = folium.Map(location=[latitude, longitude], zoom_start=13)
#folium.TileLayer('cartodbpositron').add_to(map_berlin)
HeatMap(good_locations, radius=25).add_to(map_paris_hotels)
folium.Circle(location=[latitude, longitude], radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_paris_hotels)
folium.Marker(location=[latitude, longitude], popup='Notre-Dame').add_to(map_paris_hotels)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=True, fill_opacity=0.25).add_to(map_paris_hotels) 
for lat, lon in zip(latitudes, longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_paris_hotels)
map_paris_hotels