***Rome or Venice***

The business problem we have encountered was where to open the new restaurant in Italy. We hesitated between the two touristique destinations - Rome and Venice. This is a common problem to everybody who wants to open a restaurant. He/She aks this main question "Where to locate the restaurant or which is the best place for it?".
It this project I wil try to give the answer to the abovementioned question.

Fist of all, let's see a breaf information about Rome and Venice (from Wikipedia).


*Rome*

Rome is the capital city of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region. With 2,868,782 residents in 1,285 km2 (496.1 sq mi), it is also the country's most populated comune. It is the fourth-most populous city in the European Union by population within city limits. It is the centre of the Metropolitan City of Rome, which has a population of 4.3 million residents. Rome is located in the central-western portion of the Italian Peninsula, within Lazio (Latium), along the shores of the Tiber. The Vatican City (the smallest country in the world) is an independent country inside the city boundaries of Rome, the only existing example of a country within a city: for this reason Rome has been often defined as capital of two states.

*Venice*

Venice is a city in northeastern Italy and the capital of the Veneto region. It is situated across a group of 118 small islands that are separated by canals and linked by over 400 bridges. The islands are located in the shallow Venetian Lagoon, an enclosed bay that lies between the mouths of the Po and the Piave rivers (more exactly between the Brenta and the Sile). Parts of Venice are renowned for the beauty of their settings, their architecture, and artwork. The lagoon and a part of the city are listed as a UNESCO World Heritage Site.

In this project I will use geolocation data about these two cities. 
The target of the project will be to decide which city to chose for restaurant opening based on the the foursquare venues data,.

For business density calculation I will apply Density-based spatial clustering of applications with noise (DBSCAN). It is a data clustering algorithm that is commonly used in data mining and machine learning.

Based on a set of points, DBSCAN groups together points that are close to each other based on a distance measurement (usually Euclidean distance) and a minimum number of points. It also marks as outliers the points that are in low-density regions.

The main information which I need is:

- The information about latitude and longitude of Rome and Venice:
*Rome* - Latitude and longitude coordinates are: 41.902782, 12.496366
*Venice* - Latitude and longitude coordinates are: 45.444958, 12.328463

- From https://foursquare.com/developers/apps I will extract data about Rome and Venice (query food). 
The scope of the data is Location, Venue, Latitude, Longitude, Category.

The main steps which I will follow are:
1. To import the necessary libraries;
2. To get the location of the city;
3. To get the dataframe of the place using foursqare api;
4. To create a cluster map;
5. To get the location of Rome and Venice and make a map to display the two cities;
6. To create the sample matrix from the location of each venues;
7. To apply the DBSCAN algoritm to cluster data;
8. To compare the result from the DBSCAN algoritm.
9. To extract the conclusion.


In [1]:
# First step is to import the necessary libraries
import pandas as pd
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
import os,sys
import urllib
import requests 
import json
from urllib.request import urlopen
from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans
from sklearn.cluster import DBSCAN
!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library
import math

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    altair:  2.2.2-py35_1 conda-forge
    branca:  0.3.1-py_0   conda-forge
    folium:  0.5.0-py_0   conda-forge
    vincent: 0.4.4-py_1   conda-forge

altair-2.2.2-p 100% |################################| Time: 0:00:00  52.45 MB/s
branca-0.3.1-p 100% |################################| Time: 0:00:00  26.20 MB/s
vincent-0.4.4- 100% |################################| Time: 0:00:00  30.03 MB/s
folium-0.5.0-p 100% |################################| Time: 0:00:00  37.29 MB/s


In [2]:
# Get the location of the city
def getlocation(address):
    geolocator = Nominatim()
    location = geolocator.geocode(address)
    latitude = 0.0
    latitude = location.latitude
    longitude =0.0
    longitude = location.longitude
    print('The geograpical coordinate of ',address,'are {}, {}.'.format(latitude, longitude))
    return latitude,longitude

In [3]:
# Get the dataframe of the place using foursqare api
def getNearbyVenues(name, latitudes, longitudes, radius=5000):   
    venues_list=[]  


# create the API request URL
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        latitudes, 
        longitudes, 
        radius, 
        500)
            
        # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
       
        # return information for each nearby venue
    venues_list.append([(
        name, 
        v['venue']['name'], 
        v['venue']['location']['lat'], 
        v['venue']['location']['lng'],  
        v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Location','Venue','Latitude','Longitude','Category']    
    return(nearby_venues)

In [4]:
# Create cluster map
def clusterMap(kclusters,dfs):
    x = np.arange(kclusters)
    ys = [i+x+(i*x)**2 for i in range(kclusters)]
    colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
    rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
    markers_colors = []
    for lat, lon, poi, cluster in zip(dfs['Latitude'], dfs['Longitude'], dfs['Venue'], dfs['Cluster Labels']):
        label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[cluster-1],
            fill=True,
            fill_color=rainbow[cluster-1],
            fill_opacity=0.7).add_to(map_clusters)

In [5]:
# Get the epsilon value of the DBSCAN algorithm
def epsilon(data, MinPts):
    m, n = np.shape(data)
    xMax = np.max(data, 0)
    xMin = np.min(data, 0)
    eps = ((np.prod(xMax - xMin) * MinPts * math.gamma(0.5 * n + 1)) / (m * math.sqrt(math.pi ** n))) ** (1.0 / n)
    return eps


In [6]:
# Color the place in Map
def mapMarkers(map_name,dfs_data):  # add markers to map
    for lat, lng, label in zip(dfs_data['Latitude'], dfs_data['Longitude'], dfs_data['Venue']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7).add_to(map_name)
        
        
        
        
        

In [7]:
# Get the location of Rome and Venice
# Rome - Latitude and longitude coordinates are: 41.902782, 12.496366
# Venice - Latitude and longitude coordinates are: 45.444958, 12.328463
address_ro = 'Rome,IT'
address_ve = 'Venice,IT'
#location_ro = getlocation(address_ro)
latitude_ro = 41.902782
longitude_ro = 12.496366
#location_ve = getlocation(address_ve)
latitude_ve = 45.444958
longitude_ve = 12.328463
print('The geograpical coordinate of ',address_ro,'are {} {} .'.format(latitude_ro,longitude_ro))
print('The geograpical coordinate of ',address_ve,'are {} {}.'.format(latitude_ve,longitude_ve))


The geograpical coordinate of  Rome,IT are 41.902782 12.496366 .
The geograpical coordinate of  Venice,IT are 45.444958 12.328463.


In [8]:
map_all = folium.Map(location=[(latitude_ro+latitude_ve)/2, (longitude_ro+longitude_ve)/2], tiles='Stamen Terrain',zoom_start=5)

folium.Marker(location=[latitude_ro, longitude_ro], popup='Shanghai City').add_to(map_all)
folium.CircleMarker(location=[latitude_ro, longitude_ro], radius=10,
popup='Rome', color='#3186cc',fill_color='#3186cc').add_to(map_all)

folium.Marker(location=[latitude_ve, longitude_ve], popup='Venice').add_to(map_all)
folium.CircleMarker(location=[latitude_ve, longitude_ve], radius=10,
popup='Venice', color='#3186cc',fill_color='#3186cc').add_to(map_all)
map_all

In [9]:
map_all.save('map_all.html')

In [10]:
CLIENT_ID = 'IPROQ5XDXNQ45MP5LRZ503HUUSSFVMAPHXZKBTQV01DDJ2UH' # your Foursquare ID
CLIENT_SECRET = '3A4WUAEKMKKPCFCH4WHEGDPC51EGGQENZ45R0A2DYUBA33G5' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
QUERY = 'food'
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: IPROQ5XDXNQ45MP5LRZ503HUUSSFVMAPHXZKBTQV01DDJ2UH
CLIENT_SECRET:3A4WUAEKMKKPCFCH4WHEGDPC51EGGQENZ45R0A2DYUBA33G5


In [11]:
dfs_ro = getNearbyVenues('Rome',latitude_ro,longitude_ro)
dfs_ro.head()

Unnamed: 0,Location,Venue,Latitude,Longitude,Category
0,Rome,Piazza della Repubblica,41.902422,12.496367,Plaza
1,Rome,The St. Regis Rome,41.904072,12.494873,Hotel
2,Rome,Palazzo Massimo Alle Terme - Museo Nazionale R...,41.901696,12.497788,History Museum
3,Rome,Hotel Artemide,41.900747,12.493785,Hotel
4,Rome,Teatro dell'Opera di Roma,41.900439,12.495875,Opera House


In [12]:
# Create map using latitude and longitude values
map_ro = folium.Map(location=[latitude_ro, longitude_ro], zoom_start=13)
mapMarkers(map_ro,dfs_ro)
map_ro

In [13]:
map_ro.save('map_ro.html')


In [14]:
X_dfs_ro=dfs_ro.drop(['Venue','Location','Category'],axis=1)
X_dfs_ro.head()

Unnamed: 0,Latitude,Longitude
0,41.902422,12.496367
1,41.904072,12.494873
2,41.901696,12.497788
3,41.900747,12.493785
4,41.900439,12.495875


In [15]:
# Apply the DBSCAN algorithm
eps_temp = epsilon(X_dfs_ro,5)
ydbscan_ro = DBSCAN(eps=eps_temp,min_samples=5).fit(X_dfs_ro)
dfs_ro['Cluster Labels'] = ydbscan_ro.labels_
ydbscan_ro
ydbscan_ro.labels_

array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0, -1,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, -1,  0,  0,  0,  0, -1,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, -1,  0,  0,  0,  0, -1,
       -1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0, -1,  0,  0, -1,  0,  0,  0,  0,  0,  0,  0, -1, -1,
        0,  0,  0,  0,  0, -1,  0,  0,  0,  0,  0, -1, -1, -1,  0])

In [16]:
dfs_ro.groupby('Cluster Labels').size().sort_values()

Cluster Labels
-1    14
 0    86
dtype: int64

In [17]:
Rome_density = (86/100)
Rome_density

0.86

In [18]:
map_clusters = folium.Map(location=[latitude_ro, longitude_ro], zoom_start=13)
# set color scheme for the clusters
clusterMap(2,dfs_ro)
map_clusters

In [19]:
map_clusters.save('map_ro_density.html')


In [20]:
dfs_ve = getNearbyVenues('Venice',latitude_ve,longitude_ve)
dfs_ve.head()

Unnamed: 0,Location,Venue,Latitude,Longitude,Category
0,Venice,Timon,45.445523,12.328327,Wine Bar
1,Venice,Anice Stellato,45.44628,12.328538,Italian Restaurant
2,Venice,Il Paradiso Perduto,45.445023,12.330404,Italian Restaurant
3,Venice,Torrefazione Cannaregio srl,45.443726,12.326718,Coffee Shop
4,Venice,Hotel Canal Grande,45.441035,12.324256,Hotel


In [21]:
# create map of Scarborough using latitude and longitude values
map_ve = folium.Map(location=[latitude_ve, longitude_ve], zoom_start=13)
mapMarkers(map_ve,dfs_ve)
map_ve

In [22]:
map_ve.save('map_ve.html')

In [23]:
X_dfs_ve=dfs_ve.drop(['Venue','Location','Category'],axis=1)
X_dfs_ve.head()

Unnamed: 0,Latitude,Longitude
0,45.445523,12.328327
1,45.44628,12.328538
2,45.445023,12.330404
3,45.443726,12.326718
4,45.441035,12.324256


In [24]:
eps_temp = epsilon(X_dfs_ve,5)
ydbscan_ve = DBSCAN(eps=eps_temp,min_samples=5).fit(X_dfs_ve)
dfs_ve['Cluster Labels'] = ydbscan_ve.labels_
ydbscan_ve
ydbscan_ve.labels_

array([ 0,  0,  0,  0, -1, -1,  1,  1, -1,  1,  0,  0,  1,  1,  0,  1,  1,
       -1,  1,  1,  1,  1,  1,  1,  0,  1,  1,  1,  0,  1,  1,  1,  1,  1,
        1, -1,  0,  0,  1,  1,  1,  0,  1,  1,  1,  2,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
       -1,  2,  1,  1,  2,  1,  1,  1, -1,  2,  1,  1,  1,  2, -1, -1,  1,
        1, -1,  1,  1, -1, -1,  1, -1, -1,  1,  2,  2, -1,  2,  1])

In [25]:
dfs_ve.groupby('Cluster Labels').size().sort_values()

Cluster Labels
 2     8
 0    12
-1    15
 1    65
dtype: int64

In [26]:
Venice_density = ((100-15)/100)
Venice_density

0.85

In [27]:
map_clusters = folium.Map(location=[latitude_ve, longitude_ve], zoom_start=13)
# set color scheme for the clusters
clusterMap(4,dfs_ve)
map_clusters

In [28]:
map_clusters.save('map_ve_density.html')


***Conclusion***

The results from the analysis show that after appying DBSCAN clustering algorithm on the data for Rome and Venice, we received almost the same density (0.86 for Rome and 0.85 for Venice). 
I will recommend the restaurant to be opened in Rome because the difference between these two densities is in favor of the first one (Rome). 
We need more information to take a better decision - if we have enought data we can combine these results with the information about consumption index, population, number of tourists.
It seems that this is one of the fastest ways to get preliminary information about positioning the public placees (restaurants, hotels, pubs, shops ets. ) on the basis of the foursquare venues data.
