# Introduction

 - The main point of interest for this project is to find a suitable area where to find the optimum place where to start a businness, which kind is going to be clarified as well by the results of this analysis.
 - Any person interested in starting a businnes inside or around this area, or interested into a market analysis able to predict possible new activities that may have a negative impact on his/her businness in that area might find this project useful.

# Data

The main source of data is a local database, as not much informations are available about this particular place. 
After gathering all the important features, Foursquare API will handle the exploration to find relevant venues and tendencies.
Data will be merged and crossed along with the main features (area in kms, population density) and showed via folium map.
In the end, data will be clustered as a number equal to the cluster of possibilities appear to the use of the customer.

In [96]:
from bs4 import BeautifulSoup
from lxml import etree
import urllib.request


# Setup webpage downloader
with urllib.request.urlopen("https://www.comuniecitta.it/sicilia-19/provincia-di-siracusa-89/comuni") as url:
    s = url.read()

html = etree.HTML(s)

# Use BeautifulSoup to extract the table
target = s
soup = BeautifulSoup(target,'html.parser')
tAll = soup.find_all('table')

# First stage: download the table
for i in range(len(tAll)):
    try:
        # 1.1 Extract every header
        th = tAll[i].find_all("th")

        # 1.2 Extract every item
        td = tAll[i].find_all("td")
        
    except:
        pass
    
# Second stage: save filtered data into RAM

# 2.0 Initialize lists
columns = []
rows = []

# 2.1 Save headers as columns
for i in range(len(th)):
    columns.append(th[i].text)
    
# 2.1.1 Initialize empty row for each column
for i in range(len(columns)):
    rows.append([])

#2.2 Save each row in the corresponding column
y = 0
for x in range(len(rows)):
    y = 0 + x
    while y <= len(td) -1:
        rows[x].append(td[y].text)
        y += (3 - x) + x
    y = 0 + x
        

print(columns)
print(rows)






        

['Comune', 'Superficie', 'Popolazione']
[[' Siracusa', 'Augusta', 'Avola', 'Buccheri', 'Buscemi', 'Canicattini Bagni', 'Carlentini', 'Cassaro', 'Ferla', 'Floridia', 'Francofonte', 'Lentini', 'Melilli', 'Noto', 'Pachino', 'Palazzolo Acreide', 'Portopalo di Capo Passero', 'Priolo Gargallo', 'Rosolini', 'Solarino', 'Sortino'], ['204,10 km2', '109,30 km2', '74,30 km2', '57,40 km2', '51,60 km2', '15,10 km2', '158,00 km2', '19,40 km2', '24,80 km2', '26,20 km2', '74,00 km2', '215,80 km2', '136,10 km2', '551,10 km2', '50,50 km2', '86,30 km2', '14,90 km2', '57,60 km2', '76,20 km2', '13,00 km2', '93,20 km2'], ['123.850', '34.539', '31.827', '2.148', '1.147', '7.355', '17.587', '819', '2.599', '23.050', '12.392', '24.017', '13.304', '24.047', '21.990', '9.061', '3.818', '12.148', '21.798', '7.820', '8.955']]


In [112]:
# Third stage: convert lists into pandas dataframe
import pandas as pd

syracuse_province = pd.DataFrame({columns[0]:rows[0],columns[1]:rows[1],columns[2]:rows[2]})
syracuse_province


Unnamed: 0,Comune,Superficie,Popolazione
0,Siracusa,"204,10 km2",123.85
1,Augusta,"109,30 km2",34.539
2,Avola,"74,30 km2",31.827
3,Buccheri,"57,40 km2",2.148
4,Buscemi,"51,60 km2",1.147
5,Canicattini Bagni,"15,10 km2",7.355
6,Carlentini,"158,00 km2",17.587
7,Cassaro,"19,40 km2",819.0
8,Ferla,"24,80 km2",2.599
9,Floridia,"26,20 km2",23.05


## Get coordinates using Geocoder package

In [117]:
# Fourth stage: find GPS coordinates for each city

# 4.1 Initialization

# 4.1.1 Initalize geocoder
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="syracuse_agent")

# 4.1.2 Initalize new column in dataframe
syracuse_province['Latitude'] = ""
syracuse_province['Longitude'] = ""

# 4.2 Localize and record position for each city in the dataframe
for index, row in syracuse_province.iterrows():
    location = geolocator.geocode("{}, Siracusa".format(row['Comune']))
    row['Latitude'] = location.latitude
    row['Longitude'] = location.longitude

syracuse_province

Unnamed: 0,Comune,Superficie,Popolazione,Latitude,Longitude
0,Siracusa,"204,10 km2",123.85,37.0646,15.2907
1,Augusta,"109,30 km2",34.539,37.2369,15.2197
2,Avola,"74,30 km2",31.827,36.9095,15.135
3,Buccheri,"57,40 km2",2.148,37.1265,14.8504
4,Buscemi,"51,60 km2",1.147,37.0855,14.8842
5,Canicattini Bagni,"15,10 km2",7.355,37.0349,15.0623
6,Carlentini,"158,00 km2",17.587,37.2759,15.015
7,Cassaro,"19,40 km2",819.0,37.1069,14.9476
8,Ferla,"24,80 km2",2.599,37.1186,14.9411
9,Floridia,"26,20 km2",23.05,37.0869,15.1519


# Methodology
- Explore and Cluster the neighborhoods in Syracuse.
- Find all venues with Foursquare API.
- Explore each neighborhood invidually.
- Cluster the neighborhoods to find optimum spots to open a determinate type of businness.

## Explore and cluster the neighborhoods in Syracuse

In [131]:
latitude = syracuse_province["Latitude"][0]
longitude = syracuse_province["Longitude"][0]
print('The geograpical coordinate of Syracuse are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Syracuse are 37.0646139, 15.2907196.


### Create a map of Syracuse

In [136]:
import folium # map rendering library

# create map of New York using latitude and longitude values
map_syracuse = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough in zip(syracuse_province['Latitude'], syracuse_province['Longitude'], syracuse_province['Comune']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_syracuse)  
    
map_syracuse

### Log into Foursquare and explore Syracuse's venues

In [137]:
CLIENT_ID = 'QEKVYOHVSMWOW12V2S1T3HIRUQP3B02W342Z2TWNTMQH20EP' # your Foursquare ID
CLIENT_SECRET = '344DWOTSL3IKCM4B1XRFSZ2MTHWR1WRBTIB3BKCW0RE3PLCK' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: QEKVYOHVSMWOW12V2S1T3HIRUQP3B02W342Z2TWNTMQH20EP
CLIENT_SECRET:344DWOTSL3IKCM4B1XRFSZ2MTHWR1WRBTIB3BKCW0RE3PLCK


In [141]:
import requests

# Fifth stage: setup a function to retrieve a DataFrame containing nearby venues
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            500, 
            100)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [142]:
syracuse_venues = getNearbyVenues(names=syracuse_province['Comune'],
                                   latitudes=syracuse_province['Latitude'],
                                   longitudes=syracuse_province['Longitude']
                                  )

 Siracusa
Augusta
Avola
Buccheri
Buscemi
Canicattini Bagni
Carlentini
Cassaro
Ferla
Floridia
Francofonte
Lentini
Melilli
Noto
Pachino
Palazzolo Acreide
Portopalo di Capo Passero
Priolo Gargallo
Rosolini
Solarino
Sortino


In [146]:
print('There are {} uniques categories.'.format(len(syracuse_venues['Venue Category'].unique())))
syracuse_venues

There are 63 uniques categories.


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Siracusa,37.064614,15.290720,La Voglia Matta,37.065380,15.288889,Ice Cream Shop
1,Siracusa,37.064614,15.290720,Piano B Casual Food,37.064811,15.288139,Italian Restaurant
2,Siracusa,37.064614,15.290720,Caseificio Borderi,37.065452,15.293635,Deli / Bodega
3,Siracusa,37.064614,15.290720,Ponte Umbertino,37.064959,15.290439,Historic Site
4,Siracusa,37.064614,15.290720,Il Leggendario Panino Da Antonio & Daniele,37.065794,15.290896,Food Truck
...,...,...,...,...,...,...,...
155,Sortino,37.156854,15.027716,I QUATTRO CANTI,37.158434,15.027846,Plaza
156,Sortino,37.156854,15.027716,Ristorante Pizzoleria I Quattro Canti,37.158275,15.028132,Pizza Place
157,Sortino,37.156854,15.027716,Nabila,37.158643,15.029019,Restaurant
158,Sortino,37.156854,15.027716,Piccole Gioie,37.159001,15.027898,Gift Shop


### Analyze each neighborhood

In [147]:
# Sixth stage: One Hot Encoding

# one hot encoding
syracuse_onehot = pd.get_dummies(syracuse_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
syracuse_onehot['Neighborhood'] = syracuse_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [syracuse_onehot.columns[-1]] + list(syracuse_onehot.columns[:-1])
syracuse_onehot = syracuse_onehot[fixed_columns]

syracuse_onehot.head()

# Group neighborhoods
syracuse_grouped = syracuse_onehot.groupby('Neighborhood').mean().reset_index()
syracuse_grouped

Unnamed: 0,Neighborhood,Art Museum,Bakery,Bar,Beach,Bistro,Breakfast Spot,Brewery,Burger Joint,Cafeteria,...,Pub,Restaurant,Sandwich Place,Sculpture Garden,Seafood Restaurant,Snack Place,Spanish Restaurant,Tennis Court,Theater,Trattoria/Osteria
0,Siracusa,0.0,0.016667,0.0,0.0,0.033333,0.016667,0.016667,0.0,0.016667,...,0.0,0.05,0.016667,0.0,0.083333,0.0,0.016667,0.0,0.0,0.016667
1,Augusta,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0
2,Avola,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Buccheri,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0
4,Buscemi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Canicattini Bagni,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Cassaro,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Ferla,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Floridia,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0
9,Francofonte,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,...,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [168]:
# Print most common venues
num_top_venues =20

for hood in syracuse_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = syracuse_grouped[syracuse_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')
   

---- Siracusa----
                       venue  freq
0         Italian Restaurant  0.13
1                      Hotel  0.10
2         Seafood Restaurant  0.08
3                 Restaurant  0.05
4   Mediterranean Restaurant  0.05
5                     Bistro  0.03
6             Ice Cream Shop  0.03
7              Historic Site  0.03
8                       Café  0.03
9          Trattoria/Osteria  0.02
10        Miscellaneous Shop  0.02
11               Pizza Place  0.02
12                Food Court  0.02
13                Food Truck  0.02
14                  Fountain  0.02
15              Gourmet Shop  0.02
16                    Bakery  0.02
17     General Entertainment  0.02
18               Fish Market  0.02
19           Harbor / Marina  0.02


----Augusta----
                       venue  freq
0             Ice Cream Shop  0.17
1                      Beach  0.17
2            Harbor / Marina  0.17
3         Seafood Restaurant  0.17
4           Sculpture Garden  0.17
5                Pi

In [169]:
# Sort in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [173]:
# Seventh stage: reformat DataFrame according to observations
import numpy as np

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = syracuse_grouped['Neighborhood']

for ind in np.arange(syracuse_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(syracuse_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Siracusa,Italian Restaurant,Hotel,Seafood Restaurant,Restaurant,Mediterranean Restaurant,Ice Cream Shop,Café,Bistro,Historic Site,Creperie
1,Augusta,Ice Cream Shop,Beach,Seafood Restaurant,Sculpture Garden,Harbor / Marina,Pizza Place,Trattoria/Osteria,Farmers Market,Design Studio,Dessert Shop
2,Avola,Café,Gastropub,Plaza,Farmers Market,Design Studio,Dessert Shop,Diner,Electronics Store,Trattoria/Osteria,Deli / Bodega
3,Buccheri,Theater,Italian Restaurant,Pizza Place,Trattoria/Osteria,Farmers Market,Design Studio,Dessert Shop,Diner,Electronics Store,Fast Food Restaurant
4,Buscemi,Flower Shop,Construction & Landscaping,Trattoria/Osteria,Cupcake Shop,Food Truck,Food Court,Food & Drink Shop,Food,Fish Market,Fast Food Restaurant


## Cluster neighborhoods

In [176]:
#Eighth stage: Cluster neighborhoods

from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

syracuse_grouped_clustering = syracuse_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(syracuse_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 3, 0, 2, 0, 1, 4, 3, 0])

In [188]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

syracuse_merged = syracuse_province

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
syracuse_merged = syracuse_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Comune')

syracuse_merged.head()

Unnamed: 0,Comune,Superficie,Popolazione,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Siracusa,"204,10 km2",123.85,37.0646,15.2907,0.0,Italian Restaurant,Hotel,Seafood Restaurant,Restaurant,Mediterranean Restaurant,Ice Cream Shop,Café,Bistro,Historic Site,Creperie
1,Augusta,"109,30 km2",34.539,37.2369,15.2197,0.0,Ice Cream Shop,Beach,Seafood Restaurant,Sculpture Garden,Harbor / Marina,Pizza Place,Trattoria/Osteria,Farmers Market,Design Studio,Dessert Shop
2,Avola,"74,30 km2",31.827,36.9095,15.135,3.0,Café,Gastropub,Plaza,Farmers Market,Design Studio,Dessert Shop,Diner,Electronics Store,Trattoria/Osteria,Deli / Bodega
3,Buccheri,"57,40 km2",2.148,37.1265,14.8504,0.0,Theater,Italian Restaurant,Pizza Place,Trattoria/Osteria,Farmers Market,Design Studio,Dessert Shop,Diner,Electronics Store,Fast Food Restaurant
4,Buscemi,"51,60 km2",1.147,37.0855,14.8842,2.0,Flower Shop,Construction & Landscaping,Trattoria/Osteria,Cupcake Shop,Food Truck,Food Court,Food & Drink Shop,Food,Fish Market,Fast Food Restaurant


# Results

- Our model show that the whole area can be divided into 5 or 4 clusters, with a single red cluster being bigger than all the others together.
    - This red cluster also includes Syracuse, making it the most suitable place after the green cluster to make businness.

In [200]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

syracuse_merged["Cluster Labels"] = syracuse_merged["Cluster Labels"]

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(syracuse_merged['Latitude'], syracuse_merged['Longitude'], syracuse_merged['Comune'], syracuse_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    try:
        folium.CircleMarker(
                [lat, lon],
                radius=5,
                popup=label,
                color=rainbow[int(cluster)-1],
                fill=True,
                fill_color=rainbow[int(cluster)-1],
                fill_opacity=0.7).add_to(map_clusters)
    except:
        folium.CircleMarker(
                [lat, lon],
                radius=5,
                popup=label,
                color=rainbow[0-1],
                fill=True,
                fill_color=rainbow[0-1],
                fill_opacity=0.7).add_to(map_clusters)
    

       
map_clusters

## Examination of most profitable activities

In [258]:
red_cluster = syracuse_merged.loc[syracuse_merged['Cluster Labels'] == 0, syracuse_merged.columns[[0] + list(range(1, syracuse_merged.shape[1]))]]

red_cluster

Unnamed: 0,Comune,Superficie,Popolazione,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Siracusa,"204,10 km2",123.85,37.0646,15.2907,0.0,Italian Restaurant,Hotel,Seafood Restaurant,Restaurant,Mediterranean Restaurant,Ice Cream Shop,Café,Bistro,Historic Site,Creperie
1,Augusta,"109,30 km2",34.539,37.2369,15.2197,0.0,Ice Cream Shop,Beach,Seafood Restaurant,Sculpture Garden,Harbor / Marina,Pizza Place,Trattoria/Osteria,Farmers Market,Design Studio,Dessert Shop
3,Buccheri,"57,40 km2",2.148,37.1265,14.8504,0.0,Theater,Italian Restaurant,Pizza Place,Trattoria/Osteria,Farmers Market,Design Studio,Dessert Shop,Diner,Electronics Store,Fast Food Restaurant
5,Canicattini Bagni,"15,10 km2",7.355,37.0349,15.0623,0.0,Art Museum,Plaza,Electronics Store,Restaurant,Food,Flower Shop,Fish Market,Fast Food Restaurant,Creperie,Farmers Market
10,Francofonte,"74,00 km2",12.392,37.2266,14.8819,0.0,Plaza,Food Court,Bar,Pub,Trattoria/Osteria,Design Studio,Dessert Shop,Diner,Electronics Store,Farmers Market
11,Lentini,"215,80 km2",24.017,37.2847,14.9988,0.0,Health & Beauty Service,Plaza,Bakery,Cupcake Shop,Design Studio,Dessert Shop,Diner,Electronics Store,Farmers Market,Trattoria/Osteria
12,Melilli,"136,10 km2",13.304,37.1783,15.1266,0.0,Jewelry Store,Bar,Construction & Landscaping,Fast Food Restaurant,Design Studio,Dessert Shop,Diner,Electronics Store,Farmers Market,Trattoria/Osteria
16,Portopalo di Capo Passero,"14,90 km2",3.818,36.6838,15.1333,0.0,Café,Food,Snack Place,Seafood Restaurant,Beach,Hotel,Restaurant,Harbor / Marina,Outdoors & Recreation,Trattoria/Osteria
18,Rosolini,"76,20 km2",21.798,36.8205,14.9526,0.0,Pub,Breakfast Spot,Burger Joint,Pizza Place,Fast Food Restaurant,Design Studio,Dessert Shop,Diner,Electronics Store,Farmers Market
20,Sortino,"93,20 km2",8.955,37.1569,15.0277,0.0,Pizza Place,Plaza,Gift Shop,Restaurant,Trattoria/Osteria,Deli / Bodega,Design Studio,Dessert Shop,Diner,Electronics Store


In [259]:
green_cluster = syracuse_merged.loc[syracuse_merged['Cluster Labels'] == 3, syracuse_merged.columns[[0] + list(range(1, syracuse_merged.shape[1]))]]
green_cluster

Unnamed: 0,Comune,Superficie,Popolazione,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Avola,"74,30 km2",31.827,36.9095,15.135,3.0,Café,Gastropub,Plaza,Farmers Market,Design Studio,Dessert Shop,Diner,Electronics Store,Trattoria/Osteria,Deli / Bodega
9,Floridia,"26,20 km2",23.05,37.0869,15.1519,3.0,Kids Store,Snack Place,Café,Gastropub,Bar,Deli / Bodega,Food Truck,Food Court,Food & Drink Shop,Food
13,Noto,"551,10 km2",24.047,36.8909,15.0706,3.0,Café,Trattoria/Osteria,Italian Restaurant,Bistro,Restaurant,Plaza,Coffee Shop,Dessert Shop,Diner,Food & Drink Shop
14,Pachino,"50,50 km2",21.99,36.7152,15.0915,3.0,Indie Movie Theater,Clothing Store,Café,Plaza,Trattoria/Osteria,Electronics Store,Dessert Shop,Diner,Farmers Market,Deli / Bodega
15,Palazzolo Acreide,"86,30 km2",9.061,37.0619,14.9039,3.0,Italian Restaurant,Dessert Shop,Cocktail Bar,Café,Plaza,Trattoria/Osteria,Electronics Store,Diner,Farmers Market,Deli / Bodega
17,Priolo Gargallo,"57,60 km2",12.148,37.1579,15.186,3.0,Burger Joint,Tennis Court,Café,Hardware Store,Pizza Place,Fast Food Restaurant,Dessert Shop,Diner,Electronics Store,Farmers Market
19,Solarino,"13,00 km2",7.82,37.1012,15.119,3.0,Construction & Landscaping,Plaza,Design Studio,Café,Pizza Place,Food Truck,Breakfast Spot,Brewery,Food Court,Food & Drink Shop


# Discussion and Conclusion

So, let's say we want to open a Cafè businness just using the data we have. Let's take a look to which are the best places on both clusters. Further analysis are out of scope for this project.

In [285]:
businness_target= "Café"
suitable_places = []
clusters = [red_cluster,green_cluster]

# Search inside the clusters for suitable places
pointer = 0

for cluster in clusters:
    for index, row in cluster.iterrows():
        if row["1st Most Common Venue"] == businness_target:
            suitable_places.append([])
            suitable_places[pointer].append(row["Comune"])
            suitable_places[pointer].append(row["Cluster Labels"])
            suitable_places[pointer].append(row["Popolazione"])
            suitable_places[pointer].append(row["Latitude"])
            suitable_places[pointer].append(row["Longitude"])
            pointer += 1
    
population = 0
best_index = 0
print("Places to open a {} businness are: ".format(businness_target))
for i in range(len(suitable_places)):
    try:
        population = int(population.replace(".",""))
    except:
        pass
    
    if population < int(suitable_places[i][2].replace(".","")):
        population = suitable_places[i][2]
        best_index = i
    print("\tCity of {}, cluster number: {}, population number: {}".format(suitable_places[i][0],suitable_places[i][1],suitable_places[i][2]))

print("\n\nBest place chosen by population number is: {} ".format(suitable_places[best_index][0]))
  

Places to open a Café businness are: 
	City of Portopalo di Capo Passero, cluster number: 0.0, population number: 3.818
	City of Avola, cluster number: 3.0, population number: 31.827
	City of Noto, cluster number: 3.0, population number: 24.047


Best place chosen by population number is: Avola 
