# Final Assignment - Applied data science capstone

### Background and business plan

I explore the potentiality of segmenting city's neighbours, but in the training part I explored big multicultural overseas cities like New York or Toronto. In the final assignment I want to explore and segment the city of Bari (Italy), which is a city nearby my hometown, not so famous among all italian cities but it's a very particular center mostly famous for its food tradition. I am simulating to be an investor who want to open a new restaurant that could represent a unique possibility to have international meals, e.g. mexican or thai food (which are not spread in southern italy) and in order to do that I want to know what will be the best neighbours to invest on.
I am starting from scratch and I doubt that there will be lots of informations to be used from foursquare, but I will do my best to offer the best options I can.

### Data collection

In order to pursue the work I decided to use neighbours data available at https://en.wikipedia.org/wiki/Bari#Quarters.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# import k-means from clustering stage
from sklearn.cluster import KMeans

!pip install folium
import folium

print('Libraries imported.')

Libraries imported.


In [2]:
df = pd.read_html('https://en.wikipedia.org/wiki/Bari#Quarters')[2]
df.head()

Unnamed: 0,Municipi,Quarters,Former Circoscrizioni
0,1,"Murat, San Nicola, Libertà, Madonnella, Japigi...","V, VII, IX"
1,2,"Poggiofranco, Picone, Carrassi, San Pasquale, ...","III, VI"
2,3,"San Paolo, Stanic, Marconi, San Girolamo, Fesc...","II, VIII"
3,4,"Carbonara, Ceglie, Loseto",IV
4,5,"Palese, Santo Spirito, Catino, San Pio",I


In [3]:
#Split each quarter (previously gropued by "Municipi")
df1 = df['Quarters'].str.split(',').explode()
df1.reset_index()

Unnamed: 0,index,Quarters
0,0,Murat
1,0,San Nicola
2,0,Libertà
3,0,Madonnella
4,0,Japigia
5,0,Torre a mare
6,1,Poggiofranco
7,1,Picone
8,1,Carrassi
9,1,San Pasquale


In [4]:
#Remove last character ( ')' ) in row 16 
df1 = df1.to_frame()
df1['Quarters'].iloc[16] = df1['Quarters'].iloc[16][: -1]
df1.reset_index()

Unnamed: 0,index,Quarters
0,0,Murat
1,0,San Nicola
2,0,Libertà
3,0,Madonnella
4,0,Japigia
5,0,Torre a mare
6,1,Poggiofranco
7,1,Picone
8,1,Carrassi
9,1,San Pasquale


In [5]:
#Remove whitespace at the beginning of each quarter
df1['Quarters'] = df1['Quarters'].apply(lambda row: row.lstrip())

In [6]:
nc = ['Bari,'] * df1.shape[0]

In [7]:
df1['city'] = nc

In [9]:
df1['location'] = df1['city'] + df1['Quarters']
df1

Unnamed: 0,Quarters,city,location
0,Murat,"Bari,","Bari,Murat"
0,San Nicola,"Bari,","Bari,San Nicola"
0,Libertà,"Bari,","Bari,Libertà"
0,Madonnella,"Bari,","Bari,Madonnella"
0,Japigia,"Bari,","Bari,Japigia"
0,Torre a mare,"Bari,","Bari,Torre a mare"
1,Poggiofranco,"Bari,","Bari,Poggiofranco"
1,Picone,"Bari,","Bari,Picone"
1,Carrassi,"Bari,","Bari,Carrassi"
1,San Pasquale,"Bari,","Bari,San Pasquale"


In [10]:
df1['city'] = ['Bari'] * df1.shape[0]
df1.head()

Unnamed: 0,Quarters,city,location
0,Murat,Bari,"Bari,Murat"
0,San Nicola,Bari,"Bari,San Nicola"
0,Libertà,Bari,"Bari,Libertà"
0,Madonnella,Bari,"Bari,Madonnella"
0,Japigia,Bari,"Bari,Japigia"


In [11]:
#Add coordinates for each quarter
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
geolocator = Nominatim(user_agent="Bari_quar")

df1['Dist_Coord']= df1['location'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
df1[['Latitude', 'Longitude']] = df1['Dist_Coord'].apply(pd.Series)

df1.drop(['Dist_Coord'], axis=1, inplace=True)
df1

Unnamed: 0,Quarters,city,location,Latitude,Longitude
0,Murat,Bari,"Bari,Murat",41.123301,16.870601
0,San Nicola,Bari,"Bari,San Nicola",41.128054,16.869299
0,Libertà,Bari,"Bari,Libertà",41.123126,16.856248
0,Madonnella,Bari,"Bari,Madonnella",41.120257,16.884253
0,Japigia,Bari,"Bari,Japigia",41.113869,16.896594
0,Torre a mare,Bari,"Bari,Torre a mare",41.087498,17.000413
1,Poggiofranco,Bari,"Bari,Poggiofranco",41.090048,16.857068
1,Picone,Bari,"Bari,Picone",41.070221,16.860594
1,Carrassi,Bari,"Bari,Carrassi",41.089591,16.873487
1,San Pasquale,Bari,"Bari,San Pasquale",41.113456,16.876145


In [12]:
#Coordinate of the city of Bari
address = 'Bari, Italy'

geolocator = Nominatim(user_agent="Bari_quar")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Bari are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Bari are 41.1257843, 16.8620293.
