#### Libraries

In [1]:
import requests
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans
import pandas as pd
import numpy as np
import folium

#### Data
Each row is a car park, identified by its ID. The columns contain (data types are reported between parentheses):

1. **ID** (int64): ID associated to the car park
2. **Area** (object): The area of the city in which the car park is located
3. **Address** (object): The address of the car park
4. **Zone** (int64): The zone of the city in which the car park is located, numbered from 1 to 9
5. **City** (object): The name of the city
6. **ParkingPlaces** (int64): The number of parking places
7. **Service** (object): The types of vehicles tha can be parked
8. **Longitude** (float64): The longitude
9. **Latitude** (float64): The latitude
10. **Location** (object): (Latitude, Longitude)

In [2]:
df=pd.read_csv("ds79_car_sh.csv", sep=';') #read the csv

#Translate the names of the columns
df.columns=["ID", "Area", "Address", "Zone", "City", "ParkingPlaces", 
                        "Service", "Longitude", "Latitude", "Location"]
df.head()

Unnamed: 0,ID,Area,Address,Zone,City,ParkingPlaces,Service,Longitude,Latitude,Location
0,1001,Loreto/Mercadante,Via Mercadante davanti uscita M1 Argentina,3,MILANO,8,QUADRICICLI,9.214544,45.483351,"(45.4833510000071, 9.214544)"
1,1002,Pta Venezia,P.zza Oberdan uscita M1 ex casello (fronte Bas...,1,MILANO,8,QUADRICICLI,9.204847,45.473929,"(45.4739290000071, 9.204847)"
2,1003,Pagano,Via del Burchiello fronte parcheggio,7,MILANO,6,QUADRICICLI,9.160451,45.467845,"(45.4678450000071, 9.160451)"
3,1004,Cadorna,P.le Cadorna fronte civico 1/3,1,MILANO,8,QUADRICICLI,9.177717,45.466924,"(45.4669240000071, 9.177717)"
4,1005,Centrale Stazione,P.zza Duca d'Aosta davanti civico 16,2,MILANO,8,QUADRICICLI,9.205185,45.483864,"(45.4838640000071, 9.205185)"


#### Dropping non needed columns

In [3]:
df.drop(labels=["Area", "City", "Service", "Location"], axis=1, inplace=True)
df.head()

print("The total number of parking areas is: {}".format(df.shape[0]))

The total number of parking areas is: 113


So, in the city of Milan there are 113 parking areas. The location of these areas will be studied throughout this notebook

#### Data exploration
The analysis begins by computing the total number of parking places and the average number of parking places per parking area.

In [4]:
print("The total number of parking places is: {}".format(df["ParkingPlaces"].sum()))
print("The average number of parking places per parking area is: {}".format(round(df["ParkingPlaces"].sum()/df.shape[0],2)))

The total number of parking places is: 444
The average number of parking places per parking area is: 3.93


These numbers should be compared to the average number of vehicles entering in the city of Milan every day (around 1 million according to the Milan major). Even this simple analysis points out the absolute necessity to increase the number of parking areas and the dimension of the existing ones.

#### Location of the parking areas
Now it is time to analyze the position of the parking areas.

In [5]:
milan_map=folium.Map(location=[45.464664, 9.188540], zoom_start=12)

PAreas=folium.map.FeatureGroup()

for lat,lon in zip(df["Latitude"], df["Longitude"]):
    
    PAreas.add_child(
    folium.vector_layers.CircleMarker([lat,lon], radius=5, color="red", fill_color="red")  
    )
    
milan_map.add_child(PAreas)
milan_map

Let's now anayze the density of parking places in the city. Milan is divided into nine zones, numbered from 1 to 9. It is important to check if the parking places are equally distributed across the city or some areas have a higher number of parking places than others.  
The first step is grouping the parking places according to the zone they belong to.

In [6]:
df_zones=df[["Zone", "ParkingPlaces"]]
df_zones.head()
park_zones=df_zones.groupby("Zone",as_index=False).sum()[1:]
park_zones.head(9)

Unnamed: 0,Zone,ParkingPlaces
1,1,117
2,2,26
3,3,57
4,4,33
5,5,30
6,6,44
7,7,31
8,8,26
9,9,80


It is clear that zone 1 has more than a fourth of the total parking places, while other zones have less than 30 parking places. A graphical representation is useful to get an idea of how the parking places are distributed across the city. 

In [8]:
milan_geo=r"zones.geojson" #geojson file

milan_map = folium.Map(location=[45.464664, 9.188540], zoom_start=12)

folium.Choropleth(
    geo_data=milan_geo,
    data=park_zones,
    columns=['Zone', 'ParkingPlaces'],
    key_on='feature.properties.ZONADEC',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.5,
    legend_name='Number of parking places per zone',
    reset=True
).add_to(milan_map)
milan_map



It is clear that the parking areas are not huniform across the city. More than a fourth of the parking places is located in the city centre. The remaining are majorly distributed among three areas (north, north-east and south-west). This information might be precious to plan the position of new parking areas.

#### Exploring the neighborhoods

The parking areas should be placed in specific positions. One should expect to find restaurants, shops, and hotels nearby, for example. The Foursquare Database allows to retrieve popular places located around the parking areas. To do so the function GetNearbyVenues is employed

In [8]:

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
#        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']

        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']

    
    #nearby_venues
    #import sys
    #sys.exit
    
    return(nearby_venues)

In [10]:
LIMIT=10
radius=500

CLIENT_ID = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' # Foursquare ID
CLIENT_SECRET = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' # Foursquare Secret
VERSION = '20180605' # Foursquare API version

parking_venues = getNearbyVenues(names=df['ID'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

In [11]:
print("Total number of venues retrieved by Foursquare: ",parking_venues.shape[0])
parking_venues.head()

(1119, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,1001,45.483351,9.214544,Flying Tiger,45.484686,9.215347,Gift Shop
1,1001,45.483351,9.214544,The Small,45.483622,9.215102,Italian Restaurant
2,1001,45.483351,9.214544,Vino Al Vino,45.481489,9.214537,Wine Bar
3,1001,45.483351,9.214544,Pizzeria Spontini,45.482009,9.213112,Pizza Place
4,1001,45.483351,9.214544,Hotel San Biagio,45.48412,9.21766,Hotel


Now it is possible to count the number of venues retrieved for each parking area

In [12]:
parking_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,10,10,10,10,10,10
2,10,10,10,10,10,10
3,10,10,10,10,10,10
4,10,10,10,10,10,10
5,10,10,10,10,10,10
...,...,...,...,...,...,...
2006,10,10,10,10,10,10
2007,10,10,10,10,10,10
2008,10,10,10,10,10,10
2013,10,10,10,10,10,10


One hot encoding is employed to obtain frequencies of venues types

In [13]:
# one hot encoding
parking_onehot = pd.get_dummies(parking_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
parking_onehot['Neighborhood'] = parking_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [parking_onehot.columns[-1]] + list(parking_onehot.columns[:-1])
parking_onehot = parking_onehot[fixed_columns]

parking_onehot.head()

(1119, 177)


Unnamed: 0,Neighborhood,Accessories Store,African Restaurant,Airport Lounge,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Theater,Tram Station,Trattoria/Osteria,Turkish Restaurant,Tuscan Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Winery
0,1001,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1001,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1001,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,1001,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,1001,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [14]:
parking_grouped = parking_onehot.groupby('Neighborhood').mean().reset_index()

print(parking_grouped.shape)
parking_grouped.head()

(113, 177)


Unnamed: 0,Neighborhood,Accessories Store,African Restaurant,Airport Lounge,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Theater,Tram Station,Trattoria/Osteria,Turkish Restaurant,Tuscan Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Winery
0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0
1,2,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.1,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0
2,3,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.1,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0
4,5,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


The top 10 venues is identified for each parking area.

In [15]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [17]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = parking_grouped['Neighborhood']

for ind in np.arange(parking_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(parking_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Plaza,Wine Bar,Electronics Store,Hotel,Ice Cream Shop,Pedestrian Plaza,Cosmetics Shop,Bistro,Bar,Winery
1,2,Plaza,Art Gallery,Lounge,Bar,Ice Cream Shop,Art Museum,Scenic Lookout,American Restaurant,Wine Bar,Hotel
2,3,Lounge,Art Gallery,Korean Restaurant,Café,Sushi Restaurant,Plaza,Art Museum,Japanese Restaurant,Hotel,American Restaurant
3,4,Café,Ice Cream Shop,Hostel,Dessert Shop,Plaza,Pub,Pizza Place,Wine Bar,Argentinian Restaurant,Department Store
4,5,Plaza,Chocolate Shop,Art Gallery,Sandwich Place,Fountain,Dessert Shop,Castle,Gift Shop,Ice Cream Shop,Concert Hall


#### Cluster analysis
The goal now is to group the parking areas in clusters. This step is important because allows one to identify where people using car-sharing systems go and what they do. The division that will be obtained provides information about how the parking areas are distributed.

In [18]:
# set number of clusters:
kclusters = 5

parking_grouped_clustering = parking_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(parking_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 4, 4, 1, 4, 1, 3, 1, 1, 1], dtype=int32)

Add the cluster labels to the parking areas

In [19]:
df_parking = df.copy()
df_parking["Labels"]=kmeans.labels_

df_parking_venues=neighborhoods_venues_sorted.copy()
df_parking_venues["Labels"]=kmeans.labels_

df_parking.head()

#### Analysis of the clusters
The first step is to check if there is a spatial relationship between different clusters. The parking areas are plotted onto the Milan map. Different colors represent different cluster labels.

In [21]:
milan_cluster=folium.Map(location=[45.464664, 9.188540], zoom_start=11)
color_list=["red", "yellow", "green", "blue", "orange"]

locat=folium.map.FeatureGroup()

for lat, lon, cluster_label in zip(df_parking["Latitude"], df_parking["Longitude"], df_parking["Labels"]):
    
    locat.add_child(
    folium.vector_layers.CircleMarker([lat,lon], radius=5, color=color_list[cluster_label], fill_color=color_list[cluster_label])  
    )
    
milan_cluster.add_child(locat)
milan_cluster

The clusters seem to be homogeneusly distributed across the city. Let's now check the venues associated to each cluster.

In [22]:
interests=["1st Most Common Venue", "2nd Most Common Venue", "3rd Most Common Venue", "4th Most Common Venue", "5th Most Common Venue"]

#### Cluster 1: Pizza places
The first cluster is characterized by parking places that are close to pizza places. Restaurants and shops are also common venues.

In [23]:
df0=df_parking_venues.loc[df_parking_venues["Labels"]==0]
print(df0.shape)
df0[interests]

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
18,Hotel,Pizza Place,Ice Cream Shop,Beer Bar,Café
20,Pizza Place,Bakery,Café,Restaurant,Seafood Restaurant
23,Pizza Place,Ice Cream Shop,Comic Shop,Seafood Restaurant,Italian Restaurant
24,Pizza Place,Art Museum,Candy Store,Pub,Bookstore
31,Pizza Place,Italian Restaurant,Seafood Restaurant,Smoke Shop,Street Art
36,Pizza Place,Department Store,Bookstore,Movie Theater,Café
49,Pizza Place,Ice Cream Shop,Hotel,Diner,Pub
50,Pizza Place,Sporting Goods Shop,Japanese Restaurant,Restaurant,Italian Restaurant
53,Pizza Place,Italian Restaurant,Chinese Restaurant,Cultural Center,Café
60,Pizza Place,Persian Restaurant,Trattoria/Osteria,Ice Cream Shop,Sports Bar


#### Cluster 2: Free time
The second cluster is characterized by parking places that are close to stores, cafes, dessert shops, and bars. The "Free time" label seems to be the most accurate one

In [24]:
df1=df_parking_venues.loc[df_parking_venues["Labels"]==1]
print(df1.shape)
df1[interests]

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,Café,Ice Cream Shop,Hostel,Dessert Shop,Plaza
5,Kitchen Supply Store,Museum,Ice Cream Shop,Art Gallery,Castle
7,Ice Cream Shop,Art Gallery,Italian Restaurant,Wine Bar,Sandwich Place
8,Winery,Sushi Restaurant,Gastropub,Monument / Landmark,Park
9,Art Gallery,Park,Monument / Landmark,Café,Hotel
13,Wine Bar,Pizza Place,Japanese Restaurant,Pub,Café
14,Pizza Place,Italian Restaurant,Museum,Café,New American Restaurant
26,Ice Cream Shop,Restaurant,Gym,Speakeasy,Italian Restaurant
30,Dessert Shop,Pizza Place,Hotel,Performing Arts Venue,Ice Cream Shop
35,Dessert Shop,Hostel,Italian Restaurant,Plaza,Brazilian Restaurant


#### Cluster 3: Tram stations
The third cluster is characterized by parking places that are close to tram stations.

In [25]:
df2=df_parking_venues.loc[df_parking_venues["Labels"]==2]
print(df2.shape)
df2[interests]

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
43,Tram Station,Italian Restaurant,Cocktail Bar,Supermarket,Ice Cream Shop
62,Tram Station,Trattoria/Osteria,Supermarket,Shopping Mall,Italian Restaurant
73,Tram Station,Cafeteria,Pool,Gym,Winery
94,Tram Station,Pizza Place,Restaurant,Hotel,Gym


#### Cluster 4: Restaurants
The fourth cluster is characterized by parking places close to restaurants.

In [26]:
df3=df_parking_venues.loc[df_parking_venues["Labels"]==3]
print(df3.shape)
df3[interests]

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
6,Italian Restaurant,Bakery,Accessories Store,Paper / Office Supplies Store,Gift Shop
10,Italian Restaurant,Design Studio,Church,Hotel,Ice Cream Shop
11,Seafood Restaurant,Filipino Restaurant,Italian Restaurant,Park,Café
12,Seafood Restaurant,Italian Restaurant,Japanese Restaurant,Gastropub,Ice Cream Shop
15,Hotel,Restaurant,Monument / Landmark,Pizza Place,Plaza
16,Cocktail Bar,Hotel,Italian Restaurant,Wine Bar,Hotel Bar
17,Hotel,Italian Restaurant,Hostel,Chinese Restaurant,Japanese Restaurant
21,Italian Restaurant,Pizza Place,Wine Shop,Gym Pool,Comedy Club
22,Italian Restaurant,Pizza Place,Art Gallery,Burger Joint,Dessert Shop
27,Italian Restaurant,Pizza Place,Hotel,Food Truck,Trattoria/Osteria


#### Cluster 5: Tourism
The last cluster is characterized by parking places close to plazas, art galleries, hotels... Those places are very popular among tourists, to this label seems to be enough representative.

In [27]:
df4=df_parking_venues.loc[df_parking_venues["Labels"]==4]
print(df4.shape)
df4[interests]

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Plaza,Wine Bar,Electronics Store,Hotel,Ice Cream Shop
1,Plaza,Art Gallery,Lounge,Bar,Ice Cream Shop
2,Lounge,Art Gallery,Korean Restaurant,Café,Sushi Restaurant
4,Plaza,Chocolate Shop,Art Gallery,Sandwich Place,Fountain
19,Chinese Restaurant,Szechuan Restaurant,Hotel,Plaza,Italian Restaurant
25,Plaza,Bubble Tea Shop,Fast Food Restaurant,Hotel,Gym
28,Bakery,Brewery,Arts & Crafts Store,Latin American Restaurant,Korean Restaurant
40,Plaza,Snack Place,Monument / Landmark,Cocktail Bar,Gym / Fitness Center
83,Plaza,Bakery,Accessories Store,Italian Restaurant,Paper / Office Supplies Store
88,Coffee Shop,Wine Bar,Electronics Store,Bistro,Ice Cream Shop


#### Discussion
The analysis performed above clearly depicts the types of clients that use car-sharing systems. In most cases they use them during their spare time. There is a point that must be mentioned. The cluster associated to public transportation is the smallest one. Moreover, key places like hospitals, schools and train stations only appear rarely. This fact could represent a problem for the development of the car-sharing systems. If someone cannot get a car when he get off the train, he will be forced to use his private car to reach the desired destination. Furthermore, subway (metro) stations are not a frequent place in the neighbors of parking areas.  
One may wonder if train and subway stations are located in areas where there are only a few parking places. The treni_milano.csv file contains the coordinates of all the city stations.

In [39]:
df_trains=pd.read_csv("treni_milano.csv", sep=';')
df_trains.drop(labels=["Comune", "Provincia", "Regione", "Nome", 
                       "Anno inserimento", "Data e ora inserimento", 
                       "Identificatore in OpenStreetMap"],axis=1, inplace=True)

df_trains.columns=["Longitude", "Latitude"]
print("Number of stations: ",df_trains.shape[0])
df_trains.head()

Number of stations:  110


Unnamed: 0,Longitude,Latitude
0,9.17398,45.513554
1,9.168252,45.521682
2,9.168226,45.521673
3,9.150969,45.473679
4,9.136484,45.461504


Now it is possible to visualize the positions of the stations onto the choropeth map.

In [43]:
milan_geo=r"zones.geojson" #geojson file

milan_map = folium.Map(location=[45.464664, 9.188540], zoom_start=12)

folium.Choropleth(
    geo_data=milan_geo,
    data=park_zones,
    columns=['Zone', 'ParkingPlaces'],
    key_on='feature.properties.ZONADEC',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.5,
    legend_name='Number of parking places per zone',
    reset=True
).add_to(milan_map)

stations=folium.map.FeatureGroup()

for lat,lon in zip(df_trains["Latitude"], df_trains["Longitude"]):
    
    stations.add_child(
    folium.vector_layers.CircleMarker([lat,lon], radius=5, color="blue", fill_color="red")  
    )
    
milan_map.add_child(stations)

milan_map

It is possible to notice that in many areas characterized by a low density of parking places there are numerous stations. This areas should be the targets for the construction of new parking areas. This plot fully explain the absence of stations nearby parking areas.