# The Battle of Neighborhoods

This is the final peer review assignment in Applied data science capstone course.

### 1. Create a data frame of Helsinki neighborhoods

Import the needed libraries.

In [1]:
from bs4 import BeautifulSoup
import requests
import numpy as np
import pandas as pd

Load the page containing Helsinki borough names. Then use BeautifulSoup library to do web scraping and wrangle the data so that you can read the borough names into a data frame.

In [2]:
# load the wikipage of Helsinki subdivisions
page=requests.get("https://en.wikipedia.org/wiki/Subdivisions_of_Helsinki")
soup=BeautifulSoup(page.content,'html.parser')

In [3]:
# Find the borough names within hmtl code
wikidata=soup.find('div', class_='div-col columns column-width') 
tag_elements=wikidata.find_all('li')

In [4]:
# Read the borough IDs and names to a data frame
# Include only the Finnish name, i.e leave Swedish name out for the boroughs
df_hki=pd.DataFrame(columns=['Borough ID','Borough'])
n=int(len(tag_elements))
for i in range(n):
    borough_temp=tag_elements[i].get_text().split()
    df_hki.loc[i,'Borough ID']=borough_temp[0]
    df_hki.loc[i,'Borough']=borough_temp[1]

In [5]:
#Leave out the outer sea , i.e. the last one in the list which is unnumbered and also 'Ulkosaaret' which is also sea area.
df_hki=df_hki[df_hki['Borough ID']!='(unnumbered)']
df_hki=df_hki[df_hki['Borough']!='Ulkosaaret']

In [6]:
#Convert Borough ID's to integers
df_hki['Borough ID']=df_hki['Borough ID'].astype("int")

In [7]:
# Leave out the sub-boroughs (IDs that are >100).
df_hki=df_hki[df_hki['Borough ID']<100].reset_index(drop=True)
df_hki

Unnamed: 0,Borough ID,Borough
0,1,Kruununhaka
1,2,Kluuvi
2,3,Kaartinkaupunki
3,4,Kamppi
4,5,Punavuori
5,6,Eira
6,7,Ullanlinna
7,8,Katajanokka
8,9,Kaivopuisto
9,10,Sörnäinen


### 2. Add the coordinates to the boroughs

The coordinates of Helsinki boroughs can be found from this page: https://latitude.to/articles-by-country/fi/finland/page/16. 
They were collected from there into a csv file: Helsinki_Boroughs_coordinates.csv.

Read in the csv file containing the latitude and longitude values for the Helsinki boroughs.

In [8]:
df_geo=pd.read_excel("Helsinki_Boroughs_coordinates.xlsx")
df_geo.head()

Unnamed: 0,Borough ID,Borough,Latitude,Longitude
0,1,Kruununhaka,60.169999,24.95383
1,2,Kluuvi,60.17248,24.94064
2,3,Kaartinkaupunki,60.1652,24.94897
3,4,Kamppi,60.16746,24.93107
4,5,Punavuori,60.156999,24.936163


Join the earlier created data frame containing boroughs, and this new data frame containing location coordinates. 
The join is made based on the Borough ID column data.

In [10]:
df_hki=pd.merge(df_hki,df_geo)

In [11]:
#Drop the Borough ID column
df_hki.drop('Borough ID',axis=1,inplace=True)
df_hki

Unnamed: 0,Borough,Latitude,Longitude
0,Kruununhaka,60.169999,24.95383
1,Kluuvi,60.17248,24.94064
2,Kaartinkaupunki,60.1652,24.94897
3,Kamppi,60.16746,24.93107
4,Punavuori,60.156999,24.936163
5,Eira,60.15517,24.93819
6,Ullanlinna,60.155399,24.94273
7,Katajanokka,60.16624,24.96816
8,Kaivopuisto,60.15901,24.96147
9,Sörnäinen,60.183333,24.966663


### 3. Visualize the Helsinki neighborhoods on the map

Import the needed libraries

In [13]:
import folium
from geopy.geocoders import Nominatim

Use geopy library to get the latitude and longitude values of Helsinki.

In [14]:
address='Helsinki,FI'
geolocator=Nominatim()
location=geolocator.geocode(address)
latitude=location.latitude
longitude=location.longitude
print('The geographical coordinates of Helsinki are {}, {}. '.format(latitude,longitude))

The geographical coordinates of Helsinki are 60.1713198, 24.9414566. 


Visualise Helsinki and the neighborhoods in it.

In [15]:
# create map of Helsinki using latitude and longitude values
map_helsinki = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough in zip(df_hki['Latitude'], df_hki['Longitude'], df_hki['Borough']):     
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_helsinki)  
    
map_helsinki

### 4. Explore the neighborhoods in Helsinki

Utilize Foursquare API to explore the neighborhoods and segment them.

#This part of the code is removed since foursquare api credentials are secret.

Copy the getNearbyVenues function from New York lab.

In [17]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Create a new data frame called helsinki_venues, utilizing getNearbyVenues function. 
This will get at max 100 venues within a radius of 500 meters for each neighborhood in Helsinki.

In [18]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
helsinki_venues = getNearbyVenues(names=df_hki['Borough'],
                                   latitudes=df_hki['Latitude'],
                                   longitudes=df_hki['Longitude']
                                  )

Kruununhaka
Kluuvi
Kaartinkaupunki
Kamppi
Punavuori
Eira
Ullanlinna
Katajanokka
Kaivopuisto
Sörnäinen
Kallio
Alppiharju
Etu-Töölö
Taka-Töölö
Meilahti
Ruskeasuo
Pasila
Laakso
Mustikkamaa-Korkeasaari
Länsisatama
Hermanni
Vallila
Toukola
Kumpula
Käpylä
Koskela
Vanhakaupunki
Oulunkylä
Haaga
Munkkiniemi
Lauttasaari
Konala
Kaarela
Pakila
Tuomarinkylä
Viikki
Pukinmäki
Malmi
Tapaninkylä
Suutarila
Suurmetsä
Kulosaari
Herttoniemi
Tammisalo
Vartiokylä
Pitäjänmäki
Mellunkylä
Vartiosaari
Laajasalo
Villinki
Santahamina
Suomenlinna
Vuosaari
Östersundom
Salmenkallio
Talosaari
Karhusaari
Ultuna


Let's check the size of the resulting data frame.

In [19]:
print(helsinki_venues.shape)
helsinki_venues.head()

(1249, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Kruununhaka,60.169999,24.95383,Trillby & Chadwick,60.168398,24.953651,Speakeasy
1,Kruununhaka,60.169999,24.95383,Senaatintori,60.169377,24.952033,Plaza
2,Kruununhaka,60.169999,24.95383,Tuomiokirkon portaat,60.169909,24.951915,Scenic Lookout
3,Kruununhaka,60.169999,24.95383,Helsingin kaupunginmuseo / Helsinki City Museu...,60.168996,24.954028,History Museum
4,Kruununhaka,60.169999,24.95383,El Fant,60.16868,24.953713,Café


Let's check how many venues were returned for each neighborhood.

In [20]:
helsinki_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alppiharju,24,24,24,24,24,24
Eira,29,29,29,29,29,29
Etu-Töölö,30,30,30,30,30,30
Haaga,16,16,16,16,16,16
Hermanni,23,23,23,23,23,23
Herttoniemi,41,41,41,41,41,41
Kaarela,5,5,5,5,5,5
Kaartinkaupunki,100,100,100,100,100,100
Kaivopuisto,14,14,14,14,14,14
Kallio,39,39,39,39,39,39


Let's check also how many unique categories there are in all the results.

In [21]:
print('There are {} uniques categories.'.format(len(helsinki_venues['Venue Category'].unique())))

There are 243 uniques categories.


Create a new data frame where each category is in own column, transformed by one hot encoding.

In [22]:
# one hot encoding
helsinki_onehot = pd.get_dummies(helsinki_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
helsinki_onehot['Neighborhood'] = helsinki_venues['Neighborhood'] 
# move neighborhood column to the first column
fixed_columns = [helsinki_onehot.columns[-1]] + list(helsinki_onehot.columns[:-1])
helsinki_onehot = helsinki_onehot[fixed_columns]
helsinki_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,...,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Wings Joint,Yoga Studio,Zoo
0,Kruununhaka,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Kruununhaka,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Kruununhaka,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Kruununhaka,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Kruununhaka,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [23]:
#Let's check the shape of dataframe
helsinki_onehot.shape

(1249, 244)

Next, group rows by neighborhoods and take the mean of frequency of occurrence of each category.

In [24]:
helsinki_grouped = helsinki_onehot.groupby('Neighborhood').mean().reset_index()
helsinki_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,...,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Wings Joint,Yoga Studio,Zoo
0,Alppiharju,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Eira,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.0
2,Etu-Töölö,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0
3,Haaga,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Hermanni,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,...,0.0,0.0,0.043478,0.043478,0.0,0.0,0.0,0.0,0.043478,0.0


In [25]:
#Let's check the shape of dataframe
helsinki_grouped.shape

(55, 244)

Copy the function return_most_common_venues() from New York lab.This sorts the venues in descending order in the next step.

In [26]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)   
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create a new dataframe and display the top 10 venues for each neighborhood.

In [27]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = helsinki_grouped['Neighborhood']

for ind in np.arange(helsinki_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(helsinki_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alppiharju,Theme Park Ride / Attraction,Park,Sushi Restaurant,Italian Restaurant,History Museum,Lounge,Gym,Greek Restaurant,Event Space,Dance Studio
1,Eira,Scandinavian Restaurant,Café,Playground,Bakery,Park,Boat or Ferry,French Restaurant,Bistro,Sushi Restaurant,Boutique
2,Etu-Töölö,Café,Scandinavian Restaurant,Park,Pub,Soccer Field,Mini Golf,Russian Restaurant,Supermarket,Gym,College Gym
3,Haaga,Bus Stop,Flower Shop,Indian Restaurant,Plaza,Garden,Recreation Center,Platform,Cafeteria,Café,Bakery
4,Hermanni,Pizza Place,Bus Stop,Bar,Park,Flea Market,Yoga Studio,Convenience Store,Coffee Shop,Restaurant,Recycling Facility
5,Herttoniemi,Bus Stop,Gym / Fitness Center,Supermarket,Chinese Restaurant,Accessories Store,Sushi Restaurant,Gastropub,Metro Station,Garden Center,Discount Store
6,Kaarela,Garden,Trail,Playground,Bus Stop,Shopping Mall,Flea Market,Film Studio,Fish & Chips Shop,Fish Market,Food
7,Kaartinkaupunki,Scandinavian Restaurant,Café,Hotel,Restaurant,Coffee Shop,Bar,Cocktail Bar,Mediterranean Restaurant,Furniture / Home Store,Park
8,Kaivopuisto,Park,Coffee Shop,Ice Cream Shop,Restaurant,Scandinavian Restaurant,History Museum,Grocery Store,Pier,Nightclub,Recreation Center
9,Kallio,Theme Park Ride / Attraction,Bar,Thai Restaurant,Park,Event Space,Vegetarian / Vegan Restaurant,Theater,Café,Moroccan Restaurant,Mountain


### 5. Cluster the neighborhoods in Helsinki

Next, cluster the neighborhoods. This is done with k-means clustering, by clustering the neighborhoods into 5 clusters.

In [29]:
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 5
helsinki_grouped_clustering = helsinki_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(helsinki_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 2, 2, 2, 2, 0, 0, 0], dtype=int32)

In [30]:
kmeans.labels_

array([0, 0, 0, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 2, 0, 2, 1, 2,
       2, 0, 0, 0, 0, 0, 0, 2, 1, 2, 2, 2, 0, 2, 1, 0, 0, 2, 2, 0, 0, 4,
       0, 0, 2, 0, 0, 0, 2, 2, 1, 2, 3], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [32]:
df_hki.rename(columns={'Borough':'Neighborhood'},inplace=True)
# merge df_hki with neighborhoods_venues data to add latitude/longitude for each neighborhood
helsinki_merged=pd.merge(neighborhoods_venues_sorted, df_hki,how='left',on='Neighborhood')
# add clustering labels
helsinki_merged['Cluster Labels'] = kmeans.labels_
helsinki_merged.head() 

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Latitude,Longitude,Cluster Labels
0,Alppiharju,Theme Park Ride / Attraction,Park,Sushi Restaurant,Italian Restaurant,History Museum,Lounge,Gym,Greek Restaurant,Event Space,Dance Studio,60.19187,24.94261,0
1,Eira,Scandinavian Restaurant,Café,Playground,Bakery,Park,Boat or Ferry,French Restaurant,Bistro,Sushi Restaurant,Boutique,60.15517,24.93819,0
2,Etu-Töölö,Café,Scandinavian Restaurant,Park,Pub,Soccer Field,Mini Golf,Russian Restaurant,Supermarket,Gym,College Gym,60.17382,24.91853,0
3,Haaga,Bus Stop,Flower Shop,Indian Restaurant,Plaza,Garden,Recreation Center,Platform,Cafeteria,Café,Bakery,60.22186,24.89619,2
4,Hermanni,Pizza Place,Bus Stop,Bar,Park,Flea Market,Yoga Studio,Convenience Store,Coffee Shop,Restaurant,Recycling Facility,60.1951,24.96678,2


In [33]:
helsinki_merged.shape

(55, 14)

Let's visualize the resulting clusters.

In [34]:
import matplotlib.cm as cm
import matplotlib.colors as colors
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(helsinki_merged['Latitude'], helsinki_merged['Longitude'], helsinki_merged['Neighborhood'], helsinki_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)      
map_clusters

Next, each cluster is examined to determine the discriminating venue categories that distinguish each cluster.

### Cluster 0

In [35]:
helsinki_merged.loc[helsinki_merged['Cluster Labels'] == 0, helsinki_merged.columns[[0] + list(range(1, 11))+[13]]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,Alppiharju,Theme Park Ride / Attraction,Park,Sushi Restaurant,Italian Restaurant,History Museum,Lounge,Gym,Greek Restaurant,Event Space,Dance Studio,0
1,Eira,Scandinavian Restaurant,Café,Playground,Bakery,Park,Boat or Ferry,French Restaurant,Bistro,Sushi Restaurant,Boutique,0
2,Etu-Töölö,Café,Scandinavian Restaurant,Park,Pub,Soccer Field,Mini Golf,Russian Restaurant,Supermarket,Gym,College Gym,0
7,Kaartinkaupunki,Scandinavian Restaurant,Café,Hotel,Restaurant,Coffee Shop,Bar,Cocktail Bar,Mediterranean Restaurant,Furniture / Home Store,Park,0
8,Kaivopuisto,Park,Coffee Shop,Ice Cream Shop,Restaurant,Scandinavian Restaurant,History Museum,Grocery Store,Pier,Nightclub,Recreation Center,0
9,Kallio,Theme Park Ride / Attraction,Bar,Thai Restaurant,Park,Event Space,Vegetarian / Vegan Restaurant,Theater,Café,Moroccan Restaurant,Mountain,0
10,Kamppi,Scandinavian Restaurant,Café,Japanese Restaurant,Chinese Restaurant,Asian Restaurant,Sushi Restaurant,Wine Bar,Yoga Studio,Vietnamese Restaurant,Salon / Barbershop,0
12,Katajanokka,Park,Scandinavian Restaurant,Hotel,Tram Station,Bar,Russian Restaurant,Boat or Ferry,Szechuan Restaurant,Gourmet Shop,Coffee Shop,0
13,Kluuvi,Clothing Store,Café,Restaurant,Coffee Shop,Gym / Fitness Center,Burger Joint,Art Museum,Cosmetics Shop,Theater,Plaza,0
14,Konala,Fast Food Restaurant,Taxi Stand,Chinese Restaurant,Dog Run,Supermarket,Pizza Place,Convenience Store,Park,Cafeteria,Fish & Chips Shop,0


### Cluster 1

In [36]:
helsinki_merged.loc[helsinki_merged['Cluster Labels'] == 1, helsinki_merged.columns[[0] + list(range(1, 11))+[13]]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
11,Karhusaari,Bus Stop,Lounge,Zoo,Filipino Restaurant,Furniture / Home Store,French Restaurant,Fountain,Food Truck,Food Court,Food & Drink Shop,1
15,Koskela,Bus Stop,Park,Kebab Restaurant,Café,Grocery Store,Pizza Place,Go Kart Track,Food & Drink Shop,Filipino Restaurant,Food,1
20,Laajasalo,Bus Stop,Flower Shop,Park,Zoo,Garden,French Restaurant,Fountain,Food Truck,Food Court,Food & Drink Shop,1
30,Pakila,Bus Stop,Zoo,Garden,French Restaurant,Fountain,Food Truck,Food Court,Food & Drink Shop,Food,Flower Shop,1
36,Salmenkallio,Bus Stop,Zoo,Garden,French Restaurant,Fountain,Food Truck,Food Court,Food & Drink Shop,Food,Flower Shop,1
52,Viikki,Bus Stop,Lake,Trail,Farm,Grocery Store,Zoo,Fish & Chips Shop,Fish Market,Flea Market,Food,1


### Cluster 2

In [37]:
helsinki_merged.loc[helsinki_merged['Cluster Labels'] == 2, helsinki_merged.columns[[0] + list(range(1, 11))+[13]]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
3,Haaga,Bus Stop,Flower Shop,Indian Restaurant,Plaza,Garden,Recreation Center,Platform,Cafeteria,Café,Bakery,2
4,Hermanni,Pizza Place,Bus Stop,Bar,Park,Flea Market,Yoga Studio,Convenience Store,Coffee Shop,Restaurant,Recycling Facility,2
5,Herttoniemi,Bus Stop,Gym / Fitness Center,Supermarket,Chinese Restaurant,Accessories Store,Sushi Restaurant,Gastropub,Metro Station,Garden Center,Discount Store,2
6,Kaarela,Garden,Trail,Playground,Bus Stop,Shopping Mall,Flea Market,Film Studio,Fish & Chips Shop,Fish Market,Food,2
17,Kulosaari,Bus Stop,Park,Grocery Store,Scandinavian Restaurant,Harbor / Marina,Badminton Court,Taxi Stand,Chinese Restaurant,Flea Market,Flower Shop,2
19,Käpylä,Bus Stop,Grocery Store,Scandinavian Restaurant,Pub,Thai Restaurant,Garden,Pizza Place,Flower Shop,Gym Pool,Café,2
21,Laakso,Bus Stop,Thai Restaurant,Italian Restaurant,Beer Bar,Coffee Shop,Sandwich Place,Café,Cafeteria,Stables,Supermarket,2
22,Lauttasaari,Pizza Place,Cosmetics Shop,Beer Bar,Dive Bar,Coffee Shop,Restaurant,Chinese Restaurant,Scenic Lookout,Flower Shop,Shopping Mall,2
29,Oulunkylä,Bus Stop,Pizza Place,Outdoor Sculpture,Grocery Store,Park,Irish Pub,Chinese Restaurant,Food & Drink Shop,Food Court,Food,2
31,Pasila,Bar,Bus Stop,Sports Club,Platform,Dance Studio,Restaurant,Film Studio,Cafeteria,Sports Bar,Grocery Store,2


### Cluster 3

In [38]:
helsinki_merged.loc[helsinki_merged['Cluster Labels'] == 3, helsinki_merged.columns[[0] + list(range(1, 11))+[13]]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
54,Östersundom,Pizza Place,Market,Hunting Supply,Flea Market,Field,Filipino Restaurant,Film Studio,Fish & Chips Shop,Fish Market,Zoo,3


### Cluster 4

In [39]:
helsinki_merged.loc[helsinki_merged['Cluster Labels'] == 4, helsinki_merged.columns[[0] + list(range(1, 11))+[13]]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
43,Talosaari,Stables,Zoo,Furniture / Home Store,French Restaurant,Fountain,Food Truck,Food Court,Food & Drink Shop,Food,Flower Shop,4


Based on the venues in different clusters, the clusters can be named or categorized as :
* Cluster 0: Center - Cafés and Restaurants
* Cluster 1: Middle neighborhoods - Zoos
* Cluster 2: Middle neighborhoods - Gardens/parks
* Cluster 3: Outer neighborhoods - Markets
* Cluster 4: Outer neighborhoods - Stables 

### 6. Analyse what would be the best area to open a Scandinavian restaurant

Since the Cluster 0 is in the Helsinki center, it contains the biggest number of people visiting restaurants, either tourists or business people.
This is also shown from the cluster 0 data, containing a lot of restaurants as the most common venues in that area.
Next, it is analyzed within cluster 0, what boroughs within it do not yet contain any Scandinavian restaurants.
Since the target is to analyze what would be the best area to open a new Scandinavian restaurant.

In [69]:
helsinki_cluster0=helsinki_merged.loc[helsinki_merged['Cluster Labels'] == 0, helsinki_merged.columns[[0] + list(range(1, 11))]]
helsinki_cluster0.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alppiharju,Theme Park Ride / Attraction,Park,Sushi Restaurant,Italian Restaurant,History Museum,Lounge,Gym,Greek Restaurant,Event Space,Dance Studio
1,Eira,Scandinavian Restaurant,Café,Playground,Bakery,Park,Boat or Ferry,French Restaurant,Bistro,Sushi Restaurant,Boutique
2,Etu-Töölö,Café,Scandinavian Restaurant,Park,Pub,Soccer Field,Mini Golf,Russian Restaurant,Supermarket,Gym,College Gym
7,Kaartinkaupunki,Scandinavian Restaurant,Café,Hotel,Restaurant,Coffee Shop,Bar,Cocktail Bar,Mediterranean Restaurant,Furniture / Home Store,Park
8,Kaivopuisto,Park,Coffee Shop,Ice Cream Shop,Restaurant,Scandinavian Restaurant,History Museum,Grocery Store,Pier,Nightclub,Recreation Center


In [70]:
helsinki_cluster0.set_index('Neighborhood',inplace=True)
helsinki_cluster0=helsinki_cluster0.transpose()
helsinki_cluster0.head()

Neighborhood,Alppiharju,Eira,Etu-Töölö,Kaartinkaupunki,Kaivopuisto,Kallio,Kamppi,Katajanokka,Kluuvi,Konala,...,Punavuori,Santahamina,Suomenlinna,Sörnäinen,Taka-Töölö,Tammisalo,Tapaninkylä,Tuomarinkylä,Ullanlinna,Vallila
1st Most Common Venue,Theme Park Ride / Attraction,Scandinavian Restaurant,Café,Scandinavian Restaurant,Park,Theme Park Ride / Attraction,Scandinavian Restaurant,Park,Clothing Store,Fast Food Restaurant,...,Scandinavian Restaurant,Museum,Scenic Lookout,Thai Restaurant,Café,Park,Gourmet Shop,Stables,Park,Pizza Place
2nd Most Common Venue,Park,Café,Scandinavian Restaurant,Café,Coffee Shop,Bar,Café,Scandinavian Restaurant,Café,Taxi Stand,...,Bakery,Cafeteria,Café,Event Space,Sushi Restaurant,Playground,Café,Restaurant,Ice Cream Shop,Bar
3rd Most Common Venue,Sushi Restaurant,Playground,Park,Hotel,Ice Cream Shop,Thai Restaurant,Japanese Restaurant,Hotel,Restaurant,Chinese Restaurant,...,Park,Gym,Event Space,Park,Coffee Shop,Harbor / Marina,Climbing Gym,Brewery,Pizza Place,Cafeteria
4th Most Common Venue,Italian Restaurant,Bakery,Pub,Restaurant,Restaurant,Park,Chinese Restaurant,Tram Station,Coffee Shop,Dog Run,...,Yoga Studio,Lounge,Monument / Landmark,Recreation Center,Italian Restaurant,Canal,Recreation Center,Farm,Scandinavian Restaurant,Sushi Restaurant
5th Most Common Venue,History Museum,Park,Soccer Field,Coffee Shop,Scandinavian Restaurant,Event Space,Asian Restaurant,Bar,Gym / Fitness Center,Supermarket,...,Coffee Shop,Filipino Restaurant,Island,Bar,Pizza Place,Zoo,Bus Stop,Dog Run,Boat or Ferry,Coffee Shop


In [71]:
#Then calculate how many Scandinavian restaurants there are within each borough in cluster0.
scandinavian_condition=temp.loc[:, 'Alppiharju':'Vallila']=='Scandinavian Restaurant'
scandinavian_condition.sum()

Neighborhood
Alppiharju                 0
Eira                       1
Etu-Töölö                  1
Kaartinkaupunki            1
Kaivopuisto                1
Kallio                     0
Kamppi                     1
Katajanokka                1
Kluuvi                     0
Konala                     0
Kruununhaka                1
Kumpula                    0
Länsisatama                0
Malmi                      0
Meilahti                   1
Mellunkylä                 0
Munkkiniemi                0
Mustikkamaa-Korkeasaari    0
Punavuori                  1
Santahamina                0
Suomenlinna                0
Sörnäinen                  0
Taka-Töölö                 0
Tammisalo                  0
Tapaninkylä                0
Tuomarinkylä               0
Ullanlinna                 1
Vallila                    0
dtype: int64

There are still quite many boroughs where there are no Scandinavian restaurants.
To attract both tourists and business people, it is best to select a borough that is really in the core center area. 
The boroughs that fulfil this core center criteria are: Kallio, Kluuvi, Länsisatama. Kallio is an area where mostly young people live, and it attracts also tourists, however tourists typically look for something more unique in that area. Länsisatama is next to the cruise ships, so tourists usually leave there for the center, so they don't stay there. That leaves Kluuvi as the most attractive place to open a new Scandinavian restaurant.