<a href="https://colab.research.google.com/github/damicofj/whereShouldIMove/blob/main/WhereShouldIMove.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Where Should I Move?
Final project for Data Science Professional Certificate from IBM

This project was inspired by the political and economical situation of my homecountry, Argentina. 
It has been a while since when I started thinking about moving to Europe, this because I am half italian and an EU citizen.
This project provided me with the tools and data to analize which are the best options available for me to choose between, and then from each country find the best city and neighborhoods.
 

# WhereShouldIMove
In this project I will decide **which is the best city of EUROPE to live in**, for *European Union citizens* (I currently live in a city of Argentina, and yes, I want to move). 
This will be accomplished combining data from different resources, obtained with web scrapping.

I will take into consideration the following factors:

*   Quality of Life
*   Purchasing Power  
*   Safety 
*   Health Care 
*   Cost of Living 
*   Property Price to Income Ratio 
*   Traffic Commute Time  
*   Pollution  
*   Climate 

With all this information I will be making various plots displaying differente kinds of data to understand the different options in a better way.

After finding the country with the **HIGHEST SCORE**, I will find the **BEST NEIGHBOURHOODS** to live in.

To find the best neighbourhood I will use the **FourSquare API** to find:

*   Overall Location
*   Green spaces
*   Train Stations
*   Distance from city center
*   Ammenities quantity

Finally I will plot the **NEIGHBOURHOOD LOCATION IN A MAP** of the chosen city.


References:

https://www.numbeo.com/cost-of-living/rankings_by_country.jsp?title=2021&region=150&displayColumn=-1

https://www.numbeo.com/quality-of-life/rankings_by_country.jsp

https://www.numbeo.com/crime/rankings_by_country.jsp

https://www.numbeo.com/traffic/rankings_by_country.jsp

https://www.numbeo.com/pollution/rankings_by_country.jsp

https://www.numbeo.com/health-care/rankings_by_country.jsp

https://towardsdatascience.com/neighbourhood-segmentation-and-clustering-using-foursquare-api-c43c113e89fb


In [None]:
# required packages
# math
import pandas as pd
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
import math
from decimal import *

# machine learning
from sklearn.cluster import KMeans
# import k-means from clustering stage

# scraping
from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests  # this module helps us to download a web page
import io
import json # library to handle JSON files
from google.colab import files # uploaded = files.upload()

# maps
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't got it installed
import folium # map rendering library
from folium.features import DivIcon
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
import folium
!pip install geocoder
import geocoder

# plotting
import plotly.express as px
import plotly.graph_objects as go

# import files
from google.colab import files
import io

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |███▎                            | 10kB 15.1MB/s eta 0:00:01[K     |██████▋                         | 20kB 11.5MB/s eta 0:00:01[K     |██████████                      | 30kB 9.3MB/s eta 0:00:01[K     |█████████████▎                  | 40kB 8.2MB/s eta 0:00:01[K     |████████████████▋               | 51kB 5.2MB/s eta 0:00:01[K     |████████████████████            | 61kB 5.6MB/s eta 0:00:01[K     |███████████████████████▎        | 71kB 6.0MB/s eta 0:00:01[K     |██████████████████████████▋     | 81kB 6.4MB/s eta 0:00:01[K     |██████████████████████████████  | 92kB 6.1MB/s eta 0:00:01[K     |████████████████████████████████| 102kB 4.3MB/s 
Collecting ratelim
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad

In [None]:
# Importing list of European Union countries
# fetching the url with bs
url = "https://www.gov.uk/eu-eea"
data  = requests.get(url).text
soup = BeautifulSoup(data,"html5lib")

#find a html table in the web page, and then making it look prettier
p = soup.findAll('p') # in html paragraph is represented by the tag <p>
EU_countries = p[6].text.replace('.','').replace(' and ', ', ').split(', ')
EU_countries.append('Switzerland')
EU_countries

['Austria',
 'Belgium',
 'Bulgaria',
 'Croatia',
 'Republic of Cyprus',
 'Czech Republic',
 'Denmark',
 'Estonia',
 'Finland',
 'France',
 'Germany',
 'Greece',
 'Hungary',
 'Ireland',
 'Italy',
 'Latvia',
 'Lithuania',
 'Luxembourg',
 'Malta',
 'Netherlands',
 'Poland',
 'Portugal',
 'Romania',
 'Slovakia',
 'Slovenia',
 'Spain',
 'Sweden',
 'Switzerland']

In [None]:
# Overall "Quality of life" score
# By country (World)
# https://www.numbeo.com/quality-of-life/rankings_by_country.jsp
# By country (Europe)
# https://www.numbeo.com/quality-of-life/rankings_by_country.jsp?title=2021&region=150
#######################################################################################

In [None]:
# Cost of living (with rent)
url = "https://www.numbeo.com/quality-of-life/rankings_by_country.jsp?title=2021&region=150"
data  = requests.get(url).text
soup = BeautifulSoup(data,"html5lib")

#find a html table in the web page, and then making it look prettier
table = soup.findAll('table') # in html table is represented by the tag <table>
cost_of_living_table = table[1]

# converting table to dataframe and adding score column
cl = pd.DataFrame(columns=["Index", 
                           "Country", 
                           "Quality of Life Index",
                           "Purchasing Power Index", 
                           "Safety Index",
                           "Health Care Index", 
                           "Cost of Living Index",
                           "Property Price to Income Ratio", 
                           "Traffic Commute Time Index", 
                           "Pollution Index", 
                           "Climate Index"])

for row in cost_of_living_table.find_all("tr"):
    col = row.find_all("td")
    if (col != []):
        index = col[0].text.replace("\n","")
        country = col[1].text.replace("\n","")
        quality_of_life = col[2].text.replace("\n","")
        purchasing_power = col[3].text.replace("\n","")
        safety = col[4].text.replace("\n","")
        healthcare = col[5].text.replace("\n","")
        cost_of_living = col[6].text.replace("\n","")
        property_to_income = col[7].text.replace("\n","")
        traffic = col[8].text.replace("\n","")
        pollution = col[9].text.replace("\n","")
        climate = col[10].text.replace("\n","")

        cl = cl.append({"Index":index, 
                        "Country":country, 
                        "Quality of Life Index":float(quality_of_life), 
                        "Purchasing Power Index":float(purchasing_power), 
                        "Safety Index":float(safety),
                        "Health Care Index":float(healthcare), 
                        "Cost of Living Index":float(cost_of_living),
                        "Property Price to Income Ratio":float(property_to_income), 
                        "Traffic Commute Time Index":float(traffic), 
                        "Pollution Index":float(pollution), 
                        "Climate Index":float(climate)}, ignore_index=True)

# extracting non-eu countries
for i in range(cl.shape[0]):
  if cl['Country'][i] not in EU_countries:
    # print(cl['Country'][i])
    cl.drop(cl.loc[cl['Country'] == cl['Country'][i]].index, inplace=True)

# ordering

del cl['Index']

cl = cl.sort_values(by='Quality of Life Index', ascending=False)

# ordenando alfabeticamente

# cl = cl.sort_values(by='Country', ascending=True)
cl = cl.reset_index(drop=True)

all_countries = cl

all_countries

Unnamed: 0,Country,Quality of Life Index,Purchasing Power Index,Safety Index,Health Care Index,Cost of Living Index,Property Price to Income Ratio,Traffic Commute Time Index,Pollution Index,Climate Index
0,Switzerland,190.82,110.96,78.65,74.47,131.75,8.42,28.73,20.09,80.05
1,Denmark,190.01,94.73,73.28,79.96,91.67,6.66,28.69,20.4,81.8
2,Netherlands,183.31,83.89,72.78,75.76,78.64,7.35,27.81,25.28,87.11
3,Finland,182.79,89.05,72.99,76.4,77.46,8.64,28.96,11.86,56.64
4,Austria,182.37,78.23,74.77,78.4,75.49,10.4,25.68,19.2,77.79
5,Germany,176.76,93.72,64.58,73.77,70.62,9.12,31.36,27.48,82.97
6,Estonia,173.56,61.22,76.62,72.83,56.45,9.11,24.72,19.01,64.28
7,Sweden,171.4,90.55,52.8,68.8,79.17,8.56,29.89,18.44,74.92
8,Slovenia,168.2,56.14,78.21,65.28,59.38,10.89,26.79,22.65,77.56
9,Spain,164.48,62.68,66.87,78.8,59.09,9.59,29.38,39.62,93.18


In [None]:
# Create chart here

fig = px.bar(all_countries, title="Quality of LIfe", x="Country", y="Quality of Life Index", color='Quality of Life Index')
fig.show()

In [None]:
# We will work with the top 7
cl = cl.head(7)

choosen_7 = cl['Country'].to_list()
choosen_7

['Switzerland',
 'Denmark',
 'Netherlands',
 'Finland',
 'Austria',
 'Germany',
 'Estonia']

In [None]:
# scrape coordinates of countries

# Cost of living (with rent)
url = "https://developers.google.com/public-data/docs/canonical/countries_csv"
data  = requests.get(url).text
soup = BeautifulSoup(data,"html5lib")

#find a html table in the web page, and then making it look prettier
table = soup.findAll('table') # in html table is represented by the tag <table>
countries_coordinates = table[0]


# converting table to dataframe and adding score column
lats = pd.DataFrame(columns=["Country code", 
                           "Latitude",
                           "Longitude", 
                           "Name"])

for row in countries_coordinates.find_all("tr"):
    col = row.find_all("td")
    if (col != []):
        country = col[0].text.replace("\n","")
        latitude = col[1].text.replace("\n","")
        longitude = col[2].text.replace("\n","")
        name = col[3].text.replace("\n","")

        lats = lats.append({"Country code":country, 
                        "Latitude":latitude, 
                        "Longitude":longitude, 
                        "Name":name}, ignore_index=True)

# extracting non-eu countries
for i in range(lats.shape[0]):
  if lats['Name'][i] not in EU_countries:
    lats.drop(lats.loc[lats['Name'] == lats['Name'][i]].index, inplace=True)

del lats['Country code']

# ordering
lats = lats.sort_values(by='Name',ascending = True) 
lats = lats.reset_index(drop=True)

lats

Unnamed: 0,Latitude,Longitude,Name
0,47.516231,14.550072,Austria
1,50.503887,4.469936,Belgium
2,42.733883,25.48583,Bulgaria
3,45.1,15.2,Croatia
4,49.817492,15.472962,Czech Republic
5,56.26392,9.501785,Denmark
6,58.595272,25.013607,Estonia
7,61.92411,25.748151,Finland
8,46.227638,2.213749,France
9,51.165691,10.451526,Germany


In [None]:
# searching for the top 7 of the last section
# extracting extra countries

for i in range(lats.shape[0]):
  if lats['Name'][i] not in choosen_7:
    lats.drop(lats.loc[lats['Name'] == lats['Name'][i]].index, inplace=True)

lats

Unnamed: 0,Latitude,Longitude,Name
0,47.516231,14.550072,Austria
5,56.26392,9.501785,Denmark
6,58.595272,25.013607,Estonia
7,61.92411,25.748151,Finland
9,51.165691,10.451526,Germany
18,52.132633,5.291266,Netherlands
26,46.818188,8.227512,Switzerland


In [None]:
# Appending coordinates of top 7 and cleaning

cl = cl.sort_values(by='Country',ascending = True) 
final_7 = pd.merge(lats, cl, left_on='Name', right_on='Country')
del final_7['Name']
final_7 = final_7.sort_values(by='Quality of Life Index',ascending = False) 
final_7 = final_7.reset_index(drop=True)

# Conver coordinates strings to floats
final_7['Latitude'] = final_7['Latitude'].astype(float)
final_7['Longitude'] = final_7['Longitude'].astype(float)

final_7

Unnamed: 0,Latitude,Longitude,Country,Quality of Life Index,Purchasing Power Index,Safety Index,Health Care Index,Cost of Living Index,Property Price to Income Ratio,Traffic Commute Time Index,Pollution Index,Climate Index
0,46.818188,8.227512,Switzerland,190.82,110.96,78.65,74.47,131.75,8.42,28.73,20.09,80.05
1,56.26392,9.501785,Denmark,190.01,94.73,73.28,79.96,91.67,6.66,28.69,20.4,81.8
2,52.132633,5.291266,Netherlands,183.31,83.89,72.78,75.76,78.64,7.35,27.81,25.28,87.11
3,61.92411,25.748151,Finland,182.79,89.05,72.99,76.4,77.46,8.64,28.96,11.86,56.64
4,47.516231,14.550072,Austria,182.37,78.23,74.77,78.4,75.49,10.4,25.68,19.2,77.79
5,51.165691,10.451526,Germany,176.76,93.72,64.58,73.77,70.62,9.12,31.36,27.48,82.97
6,58.595272,25.013607,Estonia,173.56,61.22,76.62,72.83,56.45,9.11,24.72,19.01,64.28


In [None]:
# create a Stamen Toner map of the world centered around Canada
europe_latitude = 55.5260
europe_longitude = 20.2551
# map_europe = folium.Map(location=[europe_latitude, europe_longitude], zoom_start=4) - mapa normal, con colores
europe_map = folium.Map(location=[europe_latitude, europe_longitude], zoom_start=4, tiles='Stamen Toner')

# add markers to map
for lat, lng, country, score in zip( final_7['Latitude'], final_7['Longitude'], final_7['Country'], final_7['Quality of Life Index']):
    index = int('{}'.format(index + 1))
    # label = '{} Country: {}, Score: {}'.format(index, country, score)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=15,
        # popup=label,
        tooltip= '{} Country: {}, Score: {}'.format(index, country, score), 
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.8,
        parse_html=False).add_to(europe_map)

# display map
europe_map

In [None]:
# Now 3 top countries
# final_7.head(3)

# Create Europe Map
europe_latitude = 52.5260
europe_longitude = 10.2551
index = 0
# create map of New York using latitude and longitude values
map_europe = folium.Map(location=[europe_latitude, europe_longitude], zoom_start=5)

# add markers to map
for lat, lng, country, score in zip( final_7.head(3)['Latitude'], final_7.head(3)['Longitude'], final_7.head(3)['Country'], final_7.head(3)['Quality of Life Index']):
    index = int('{}'.format(index + 1))
    # label = '{} Country: {}, Score: {}'.format(index, country, score)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=8,
        # popup=label,
        tooltip= '{} Country: {}, Score: {}'.format(index, country, score), 
        color='red',
        fill=True,
        fill_color='#FF0000',
        fill_opacity=0.8,
        parse_html=False).add_to(map_europe)

map_europe

In [None]:
# plotting stats of the top 3
final = final_7.head(3)
del final['Latitude']
del final['Longitude']
final

Unnamed: 0,Country,Quality of Life Index,Purchasing Power Index,Safety Index,Health Care Index,Cost of Living Index,Property Price to Income Ratio,Traffic Commute Time Index,Pollution Index,Climate Index
0,Switzerland,190.82,110.96,78.65,74.47,131.75,8.42,28.73,20.09,80.05
1,Denmark,190.01,94.73,73.28,79.96,91.67,6.66,28.69,20.4,81.8
2,Netherlands,183.31,83.89,72.78,75.76,78.64,7.35,27.81,25.28,87.11


In [None]:
# Getting capital city from each with coordinates:
# Switzerland = Zurich
# Denmark = Copenhagen
# Netherlands = Amsterdam

switzerland = {'Capital':'Zurich', 'Latitude':'47.3769', 'Longitude':'8.5417'} 
denmark = {'Capital':'Copenhagen', 'Latitude':'55.6761', 'Longitude':'12.5683'} 
netherlands = {'Capital':'Amsterdam', 'Latitude':'52.1326', 'Longitude':'5.2913'} 

# And graph of number 1: Zurich

final.head(1)


Unnamed: 0,Country,Quality of Life Index,Purchasing Power Index,Safety Index,Health Care Index,Cost of Living Index,Property Price to Income Ratio,Traffic Commute Time Index,Pollution Index,Climate Index
0,Switzerland,190.82,110.96,78.65,74.47,131.75,8.42,28.73,20.09,80.05


In [None]:
# Getting Neighbourhoods and extracting insights from number 1 (Zurich)

# https://en.wikipedia.org/wiki/Subdivisions_of_Z%C3%BCrich

# Ended up using googleshits to clean and extract the data because it was too dirty
# there where tables inside tables and nested tr's

uploaded = files.upload()

zurich_data = pd.read_csv(io.StringIO(uploaded['zurich.csv'].decode('utf-8')))
zurich_data

Saving zurich.csv to zurich (3).csv


Unnamed: 0,Neighborhood,Population,Non-Swiss Citizens
0,Rathaus,3081,23.90%
1,Hochschulen,695,27.50%
2,Lindenhof,950,20.30%
3,City,846,28.30%
4,Wollishofen,15592,22.90%
5,Leimbach,4867,23.00%
6,Enge,8375,26.00%
7,Alt-Wiedikon,14971,32.30%
8,Friesenberg,10360,22.90%
9,Sihlfeld,20554,36.50%


In [None]:
# Reordering Zurich data by % of foreigners per neighborhood
# ordering
zurich_data = zurich_data.sort_values(by='Non-Swiss Citizens',ascending = False) 
zurich_data = zurich_data.reset_index(drop=True)
zurich_data

Unnamed: 0,Neighborhood,Population,Non-Swiss Citizens
0,Hard,12715,46.20%
1,Langstrasse,10368,41.50%
2,Schwamendingen Mitte,10322,40.60%
3,Gewerbeschule,9690,39.30%
4,Hirzenbach,11265,37.50%
5,Altstetten,28278,37.00%
6,Sihlfeld,20554,36.50%
7,Seebach,19879,36.00%
8,Oerlikon,19585,33.70%
9,Werd,3878,33.20%


In [None]:
fig = px.bar(zurich_data, title="Foreigners by Neighborhood", x="Neighborhood", y="Non-Swiss Citizens")
fig.show()

In [None]:
# Getting data with FourSquare API
# Credentials

CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
ACCESS_TOKEN = '' # your FourSquare Access Token
VERSION = ''
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


In [None]:
# Defining a function to get coordinates of each neighborhood
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Zurich, Switzerland'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords
# Call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in zurich_data["Neighborhood"].tolist()]

In [None]:
# Create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

# Merge the coordinates into the original dataframe
zurich_data['Latitude'] = df_coords['Latitude']
zurich_data['Longitude'] = df_coords['Longitude']

zurich_data

Unnamed: 0,Neighborhood,Population,Non-Swiss Citizens,Latitude,Longitude
0,Hard,12715,46.20%,47.43583,8.62013
1,Langstrasse,10368,41.50%,47.382796,8.530004
2,Schwamendingen Mitte,10322,40.60%,47.40427,8.57326
3,Gewerbeschule,9690,39.30%,47.32496,8.795475
4,Hirzenbach,11265,37.50%,47.40301,8.5903
5,Altstetten,28278,37.00%,47.39175,8.48137
6,Sihlfeld,20554,36.50%,47.37382,8.51164
7,Seebach,19879,36.00%,47.42309,8.54255
8,Oerlikon,19585,33.70%,47.40964,8.54396
9,Werd,3878,33.20%,47.596973,8.702235


In [None]:
# Finally it is time to get the best neighbourhood the city of Zurich

# Creating a function to get all the ammenities near each Neighbourhoods

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
zurich_venues = getNearbyVenues(zurich_data['Neighborhood'], zurich_data['Latitude'], zurich_data['Longitude'], radius=500)

Hard
Langstrasse 
Schwamendingen Mitte 
Gewerbeschule
Hirzenbach
Altstetten
Sihlfeld
Seebach
Oerlikon 
Werd 
Alt-Wiedikon 
Saatlen 
Wipkingen
Affoltern 
Seefeld
City
Hochschulen
Escher Wyss
Enge
Albisrieden 
Weinegg
Oberstrass
Mühlebach 
Rathaus
Fluntern
Hottingen
Unterstrass
Leimbach 
Friesenberg
Wollishofen
Lindenhof 
Höngg 
Hirslanden 
Witikon


In [None]:
print('There are {} uniques categories.'.format(len(zurich_venues['Venue Category'].unique())))

There are 115 uniques categories.


In [None]:
# Analizing each neighborhood
# one hot encoding
zurich_onehot = pd.get_dummies(zurich_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
zurich_onehot['Neighborhood'] = zurich_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [zurich_onehot.columns[-1]] + list(zurich_onehot.columns[:-1])
zurich_onehot = zurich_onehot[fixed_columns]

zurich_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Arts & Crafts Store,Asian Restaurant,Automotive Shop,BBQ Joint,Bakery,Bar,Baseball Field,Bistro,Bookstore,Botanical Garden,Burger Joint,Burrito Place,Bus Station,Business Service,Cable Car,Cafeteria,Café,Cemetery,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Cocktail Bar,Coffee Shop,Concert Hall,Creperie,Cultural Center,Cupcake Shop,Dance Studio,Department Store,Dessert Shop,Diner,Discount Store,Doner Restaurant,Factory,Falafel Restaurant,Farmers Market,...,Nightclub,Other Great Outdoors,Paella Restaurant,Park,Pedestrian Plaza,Pizza Place,Plaza,Pool,Pool Hall,Pub,Restaurant,River,Salon / Barbershop,Sandwich Place,Sauna / Steam Room,Scenic Lookout,Shoe Store,Shopping Mall,Snack Place,Soccer Field,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Stables,Steakhouse,Supermarket,Swiss Restaurant,Tapas Restaurant,Thai Restaurant,Theater,Tibetan Restaurant,Trail,Train Station,Tram Station,Trattoria/Osteria,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Watch Shop
0,Langstrasse,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Langstrasse,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Langstrasse,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
3,Langstrasse,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
4,Langstrasse,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [None]:
# Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

zurich_grouped = zurich_onehot.groupby('Neighborhood').mean().reset_index()
zurich_grouped

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Arts & Crafts Store,Asian Restaurant,Automotive Shop,BBQ Joint,Bakery,Bar,Baseball Field,Bistro,Bookstore,Botanical Garden,Burger Joint,Burrito Place,Bus Station,Business Service,Cable Car,Cafeteria,Café,Cemetery,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Cocktail Bar,Coffee Shop,Concert Hall,Creperie,Cultural Center,Cupcake Shop,Dance Studio,Department Store,Dessert Shop,Diner,Discount Store,Doner Restaurant,Factory,Falafel Restaurant,Farmers Market,...,Nightclub,Other Great Outdoors,Paella Restaurant,Park,Pedestrian Plaza,Pizza Place,Plaza,Pool,Pool Hall,Pub,Restaurant,River,Salon / Barbershop,Sandwich Place,Sauna / Steam Room,Scenic Lookout,Shoe Store,Shopping Mall,Snack Place,Soccer Field,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Stables,Steakhouse,Supermarket,Swiss Restaurant,Tapas Restaurant,Thai Restaurant,Theater,Tibetan Restaurant,Trail,Train Station,Tram Station,Trattoria/Osteria,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Watch Shop
0,Affoltern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Albisrieden,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.090909,0.0,0.0,0.0,0.0
2,Alt-Wiedikon,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.666667,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Altstetten,0.0,0.0,0.0,0.090909,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0
4,City,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.1,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.066667,0.0,0.0,0.0,0.0,0.033333,0.0,0.1,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.033333
5,Enge,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.1,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.133333,0.0,0.0,0.033333,0.033333,0.0,0.0,0.1,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.066667,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0
6,Escher Wyss,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Fluntern,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0
8,Friesenberg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Gewerbeschule,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
# Let's print each neighborhood along with the top 5 most common venues

num_top_venues = 5

for hood in zurich_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = zurich_grouped[zurich_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Affoltern ----
                 venue  freq
0              Stables   1.0
1    Accessories Store   0.0
2  Moroccan Restaurant   0.0
3                  Pub   0.0
4            Pool Hall   0.0


----Albisrieden ----
           venue  freq
0    Supermarket  0.18
1    Pizza Place  0.18
2   Tram Station  0.18
3    Bus Station  0.18
4  Grocery Store  0.09


----Alt-Wiedikon ----
                 venue  freq
0          Bus Station  0.67
1          Supermarket  0.17
2                 Café  0.17
3  Moroccan Restaurant  0.00
4                  Pub  0.00


----Altstetten----
              venue  freq
0  Swiss Restaurant  0.18
1            Bakery  0.18
2               Gym  0.09
3      Tram Station  0.09
4              Pool  0.09


----City----
                           venue  freq
0  Vegetarian / Vegan Restaurant  0.10
1               Department Store  0.10
2                            Bar  0.10
3                         Lounge  0.07
4                   Cocktail Bar  0.07


----Enge----
       

In [None]:
# Let's put that into a _pandas_ dataframe

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

# Now let's create the new dataframe and display the top 10 venues for each neighborhood.

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = zurich_grouped['Neighborhood']

for ind in np.arange(zurich_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(zurich_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Affoltern,Stables,Watch Shop,Dance Studio,Department Store,Dessert Shop,Diner,Discount Store,Doner Restaurant,Factory,Falafel Restaurant
1,Albisrieden,Pizza Place,Tram Station,Supermarket,Bus Station,Swiss Restaurant,Grocery Store,Trattoria/Osteria,Department Store,Dessert Shop,Diner
2,Alt-Wiedikon,Bus Station,Café,Supermarket,Watch Shop,Department Store,Dessert Shop,Diner,Discount Store,Doner Restaurant,Factory
3,Altstetten,Bakery,Swiss Restaurant,Discount Store,Hotel,Asian Restaurant,Tram Station,Pool,Coffee Shop,Gym,Watch Shop
4,City,Bar,Vegetarian / Vegan Restaurant,Department Store,Lounge,Cocktail Bar,Grocery Store,Juice Bar,Gourmet Shop,Chocolate Shop,Gym
5,Enge,Park,Restaurant,Bar,Italian Restaurant,Swiss Restaurant,History Museum,Sauna / Steam Room,Chinese Restaurant,Cupcake Shop,Burger Joint
6,Escher Wyss,Burger Joint,Japanese Restaurant,Bar,Italian Restaurant,Restaurant,River,Korean Restaurant,Cheese Shop,Café,Shopping Mall
7,Fluntern,Plaza,Bakery,Tram Station,Supermarket,Gastropub,Grocery Store,Bus Station,Discount Store,Diner,Food & Drink Shop
8,Friesenberg,Bus Station,Light Rail Station,Cafeteria,Supermarket,Lounge,Restaurant,Discount Store,Doner Restaurant,Food & Drink Shop,Falafel Restaurant
9,Gewerbeschule,Grocery Store,Italian Restaurant,Asian Restaurant,Supermarket,Movie Theater,Watch Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Doner Restaurant


In [None]:
# Run k-means to cluster the neighborhood into 5 clusters

# set number of clusters
kclusters = 5

zurich_grouped_clustering = zurich_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(zurich_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

# Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

zurich_merged = zurich_data

# merge zurich_grouped with zurich_data to add latitude/longitude for each neighborhood
zurich_merged = zurich_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

zurich_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Population,Non-Swiss Citizens,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Hard,12715,46.20%,47.43583,8.62013,,,,,,,,,,,
1,Langstrasse,10368,41.50%,47.382796,8.530004,2.0,Café,Bar,Sandwich Place,Asian Restaurant,Thai Restaurant,Accessories Store,Meyhane,Middle Eastern Restaurant,Coffee Shop,Juice Bar
2,Schwamendingen Mitte,10322,40.60%,47.40427,8.57326,2.0,Tram Station,Bus Station,Restaurant,Thai Restaurant,Fast Food Restaurant,Light Rail Station,Café,Supermarket,Swiss Restaurant,Shopping Mall
3,Gewerbeschule,9690,39.30%,47.32496,8.795475,2.0,Grocery Store,Italian Restaurant,Asian Restaurant,Supermarket,Movie Theater,Watch Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Doner Restaurant
4,Hirzenbach,11265,37.50%,47.40301,8.5903,2.0,Tram Station,Pizza Place,Baseball Field,Furniture / Home Store,Steakhouse,Supermarket,Soccer Field,Diner,Discount Store,Fast Food Restaurant


In [None]:
# Finally, visualizing the clusters of the best 10 neighbourhood related to the % of foreigners
zurich_merged = zurich_merged.head(10)

# Getting latitude of Zurich
address = 'Zurich, CH'

geolocator = Nominatim(user_agent="z_explorer")
location = geolocator.geocode(address)
latitude = location.latitude + .015
longitude = location.longitude
print('The geograpical coordinate of Zurich are {}, {}.'.format(latitude, longitude))

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(zurich_merged['Latitude'], zurich_merged['Longitude'], zurich_merged['Neighborhood'], zurich_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=50,
        popup=label,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

The geograpical coordinate of Zurich are 47.3894489, 8.5410422.
