# The Battle of Neighborhoods
## Which neighborhood is most suitable for living in Overijssel, in the Netherlands?

### 1. Introduction

#### a. Problem definition
#### Since housing price in major cities in the Netherlands such as Amsterdam and Utrecht is significantly high, young people and expats have troubles with finding suitable locations to live. An alternative location is the east side of the Netherlands, Overijssel. The locational information on Overijssel such as neighbourhoods having various living facilities is, however, not well-known. Thus, this project aims to help people interested in lliving Overijssel and real-estate agents having properties there get a GIS-based regional guide presenting clustered neighbourhoods.

#### b. Data
##### This report employes 1) postal code data from a website (https://www.metatopos.eu/overijssel2.html), 2) Bing Map API offering latitude and longitude of each neighbourhood, and 3) Foursquare API providing information on venues in each location. These three kinds of data will allow to draw a map presenting several clusters of neighbourhoods in Overijssel.


#### 1. Scraping the list of postal codes in Overijssel

In [2]:
# import modules to be used for this assignment

import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

In [3]:
# The URL offering the list of postal codes in Overijssel

url_O = "https://www.metatopos.eu/overijssel2.html"

In [4]:
# Read HTML tables into a list of DataFrame objects

df = pd.read_html(url_O)

In [5]:
df

[                                                     0  \
 0    Gemeente(in de provincieOverijssel)met link na...   
 1                                             Deventer   
 2                                             Deventer   
 3                                             Deventer   
 4                                             Deventer   
 ..                                                 ...   
 542                                    Steenwijkerland   
 543                                    Steenwijkerland   
 544                                    Steenwijkerland   
 545                                    Steenwijkerland   
 546                                    Steenwijkerland   
 
                                                      1  \
 0    Naam woonplaatszoals ingeschrevenbij het Kadas...   
 1                                             Deventer   
 2                                             Deventer   
 3                                             Devente

In [6]:
# Define the table in the website as the Overijssel dataframe

df_overijssel = df[0]

In [7]:
df_overijssel

Unnamed: 0,0,1,2,3,4
0,Gemeente(in de provincieOverijssel)met link na...,Naam woonplaatszoals ingeschrevenbij het Kadas...,Plaatsen enbuurtschappenzoals vermeldbij de be...,Post-code,Wijknaamof omschrijvingof ligging van de wijk ...
1,Deventer,Deventer,Deventer,7411,"centrum, Knutteloord"
2,Deventer,Deventer,Deventer,7412,"Zandweerd, Zwolse Wijk"
3,Deventer,Deventer,Deventer,7413,Voorstad
4,Deventer,Deventer,Deventer,7414,"Platvoet, Borgele"
...,...,...,...,...,...
542,Steenwijkerland,Oldemarkt,Oldemarkt,8375,
543,Steenwijkerland,Oldemarkt,De Hare,8375,
544,Steenwijkerland,Ossenzijl,Ossenzijl,8376,
545,Steenwijkerland,Kalenberg,Kalenberg,8377,


In [8]:
# Drop unneccesary columns

df_overijssel.drop([1, 2], axis = 1, inplace = True)

In [9]:
# Rename the columns

df_overijssel.rename(columns = {0:"City", 3:"Postal Code", 4:"Neighbourhood"}, inplace = True)

In [10]:
# Drop the first row and change the sequence of columns

df_overijssel.drop([0], inplace = True)
df_overijssel = df_overijssel[['Postal Code', 'City', 'Neighbourhood']]

In [11]:
df_overijssel

Unnamed: 0,Postal Code,City,Neighbourhood
1,7411,Deventer,"centrum, Knutteloord"
2,7412,Deventer,"Zandweerd, Zwolse Wijk"
3,7413,Deventer,Voorstad
4,7414,Deventer,"Platvoet, Borgele"
5,7415,Deventer,Keizerslanden
...,...,...,...
542,8375,Steenwijkerland,
543,8375,Steenwijkerland,
544,8376,Steenwijkerland,
545,8377,Steenwijkerland,


In [12]:
# Drop the rows whose values are 'NaN'

df_overijssel.dropna(inplace = True)

#### 2. Acquiring the latitude and longitutde of each postal code using Bing Map API

In [13]:
# Import geocoder

import geocoder

In [14]:
# Add two columns 'Latitude' and 'Longitude'

df_overijssel['Latitude'] = ''
df_overijssel['Longitude'] = ''

df_overijssel.head()

Unnamed: 0,Postal Code,City,Neighbourhood,Latitude,Longitude
1,7411,Deventer,"centrum, Knutteloord",,
2,7412,Deventer,"Zandweerd, Zwolse Wijk",,
3,7413,Deventer,Voorstad,,
4,7414,Deventer,"Platvoet, Borgele",,
5,7415,Deventer,Keizerslanden,,


In [15]:
# Fill up the two columns using loop

for i in range(len(df_overijssel)):
    g = geocoder.bing('{}, the Netherlands, Overijssel, {}'.format(df_overijssel.iloc[i, 0], df_overijssel.iloc[i, 1]), key = "AveqYCAVfTS2P8hy16Zdv1JCwo_5CbSiObipo9UsshRtze28C--AYls_MLl0eFLd")
    df_overijssel.iloc[i, 3] = g.latlng[0]
    df_overijssel.iloc[i, 4] = g.latlng[1]

In [16]:
df_overijssel

Unnamed: 0,Postal Code,City,Neighbourhood,Latitude,Longitude
1,7411,Deventer,"centrum, Knutteloord",52.255886,6.16137
2,7412,Deventer,"Zandweerd, Zwolse Wijk",52.26442,6.135583
3,7413,Deventer,Voorstad,52.26189,6.158082
4,7414,Deventer,"Platvoet, Borgele",52.274899,6.151459
5,7415,Deventer,Keizerslanden,52.270542,6.165892
...,...,...,...,...,...
495,8293,Kampen,Mastenbroek-west (=ten westen van de Oude Wete...,52.561821,5.99851
498,8294,Zwartewaterland,Mastenbroek-oost (=ten oosten van de Oude Wete...,52.573589,6.022351
508,8331,Steenwijkerland,Steenwijk-centrum en -zuid: (ten zuiden van de...,52.786621,6.116048
509,8332,Steenwijkerland,Steenwijk-noord: (ten noorden van de spoorlijn...,52.798843,6.13266


In [17]:
# Reset the index

df_overijssel.reset_index(drop = True, inplace = True)

In [18]:
df_overijssel

Unnamed: 0,Postal Code,City,Neighbourhood,Latitude,Longitude
0,7411,Deventer,"centrum, Knutteloord",52.255886,6.16137
1,7412,Deventer,"Zandweerd, Zwolse Wijk",52.26442,6.135583
2,7413,Deventer,Voorstad,52.26189,6.158082
3,7414,Deventer,"Platvoet, Borgele",52.274899,6.151459
4,7415,Deventer,Keizerslanden,52.270542,6.165892
...,...,...,...,...,...
133,8293,Kampen,Mastenbroek-west (=ten westen van de Oude Wete...,52.561821,5.99851
134,8294,Zwartewaterland,Mastenbroek-oost (=ten oosten van de Oude Wete...,52.573589,6.022351
135,8331,Steenwijkerland,Steenwijk-centrum en -zuid: (ten zuiden van de...,52.786621,6.116048
136,8332,Steenwijkerland,Steenwijk-noord: (ten noorden van de spoorlijn...,52.798843,6.13266


#### 3. Exploring and clustering the neighbourhoods in Overijssel

In [19]:
# import libraries to be used

import matplotlib.cm as cm
import matplotlib.colors as colors
import folium

In [20]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# import Nominatim converting an address into latitude and longitude values
from geopy.geocoders import Nominatim

In [21]:
# Use geopy library to get the latitude and longitude values of Overijssel, the Netherlands

address = 'Overijssel, the Netherlands'

geolocator = Nominatim(user_agent = "Overijssel Explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print("The geographical coordinate of Overijssel are {}, {}.".format(latitude, longitude))

The geographical coordinate of Overijssel are 52.4254143, 6.4610611.


In [22]:
# Create a map of Overijssel with neighbourhoods superimposed on top

map_overijssel = folium.Map(location = [latitude, longitude], zoom_start = 10)

# Add markers to map

for lat, lng, neighbourhood, city in zip(df_overijssel["Latitude"], df_overijssel["Longitude"], df_overijssel["Neighbourhood"], df_overijssel["City"]):
    label = '{}, {}'.format(neighbourhood, city)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker([lat, lng], radius = 5, popup = label, color = 'blue', fill = True, fill_color = "#3186cc", fill_opacity = 0.7, parse_html = False).add_to(map_overijssel)

map_overijssel

In [23]:
# Define Foursquare Credentials and Version

CLIENT_ID = "H1RCC0M44AYUJUXSK4H3IBJCEAGRS54WJHD3ABHKNUBNGAI0"
CLIENT_SECRET = '4DGTG1NSFOP1KU333FRGDPRHAWB0FW2YC523ZPDKIG1D042B' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: H1RCC0M44AYUJUXSK4H3IBJCEAGRS54WJHD3ABHKNUBNGAI0
CLIENT_SECRET:4DGTG1NSFOP1KU333FRGDPRHAWB0FW2YC523ZPDKIG1D042B


In [24]:
# Import json_normalize 

from pandas.io.json import json_normalize

In [25]:
# Create a function getting the top 10 venues that are in each neighbourhood within a radius of 500 meters

LIMIT = 100
radius = 500

In [26]:
# Create a function getting the top 10 venues that are in each neighbourhood within a radius of 500 meters

def getNearbyVenues(names, latitudes, longitudes, radius = 500):

    venues_list = []
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)

        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)

        # Make the GET request
        results = requests.get(url).json()["response"]["groups"][0]["items"]

        # Return only relevant information for each nearby venue
        venues_list.append([(name, lat, lng, v['venue']['name'], v['venue']['location']['lat'], v['venue']['location']['lng'], v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 'Neighbourhood Latitude', 'Neighbourhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']

    return(nearby_venues)

In [35]:
Overijssel_venues = getNearbyVenues(names = df_overijssel['Neighbourhood'], latitudes = df_overijssel['Latitude'], longitudes = df_overijssel['Longitude'])

centrum, Knutteloord
Zandweerd, Zwolse Wijk
Voorstad
Platvoet, Borgele
Keizerslanden
Brinkgreven
Rivierenwijk
Bergweide, Snippeling, Het Hanzepark, Kloosterlanden
aan de Gelderse kant van de IJssel: De Hoven
Roessink, 't Bramelt, Essenerveld, Swormink, industrieterrein De Weteringen
Colmschater Enk
Groot Douwel, Klein Douwel, Blauwenoord, Het Oostrik
Het Jeurlink, Het Eetlaer, Spijkvoorderenk
De Vijfhoek, Op den Haar, Graveland, Spijkvoorde, Steinvoorde
Starinksweg en Aarninksweg en omgeving
omgeving Rodijksweg en Spitdijk
Nijverdal ten westen van N347
Nijverdal-centrum en -oost (ten oosten van N347 en ten zuiden van N35)
Kruidenwijk (ten noorden van N35)
Rijssen-centrumoost, -noord (ten oosten van de Nijverdalseweg, met bedrijventerrein De Mors), -oost (Reggehal) en -zuidoost
Rijssen-centrumwest, -zuidwest (Braakmanslanden, omg. Haarstraat, Holterstraatweg, Welleweg)
Veeneslagen, bedrijventerrein Plaagslagen
Goor-centrum, De Whee, Kevelhammerhoek
noordoostelijk bedrijventerrein Zenkel

In [36]:
Overijssel_venues

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"centrum, Knutteloord",52.255886,6.161370,Talamini,52.253434,6.159645,Ice Cream Shop
1,"centrum, Knutteloord",52.255886,6.161370,Filmhuis De Keizer,52.255300,6.161460,Movie Theater
2,"centrum, Knutteloord",52.255886,6.161370,Dille & Kamille,52.253459,6.161828,Furniture / Home Store
3,"centrum, Knutteloord",52.255886,6.161370,Simit Sarayı,52.253464,6.164308,Bagel Shop
4,"centrum, Knutteloord",52.255886,6.161370,Kaashandel De Brink,52.253371,6.159947,Cheese Shop
...,...,...,...,...,...,...,...
1093,Steenwijk-centrum en -zuid: (ten zuiden van de...,52.786621,6.116048,Shanghai,52.787248,6.113062,Chinese Restaurant
1094,Steenwijk-centrum en -zuid: (ten zuiden van de...,52.786621,6.116048,Villa Rams Woerthe,52.785336,6.112390,Monument / Landmark
1095,Steenwijk-centrum en -zuid: (ten zuiden van de...,52.786621,6.116048,Park Rams Woerthe,52.785632,6.110811,Park
1096,Steenwijk-noord: (ten noorden van de spoorlijn...,52.798843,6.132660,Roelofs Advies en Ontwerp,52.799770,6.130400,Construction & Landscaping


In [37]:
# Check how many venues were returned for each neighbourhood

Overijssel_venues.groupby('Neighbourhood').count()
print("There are {} unique categories.".format(len(Overijssel_venues['Venue Category'].unique())))

There are 191 unique categories.


#### 4. Analyze each neighbourhood

In [39]:
# one hot encoding
Overijssel_onehot = pd.get_dummies(Overijssel_venues[['Venue Category']], prefix = '', prefix_sep = '')

# add neighbourhood column back to dataframe
Overijssel_onehot['Neighbourhood'] = Overijssel_venues['Neighbourhood']

# Move neighbourhood column to the first column
fixed_columns = [Overijssel_onehot.columns[-1]] + list(Overijssel_onehot.columns[:-1])
Overijssel_onehot = Overijssel_onehot[fixed_columns]

Overijssel_onehot.head()

Unnamed: 0,Neighbourhood,Advertising Agency,American Restaurant,Amphitheater,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,...,Track,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Windmill,Wine Bar,Wine Shop,Women's Store
0,"centrum, Knutteloord",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"centrum, Knutteloord",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"centrum, Knutteloord",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"centrum, Knutteloord",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"centrum, Knutteloord",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [40]:
Overijssel_onehot.shape

(1098, 192)

In [41]:
# Group rows by neighbourhood and by taking the mean of the frequency of occurence of each category

Overijssel_grouped = Overijssel_onehot.groupby("Neighbourhood").mean().reset_index()
Overijssel_grouped

Unnamed: 0,Neighbourhood,Advertising Agency,American Restaurant,Amphitheater,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,...,Track,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Windmill,Wine Bar,Wine Shop,Women's Store
0,Aalanden behalve straatnamen op -beek,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0
1,Aalanden straatnamen op -beek,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0
2,Almelo-centrum en -oost,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.0,0.0,0.032258,0.0,0.032258,0.0,0.0,0.0,0.0
3,"Almelo-noord: Schelfhorst, Kluppelshuizen, Mar...",0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.5,...,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0
4,"Almelo-noordoost: Sluitersveld, Hedeman, Rumer...",0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
116,oostelijke nieuwbouwwijk De Vlierlanden (omgev...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.166667,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0
117,"ten noorden van de Gronausestraat: Het Schild,...",0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0
118,"ten noorden van de N34: Marslanden (De Velden,...",0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0
119,"ten oosten van de Vecht: centrum, Baalder, Baa...",0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0


In [42]:
# Print each neighbourhood along with the top 5 most common values

num_top_venues = 5

for hood in Overijssel_grouped['Neighbourhood']:
    print("----" + hood + "----")
    temp = Overijssel_grouped[Overijssel_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ["venue", 'freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending = False).reset_index(drop = True).head(num_top_venues))
    print('\n')

----Aalanden behalve straatnamen op -beek----
                venue  freq
0          Playground   0.4
1           Rest Area   0.2
2         Supermarket   0.2
3             Dog Run   0.2
4  Advertising Agency   0.0


----Aalanden straatnamen op -beek----
         venue  freq
0   Theme Park  0.25
1    Rest Area  0.25
2         Lake  0.25
3      Dog Run  0.25
4  Pizza Place  0.00


----Almelo-centrum en -oost----
              venue  freq
0              Café  0.10
1  Tapas Restaurant  0.06
2               Pub  0.06
3      Optical Shop  0.06
4    Sandwich Place  0.06


----Almelo-noord: Schelfhorst, Kluppelshuizen, Markgraven, Krommendijk, Rumerslanden----
                venue  freq
0    Asian Restaurant   0.5
1               Diner   0.5
2  Advertising Agency   0.0
3         Pizza Place   0.0
4         Music Venue   0.0


----Almelo-noordoost: Sluitersveld, Hedeman, Rumerslanden----
                venue  freq
0              Garden  0.33
1             Dog Run  0.33
2                Café  

In [43]:
# Write a function to sort the venues in descending order

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending = False)

    return row_categories_sorted.index.values[0:num_top_venues]

In [44]:
# Create the new dataframe and display the top 10 venues for each neighbourhood.

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

## create columns according to number of top venues

columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

## create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns = columns)
neighbourhoods_venues_sorted['Neighbourhood'] = Overijssel_grouped['Neighbourhood']

for ind in np.arange(Overijssel_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Overijssel_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aalanden behalve straatnamen op -beek,Playground,Rest Area,Supermarket,Dog Run,Advertising Agency,Pet Store,Museum,Music Venue,Nightclub,Optical Shop
1,Aalanden straatnamen op -beek,Theme Park,Rest Area,Lake,Dog Run,Pizza Place,Music Venue,Nightclub,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store
2,Almelo-centrum en -oost,Café,Tapas Restaurant,Pub,Optical Shop,Sandwich Place,Steakhouse,Bus Stop,Shoe Store,Clothing Store,Department Store
3,"Almelo-noord: Schelfhorst, Kluppelshuizen, Mar...",Asian Restaurant,Diner,Advertising Agency,Pizza Place,Music Venue,Nightclub,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store,Park
4,"Almelo-noordoost: Sluitersveld, Hedeman, Rumer...",Garden,Dog Run,Café,Advertising Agency,Pharmacy,Music Venue,Nightclub,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store


#### 5. Cluster Neighbourhoods

In [45]:
# Run k-means to cluster the neighbourhood into 5 clusters

## set number of clusters
kclusters = 5

Overijssel_grouped_clustering = Overijssel_grouped.drop('Neighbourhood', 1)

## run k-means clustering
kmeans = KMeans(n_clusters = kclusters, random_state = 0).fit(Overijssel_grouped_clustering)

## check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 2, 2, 2, 2, 2, 0, 0, 2, 2], dtype=int32)

In [46]:
# Create a new dataframe that includes the cluster as well as the top 10 venues for each neighbourhood

## add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Overijssel_merged = df_overijssel

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighbourhood
Overijssel_merged = Overijssel_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on = 'Neighbourhood')

Overijssel_merged.head()

Unnamed: 0,Postal Code,City,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,7411,Deventer,"centrum, Knutteloord",52.255886,6.16137,2.0,Restaurant,Bar,Italian Restaurant,Supermarket,Drugstore,Diner,Deli / Bodega,Electronics Store,Fast Food Restaurant,Plaza
1,7412,Deventer,"Zandweerd, Zwolse Wijk",52.26442,6.135583,0.0,Market,Supermarket,Burger Joint,Scenic Lookout,Bus Stop,Pharmacy,Nightclub,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store
2,7413,Deventer,Voorstad,52.26189,6.158082,2.0,Food Truck,Gym / Fitness Center,Gym,Park,Tennis Court,Advertising Agency,Pet Store,Museum,Music Venue,Nightclub
3,7414,Deventer,"Platvoet, Borgele",52.274899,6.151459,0.0,Pharmacy,Supermarket,Bus Stop,Market,Pet Store,Music Venue,Nightclub,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store
4,7415,Deventer,Keizerslanden,52.270542,6.165892,0.0,Supermarket,Shopping Mall,Drugstore,Gas Station,Pharmacy,Music Venue,Nightclub,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store


In [61]:
Overijssel_merged.dropna(inplace = True)

In [63]:
# Visualize the resulting clusters

## create map
map_clusters = folium.Map(location = [latitude, longitude], zoom_start = 11, legend)

## set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

## add markers to the map
markers_colors = []
for lat, lng, poi, cluster in zip(Overijssel_merged['Latitude'], Overijssel_merged['Longitude'], Overijssel_merged['Neighbourhood'], Overijssel_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html = True)
    folium.CircleMarker([lat, lng], radius = 5, popup = label, color = rainbow[int(cluster-1)], fill = True, fill_color = rainbow[int(cluster-1)], fill_opacity = 0.7).add_to(map_clusters)

map_clusters

In [64]:
# Examine Clusters

## Cluster 0

Overijssel_merged.loc[Overijssel_merged['Cluster Labels'] == 0, Overijssel_merged.columns[[1] + list(range(5, Overijssel_merged.shape[1]))]]

Unnamed: 0,City,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Deventer,0.0,Market,Supermarket,Burger Joint,Scenic Lookout,Bus Stop,Pharmacy,Nightclub,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store
3,Deventer,0.0,Pharmacy,Supermarket,Bus Stop,Market,Pet Store,Music Venue,Nightclub,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store
4,Deventer,0.0,Supermarket,Shopping Mall,Drugstore,Gas Station,Pharmacy,Music Venue,Nightclub,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store
6,Deventer,0.0,Supermarket,Record Shop,Soccer Field,Business Service,Movie Theater,Performing Arts Venue,Museum,Music Venue,Nightclub,Optical Shop
13,Deventer,0.0,Drugstore,Bakery,Supermarket,Department Store,Bus Stop,Liquor Store,Snack Place,Pharmacy,Advertising Agency,Optical Shop
17,Hellendoorn,0.0,Soccer Field,Supermarket,Fast Food Restaurant,Advertising Agency,Performing Arts Venue,Museum,Music Venue,Nightclub,Optical Shop,Outdoors & Recreation
21,Rijssen-Holten,0.0,Supermarket,Fast Food Restaurant,Advertising Agency,Movie Theater,Museum,Music Venue,Nightclub,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store
22,Hof van Twente,0.0,Supermarket,Asian Restaurant,Café,Chinese Restaurant,Drugstore,Train Station,Pharmacy,Diner,Italian Restaurant,Department Store
24,Haaksbergen,0.0,Supermarket,Bar,Café,Gastropub,Museum,Department Store,French Restaurant,Arts & Entertainment,Drugstore,Toy / Game Store
29,Enschede,0.0,Supermarket,Dance Studio,Music Venue,Arts & Crafts Store,Bus Stop,Turkish Restaurant,Furniture / Home Store,Automotive Shop,Advertising Agency,Pet Store


In [65]:
# Examine Clusters

## Cluster 1

Overijssel_merged.loc[Overijssel_merged['Cluster Labels'] == 1, Overijssel_merged.columns[[1] + list(range(5, Overijssel_merged.shape[1]))]]

Unnamed: 0,City,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,Enschede,1.0,Playground,Advertising Agency,Pharmacy,Museum,Music Venue,Nightclub,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store,Park
43,Enschede,1.0,Playground,Advertising Agency,Pharmacy,Museum,Music Venue,Nightclub,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store,Park


In [66]:
# Examine Clusters

## Cluster 2

Overijssel_merged.loc[Overijssel_merged['Cluster Labels'] == 2, Overijssel_merged.columns[[1] + list(range(5, Overijssel_merged.shape[1]))]]

Unnamed: 0,City,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Deventer,2.0,Restaurant,Bar,Italian Restaurant,Supermarket,Drugstore,Diner,Deli / Bodega,Electronics Store,Fast Food Restaurant,Plaza
2,Deventer,2.0,Food Truck,Gym / Fitness Center,Gym,Park,Tennis Court,Advertising Agency,Pet Store,Museum,Music Venue,Nightclub
5,Deventer,2.0,Soccer Field,Snack Place,Cafeteria,Advertising Agency,Multiplex,Music Venue,Nightclub,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store
7,Deventer,2.0,Park,Wine Shop,Garden Center,Camera Store,Advertising Agency,Pharmacy,Music Venue,Nightclub,Optical Shop,Outdoors & Recreation
8,Deventer,2.0,Bowling Alley,Garden Center,Cupcake Shop,Campground,Snack Place,Park,Fast Food Restaurant,Flower Shop,Music Venue,Furniture / Home Store
...,...,...,...,...,...,...,...,...,...,...,...,...
127,Kampen,2.0,Rental Car Location,Print Shop,Pet Store,Multiplex,Museum,Music Venue,Nightclub,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store
129,Kampen,2.0,Fishing Spot,Hockey Field,Multiplex,Museum,Music Venue,Nightclub,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store,Park
130,Kampen,2.0,Construction & Landscaping,Scenic Lookout,Bus Stop,Bed & Breakfast,Pizza Place,Nightclub,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store,Park
135,Steenwijkerland,2.0,Supermarket,Drugstore,Department Store,Plaza,Park,Steakhouse,Monument / Landmark,Bistro,Bar,Chinese Restaurant


In [67]:
# Examine Clusters

## Cluster 3

Overijssel_merged.loc[Overijssel_merged['Cluster Labels'] == 3, Overijssel_merged.columns[[1] + list(range(5, Overijssel_merged.shape[1]))]]

Unnamed: 0,City,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,Deventer,3.0,Park,Advertising Agency,Pharmacy,Museum,Music Venue,Nightclub,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store,Performing Arts Venue


In [70]:
# Examine Clusters

## Cluster 4

Overijssel_merged.loc[Overijssel_merged['Cluster Labels'] == 4, Overijssel_merged.columns[[1] + list(range(5, Overijssel_merged.shape[1]))]]

Unnamed: 0,City,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Enschede,4.0,Flower Shop,Movie Theater,Multiplex,Museum,Music Venue,Nightclub,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store,Park
66,Losser,4.0,Flower Shop,Movie Theater,Multiplex,Museum,Music Venue,Nightclub,Optical Shop,Outdoors & Recreation,Paper / Office Supplies Store,Park


In [71]:
# Examine Clusters

## Cluster 5

Overijssel_merged.loc[Overijssel_merged['Cluster Labels'] == 5, Overijssel_merged.columns[[1] + list(range(5, Overijssel_merged.shape[1]))]]

Unnamed: 0,City,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
