<img src=https://littleml.files.wordpress.com/2016/09/stadsdelen-amsterdam.png width = 500 align="left">
<h1 align=center><font size=5>AMSTERDAM<br><br>NEIGHBORHOODS</font></h1>
<h2 align=center><font size=4>-</font></h2>
<h2 align=center><font size=4>July 2020</font></h2>
<h2 align=center><font size=4><a href="https://www.linkedin.com/in/dgallo88/">Daniel Gallo Sánchez</a></font></h2>


## Introduction
In this notebook I will convert Amsterdam's neighborhoods into their equivalent latitude and longitude values. Also, I will use the Foursquare API to explore these neighborhoods. I will use the **explore** function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. I will use the *k*-means clustering algorithm to complete this task. Finally, I will use the Folium library to visualize the neighborhoods in Amsterdam and their emerging clusters.

## Table of Contents

<div style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Download and Explore Dataset</a>

2. <a href="#item2">Explore Neighborhoods in Amsterdam</a>

3. <a href="#item3">Analyze Each Neighborhood</a>

4. <a href="#item4">Cluster Neighborhoods</a>

5. <a href="#item5">Examine Clusters</a>    
</font>
</div>

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [47]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

import matplotlib.pyplot as plt
%matplotlib inline

print('Libraries imported!!!')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported!!!


<a id='item1'></a>

## 1. Download and Explore Dataset

In order to segement Amsterdam neighborhoods and explore them, we will essentially need a dataset that contains the name of these neighborhoods as well as the the latitude and logitude coordinates of each neighborhood. Luckily, this information exists for free on the web at:  
https://www.amsterdam.nl/en/districts/  

Amsterdam is divided into *7 districts* and each district is divided into neighbourhoods. Amsterdam has *26 neighbourhoods* in total.

I put the name of the neighborhoods in an Excel documents and upload the file to GitHub. Another option it would be to use Postal Codes, but doing this analysis with the names of the neighborhoods is more comprenhensive. Let's have a look at Amsterdam neighboorhoods. 

In [48]:
neighborhoods_data = pd.read_excel('Amsterdam_Districts_and_Neighborhoods.xlsx')
neighborhoods_data

FileNotFoundError: [Errno 2] No such file or directory: 'Amsterdam_Districts_and_Neighborhoods.xlsx'

Let's check the type of the data.

In [None]:
print (neighborhoods_data.dtypes)

The type in the data frame is an *object*, let's change it to *string* for our own convenience later on. 

In [None]:
neighborhoods_data[["District"]] = neighborhoods_data[["District"]].astype('string')
neighborhoods_data[["Neighborhood"]] = neighborhoods_data[["Neighborhood"]].astype('string')
print (neighborhoods_data.dtypes)

Now that we have built a dataframe of the neighborhood names, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood. We will use the *geopy* library for this purpuse.

In order to define an instance of the geocoder, we need to define a *user_agent*. We will name our agent *ams_explorer*, as shown below.

In [None]:
neighborhoods =  pd.DataFrame(columns=['District','Neighborhood', 'Latitude', 'Longitude'])
geolocator = Nominatim(user_agent='ams_explorer')

for index, row in neighborhoods_data.iterrows():
    district_name = row['District']
    neighborhood_name = row['Neighborhood']
    address = neighborhood_name + ', Amsterdam, Noord-Holland, Nederland'  
    location = geolocator.geocode(address)
    if location is None:
        latitude = None
        longitude = None
    else:
        latitude = location.latitude
        longitude = location.longitude
    neighborhoods = neighborhoods.append({'District': district_name,'Neighborhood': str(neighborhood_name), 'Latitude': latitude, 'Longitude': longitude}, ignore_index=True)
    print('{}: {}'.format(index, neighborhood_name))
    print('location: {}'.format(location))
    print('{}: {}, {}.'.format('coordinates: ', latitude, longitude)) 

In [None]:
neighborhoods

### Dealing with missing data

Let's see how many elemens in the data frame are with *Latitude* and *Longitude* equal to *NaN*

In [None]:
null_values_df = neighborhoods[neighborhoods['Latitude'].isnull().values]
null_values_df

In [None]:
null_values_df.shape[0]

There are 5 neighborhoods with empty coordinates. Nominatim (in the background Open Street Map) cannot retrieve theirs coordinates. This is, for example, due to the fact that Oostelijke Eilanden/Kadijken is considered as two separated neighborhoods in Open Street Maps. The same applies for Amstel III/Bullewijk. On the other hand, IJburg Oost and IJburg Zuid are not considered as two separeted neighborhoods but one. The same for Bijlmer Centrum and Bijlmer Oost. As there are only 32 neighborhoods missing their latitude and longitude, I am going to look up those values on the internet and replace it with the right coordinates. 

**Missing Coordinates:**  
Sloterdijken-> Sloterdijk              (52.3871,4.8465)  

Noordelijke IJ-oever -> Buiksloterham  (52.3923, 4.9014)  
Noordelijke IJ-oever -> NDSM-werf      (52.4008527, 4.8912277)  
Noordelijke IJ-oever -> Overhoeks      (52.3871051, 4.8937385)  
Noordelijke IJ-oever -> Hamerkwartier  (52.3824882, 4.9220725)  

Amstel lll -> Amstel III (52.2952,4.9460)  

Bijlmer-Centrum -> Bijlmer-Centrum (52.3170,4.9650)  
Bijlmer-Oost -> Bijlmer-Oost       (52.3169,4.9801) 

In [None]:
neighborhoods.loc[neighborhoods['Neighborhood']=='Sloterdijken', ['Latitude', 'Longitude']] = [52.3871,4.8465] 
neighborhoods.loc[neighborhoods['Neighborhood']=='Noordelijke IJ-oever', ['Latitude', 'Longitude']] = [52.3923, 4.9014]
neighborhoods.loc[neighborhoods['Neighborhood']=='Amstel lll', ['Latitude', 'Longitude']] = [52.2952,4.9460] 
neighborhoods.loc[neighborhoods['Neighborhood']=='Bijlmer-Centrum', ['Latitude', 'Longitude']] = [52.3170,4.9650]
neighborhoods.loc[neighborhoods['Neighborhood']=='Bijlmer-Oost', ['Latitude', 'Longitude']] = [52.3169,4.9801]

#fixing some incorrect coordinates
neighborhoods.loc[neighborhoods['Neighborhood']=='Centrum-Oost', ['Latitude', 'Longitude']] = [52.36456,4.90678]
neighborhoods.loc[neighborhoods['Neighborhood']=='Waterland', ['Latitude', 'Longitude']] = [52.39345,4.99409]
neighborhoods.loc[neighborhoods['Neighborhood']=='Oud-Oost', ['Latitude', 'Longitude']] = [52.35999,4.92523]

neighborhoods

Let's now show the neighborhoods on a map

In [None]:
address = 'Amsterdam, Noord-Holland, Nederland'

location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Amsterdam are {}, {}.'.format(latitude, longitude))

Let's create a map of Amsterdam with neighborhoods superimposed on top. Colors represent districts.

In [None]:
# create map of Amsterddam using latitude and longitude values
map_ams = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the Districts
districs = neighborhoods['District'].unique()
kclusters = len(districs)
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []

# add markers to map
for lat, lng, district, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['District'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, district)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
        [lat, lng],
        radius=500,
        popup=label,
        color=rainbow[np.where(districs==district)[0][0]],
        fill=True,
        fill_color=rainbow[np.where(districs==district)[0][0]],
        fill_opacity=0.7,
        parse_html=False).add_to(map_ams)  
    
map_ams

**Folium** is a great visualization library. Feel free to zoom into the above map, and click on each circle mark to reveal the name of the neighborhood and its respective borough.

Next, we are going to start utilizing the **Foursquare API** to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

In [None]:
# @hidden_cell

CLIENT_ID = 'JMWOIVHGTJLVU0Q01P05GKLX2OCPUXINZLWVW4KYBFE3DLT0' # your Foursquare ID
CLIENT_SECRET = '01JPYMX0ECILJTS0HD232N02XDFX1TSF0WCFYTVZBUZO5GO1' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

<a id='item2'></a>

## 2. Explore Neighborhoods in Amsterdam

#### Let's create a function to repeat the same process to all the neighborhoods in Amsterdam

In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, limit=100):
            
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name + ':')
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        print('{} {}'.format(len(results), 'venues found'))
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Let's now write the code to run the above function on each neighborhood and create a new dataframe called *amsterdam_venues*.

In [None]:
amsterdam_venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude'],
                                   radius=500,
                                   limit=100
                                   )

Let's check the size of the resulting dataframe

In [None]:
print(amsterdam_venues.shape)
amsterdam_venues.head()

In [None]:
amsterdam_venues[amsterdam_venues['Neighborhood']=='Zeeburgereiland']

Let's check how many venues were returned for each neighborhood.

In [None]:
amsterdam_venues.groupby('Neighborhood').count().reset_index()

Let's plot this using a histogram to see the distribution of the venues.

In [None]:
amsterdam_venues.groupby('Neighborhood').count()['Venue'].plot(kind='hist', figsize=(8, 5))

In [None]:
amsterdam_venues.groupby('Neighborhood').count().shape

Let's find out how many unique categories can be curated from all the returned venues

In [None]:
print('There are {} uniques categories.'.format(len(amsterdam_venues['Venue Category'].unique())))
amsterdam_venues['Venue Category'].unique()

Let's see how many venues we have per category.

In [None]:
amsterdam_venues['Venue Category'].value_counts()

<a id='item3'></a>

## 3. Analyze Each Neighborhood

In [None]:
# one hot encoding
amsterdam_onehot = pd.get_dummies(amsterdam_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
amsterdam_onehot['Neighborhood'] = amsterdam_venues['Neighborhood'] 

# move neighborhood column to the first column
col_name = "Neighborhood"
first_col = amsterdam_onehot.pop(col_name)
amsterdam_onehot.insert(0, col_name, first_col)

amsterdam_onehot.head()

And let's examine the new dataframe size.

In [None]:
amsterdam_onehot.shape

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [None]:
amsterdam_grouped = amsterdam_onehot.groupby('Neighborhood').mean().reset_index()
amsterdam_grouped

#### Let's confirm the new size

In [None]:
amsterdam_grouped.shape

Let's print each neighborhood along with the top 5 most common venues

In [None]:
num_top_venues = 5

for hood in amsterdam_grouped['Neighborhood']:
    print("---- "+hood+" ----")
    temp = amsterdam_grouped[amsterdam_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [None]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [None]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = amsterdam_grouped['Neighborhood']

for ind in np.arange(amsterdam_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(amsterdam_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

<a id='item4'></a>

## 4. Cluster Neighborhoods

Below is the dataset that we are going to use to cluster Amsterdam neighborhoods based on most popular places.

In [None]:
amsterdam_grouped_clustering = amsterdam_grouped.drop('Neighborhood', 1)
amsterdam_grouped_clustering

The KMeans class has many parameters that can be used, but we will be using these three:
<ul>
    <li> <b>init</b>: Initialization method of the centroids. </li>
    <ul>
        <li> Value will be: "k-means++" </li>
        <li> k-means++: Selects initial cluster centers for k-mean clustering in a smart way to speed up convergence.</li>
    </ul>
    <li> <b>n_clusters</b>: The number of clusters to form as well as the number of centroids to generate. </li>
    <ul> <li> Value will be: 5</li> </ul>
    <li> <b>n_init</b>: Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia. </li>
    <ul> <li> Value will be: 12 </li> </ul>
</ul>

But, how can we choose right value for *k*? The general solution is to run *k*-means with different values for *k*, and choose the *k* that is the best for the model.  
For each k value, we will initialise k-means and use the inertia attribute to identify the sum of squared distances of samples to the nearest cluster centre.

In [None]:
# Best k
Ks=45
Sum_of_squared_distances =  np.zeros((Ks-1))

#Run k-means with different Ks 
for k in range(1,Ks):
    k_means = KMeans(init = "k-means++", n_clusters = k, n_init = 12, random_state=0).fit(amsterdam_grouped_clustering)    
    Sum_of_squared_distances[k-1]=k_means.inertia_
    
Sum_of_squared_distances

As k increases, the sum of squared distance tends to zero. Imagine we set k to its maximum value n (where n is number of samples) each sample will form its own cluster meaning sum of squared distances equals zero.

Below is a plot of sum of squared distances for k in the range specified above. If the plot looks like an arm, then the elbow on the arm is optimal k.

In [None]:
k= range(1,Ks)

fig = plt.figure()
fig.set_figwidth(12) # set width
fig.set_figheight(6) # set height

plt.plot(k, Sum_of_squared_distances,'o-')
plt.title('Elbow Method For Optimal k')
plt.ylabel('Sum_of_squared_distances')
plt.xlabel('k')
plt.tight_layout()
plt.xticks(k,k)
plt.show()

Let's use now the Silhoutte method to determine the best value for *k*.  

The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is (b - a) / max(a, b). To clarify, b is the distance between a sample and the nearest cluster that the sample is not a part of. Note that Silhouette Coefficient is only defined if number of labels is 2 <= n_labels <= n_samples - 1.

The best value is 1 and the worst value is -1. Values near 0 indicate overlapping clusters. Negative values generally indicate that a sample has been assigned to the wrong cluster, as a different cluster is more similar.

In [None]:
from sklearn.metrics import silhouette_score 

# Best k
Ks=45
silhouette_list =  np.zeros((Ks-2))

#Run k-means with different Ks 
for k in range(2,Ks):
    k_means = KMeans(init = "k-means++", n_clusters = k, n_init = 12, random_state = 0).fit(amsterdam_grouped_clustering)    
    silhouette_list[k-2] = silhouette_score(amsterdam_grouped_clustering, k_means.labels_)
    
silhouette_list

In [None]:
k= range(2,Ks)

fig = plt.figure()
fig.set_figwidth(12) # set width
fig.set_figheight(6) # set height

plt.plot(k, silhouette_list,'o-')
plt.title('silhouette_score')
plt.ylabel('silhouette_score')
plt.xlabel('k')
plt.tight_layout()
plt.xticks(k,k)
plt.show()

Let's try now with **Agglomerative Clustering**

In [None]:
from sklearn.cluster import AgglomerativeClustering

# Best k
Ks=45
silhouette_list =  np.zeros((Ks-2))

#Run k-means with different Ks 
for k in range(2,Ks):
    clusterer = AgglomerativeClustering(n_clusters = k, linkage='average').fit(amsterdam_grouped_clustering)    
    silhouette_list[k-2] = silhouette_score(amsterdam_grouped_clustering, clusterer.labels_)
    
silhouette_list

In [None]:
k= range(2,Ks)

fig = plt.figure()
fig.set_figwidth(12) # set width
fig.set_figheight(6) # set height

plt.plot(k, silhouette_list,'o-')
plt.title('silhouette_score')
plt.ylabel('silhouette_score')
plt.xlabel('k')
plt.tight_layout()
plt.xticks(k,k)
plt.show()

In [None]:
# set number of clusters
kclusters = 10
k_means = AgglomerativeClustering(n_clusters = kclusters, linkage='ward')
# run k-means clustering
k_means.fit(amsterdam_grouped_clustering)

# check cluster labels generated for each row in the dataframe
k_means.labels_

In [None]:
from scipy.spatial import distance_matrix 
dist_matrix = distance_matrix(amsterdam_grouped_clustering,amsterdam_grouped_clustering) 
df = pd.DataFrame(data=dist_matrix)
df.describe()

In [None]:
from scipy.cluster import hierarchy 
import pylab

Z = hierarchy.linkage(dist_matrix, 'ward')

fig = pylab.figure(figsize=(18,10))

dendro = hierarchy.dendrogram(Z)

Let's try now with **DBSCAN** (Density-Based Spatial Clustering of Applications of Noise )

In [None]:
from sklearn.cluster import DBSCAN

epsilon = 0.4
minimumSamples = 7
db = DBSCAN(eps=epsilon, min_samples=minimumSamples).fit(amsterdam_grouped_clustering)
labels = db.labels_


unique, counts = np.unique(db.labels_, return_counts=True)
for u, c in zip(unique, counts):
    print ('{}:{}'.format(u,c))

***KMEANS!!!!!***

In the plot above the elbow is at k=7 indicating the optimal k for this dataset is 7.  
Let's cluster now the neighborhoods in 7 groups. 

In [None]:
'''# set number of clusters
kclusters = 2
k_means = KMeans(init = "k-means++", n_clusters = kclusters, n_init = 12)
# run k-means clustering
k_means.fit(amsterdam_grouped_clustering)

# check cluster labels generated for each row in the dataframe
k_means.labels_
'''

In [None]:
unique, counts = np.unique(k_means.labels_, return_counts=True)
for u, c in zip(unique, counts):
    print ('{}:{}'.format(u,c))

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [None]:
# add clustering labels
neighborhoods_venues_labeled = neighborhoods_venues_sorted.copy()
neighborhoods_venues_labeled.insert(0, 'Cluster Labels', k_means.labels_)

amsterdam_merged = neighborhoods.copy()

# merge neighborhoods_venues_labeled with neighborhoods to add latitude/longitude for each neighborhood
amsterdam_merged = amsterdam_merged.join(neighborhoods_venues_labeled.set_index('Neighborhood'), on='Neighborhood')

amsterdam_merged.head() # check the last columns!

In [None]:
amsterdam_merged['Cluster Labels'].value_counts()

Finally, let's visualize the resulting clusters

In [None]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(amsterdam_merged['Latitude'], amsterdam_merged['Longitude'], amsterdam_merged['Neighborhood'], amsterdam_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ', Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. We will start counting from **Cluster 0**. I will leave this exercise to you.

#### Cluster 0

In [None]:
amsterdam_merged.loc[amsterdam_merged['Cluster Labels'] == 0, amsterdam_merged.columns[[1] + list(range(5, amsterdam_merged.shape[1]))]]

Cluster 1

In [None]:
amsterdam_merged.loc[amsterdam_merged['Cluster Labels'] == 1, amsterdam_merged.columns[[1] + list(range(5, amsterdam_merged.shape[1]))]]

Cluster 2

In [None]:
amsterdam_merged.loc[amsterdam_merged['Cluster Labels'] == 2, amsterdam_merged.columns[[1] + list(range(5, amsterdam_merged.shape[1]))]]

Cluster 3

In [None]:
amsterdam_merged.loc[amsterdam_merged['Cluster Labels'] == 3, amsterdam_merged.columns[[1] + list(range(5, amsterdam_merged.shape[1]))]]

Cluster 4

In [None]:
amsterdam_merged.loc[amsterdam_merged['Cluster Labels'] == 4, amsterdam_merged.columns[[1] + list(range(5, amsterdam_merged.shape[1]))]]