# APPLIED DATA SCIENCE CAPSTONE PROJECT
## The Battle of Neighbourhoods

## INTRODUCTION
In this project we will be analyzing neighbourhoods of Basel City, Switzerland, and cluster them using k-Means clustering algorithm to identify those that would suit individual's taste for moving best.

Since Basel is a relatively small city with the population under 200,000 and only 19 neighbourhoods (known as quartieres), commute usually is not an issue. But identifying personal preferences is still essensial to improve daily convenience, especially during the uncertain COVID-times, when many are doing home-office and restrictions for movement often apply.

We will use data science to collect and analyze available information to generate several sets of neighbourhoods fulfilling different requirements. That would allow the individuals to make an informed decision on which area of the city they would like to reside in. 

## DATA
Based on definition of the problem, factors that may influence the decision are:
- population density
- number of available venues
- types of available venues

We decided to use all available quartieres within the city. Thus we will need to extract the following information:
- names and postal codes of Basel quartieres
- geolocation information of all neighbourhoods, collected using **GeoPy Geocoders**
- number and types of venues in each quartiere, obtained using **Foursquare API**

### 1. Download all necessary libraries

Before we continue on the project, let us prepare all the necessary libraries.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

from bs4 import BeautifulSoup

#!conda install -c conda-forge geopy --yes # uncomment if needed
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

!conda install -c conda-forge folium=0.5.0 --yes # uncomment if necessary
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python-3.7-main

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    _libgcc_mutex-0.1          |      conda_forge           3 KB  conda-forge
    _openmp_mutex-4.5          |           1_llvm           5 KB  conda-forge
    _py-xgboost-mutex-2.0      |            cpu_0           8 KB  conda-forge
    _pytorch_select-0.2        |            gpu_0           2 KB
    absl-py-0.13.0             |     pyhd8ed1ab_0          97 KB  conda-forge
    aiohttp-3.7.4.post0        |   py37h5e8e339_0  

### 2. Collecting Basel neighbourhoods data
Scraping the relevant web-page for Basel district data and creating a dataframe with Postal Codes and Quartieres' names.

In [2]:
url = 'https://www.plz-suche.org/basel-ch7874'

r = requests.get(url)
soup = BeautifulSoup(r.content, 'html5lib')

In [3]:
# finding the right table
table = soup.find('table', {'class': 'list-location tablesorter tablesorter-location'})

In [4]:
# scraping the table
table_contents=[]

for row in table.findAll('tr'):
    cell = []
    for td in row:
        try:
            cell.append(td.text.replace('\n', ''))
        except:
            continue
            
    if len(cell) > 0:
        table_contents.append(cell)
    
print(table_contents)

[['PLZ', 'Name', 'Typ', '\xa0'], ['4001-4051', 'Altstadt Grossbasel', 'Quartier', ''], ['4058', 'Altstadt Kleinbasel', 'Quartier', ''], ['4051-4056', 'Am Ring', 'Quartier', ''], ['4054', 'Bachletten', 'Quartier', ''], ['4052', 'Breite', 'Quartier', ''], ['4059', 'Bruderholz', 'Quartier', ''], ['4058', 'Clara', 'Quartier', ''], ['4054', 'Gotthelf', 'Quartier', ''], ['4053', 'Gundeldingen', 'Quartier', ''], ['4058', 'Hirzbrunnen', 'Quartier', ''], ['4055', 'Iselin', 'Quartier', ''], ['4057', 'Kleinhüningen', 'Quartier', ''], ['4057', 'Klybeck', 'Quartier', ''], ['4057', 'Matthäus', 'Quartier', ''], ['4058', 'Rosental', 'Quartier', ''], ['4052', 'Sankt Alban', 'Quartier', ''], ['4056', 'Sankt Johann', 'Quartier', ''], ['4051', 'Vorstädte', 'Quartier', ''], ['4058', 'Wettstein', 'Quartier', '']]


In [10]:
# creating a dataframe
df = pd.DataFrame(table_contents)
print(df.head(10))

           0                    1         2  3
0        PLZ                 Name       Typ   
1  4001-4051  Altstadt Grossbasel  Quartier   
2       4058  Altstadt Kleinbasel  Quartier   
3  4051-4056              Am Ring  Quartier   
4       4054           Bachletten  Quartier   
5       4052               Breite  Quartier   
6       4059           Bruderholz  Quartier   
7       4058                Clara  Quartier   
8       4054             Gotthelf  Quartier   
9       4053         Gundeldingen  Quartier   


In [11]:
# cleaning up the data
df.drop([2, 3], axis = 1, inplace = True)
df.drop(0, axis = 0, inplace = True)
df.columns = ['Postal Code', 'Quartiere']

In [12]:
print(df)
print('There are {} quartieres in Basel City.'. format(df.shape[0]))

   Postal Code            Quartiere
1    4001-4051  Altstadt Grossbasel
2         4058  Altstadt Kleinbasel
3    4051-4056              Am Ring
4         4054           Bachletten
5         4052               Breite
6         4059           Bruderholz
7         4058                Clara
8         4054             Gotthelf
9         4053         Gundeldingen
10        4058          Hirzbrunnen
11        4055               Iselin
12        4057        Kleinhüningen
13        4057              Klybeck
14        4057             Matthäus
15        4058             Rosental
16        4052          Sankt Alban
17        4056         Sankt Johann
18        4051            Vorstädte
19        4058            Wettstein
There are 19 quartieres in Basel City.


Their postal codes are not unique, as they overlap over neighbouring districts. 

Let's collect the latitude and longitude values for each quartiere. This information will be required for obtaining venue information for each district.

In [13]:
# collecting the geospacial data for the quartieres
geolocator = Nominatim(user_agent="ny_explorer")
coords = []

for quartiere in df['Quartiere']:
    latlon = {}
    address = quartiere + ', Basel, Switzerland'
    location = geolocator.geocode(address)
    latlon['Quartiere'] = quartiere
    latlon['Latitude'] = location.latitude
    latlon['Longitude'] = location.longitude
    coords.append(latlon)

In [14]:
# converting coordinates data into a dataframe
coordinates = pd.DataFrame(coords)

In [15]:
basel_data = pd.merge(df, coordinates, left_on = 'Quartiere', right_on = 'Quartiere', how = 'left')
print(basel_data)

   Postal Code            Quartiere   Latitude  Longitude
0    4001-4051  Altstadt Grossbasel  47.556427   7.588259
1         4058  Altstadt Kleinbasel  47.560700   7.593382
2    4051-4056              Am Ring  47.558774   7.577477
3         4054           Bachletten  47.548566   7.571726
4         4052               Breite  47.551809   7.617853
5         4059           Bruderholz  47.530799   7.591624
6         4058                Clara  47.564085   7.596629
7         4054             Gotthelf  47.555819   7.570952
8         4053         Gundeldingen  47.543219   7.591485
9         4058          Hirzbrunnen  47.568873   7.615470
10        4055               Iselin  47.562196   7.565999
11        4057        Kleinhüningen  47.583376   7.597574
12        4057              Klybeck  47.576798   7.590149
13        4057             Matthäus  47.567439   7.591540
14        4058             Rosental  47.567708   7.601491
15        4052          Sankt Alban  47.549565   7.605052
16        4056

Let's create a map of Basel with quartieres superimposed on top.

In [16]:
address = 'Basel, Switzerland'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Basel are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Basel are 47.5581077, 7.5878261.


In [18]:
# create map of Basel using latitude and longitude values
map_Basel = folium.Map(location = [latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, postcode, neighborhood in zip(basel_data['Latitude'], basel_data['Longitude'], basel_data['Postal Code'], basel_data['Quartiere']):
    label = '{}, {}'.format(postcode, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(map_Basel)  
    
map_Basel

### 3. Explore Basel neighbourhoods

#### 3.1
Define Foursquare Credentials and Version:

In [19]:
CLIENT_ID = '3YSYZDN33OA2YAUZJSOAPVBVKNO1BJMY53IJGT4ZL3YK2G10' # your Foursquare ID
CLIENT_SECRET = 'BSRXIDKQFN1BXCB0XZR131L45Z32MAH4FY3RAB2JWCBUNEZS' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 3YSYZDN33OA2YAUZJSOAPVBVKNO1BJMY53IJGT4ZL3YK2G10
CLIENT_SECRET:BSRXIDKQFN1BXCB0XZR131L45Z32MAH4FY3RAB2JWCBUNEZS


#### 3.2
Explore the first neighbourhood in Basel:

In [20]:
print('First neighbourhood on the list is {}.'.format(basel_data.loc[0,'Quartiere']))

neighborhood_latitude = basel_data.loc[0, 'Latitude'] # neighbourhood's latitude value
neighborhood_longitude = basel_data.loc[0, 'Longitude'] # neighbourhood's longitude value

neighborhood_name = basel_data.loc[0, 'Quartiere'] # neighbourhood's name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

First neighbourhood on the list is Altstadt Grossbasel.
Latitude and longitude values of Altstadt Grossbasel are 47.5564274, 7.5882594.


Lets get the first 100 venues in this neighbourhood within a radius of 500 meters and examine the results, if needed.

In [21]:
radius = 500
url = 'http://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, neighborhood_latitude, neighborhood_longitude, radius, LIMIT)

results = requests.get(url).json()
#results

Define the function that would retrieve the information about every quartiere individually.

In [22]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Quartiere', 
                  'Quartiere Latitude', 
                  'Quartiere Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Write the code to run the above function on each neighbourhood and create a new dataframe called _basel_venues_:

In [23]:
basel_venues = getNearbyVenues(names = basel_data['Quartiere'], latitudes = basel_data['Latitude'], longitudes = basel_data['Longitude'])

Altstadt Grossbasel
Altstadt Kleinbasel
Am Ring
Bachletten
Breite
Bruderholz
Clara
Gotthelf
Gundeldingen
Hirzbrunnen
Iselin
Kleinhüningen
Klybeck
Matthäus
Rosental
Sankt Alban
Sankt Johann
Vorstädte
Wettstein


In [24]:
print(basel_venues.shape)
print(basel_venues.head())

(503, 7)
             Quartiere  Quartiere Latitude  Quartiere Longitude  \
0  Altstadt Grossbasel           47.556427             7.588259   
1  Altstadt Grossbasel           47.556427             7.588259   
2  Altstadt Grossbasel           47.556427             7.588259   
3  Altstadt Grossbasel           47.556427             7.588259   
4  Altstadt Grossbasel           47.556427             7.588259   

                      Venue  Venue Latitude  Venue Longitude  Venue Category  
0  The Bird's Eye Jazz Club       47.554796         7.587777       Jazz Club  
1  Naturhistorisches Museum       47.557664         7.590572  Science Museum  
2       Der Teufelhof Basel       47.555893         7.586578           Hotel  
3       Museum der Kulturen       47.557108         7.590558          Museum  
4                Marktplatz       47.558128         7.587754           Plaza  


Let's check how many venues were returned for each neighborhood.

In [25]:
basel_venues_count = basel_venues.loc[:, ['Quartiere', 'Venue']].groupby('Quartiere').count()
basel_venues_count.rename(columns = {'Venue': 'Number of venues'}, inplace = True)
print(basel_venues_count)

                     Number of venues
Quartiere                            
Altstadt Grossbasel                65
Altstadt Kleinbasel                65
Am Ring                            15
Bachletten                          4
Breite                              8
Bruderholz                          7
Clara                              55
Gotthelf                           15
Gundeldingen                       39
Hirzbrunnen                         6
Iselin                              5
Kleinhüningen                      13
Klybeck                             9
Matthäus                           23
Rosental                           29
Sankt Alban                        10
Sankt Johann                       11
Vorstädte                         100
Wettstein                          24


In [26]:
print('There are {} uniques categories.'.format(len(basel_venues['Venue Category'].unique())))

There are 111 uniques categories.


#### 3.3
We are now analyzing each neighbourhood using one hot encoding.

In [27]:
# one hot encoding
basel_onehot = pd.get_dummies(basel_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
basel_onehot['Quartiere'] = basel_venues['Quartiere'] 

# move neighborhood column to the first column
fixed_columns = [basel_onehot.columns[-1]] + list(basel_onehot.columns[:-1])
basel_onehot = basel_onehot[fixed_columns]

print(basel_onehot.head())

             Quartiere  Art Gallery  Art Museum  Asian Restaurant  \
0  Altstadt Grossbasel            0           0                 0   
1  Altstadt Grossbasel            0           0                 0   
2  Altstadt Grossbasel            0           0                 0   
3  Altstadt Grossbasel            0           0                 0   
4  Altstadt Grossbasel            0           0                 0   

   Athletics & Sports  BBQ Joint  Bagel Shop  Bakery  Bar  Beer Garden  \
0                   0          0           0       0    0            0   
1                   0          0           0       0    0            0   
2                   0          0           0       0    0            0   
3                   0          0           0       0    0            0   
4                   0          0           0       0    0            0   

   Beer Store  Bike Shop  Bookstore  Botanical Garden  Boutique  Brewery  \
0           0          0          0                 0         0 

In [28]:
basel_onehot.shape

(503, 112)

Next, let's group rows by quartieres and by taking the mean of the frequency of occurrence of each category.

In [29]:
basel_grouped = basel_onehot.groupby('Quartiere').mean().reset_index()
print(basel_grouped)

              Quartiere  Art Gallery  Art Museum  Asian Restaurant  \
0   Altstadt Grossbasel     0.000000    0.015385          0.000000   
1   Altstadt Kleinbasel     0.000000    0.000000          0.000000   
2               Am Ring     0.000000    0.000000          0.000000   
3            Bachletten     0.000000    0.000000          0.000000   
4                Breite     0.000000    0.000000          0.000000   
5            Bruderholz     0.000000    0.000000          0.000000   
6                 Clara     0.000000    0.000000          0.018182   
7              Gotthelf     0.000000    0.000000          0.000000   
8          Gundeldingen     0.000000    0.000000          0.025641   
9           Hirzbrunnen     0.000000    0.000000          0.000000   
10               Iselin     0.000000    0.000000          0.000000   
11        Kleinhüningen     0.000000    0.000000          0.000000   
12              Klybeck     0.000000    0.000000          0.000000   
13             Matth

In [30]:
basel_grouped.shape

(19, 112)

#### 3.4
Let's print each quartiere along with the top 5 most common venues.

In [34]:
# number of top venues
num_top_venues = 4

for quart in basel_grouped['Quartiere']:
    print("----------"+quart+"----------")
    print('---Number of venues: {}.'.format(basel_venues_count.loc[quart, 'Number of venues']))
    
    temp = basel_grouped[basel_grouped['Quartiere'] == quart].T.reset_index()
    temp.columns = ['Type of venue','Frequency']
    temp = temp.iloc[1:]
    temp['Frequency'] = temp['Frequency'].astype(float)
    temp = temp.round({'Frequency': 2})
    print(temp.sort_values('Frequency', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----------Altstadt Grossbasel----------
---Number of venues: 65.
       Type of venue  Frequency
0                Bar       0.08
1  French Restaurant       0.06
2              Hotel       0.06
3               Café       0.06


----------Altstadt Kleinbasel----------
---Number of venues: 65.
        Type of venue  Frequency
0               Hotel       0.14
1  Italian Restaurant       0.08
2                 Bar       0.06
3                Café       0.06


----------Am Ring----------
---Number of venues: 15.
               Type of venue  Frequency
0               Tram Station       0.20
1                      Hotel       0.13
2                Bus Station       0.07
3  Middle Eastern Restaurant       0.07


----------Bachletten----------
---Number of venues: 4.
          Type of venue  Frequency
0           Zoo Exhibit       0.25
1                 Plaza       0.25
2  Fast Food Restaurant       0.25
3                  Park       0.25


----------Breite----------
---Number of venues: 8.
  T

Let's put this info _pandas_ dataframe. First, write a function to sort the venues in descending order.

In [35]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 5 venues for each neighborhood.

In [36]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Quartiere']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind + 1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind + 1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns = columns)
neighborhoods_venues_sorted['Quartiere'] = basel_grouped['Quartiere']

for ind in np.arange(basel_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(basel_grouped.iloc[ind, :], num_top_venues)

print(neighborhoods_venues_sorted.head)

<bound method NDFrame.head of               Quartiere 1st Most Common Venue 2nd Most Common Venue  \
0   Altstadt Grossbasel                   Bar    Italian Restaurant   
1   Altstadt Kleinbasel                 Hotel    Italian Restaurant   
2               Am Ring          Tram Station                 Hotel   
3            Bachletten           Zoo Exhibit                 Plaza   
4                Breite          Tram Station            Restaurant   
5            Bruderholz          Tram Station      Swiss Restaurant   
6                 Clara                 Hotel            Restaurant   
7              Gotthelf          Tram Station     Indian Restaurant   
8          Gundeldingen           Supermarket      Swiss Restaurant   
9           Hirzbrunnen           Supermarket                  Pool   
10               Iselin              Bus Stop           Coffee Shop   
11        Kleinhüningen  Fast Food Restaurant      Toy / Game Store   
12              Klybeck           Gaming Cafe  

### 4. Clustering neighbourhoods


#### 4.1
Run k-means to cluster the neighborhood into 5 clusters.

In [37]:
# set number of clusters
kclusters = 5

basel_grouped_clustering = basel_grouped.drop('Quartiere', 1)

# run k-means clustering
kmeans = KMeans(n_clusters = kclusters, random_state = 0).fit(basel_grouped_clustering)

# check cluster labels generated for each row in the dataframe
print(kmeans.labels_)

[0 0 1 2 4 4 0 1 0 1 3 0 0 0 0 4 0 0 0]


Let's create a new dataframe that includes the cluster as well as the top 5 venues for each neighborhood.

In [38]:
basel_venues_count.reset_index(inplace = True)
print(basel_venues_count.head())

             Quartiere  Number of venues
0  Altstadt Grossbasel                65
1  Altstadt Kleinbasel                65
2              Am Ring                15
3           Bachletten                 4
4               Breite                 8


In [42]:
# add clustering labels
# uncomment for the first run
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

basel_merged = basel_data

basel_merged = basel_merged.join(basel_venues_count.set_index('Quartiere'), on = 'Quartiere')

# merge basel_grouped with basel_data to add latitude/longitude for each neighborhood
basel_merged = basel_merged.join(neighborhoods_venues_sorted.set_index('Quartiere'), on = 'Quartiere')

In [43]:
print(basel_merged)

   Postal Code            Quartiere   Latitude  Longitude  Number of venues  \
0    4001-4051  Altstadt Grossbasel  47.556427   7.588259                65   
1         4058  Altstadt Kleinbasel  47.560700   7.593382                65   
2    4051-4056              Am Ring  47.558774   7.577477                15   
3         4054           Bachletten  47.548566   7.571726                 4   
4         4052               Breite  47.551809   7.617853                 8   
5         4059           Bruderholz  47.530799   7.591624                 7   
6         4058                Clara  47.564085   7.596629                55   
7         4054             Gotthelf  47.555819   7.570952                15   
8         4053         Gundeldingen  47.543219   7.591485                39   
9         4058          Hirzbrunnen  47.568873   7.615470                 6   
10        4055               Iselin  47.562196   7.565999                 5   
11        4057        Kleinhüningen  47.583376   7.5

#### 4.2
Finally, let's visualize the resulting clusters.

In [44]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(basel_merged['Latitude'], basel_merged['Longitude'], basel_merged['Quartiere'], basel_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius = 5,
        popup = label,
        color = rainbow[int(cluster)-1],
        fill = True,
        fill_color = rainbow[int(cluster)-1],
        fill_opacity = 0.7).add_to(map_clusters)
       
map_clusters

#### 4.3
Let's examine the resulting clusters in order to make the decision about what neighourhood would be the most suitable for the move.

In [45]:
print(basel_merged.loc[basel_merged['Cluster Labels'] == 0, basel_merged.columns[[1,4] + list(range(6, basel_merged.shape[1]))]])

              Quartiere  Number of venues 1st Most Common Venue  \
0   Altstadt Grossbasel                65                   Bar   
1   Altstadt Kleinbasel                65                 Hotel   
6                 Clara                55                 Hotel   
8          Gundeldingen                39           Supermarket   
11        Kleinhüningen                13  Fast Food Restaurant   
12              Klybeck                 9           Gaming Cafe   
13             Matthäus                23           Beer Garden   
14             Rosental                29                 Hotel   
16         Sankt Johann                11           Supermarket   
17            Vorstädte               100    Italian Restaurant   
18            Wettstein                24     Food & Drink Shop   

   2nd Most Common Venue 3rd Most Common Venue  4th Most Common Venue  \
0     Italian Restaurant                 Hotel      French Restaurant   
1     Italian Restaurant                   Bar   

In [46]:
print(basel_merged.loc[basel_merged['Cluster Labels'] == 1, basel_merged.columns[[1,4] + list(range(6, basel_merged.shape[1]))]])

     Quartiere  Number of venues 1st Most Common Venue 2nd Most Common Venue  \
2      Am Ring                15          Tram Station                 Hotel   
7     Gotthelf                15          Tram Station     Indian Restaurant   
9  Hirzbrunnen                 6           Supermarket                  Pool   

  3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue  
2         Historic Site           Supermarket        Sandwich Place  
7              Bus Stop           Bus Station                 Plaza  
9         Grocery Store          Tram Station          Skating Rink  


In [47]:
print(basel_merged.loc[basel_merged['Cluster Labels'] == 2, basel_merged.columns[[1,4] + list(range(6, basel_merged.shape[1]))]])

    Quartiere  Number of venues 1st Most Common Venue 2nd Most Common Venue  \
3  Bachletten                 4           Zoo Exhibit                 Plaza   

  3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue  
3                  Park  Fast Food Restaurant                Hostel  


In [48]:
print(basel_merged.loc[basel_merged['Cluster Labels'] == 3, basel_merged.columns[[1,4] + list(range(6, basel_merged.shape[1]))]])

   Quartiere  Number of venues 1st Most Common Venue 2nd Most Common Venue  \
10    Iselin                 5              Bus Stop           Coffee Shop   

   3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue  
10    Mexican Restaurant           Bus Station           Zoo Exhibit  


In [49]:
print(basel_merged.loc[basel_merged['Cluster Labels'] == 4, basel_merged.columns[[1,4] + list(range(6, basel_merged.shape[1]))]])

      Quartiere  Number of venues 1st Most Common Venue 2nd Most Common Venue  \
4        Breite                 8          Tram Station            Restaurant   
5    Bruderholz                 7          Tram Station      Swiss Restaurant   
15  Sankt Alban                10          Tram Station                  Park   

   3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue  
4            Supermarket                  Park    Italian Restaurant  
5         Scenic Lookout    Italian Restaurant                  Park  
15         Moving Target           Supermarket             BBQ Joint  


The neighbourhoods are clustered into 5 groups by venues' types.

## Conclusion

We have clustered the quarieres in Basel based on the venue types. To give a more detailed outlook of results, the map with plotted quartieres is available. There are only 19 neighbourhoods, a few of which contain very small number of venues (<10). The 2nd cluster (in red) seems to be the most popular type in Basel. It contains multiple shops, restraunts and entertainment venues. Clusters 3 and 4 don't appear to have easily accessible supermarkets.