# APPLIED DATA SCIENCE CAPSTONE PROJECT
## The Battle of Neighbourhoods

In this project I will be analyzing neighbourhoods of Basel City, Switzerland, and cluster them using k-Means clustering algorithm to identify those that would suit my taste for moving best.

Questions to be addressed:

1. What are the features I am looking for in the neighbourhood?
2. What kind of data is required?
3. Where to collect the data from? 
4. How to collect the data?


### 1. Download all necessary libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

from bs4 import BeautifulSoup

print('Libraries imported.')

Libraries imported.


### 2. Collecting Basel neighbourhoods data
Scraping the relevant web-page for Basel district data and creating a dataframe with Postal Codes and Quartieres' names.

In [2]:
url = 'https://www.plz-suche.org/basel-ch7874'

r = requests.get(url)
soup = BeautifulSoup(r.content, 'html5lib')

In [3]:
# finding the right table
table = soup.find('table', {'class': 'list-location tablesorter tablesorter-location'})

In [4]:
# scraping the table
table_contents=[]

for row in table.findAll('tr'):
    cell = []
    for td in row:
        try:
            cell.append(td.text.replace('\n', ''))
        except:
            continue
            
    if len(cell) > 0:
        table_contents.append(cell)
    
print(table_contents)

[['PLZ', 'Name', 'Typ', '\xa0'], ['4001-4051', 'Altstadt Grossbasel', 'Quartier', ''], ['4058', 'Altstadt Kleinbasel', 'Quartier', ''], ['4051-4056', 'Am Ring', 'Quartier', ''], ['4054', 'Bachletten', 'Quartier', ''], ['4052', 'Breite', 'Quartier', ''], ['4059', 'Bruderholz', 'Quartier', ''], ['4058', 'Clara', 'Quartier', ''], ['4054', 'Gotthelf', 'Quartier', ''], ['4053', 'Gundeldingen', 'Quartier', ''], ['4058', 'Hirzbrunnen', 'Quartier', ''], ['4055', 'Iselin', 'Quartier', ''], ['4057', 'Kleinhüningen', 'Quartier', ''], ['4057', 'Klybeck', 'Quartier', ''], ['4057', 'Matthäus', 'Quartier', ''], ['4058', 'Rosental', 'Quartier', ''], ['4052', 'Sankt Alban', 'Quartier', ''], ['4056', 'Sankt Johann', 'Quartier', ''], ['4051', 'Vorstädte', 'Quartier', ''], ['4058', 'Wettstein', 'Quartier', '']]


In [5]:
# creating a dataframe
df = pd.DataFrame(table_contents)
print(df.head(10))

           0                    1         2  3
0        PLZ                 Name       Typ   
1  4001-4051  Altstadt Grossbasel  Quartier   
2       4058  Altstadt Kleinbasel  Quartier   
3  4051-4056              Am Ring  Quartier   
4       4054           Bachletten  Quartier   
5       4052               Breite  Quartier   
6       4059           Bruderholz  Quartier   
7       4058                Clara  Quartier   
8       4054             Gotthelf  Quartier   
9       4053         Gundeldingen  Quartier   


In [6]:
# cleaning up the data
df.drop([2, 3], axis = 1, inplace = True)
df.drop(0, axis = 0, inplace = True)
df.columns = ['Postal Code', 'Quartiere']

In [7]:
print(df)
print('There are {} quartieres in Basel City.'. format(df.shape[0]))

   Postal Code            Quartiere
1    4001-4051  Altstadt Grossbasel
2         4058  Altstadt Kleinbasel
3    4051-4056              Am Ring
4         4054           Bachletten
5         4052               Breite
6         4059           Bruderholz
7         4058                Clara
8         4054             Gotthelf
9         4053         Gundeldingen
10        4058          Hirzbrunnen
11        4055               Iselin
12        4057        Kleinhüningen
13        4057              Klybeck
14        4057             Matthäus
15        4058             Rosental
16        4052          Sankt Alban
17        4056         Sankt Johann
18        4051            Vorstädte
19        4058            Wettstein
There are 19 quartieres in Basel City.


Their postal codes are not unique, as they overlap over neighbouring districts. 

Let's collect the latitude and longitude values for each quartiere. This information will be required for obtaining venue information for each district.

In [8]:
#!conda install -c conda-forge geopy --yes # uncomment if needed
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

In [9]:
# collecting the geospacial data for the quartieres
geolocator = Nominatim(user_agent="ny_explorer")
coords = []

for quartiere in df['Quartiere']:
    latlon = {}
    address = quartiere + ', Basel, Switzerland'
    location = geolocator.geocode(address)
    latlon['Quartiere'] = quartiere
    latlon['Latitude'] = location.latitude
    latlon['Longitude'] = location.longitude
    coords.append(latlon)

In [10]:
# converting coordinates data into a dataframe
coordinates = pd.DataFrame(coords)

In [11]:
basel_data = pd.merge(df, coordinates, left_on = 'Quartiere', right_on = 'Quartiere', how = 'left')
print(basel_data)

   Postal Code            Quartiere   Latitude  Longitude
0    4001-4051  Altstadt Grossbasel  47.556427   7.588259
1         4058  Altstadt Kleinbasel  47.560700   7.593382
2    4051-4056              Am Ring  47.558774   7.577477
3         4054           Bachletten  47.548566   7.571726
4         4052               Breite  47.551809   7.617853
5         4059           Bruderholz  47.530799   7.591624
6         4058                Clara  47.564085   7.596629
7         4054             Gotthelf  47.555819   7.570952
8         4053         Gundeldingen  47.543219   7.591485
9         4058          Hirzbrunnen  47.568873   7.615470
10        4055               Iselin  47.562196   7.565999
11        4057        Kleinhüningen  47.583376   7.597574
12        4057              Klybeck  47.576798   7.590149
13        4057             Matthäus  47.567439   7.591540
14        4058             Rosental  47.567708   7.601491
15        4052          Sankt Alban  47.549565   7.605052
16        4056

Let's create a map of Basel with quartieres superimposed on top.

In [12]:
!conda install -c conda-forge folium=0.5.0 --yes # uncomment if necessary
import folium # map rendering library

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python-3.7-main

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    _libgcc_mutex-0.1          |      conda_forge           3 KB  conda-forge
    _openmp_mutex-4.5          |           1_llvm           5 KB  conda-forge
    _py-xgboost-mutex-2.0      |            cpu_0           8 KB  conda-forge
    _pytorch_select-0.2        |            gpu_0           2 KB
    absl-py-0.13.0             |     pyhd8ed1ab_0          97 KB  conda-forge
    aiohttp-3.7.4.post0        |   py37h5e8e339_0  

In [13]:
address = 'Basel, Switzerland'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Basel are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Basel are 47.5581077, 7.5878261.


In [14]:
# create map of Basel using latitude and longitude values
map_Basel = folium.Map(location = [latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, postcode, neighborhood in zip(basel_data['Latitude'], basel_data['Longitude'], basel_data['Postal Code'], basel_data['Quartiere']):
    label = '{}, {}'.format(postcode, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(map_Basel)  
    
map_Basel

### 3. Explore Basel neighbourhoods

#### 3.1
Define Foursquare Credentials and Version:

In [15]:
CLIENT_ID = '3YSYZDN33OA2YAUZJSOAPVBVKNO1BJMY53IJGT4ZL3YK2G10' # your Foursquare ID
CLIENT_SECRET = 'BSRXIDKQFN1BXCB0XZR131L45Z32MAH4FY3RAB2JWCBUNEZS' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 3YSYZDN33OA2YAUZJSOAPVBVKNO1BJMY53IJGT4ZL3YK2G10
CLIENT_SECRET:BSRXIDKQFN1BXCB0XZR131L45Z32MAH4FY3RAB2JWCBUNEZS


#### 3.2
Explore the first neighbourhood in Basel:

In [16]:
print('First neighbourhood on the list is {}.'.format(basel_data.loc[0,'Quartiere']))

neighborhood_latitude = basel_data.loc[0, 'Latitude'] # neighbourhood's latitude value
neighborhood_longitude = basel_data.loc[0, 'Longitude'] # neighbourhood's longitude value

neighborhood_name = basel_data.loc[0, 'Quartiere'] # neighbourhood's name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

First neighbourhood on the list is Altstadt Grossbasel.
Latitude and longitude values of Altstadt Grossbasel are 47.5564274, 7.5882594.


Lets get the first 100 venues in this neighbourhood within a radius of 500 meters and examine the results, if needed.

In [17]:
radius = 500
url = 'http://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, neighborhood_latitude, neighborhood_longitude, radius, LIMIT)

results = requests.get(url).json()
#results

Define the function that would retrieve the information about every quartiere individually.

In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Quartiere', 
                  'Quartiere Latitude', 
                  'Quartiere Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Write the code to run the above function on each neighbourhood and create a new dataframe called _basel_venues_:

In [19]:
basel_venues = getNearbyVenues(names = basel_data['Quartiere'], latitudes = basel_data['Latitude'], longitudes = basel_data['Longitude'])

Altstadt Grossbasel
Altstadt Kleinbasel
Am Ring
Bachletten
Breite
Bruderholz
Clara
Gotthelf
Gundeldingen
Hirzbrunnen
Iselin
Kleinhüningen
Klybeck
Matthäus
Rosental
Sankt Alban
Sankt Johann
Vorstädte
Wettstein


In [20]:
print(basel_venues.shape)
print(basel_venues.head())

(509, 7)
             Quartiere  Quartiere Latitude  Quartiere Longitude  \
0  Altstadt Grossbasel           47.556427             7.588259   
1  Altstadt Grossbasel           47.556427             7.588259   
2  Altstadt Grossbasel           47.556427             7.588259   
3  Altstadt Grossbasel           47.556427             7.588259   
4  Altstadt Grossbasel           47.556427             7.588259   

                      Venue  Venue Latitude  Venue Longitude  Venue Category  
0  The Bird's Eye Jazz Club       47.554796         7.587777       Jazz Club  
1  Naturhistorisches Museum       47.557664         7.590572  Science Museum  
2       Museum der Kulturen       47.557108         7.590558          Museum  
3       Der Teufelhof Basel       47.555893         7.586578           Hotel  
4                Marktplatz       47.558128         7.587754           Plaza  


Let's check how many venues were returned for each neighborhood.

In [21]:
basel_venues_count = basel_venues.groupby('Quartiere').count()
print(basel_venues_count)

                     Quartiere Latitude  Quartiere Longitude  Venue  \
Quartiere                                                             
Altstadt Grossbasel                  64                   64     64   
Altstadt Kleinbasel                  72                   72     72   
Am Ring                              18                   18     18   
Bachletten                            5                    5      5   
Breite                                9                    9      9   
Bruderholz                            7                    7      7   
Clara                                52                   52     52   
Gotthelf                             14                   14     14   
Gundeldingen                         40                   40     40   
Hirzbrunnen                           6                    6      6   
Iselin                                4                    4      4   
Kleinhüningen                         9                    9      9   
Klybec