# IBM Data Science Professional Certificate Capstone Project: The Battle of Neighbourhoods

## Cologne: Where would I go when I am hungry (Hunger Games)

### Data acquisition
As this task requires the usage of the Foursquare's API, we will definitely use it. But, before that some basic information about the districts of  Cologne needs to be gathered. After conducting a research I have come up with various tables which contain many things: population, job market, voting resluts, etc. But, for the first step we need some basics. For this, the districts' table from the Wikipedia will do.

As the first thing let's import all the libraries and set everything up.

In [1]:
import pandas as pd
import numpy as np
import requests
pd.set_option('max_colwidth', 1000) # Show up to 1000 characters within each cell
#pd.set_option('max_rows', 20) # Show up to 20 dataframe rows
pd.set_option('max_columns', 1000) # Show up to 1000 columns

Creating a request with the URL that leads to the Cologne Districts Wiki.

In [2]:
url = "https://en.wikipedia.org/wiki/Districts_of_Cologne"
res = requests.get(url)
res

<Response [200]>

Response 200 means everything is OK. Now, let's use <code>pandas.read_html</code> to get the contents and have a preliminary look at the dataframe.

In [3]:
url_raw = pd.read_html(res.content)
type(url_raw)

list

In [4]:
url_raw = pd.read_html(res.content)[1]
url_raw

Unnamed: 0,Map,Coat,City district,City parts,Area,Population1,Pop. density,District Councils,Town Hall
0,,,District 1 Köln-Innenstadt,"Altstadt-Nord, Altstadt-Süd, Deutz, Neustadt-Nord, Neustadt-Süd",16.4 km²,127.033,7.746/km²,"Bezirksksamt Innenstadt Brückenstraße 19, D-50667 Köln",
1,,,District 2 Köln-Rodenkirchen,"Bayenthal, Godorf, Hahnwald, Immendorf, Marienburg, Meschenich, Raderberg, Raderthal, Rodenkirchen, Rondorf, Sürth, Weiß, Zollstock",54.6 km²,100.936,1.850/km²,"Bezirksamt Rodenkirchen Hauptstraße 85, D-50996 Köln",
2,,,District 3 Köln-Lindenthal,"Braunsfeld, Junkersdorf, Klettenberg, Lindenthal, Lövenich, Müngersdorf, Sülz, Weiden, Widdersdorf",41.6 km²,137.552,3.308/km²,"Bezirksamt Lindenthal Aachener Straße 220, 50931 Köln",
3,,,District 4 Köln-Ehrenfeld,"Bickendorf, Bocklemünd/Mengenich, Ehrenfeld, Neuehrenfeld, Ossendorf, Vogelsang",23.8 km²,103.621,4.348/km²,"Bezirksamt Ehrenfeld Venloer Straße 419 – 421, D-50825 Köln",
4,,,District 5 Köln-Nippes,"Bilderstöckchen, Longerich, Mauenheim, Niehl, Nippes, Riehl, Weidenpesch",31.8 km²,110.092,3.462/km²,"Bezirksamt NippesNeusser Straße 450,D-50733 Köln",
5,,,District 6 Köln-Chorweiler,"Blumenberg, Chorweiler, Esch/Auweiler, Fühlingen, Heimersdorf, Lindweiler, Merkenich, Pesch, Roggendorf/Thenhoven, Seeberg, Volkhoven/Weiler, Worringen",67.2 km²,80.870,1.204/km²,"Bezirksamt Chorweiler Pariser Platz 1, D-50765 Köln",
6,,,District 7 Köln-Porz,"Eil, Elsdorf, Ensen, Finkenberg, Gremberghoven, Grengel, Langel, Libur, Lind, Poll, Porz, Urbach, Wahn, Wahnheide, Westhoven, Zündorf",78.8 km²,106.520,1.352/km²,"Bezirksamt PorzFriedrich-Ebert-Ufer 64–70, D-51143 Köln",
7,,,District 8 Köln-Kalk,"Brück, Höhenberg, Humboldt/Gremberg, Kalk, Merheim, Neubrück, Ostheim, Rath/Heumar, Vingst",38.2 km²,108.330,2.841/km²,"Bezirksamt KalkKalker Hauptstraße 247–273,D-51103 Köln",
8,,,District 9 Köln-Mülheim,"Buchforst, Buchheim, Dellbrück, Dünnwald, Flittard, Höhenhaus, Holweide, Mülheim, Stammheim",52.2 km²,144.374,2.764/km²,"Bezirksamt Mülheim Wiener Platz 2a,D-51065 Köln",
9,,,Cologne,,405.15 km2,1.019.3282,2.516/km2,2.516/km2,2.516/km2


Looks good. Now we can create a proper data frame and do some data cleaning.

### Data Cleaning

In [5]:
df = url_raw
df.drop(['Map', 'Coat', 'Town Hall'], axis=1, inplace=True) #drop columns with NaN values

In [6]:
df.drop([9,10], inplace=True) #drop the two last rows which were redundant

In [7]:
df

Unnamed: 0,City district,City parts,Area,Population1,Pop. density,District Councils
0,District 1 Köln-Innenstadt,"Altstadt-Nord, Altstadt-Süd, Deutz, Neustadt-Nord, Neustadt-Süd",16.4 km²,127.033,7.746/km²,"Bezirksksamt Innenstadt Brückenstraße 19, D-50667 Köln"
1,District 2 Köln-Rodenkirchen,"Bayenthal, Godorf, Hahnwald, Immendorf, Marienburg, Meschenich, Raderberg, Raderthal, Rodenkirchen, Rondorf, Sürth, Weiß, Zollstock",54.6 km²,100.936,1.850/km²,"Bezirksamt Rodenkirchen Hauptstraße 85, D-50996 Köln"
2,District 3 Köln-Lindenthal,"Braunsfeld, Junkersdorf, Klettenberg, Lindenthal, Lövenich, Müngersdorf, Sülz, Weiden, Widdersdorf",41.6 km²,137.552,3.308/km²,"Bezirksamt Lindenthal Aachener Straße 220, 50931 Köln"
3,District 4 Köln-Ehrenfeld,"Bickendorf, Bocklemünd/Mengenich, Ehrenfeld, Neuehrenfeld, Ossendorf, Vogelsang",23.8 km²,103.621,4.348/km²,"Bezirksamt Ehrenfeld Venloer Straße 419 – 421, D-50825 Köln"
4,District 5 Köln-Nippes,"Bilderstöckchen, Longerich, Mauenheim, Niehl, Nippes, Riehl, Weidenpesch",31.8 km²,110.092,3.462/km²,"Bezirksamt NippesNeusser Straße 450,D-50733 Köln"
5,District 6 Köln-Chorweiler,"Blumenberg, Chorweiler, Esch/Auweiler, Fühlingen, Heimersdorf, Lindweiler, Merkenich, Pesch, Roggendorf/Thenhoven, Seeberg, Volkhoven/Weiler, Worringen",67.2 km²,80.87,1.204/km²,"Bezirksamt Chorweiler Pariser Platz 1, D-50765 Köln"
6,District 7 Köln-Porz,"Eil, Elsdorf, Ensen, Finkenberg, Gremberghoven, Grengel, Langel, Libur, Lind, Poll, Porz, Urbach, Wahn, Wahnheide, Westhoven, Zündorf",78.8 km²,106.52,1.352/km²,"Bezirksamt PorzFriedrich-Ebert-Ufer 64–70, D-51143 Köln"
7,District 8 Köln-Kalk,"Brück, Höhenberg, Humboldt/Gremberg, Kalk, Merheim, Neubrück, Ostheim, Rath/Heumar, Vingst",38.2 km²,108.33,2.841/km²,"Bezirksamt KalkKalker Hauptstraße 247–273,D-51103 Köln"
8,District 9 Köln-Mülheim,"Buchforst, Buchheim, Dellbrück, Dünnwald, Flittard, Höhenhaus, Holweide, Mülheim, Stammheim",52.2 km²,144.374,2.764/km²,"Bezirksamt Mülheim Wiener Platz 2a,D-51065 Köln"


In [8]:
df["City district"]=df["City district"].str[11:] #remove 'District #' from district descriptions
df.rename(columns={'Population1': 'Population'}, inplace=True) #rename the column 'Population1' to 'Population'
df

Unnamed: 0,City district,City parts,Area,Population,Pop. density,District Councils
0,Köln-Innenstadt,"Altstadt-Nord, Altstadt-Süd, Deutz, Neustadt-Nord, Neustadt-Süd",16.4 km²,127.033,7.746/km²,"Bezirksksamt Innenstadt Brückenstraße 19, D-50667 Köln"
1,Köln-Rodenkirchen,"Bayenthal, Godorf, Hahnwald, Immendorf, Marienburg, Meschenich, Raderberg, Raderthal, Rodenkirchen, Rondorf, Sürth, Weiß, Zollstock",54.6 km²,100.936,1.850/km²,"Bezirksamt Rodenkirchen Hauptstraße 85, D-50996 Köln"
2,Köln-Lindenthal,"Braunsfeld, Junkersdorf, Klettenberg, Lindenthal, Lövenich, Müngersdorf, Sülz, Weiden, Widdersdorf",41.6 km²,137.552,3.308/km²,"Bezirksamt Lindenthal Aachener Straße 220, 50931 Köln"
3,Köln-Ehrenfeld,"Bickendorf, Bocklemünd/Mengenich, Ehrenfeld, Neuehrenfeld, Ossendorf, Vogelsang",23.8 km²,103.621,4.348/km²,"Bezirksamt Ehrenfeld Venloer Straße 419 – 421, D-50825 Köln"
4,Köln-Nippes,"Bilderstöckchen, Longerich, Mauenheim, Niehl, Nippes, Riehl, Weidenpesch",31.8 km²,110.092,3.462/km²,"Bezirksamt NippesNeusser Straße 450,D-50733 Köln"
5,Köln-Chorweiler,"Blumenberg, Chorweiler, Esch/Auweiler, Fühlingen, Heimersdorf, Lindweiler, Merkenich, Pesch, Roggendorf/Thenhoven, Seeberg, Volkhoven/Weiler, Worringen",67.2 km²,80.87,1.204/km²,"Bezirksamt Chorweiler Pariser Platz 1, D-50765 Köln"
6,Köln-Porz,"Eil, Elsdorf, Ensen, Finkenberg, Gremberghoven, Grengel, Langel, Libur, Lind, Poll, Porz, Urbach, Wahn, Wahnheide, Westhoven, Zündorf",78.8 km²,106.52,1.352/km²,"Bezirksamt PorzFriedrich-Ebert-Ufer 64–70, D-51143 Köln"
7,Köln-Kalk,"Brück, Höhenberg, Humboldt/Gremberg, Kalk, Merheim, Neubrück, Ostheim, Rath/Heumar, Vingst",38.2 km²,108.33,2.841/km²,"Bezirksamt KalkKalker Hauptstraße 247–273,D-51103 Köln"
8,Köln-Mülheim,"Buchforst, Buchheim, Dellbrück, Dünnwald, Flittard, Höhenhaus, Holweide, Mülheim, Stammheim",52.2 km²,144.374,2.764/km²,"Bezirksamt Mülheim Wiener Platz 2a,D-51065 Köln"


In [9]:
print(f"The dataframe has {df.shape[0]} rows and {df.shape[1]} columns in it.")

The dataframe has 9 rows and 6 columns in it.


### Geo Data acquisition

Next step is to obtain goegraphical coordinates of the given districts. The step was tricky, but after some Googling, I was able to produce this one. The coordinates were appended to the data frame.

In [10]:
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="Cologne_food_explorer") #getting coordinates for a given address

df['Major_Dist_Coord']= df['City district'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude)) #creating a new column with coordinates using the district's name and applying a geocode function on it
df[['Latitude', 'Longitude']] = df['Major_Dist_Coord'].apply(pd.Series) #creating two separate columns with lat and long

df.drop(['Major_Dist_Coord'], axis=1, inplace=True) #dropping the temporary column
df

Unnamed: 0,City district,City parts,Area,Population,Pop. density,District Councils,Latitude,Longitude
0,Köln-Innenstadt,"Altstadt-Nord, Altstadt-Süd, Deutz, Neustadt-Nord, Neustadt-Süd",16.4 km²,127.033,7.746/km²,"Bezirksksamt Innenstadt Brückenstraße 19, D-50667 Köln",50.937328,6.959234
1,Köln-Rodenkirchen,"Bayenthal, Godorf, Hahnwald, Immendorf, Marienburg, Meschenich, Raderberg, Raderthal, Rodenkirchen, Rondorf, Sürth, Weiß, Zollstock",54.6 km²,100.936,1.850/km²,"Bezirksamt Rodenkirchen Hauptstraße 85, D-50996 Köln",50.865622,6.969718
2,Köln-Lindenthal,"Braunsfeld, Junkersdorf, Klettenberg, Lindenthal, Lövenich, Müngersdorf, Sülz, Weiden, Widdersdorf",41.6 km²,137.552,3.308/km²,"Bezirksamt Lindenthal Aachener Straße 220, 50931 Köln",50.935935,6.871246
3,Köln-Ehrenfeld,"Bickendorf, Bocklemünd/Mengenich, Ehrenfeld, Neuehrenfeld, Ossendorf, Vogelsang",23.8 km²,103.621,4.348/km²,"Bezirksamt Ehrenfeld Venloer Straße 419 – 421, D-50825 Köln",50.951502,6.916529
4,Köln-Nippes,"Bilderstöckchen, Longerich, Mauenheim, Niehl, Nippes, Riehl, Weidenpesch",31.8 km²,110.092,3.462/km²,"Bezirksamt NippesNeusser Straße 450,D-50733 Köln",50.958994,6.941777
5,Köln-Chorweiler,"Blumenberg, Chorweiler, Esch/Auweiler, Fühlingen, Heimersdorf, Lindweiler, Merkenich, Pesch, Roggendorf/Thenhoven, Seeberg, Volkhoven/Weiler, Worringen",67.2 km²,80.87,1.204/km²,"Bezirksamt Chorweiler Pariser Platz 1, D-50765 Köln",51.021167,6.898034
6,Köln-Porz,"Eil, Elsdorf, Ensen, Finkenberg, Gremberghoven, Grengel, Langel, Libur, Lind, Poll, Porz, Urbach, Wahn, Wahnheide, Westhoven, Zündorf",78.8 km²,106.52,1.352/km²,"Bezirksamt PorzFriedrich-Ebert-Ufer 64–70, D-51143 Köln",50.906705,6.999129
7,Köln-Kalk,"Brück, Höhenberg, Humboldt/Gremberg, Kalk, Merheim, Neubrück, Ostheim, Rath/Heumar, Vingst",38.2 km²,108.33,2.841/km²,"Bezirksamt KalkKalker Hauptstraße 247–273,D-51103 Köln",50.931923,7.005806
8,Köln-Mülheim,"Buchforst, Buchheim, Dellbrück, Dünnwald, Flittard, Höhenhaus, Holweide, Mülheim, Stammheim",52.2 km²,144.374,2.764/km²,"Bezirksamt Mülheim Wiener Platz 2a,D-51065 Köln",50.958147,7.013526


### Clustering

Now, let's retrieve the coordinates of Cologne.

In [11]:
address = 'Cologne'

geolocator = Nominatim(user_agent="Cologne_food_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(f'The geograpical coordinates of Cologne are {latitude}, {longitude}.')

The geograpical coordinates of Cologne are 50.938361, 6.959974.


The next step is to build the map to visualize the clusters. For this task the <code>Folium</code> module is needed. Then, very much like in previous labs we add markers to the map using a <code>for</code> loop.

In [12]:
import folium

Cologne_map = folium.Map(location=[latitude, longitude], zoom_start=11)

for latitude, longitude, dist in zip(df['Latitude'], df['Longitude'], df['City district']):
    dist = folium.Popup(dist, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=dist,
        color='green',
        fill=True
        ).add_to(Cologne_map)  
    
Cologne_map

Now we are using our Foursquare credentials to use them in further requests.

In [None]:
CLIENT_ID = '****' 
CLIENT_SECRET = '****'
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Then we need to copy a familiar function from our previous labs -- the one, that allows us to count the existing venues within the respective category.

In [14]:
def getNearbyVenues(names, latitudes, longitudes, radius=4000, LIMIT=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,
            LIMIT
            )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'],
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue',
                  'Venue Latitude', 
                  'Venue Longitude',           
                  'Venue Category']
    
    return(nearby_venues)

Next we assign a new variable to the output of the function above. We also feed 'City district', 'Latitude' and 'Longtitude' as inputs for the function.

In [15]:
Cologne_venues = getNearbyVenues(df['City district'], df['Latitude'], df['Longitude'])

Köln-Innenstadt
Köln-Rodenkirchen
Köln-Lindenthal
Köln-Ehrenfeld
Köln-Nippes
Köln-Chorweiler
Köln-Porz
Köln-Kalk
Köln-Mülheim


In [20]:
print(f'The Cologne_venues data frame has {Cologne_venues.shape[0]} rows and {Cologne_venues.shape[1]} columns.')

The Cologne_venues data frame has 808 rows and 7 columns.


After we checked the data frame in terms of number, let's check the table view.

In [17]:
Cologne_venues

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Köln-Innenstadt,50.937328,6.959234,Craftbeer Corner,50.937222,6.958928,Beer Bar
1,Köln-Innenstadt,50.937328,6.959234,LEGO Store,50.937042,6.956564,Toy / Game Store
2,Köln-Innenstadt,50.937328,6.959234,Papa Joe's Jazzlokal,50.937882,6.962241,Jazz Club
3,Köln-Innenstadt,50.937328,6.959234,Sattgrün,50.938441,6.954965,Vegetarian / Vegan Restaurant
4,Köln-Innenstadt,50.937328,6.959234,Rheinufer Altstadt,50.938827,6.962870,Pedestrian Plaza
...,...,...,...,...,...,...,...
803,Köln-Mülheim,50.958147,7.013526,Kaufland,50.972862,6.977087,Big Box Store
804,Köln-Mülheim,50.958147,7.013526,Lidl,50.927293,7.004272,Supermarket
805,Köln-Mülheim,50.958147,7.013526,AS Köln-Dellbrück (26),50.969116,7.030341,Intersection
806,Köln-Mülheim,50.958147,7.013526,Schlackenbergwerft,50.975157,7.000914,Park


Looks good. Now, we can filter out the venues that deal with food'n'drinks.

In [21]:
Cologne_Food = Cologne_venues[Cologne_venues['Venue Category'].str.contains('Restaurant|Bar|Snack|Food|Pizza')].reset_index(drop=True)
Cologne_Food.index = np.arange(1, len(Cologne_Venues_only_restaurant )+1)

In [22]:
print (Cologne_Food['Venue Category'].value_counts())

Italian Restaurant               37
German Restaurant                28
Bar                              16
Turkish Restaurant               14
Restaurant                       13
Greek Restaurant                 11
French Restaurant                11
Sushi Restaurant                  9
Cocktail Bar                      9
Pizza Place                       8
Tapas Restaurant                  7
Vietnamese Restaurant             7
Vegetarian / Vegan Restaurant     6
Middle Eastern Restaurant         6
Snack Place                       5
Spanish Restaurant                4
Mediterranean Restaurant          4
Kebab Restaurant                  4
Mexican Restaurant                4
Seafood Restaurant                3
Modern European Restaurant        3
Beer Bar                          3
Japanese Restaurant               3
Fast Food Restaurant              3
Asian Restaurant                  3
Schnitzel Restaurant              2
Israeli Restaurant                2
Falafel Restaurant          

In [26]:
#print(f'There are {Cologne_Food.groupby("Venue Category").max().shape[0]} unique venue categories.')
print(f'There are {len(Cologne_Food["Venue Category"].unique())} unique types of food\'n\'drinks venues.')

There are 37 unique types of food'n'drinks venues.


In [28]:
Cologne_Food[['Neighbourhood','Venue']].groupby('Neighbourhood').count()

Unnamed: 0_level_0,Venue
Neighbourhood,Unnamed: 1_level_1
Köln-Chorweiler,8
Köln-Ehrenfeld,37
Köln-Innenstadt,35
Köln-Kalk,26
Köln-Lindenthal,30
Köln-Mülheim,24
Köln-Nippes,27
Köln-Porz,33
Köln-Rodenkirchen,17


In [30]:
Cologne_Food.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Asian Restaurant,Köln-Rodenkirchen,50.958147,7.013526,Okinii Köln GmbH,50.938557,6.997982
Bar,Köln-Nippes,50.958994,7.013526,Toddy Tapper,50.953847,6.958031
Beer Bar,Köln-Nippes,50.958994,7.005806,Craftbeer Corner,50.937222,6.958928
Cocktail Bar,Köln-Nippes,50.958994,6.959234,Sudermanbar,50.951297,6.95419
Doner Restaurant,Köln-Porz,50.906705,6.999129,Mangal Döner,50.920646,6.959776
Eastern European Restaurant,Köln-Porz,50.906705,6.999129,HoteLux,50.938852,6.977004
Falafel Restaurant,Köln-Innenstadt,50.951502,6.959234,Habibi,50.929504,6.937835
Fast Food Restaurant,Köln-Rodenkirchen,51.021167,6.969718,Männi's Grillstube,51.020572,6.992931
Food & Drink Shop,Köln-Innenstadt,50.937328,6.959234,Seng Heng Asia Supermarkt,50.934464,6.945148
French Restaurant,Köln-Porz,50.958994,7.005806,Le Moissonnier,50.950796,6.962379


Now, in order to be able to work with this dataframe further we need to make sure the code can "see" which neighbourhood has which venues. Our variables here are categorical, so we would need to convert them to a vector using binary boolean principles. Namely, if the statement is <code>True</code> and the venue is present in a given neighbourhood - then this venue would have a value of **one (1)**. Similarly, if the statement is <code>False</code> and there is no such venue type in the neighbourhood - it will have a value of **zero (0)**. This is called one-hot encoding, and in <code>pandas</code> it can be applied via <code>.get_dummies</code> method.
We want to make sure to use <code>prefix=""</code> and <code>prefix_sep=""</code> to make sure that our columns have the right names.

In [31]:
cologne_venues_encoded = pd.get_dummies(Cologne_Food[['Venue Category']], prefix="", prefix_sep="")
cologne_venues_encoded

Unnamed: 0,Asian Restaurant,Bar,Beer Bar,Cocktail Bar,Doner Restaurant,Eastern European Restaurant,Falafel Restaurant,Fast Food Restaurant,Food & Drink Shop,French Restaurant,German Restaurant,Greek Restaurant,Hookah Bar,Indian Restaurant,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Pizza Place,Restaurant,Scandinavian Restaurant,Schnitzel Restaurant,Seafood Restaurant,Snack Place,Spanish Restaurant,Sports Bar,Sushi Restaurant,Tapas Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
3,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
233,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
234,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
235,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
236,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


As we want to know which venues are in which neighbourhood, we need to include a column "Neighbourhood" into our hot-encoded dataframe. For this we will use a <code>concat()</code> function along axis 1 -- columns.

In [32]:
cologne_venues_encoded = pd.concat([Cologne_Food['Neighbourhood'],cologne_venues_encoded],axis=1)
cologne_venues_encoded.head(10)

Unnamed: 0,Neighbourhood,Asian Restaurant,Bar,Beer Bar,Cocktail Bar,Doner Restaurant,Eastern European Restaurant,Falafel Restaurant,Fast Food Restaurant,Food & Drink Shop,French Restaurant,German Restaurant,Greek Restaurant,Hookah Bar,Indian Restaurant,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Pizza Place,Restaurant,Scandinavian Restaurant,Schnitzel Restaurant,Seafood Restaurant,Snack Place,Spanish Restaurant,Sports Bar,Sushi Restaurant,Tapas Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
1,Köln-Innenstadt,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Köln-Innenstadt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
3,Köln-Innenstadt,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Köln-Innenstadt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
5,Köln-Innenstadt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
6,Köln-Innenstadt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
7,Köln-Innenstadt,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,Köln-Innenstadt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
9,Köln-Innenstadt,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
10,Köln-Innenstadt,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [33]:
print(f'This dataframe has {cologne_venues_encoded.shape[0]} rows and {cologne_venues_encoded.shape[1]} columns.')

This dataframe has 237 rows and 38 columns.


Now, to see which venue categories dominate the neighbourhood and how these venue types are distributed across Cologne we would sort/group the dataframe according to the mean values of venue types per neighbourhood.

In [35]:
cologne_grouped = cologne_venues_encoded.groupby('Neighbourhood').mean().reset_index()
cologne_grouped

Unnamed: 0,Neighbourhood,Asian Restaurant,Bar,Beer Bar,Cocktail Bar,Doner Restaurant,Eastern European Restaurant,Falafel Restaurant,Fast Food Restaurant,Food & Drink Shop,French Restaurant,German Restaurant,Greek Restaurant,Hookah Bar,Indian Restaurant,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Pizza Place,Restaurant,Scandinavian Restaurant,Schnitzel Restaurant,Seafood Restaurant,Snack Place,Spanish Restaurant,Sports Bar,Sushi Restaurant,Tapas Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
0,Köln-Chorweiler,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.125,0.0,0.125,0.0,0.0,0.375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0
1,Köln-Ehrenfeld,0.0,0.135135,0.0,0.081081,0.0,0.0,0.027027,0.0,0.0,0.027027,0.081081,0.0,0.0,0.0,0.027027,0.162162,0.027027,0.027027,0.027027,0.0,0.0,0.0,0.0,0.0,0.081081,0.0,0.027027,0.0,0.027027,0.027027,0.0,0.054054,0.081081,0.027027,0.0,0.027027,0.027027
2,Köln-Innenstadt,0.0,0.028571,0.028571,0.085714,0.0,0.0,0.028571,0.028571,0.028571,0.085714,0.028571,0.0,0.0,0.0,0.028571,0.257143,0.028571,0.028571,0.0,0.0,0.0,0.028571,0.028571,0.028571,0.028571,0.0,0.028571,0.0,0.028571,0.0,0.0,0.057143,0.028571,0.0,0.028571,0.028571,0.0
3,Köln-Kalk,0.038462,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.115385,0.076923,0.038462,0.0,0.0,0.0,0.153846,0.0,0.0,0.0,0.038462,0.038462,0.076923,0.038462,0.076923,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.076923,0.076923,0.076923,0.0
4,Köln-Lindenthal,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.1,0.0,0.0,0.0,0.1,0.033333,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.033333,0.033333,0.1,0.066667,0.0,0.0,0.0,0.0
5,Köln-Mülheim,0.041667,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.041667,0.0,0.041667,0.0,0.083333,0.0,0.0,0.0,0.083333,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.333333,0.041667,0.041667,0.0
6,Köln-Nippes,0.0,0.148148,0.037037,0.111111,0.0,0.0,0.0,0.0,0.0,0.074074,0.074074,0.037037,0.0,0.0,0.0,0.148148,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.037037,0.0,0.0,0.037037,0.037037,0.0,0.037037,0.037037,0.037037
7,Köln-Porz,0.0,0.0,0.0,0.0,0.030303,0.030303,0.0,0.0,0.0,0.060606,0.181818,0.060606,0.0,0.0,0.0,0.121212,0.0,0.0,0.0,0.030303,0.030303,0.060606,0.030303,0.121212,0.030303,0.0,0.0,0.060606,0.0,0.030303,0.0,0.0,0.0,0.060606,0.030303,0.030303,0.0
8,Köln-Rodenkirchen,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.294118,0.176471,0.0,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.058824,0.058824,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0


In [36]:
print(f'This dataframe has {cologne_grouped.shape[0]} rows and {cologne_grouped.shape[1]} columns.')

This dataframe has 9 rows and 38 columns.


Again, refering to the previous lab, we can borrow a function from it. The function is supposed to return most common venues.

In [37]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Additionally, we can borrow the sequence of finding top 10 venues in each neighbourhood and passing them to a new dataframe, which will list all the venues per neighbourhood according to their popularity/presence. As the sequence is using <code>numpy</code> we have to import it as well (we did it in the beginning).

In [39]:
#import numpy as np

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = cologne_grouped['Neighbourhood']

for ind in np.arange(cologne_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(cologne_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Köln-Chorweiler,Italian Restaurant,German Restaurant,Restaurant,Sushi Restaurant,Fast Food Restaurant,Hookah Bar,Snack Place,Pizza Place,Scandinavian Restaurant,Schnitzel Restaurant
1,Köln-Ehrenfeld,Italian Restaurant,Bar,Cocktail Bar,Tapas Restaurant,German Restaurant,Restaurant,Sushi Restaurant,Korean Restaurant,Vietnamese Restaurant,Turkish Restaurant
2,Köln-Innenstadt,Italian Restaurant,Cocktail Bar,French Restaurant,Sushi Restaurant,Israeli Restaurant,Snack Place,Restaurant,Pizza Place,Modern European Restaurant,Middle Eastern Restaurant
3,Köln-Kalk,Italian Restaurant,French Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Turkish Restaurant,German Restaurant,Pizza Place,Asian Restaurant,Spanish Restaurant
4,Köln-Lindenthal,German Restaurant,Bar,Greek Restaurant,Restaurant,Sushi Restaurant,Italian Restaurant,Tapas Restaurant,Mexican Restaurant,Sports Bar,Spanish Restaurant
5,Köln-Mülheim,Turkish Restaurant,Snack Place,Mediterranean Restaurant,Italian Restaurant,German Restaurant,Asian Restaurant,Pizza Place,Bar,Indian Restaurant,Greek Restaurant
6,Köln-Nippes,Bar,Italian Restaurant,Restaurant,Cocktail Bar,French Restaurant,German Restaurant,Kebab Restaurant,Snack Place,Wine Bar,Sushi Restaurant
7,Köln-Porz,German Restaurant,Pizza Place,Italian Restaurant,Middle Eastern Restaurant,Seafood Restaurant,Greek Restaurant,French Restaurant,Turkish Restaurant,Modern European Restaurant,Eastern European Restaurant
8,Köln-Rodenkirchen,German Restaurant,Greek Restaurant,Italian Restaurant,Fast Food Restaurant,Restaurant,Scandinavian Restaurant,Seafood Restaurant,Asian Restaurant,Turkish Restaurant,Mexican Restaurant


Now, having this frequency data at hand we can start with the clustering with k=5. Naturally, we need to import <code>KMeans</code> from <code>sklearn</code>.

In [40]:
from sklearn.cluster import KMeans

k_clusters = 5

#drop the Neighbourhood column to work with numerical values only
cologne_k_clustering = cologne_grouped.drop('Neighbourhood', 1)

KM = KMeans(n_clusters=k_clusters, random_state=0)

In [41]:
KM.fit(cologne_k_clustering)
KM

KMeans(n_clusters=5, random_state=0)

In [42]:
KM.labels_[0:10]

array([3, 1, 1, 2, 0, 4, 1, 2, 0])

The result of the code above represents the label that was assigned to a neighbourhood regarding the cluster. Now we need to add these labels to the **neighbourhoods_venues_sorted** dataframe. We then would want a dataframe which will have as much information as possible. Basically, we will merge a few dataframes together.

In [43]:
#adding the labels to the top10 df
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', KM.labels_)

#creating a copy of df
cologne_final = df.copy()
cologne_final

Unnamed: 0,City district,City parts,Area,Population,Pop. density,District Councils,Latitude,Longitude
0,Köln-Innenstadt,"Altstadt-Nord, Altstadt-Süd, Deutz, Neustadt-Nord, Neustadt-Süd",16.4 km²,127.033,7.746/km²,"Bezirksksamt Innenstadt Brückenstraße 19, D-50667 Köln",50.937328,6.959234
1,Köln-Rodenkirchen,"Bayenthal, Godorf, Hahnwald, Immendorf, Marienburg, Meschenich, Raderberg, Raderthal, Rodenkirchen, Rondorf, Sürth, Weiß, Zollstock",54.6 km²,100.936,1.850/km²,"Bezirksamt Rodenkirchen Hauptstraße 85, D-50996 Köln",50.865622,6.969718
2,Köln-Lindenthal,"Braunsfeld, Junkersdorf, Klettenberg, Lindenthal, Lövenich, Müngersdorf, Sülz, Weiden, Widdersdorf",41.6 km²,137.552,3.308/km²,"Bezirksamt Lindenthal Aachener Straße 220, 50931 Köln",50.935935,6.871246
3,Köln-Ehrenfeld,"Bickendorf, Bocklemünd/Mengenich, Ehrenfeld, Neuehrenfeld, Ossendorf, Vogelsang",23.8 km²,103.621,4.348/km²,"Bezirksamt Ehrenfeld Venloer Straße 419 – 421, D-50825 Köln",50.951502,6.916529
4,Köln-Nippes,"Bilderstöckchen, Longerich, Mauenheim, Niehl, Nippes, Riehl, Weidenpesch",31.8 km²,110.092,3.462/km²,"Bezirksamt NippesNeusser Straße 450,D-50733 Köln",50.958994,6.941777
5,Köln-Chorweiler,"Blumenberg, Chorweiler, Esch/Auweiler, Fühlingen, Heimersdorf, Lindweiler, Merkenich, Pesch, Roggendorf/Thenhoven, Seeberg, Volkhoven/Weiler, Worringen",67.2 km²,80.87,1.204/km²,"Bezirksamt Chorweiler Pariser Platz 1, D-50765 Köln",51.021167,6.898034
6,Köln-Porz,"Eil, Elsdorf, Ensen, Finkenberg, Gremberghoven, Grengel, Langel, Libur, Lind, Poll, Porz, Urbach, Wahn, Wahnheide, Westhoven, Zündorf",78.8 km²,106.52,1.352/km²,"Bezirksamt PorzFriedrich-Ebert-Ufer 64–70, D-51143 Köln",50.906705,6.999129
7,Köln-Kalk,"Brück, Höhenberg, Humboldt/Gremberg, Kalk, Merheim, Neubrück, Ostheim, Rath/Heumar, Vingst",38.2 km²,108.33,2.841/km²,"Bezirksamt KalkKalker Hauptstraße 247–273,D-51103 Köln",50.931923,7.005806
8,Köln-Mülheim,"Buchforst, Buchheim, Dellbrück, Dünnwald, Flittard, Höhenhaus, Holweide, Mülheim, Stammheim",52.2 km²,144.374,2.764/km²,"Bezirksamt Mülheim Wiener Platz 2a,D-51065 Köln",50.958147,7.013526


In [47]:
cologne_final.rename(columns={'City district':'Neighbourhood'}, inplace=True)
cologne_ffinal = cologne_final.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')
cologne_ffinal

Unnamed: 0,Neighbourhood,City parts,Area,Population,Pop. density,District Councils,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Köln-Innenstadt,"Altstadt-Nord, Altstadt-Süd, Deutz, Neustadt-Nord, Neustadt-Süd",16.4 km²,127.033,7.746/km²,"Bezirksksamt Innenstadt Brückenstraße 19, D-50667 Köln",50.937328,6.959234,1,Italian Restaurant,Cocktail Bar,French Restaurant,Sushi Restaurant,Israeli Restaurant,Snack Place,Restaurant,Pizza Place,Modern European Restaurant,Middle Eastern Restaurant
1,Köln-Rodenkirchen,"Bayenthal, Godorf, Hahnwald, Immendorf, Marienburg, Meschenich, Raderberg, Raderthal, Rodenkirchen, Rondorf, Sürth, Weiß, Zollstock",54.6 km²,100.936,1.850/km²,"Bezirksamt Rodenkirchen Hauptstraße 85, D-50996 Köln",50.865622,6.969718,0,German Restaurant,Greek Restaurant,Italian Restaurant,Fast Food Restaurant,Restaurant,Scandinavian Restaurant,Seafood Restaurant,Asian Restaurant,Turkish Restaurant,Mexican Restaurant
2,Köln-Lindenthal,"Braunsfeld, Junkersdorf, Klettenberg, Lindenthal, Lövenich, Müngersdorf, Sülz, Weiden, Widdersdorf",41.6 km²,137.552,3.308/km²,"Bezirksamt Lindenthal Aachener Straße 220, 50931 Köln",50.935935,6.871246,0,German Restaurant,Bar,Greek Restaurant,Restaurant,Sushi Restaurant,Italian Restaurant,Tapas Restaurant,Mexican Restaurant,Sports Bar,Spanish Restaurant
3,Köln-Ehrenfeld,"Bickendorf, Bocklemünd/Mengenich, Ehrenfeld, Neuehrenfeld, Ossendorf, Vogelsang",23.8 km²,103.621,4.348/km²,"Bezirksamt Ehrenfeld Venloer Straße 419 – 421, D-50825 Köln",50.951502,6.916529,1,Italian Restaurant,Bar,Cocktail Bar,Tapas Restaurant,German Restaurant,Restaurant,Sushi Restaurant,Korean Restaurant,Vietnamese Restaurant,Turkish Restaurant
4,Köln-Nippes,"Bilderstöckchen, Longerich, Mauenheim, Niehl, Nippes, Riehl, Weidenpesch",31.8 km²,110.092,3.462/km²,"Bezirksamt NippesNeusser Straße 450,D-50733 Köln",50.958994,6.941777,1,Bar,Italian Restaurant,Restaurant,Cocktail Bar,French Restaurant,German Restaurant,Kebab Restaurant,Snack Place,Wine Bar,Sushi Restaurant
5,Köln-Chorweiler,"Blumenberg, Chorweiler, Esch/Auweiler, Fühlingen, Heimersdorf, Lindweiler, Merkenich, Pesch, Roggendorf/Thenhoven, Seeberg, Volkhoven/Weiler, Worringen",67.2 km²,80.87,1.204/km²,"Bezirksamt Chorweiler Pariser Platz 1, D-50765 Köln",51.021167,6.898034,3,Italian Restaurant,German Restaurant,Restaurant,Sushi Restaurant,Fast Food Restaurant,Hookah Bar,Snack Place,Pizza Place,Scandinavian Restaurant,Schnitzel Restaurant
6,Köln-Porz,"Eil, Elsdorf, Ensen, Finkenberg, Gremberghoven, Grengel, Langel, Libur, Lind, Poll, Porz, Urbach, Wahn, Wahnheide, Westhoven, Zündorf",78.8 km²,106.52,1.352/km²,"Bezirksamt PorzFriedrich-Ebert-Ufer 64–70, D-51143 Köln",50.906705,6.999129,2,German Restaurant,Pizza Place,Italian Restaurant,Middle Eastern Restaurant,Seafood Restaurant,Greek Restaurant,French Restaurant,Turkish Restaurant,Modern European Restaurant,Eastern European Restaurant
7,Köln-Kalk,"Brück, Höhenberg, Humboldt/Gremberg, Kalk, Merheim, Neubrück, Ostheim, Rath/Heumar, Vingst",38.2 km²,108.33,2.841/km²,"Bezirksamt KalkKalker Hauptstraße 247–273,D-51103 Köln",50.931923,7.005806,2,Italian Restaurant,French Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Turkish Restaurant,German Restaurant,Pizza Place,Asian Restaurant,Spanish Restaurant
8,Köln-Mülheim,"Buchforst, Buchheim, Dellbrück, Dünnwald, Flittard, Höhenhaus, Holweide, Mülheim, Stammheim",52.2 km²,144.374,2.764/km²,"Bezirksamt Mülheim Wiener Platz 2a,D-51065 Köln",50.958147,7.013526,4,Turkish Restaurant,Snack Place,Mediterranean Restaurant,Italian Restaurant,German Restaurant,Asian Restaurant,Pizza Place,Bar,Indian Restaurant,Greek Restaurant


In [48]:
#check for null values
cologne_ffinal[cologne_ffinal['Cluster Labels'].isnull()]

Unnamed: 0,Neighbourhood,City parts,Area,Population,Pop. density,District Councils,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


All right, no NaN values found, we are cleared to go to plotting! We would need to import a few things from <code>matplotlib</code>. We can also refer to the lab to get help with the plotting.

In [49]:
import matplotlib.cm as cm
import matplotlib.colors as colors

#creating a map with folium
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k_clusters)
ys = [i + x + (i*x)**2 for i in range(k_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(cologne_ffinal['Latitude'], cologne_ffinal['Longitude'], cologne_ffinal['Neighbourhood'], cologne_ffinal['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster) +1) + '\n' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)]
        ).add_to(map_clusters)
        
map_clusters

<br>
Now we can examine each cluster to get more insight and compare it to the map

### Cluster 1

In [50]:
cologne_ffinal.loc[cologne_ffinal['Cluster Labels'] == 0, cologne_ffinal.columns[[1] + list(range(5, cologne_ffinal.shape[1]))]]

Unnamed: 0,City parts,District Councils,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Bayenthal, Godorf, Hahnwald, Immendorf, Marienburg, Meschenich, Raderberg, Raderthal, Rodenkirchen, Rondorf, Sürth, Weiß, Zollstock","Bezirksamt Rodenkirchen Hauptstraße 85, D-50996 Köln",50.865622,6.969718,0,German Restaurant,Greek Restaurant,Italian Restaurant,Fast Food Restaurant,Restaurant,Scandinavian Restaurant,Seafood Restaurant,Asian Restaurant,Turkish Restaurant,Mexican Restaurant
2,"Braunsfeld, Junkersdorf, Klettenberg, Lindenthal, Lövenich, Müngersdorf, Sülz, Weiden, Widdersdorf","Bezirksamt Lindenthal Aachener Straße 220, 50931 Köln",50.935935,6.871246,0,German Restaurant,Bar,Greek Restaurant,Restaurant,Sushi Restaurant,Italian Restaurant,Tapas Restaurant,Mexican Restaurant,Sports Bar,Spanish Restaurant


Apparently, in Cluster 1 dominating cuisines are German and Greek, as well as, some occasional Bars.

### Cluster 2

In [51]:
cologne_ffinal.loc[cologne_ffinal['Cluster Labels'] == 1, cologne_ffinal.columns[[1] + list(range(5, cologne_ffinal.shape[1]))]]

Unnamed: 0,City parts,District Councils,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Altstadt-Nord, Altstadt-Süd, Deutz, Neustadt-Nord, Neustadt-Süd","Bezirksksamt Innenstadt Brückenstraße 19, D-50667 Köln",50.937328,6.959234,1,Italian Restaurant,Cocktail Bar,French Restaurant,Sushi Restaurant,Israeli Restaurant,Snack Place,Restaurant,Pizza Place,Modern European Restaurant,Middle Eastern Restaurant
3,"Bickendorf, Bocklemünd/Mengenich, Ehrenfeld, Neuehrenfeld, Ossendorf, Vogelsang","Bezirksamt Ehrenfeld Venloer Straße 419 – 421, D-50825 Köln",50.951502,6.916529,1,Italian Restaurant,Bar,Cocktail Bar,Tapas Restaurant,German Restaurant,Restaurant,Sushi Restaurant,Korean Restaurant,Vietnamese Restaurant,Turkish Restaurant
4,"Bilderstöckchen, Longerich, Mauenheim, Niehl, Nippes, Riehl, Weidenpesch","Bezirksamt NippesNeusser Straße 450,D-50733 Köln",50.958994,6.941777,1,Bar,Italian Restaurant,Restaurant,Cocktail Bar,French Restaurant,German Restaurant,Kebab Restaurant,Snack Place,Wine Bar,Sushi Restaurant


This cluster can be described as Italian + Bars.

### Cluster 3

In [52]:
cologne_ffinal.loc[cologne_ffinal['Cluster Labels'] == 2, cologne_ffinal.columns[[1] + list(range(5, cologne_ffinal.shape[1]))]]

Unnamed: 0,City parts,District Councils,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,"Eil, Elsdorf, Ensen, Finkenberg, Gremberghoven, Grengel, Langel, Libur, Lind, Poll, Porz, Urbach, Wahn, Wahnheide, Westhoven, Zündorf","Bezirksamt PorzFriedrich-Ebert-Ufer 64–70, D-51143 Köln",50.906705,6.999129,2,German Restaurant,Pizza Place,Italian Restaurant,Middle Eastern Restaurant,Seafood Restaurant,Greek Restaurant,French Restaurant,Turkish Restaurant,Modern European Restaurant,Eastern European Restaurant
7,"Brück, Höhenberg, Humboldt/Gremberg, Kalk, Merheim, Neubrück, Ostheim, Rath/Heumar, Vingst","Bezirksamt KalkKalker Hauptstraße 247–273,D-51103 Köln",50.931923,7.005806,2,Italian Restaurant,French Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Turkish Restaurant,German Restaurant,Pizza Place,Asian Restaurant,Spanish Restaurant


The taste of Europe and Middle East.

### Cluster 4

In [53]:
cologne_ffinal.loc[cologne_ffinal['Cluster Labels'] == 3, cologne_ffinal.columns[[1] + list(range(5, cologne_ffinal.shape[1]))]]

Unnamed: 0,City parts,District Councils,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,"Blumenberg, Chorweiler, Esch/Auweiler, Fühlingen, Heimersdorf, Lindweiler, Merkenich, Pesch, Roggendorf/Thenhoven, Seeberg, Volkhoven/Weiler, Worringen","Bezirksamt Chorweiler Pariser Platz 1, D-50765 Köln",51.021167,6.898034,3,Italian Restaurant,German Restaurant,Restaurant,Sushi Restaurant,Fast Food Restaurant,Hookah Bar,Snack Place,Pizza Place,Scandinavian Restaurant,Schnitzel Restaurant


Italia + Germany. Italian cuisine is really popular as it seems.

### Cluster 5

In [54]:
cologne_ffinal.loc[cologne_ffinal['Cluster Labels'] == 4, cologne_ffinal.columns[[1] + list(range(5, cologne_ffinal.shape[1]))]]

Unnamed: 0,City parts,District Councils,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,"Buchforst, Buchheim, Dellbrück, Dünnwald, Flittard, Höhenhaus, Holweide, Mülheim, Stammheim","Bezirksamt Mülheim Wiener Platz 2a,D-51065 Köln",50.958147,7.013526,4,Turkish Restaurant,Snack Place,Mediterranean Restaurant,Italian Restaurant,German Restaurant,Asian Restaurant,Pizza Place,Bar,Indian Restaurant,Greek Restaurant


The cluster of Turkish and Mediterranean cuisine.