<h1 align=center><font size = 5>Segmenting and Clustering Districts in Berlin</font></h1>

## Introduction/Business Problem

I live in Berlin (Germany) and there is a very hard competition in the gastronomy market. Specially, during the COVID-19 lockdown restaurants with a delivery service have an advantage and can make a good stroke of business.
So I want to know where and what kind of restaurant someone should open now.

## description of the data and how it will be used to solve the problem

Therefore, I want to see where is a low density of gastronomy businesses or is there a missing kind of restaurant.
I oriented myself to the New York City Lab, nay it was the base of my notebook. First, I tried to use some data by the Berlin government and even found some of gastronomies who reported to the government, if they deliver food or if customers can pickup ordered food. The data is in german language, but I translated the necessary information into English. You will see, that I worked with the data for a while. I early recognized that just a few venues have a filled 'neighborhood'-field. So I thought i could just work with the postal codes instead, but after I even created a map with different marker types, I had to realize that not even all venues had a defined postal code.
After that I wanted to use the foursquare API again. Additionally, the geocoder method always just return none to me (even when I use the suggested loop). Hence, I manually searched the latitudes and longitudes of the berlin districts (focused on the central districts). With this data I did the same work like in the NYC Lab except that i added different radius to look for venues, because of the different sizes of the berlin districts.

Finally, I used the k-means algorithm to cluster the districts. You will see the result at the end of the notebook.


## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Download and Explore Dataset</a>

2. <a href="#item2">Explore Districts in Berlin</a>

3. <a href="#item3">Analyze Each District</a>

4. <a href="#item4">Cluster Districts</a>

5. <a href="#item5">Examine Clusters</a>    
</font>
</div>

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [48]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


<a id='item1'></a>

## 1. Download and Explore Dataset

In [51]:
!wget -q -O 'delivery_berlin.json' https://www.berlin.de/sen/web/service/liefer-und-abholdienste/index.php/index/all.gjson?q=
print('Data downloaded!')

Data downloaded!


#### Load and explore the data

Next, let's load the data.

In [52]:
with open('delivery_berlin.json') as json_data:
    delivery_berlin = json.load(json_data)

Let's take a quick look at the data.

In [53]:
delivery_berlin

{'type': 'FeatureCollection',
 'features': [{'type': 'Feature',
   'geometry': {'type': 'Point', 'coordinates': [13.38212, 52.53116]},
   'properties': {'title': '771',
    'href': '262',
    'description': ' <a href="262">Mehr...</a>',
    'id': '/sen/web/service/liefer-und-abholdienste/index.php/detail/771',
    'data': {'id': '771',
     'unique_id': '262',
     'name': 'Brasserie la bonne franquette',
     'strasse_nr': 'Chausseestraße 110',
     'plz': '10115',
     'art': 'Gastronomie (Café, Restaurant, Imbiss, Lebensmittelhandlung, usw.)',
     'angebot': 'Klassische Französische Küche',
     'lieferung': 'FALSCH',
     'beschreibung_lieferangebot': '',
     'selbstabholung': 'WAHR',
     'angebot_selbstabholung': 'Bestellungen werden jederzeit entgegengenommen. Abholung (später auch Lieferung) immer dienstags bis samstags in der Zeit von 17 bis 21 Uhr',
     'fon': '+493094405363',
     'w3': 'https://labonnefranquette.de',
     'mail': 'essen@labonnefranquette.de',
     'monta

Notice how all the relevant data is in the *features* key, which is basically a list of the venues. So, let's define a new variable that includes this data.

In [54]:
delivery_berlin = delivery_berlin['features']

Let's take a look at the first two item in this list.

In [55]:
delivery_berlin[0:2]

[{'type': 'Feature',
  'geometry': {'type': 'Point', 'coordinates': [13.38212, 52.53116]},
  'properties': {'title': '771',
   'href': '262',
   'description': ' <a href="262">Mehr...</a>',
   'id': '/sen/web/service/liefer-und-abholdienste/index.php/detail/771',
   'data': {'id': '771',
    'unique_id': '262',
    'name': 'Brasserie la bonne franquette',
    'strasse_nr': 'Chausseestraße 110',
    'plz': '10115',
    'art': 'Gastronomie (Café, Restaurant, Imbiss, Lebensmittelhandlung, usw.)',
    'angebot': 'Klassische Französische Küche',
    'lieferung': 'FALSCH',
    'beschreibung_lieferangebot': '',
    'selbstabholung': 'WAHR',
    'angebot_selbstabholung': 'Bestellungen werden jederzeit entgegengenommen. Abholung (später auch Lieferung) immer dienstags bis samstags in der Zeit von 17 bis 21 Uhr',
    'fon': '+493094405363',
    'w3': 'https://labonnefranquette.de',
    'mail': 'essen@labonnefranquette.de',
    'montag': '',
    'dienstag': '17:00-21:00',
    'mittwoch': '17:00-2

#### Tranform the data into a *pandas* dataframe

The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe.

In [56]:
# define the dataframe columns
column_names = ['Name','Business_type','Speciality','Delivery','Pickup','Postal_code','Latitude','Longitude'] 

# instantiate the dataframe
berlin_delivery_df = pd.DataFrame(columns=column_names)

Take a look at the empty dataframe to confirm that the columns are as intended.

In [57]:
berlin_delivery_df

Unnamed: 0,Name,Business_type,Speciality,Delivery,Pickup,Postal_code,Latitude,Longitude


Then let's loop through the data and fill the dataframe one row at a time.

In [58]:
for data in delivery_berlin:
    Name = data['properties']['data']['name']
    Business_type = data['properties']['data']['art']
    Speciality = data['properties']['data']['angebot']
    Delivery = data['properties']['data']['lieferung']
    Pickup = data['properties']['data']['selbstabholung']
    Postal_code = data['properties']['data']['plz']
    LatLon = data['geometry']['coordinates']
    Latitude = LatLon[1]
    Longitude = LatLon[0]
        
    berlin_delivery_df = berlin_delivery_df.append({'Name':Name,
                                                    'Business_type':Business_type,
                                                    'Speciality':Speciality,
                                                    'Delivery':Delivery,
                                                    'Pickup':Pickup,
                                                    'Postal_code':Postal_code,
                                                    'Latitude':Latitude,
                                                    'Longitude':Longitude}, ignore_index=True)

Quickly examine the resulting dataframe.

In [59]:
berlin_delivery_df.head()

Unnamed: 0,Name,Business_type,Speciality,Delivery,Pickup,Postal_code,Latitude,Longitude
0,Brasserie la bonne franquette,"Gastronomie (Café, Restaurant, Imbiss, Lebensm...",Klassische Französische Küche,FALSCH,WAHR,10115,52.53116,13.38212
1,Büroshop Koschel,Bürobedarf,Schreib- und Spielwarenladen,WAHR,FALSCH,10115,52.53228,13.39631
2,Alpenstück,"Gastronomie (Café, Restaurant, Imbiss, Lebensm...",Süddeutsche Spezialitäten wie Maultaschen Köni...,FALSCH,WAHR,10115,52.53033,13.39184
3,sagrantino 136,"Gastronomie (Café, Restaurant, Imbiss, Lebensm...",Italian fusion kitchen 3 -course menus by Mat...,FALSCH,WAHR,10115,52.52633,13.38897
4,Risorante Bonfini,"Gastronomie (Café, Restaurant, Imbiss, Lebensm...","Italienische Küche, Pasta und Pizza",FALSCH,WAHR,10115,52.52953,13.3848


And have a look how many venues there are in the dataframe.

In [60]:
berlin_delivery_df.shape

(1228, 8)

In row 1 we see the business type 'Bürobedarf' what means office supplies in English.
We are just interested in the restaurants/gastronomies. So let's have a look at the different business types. 

In [61]:
berlin_delivery_df['Business_type'].value_counts()

Gastronomie (Café, Restaurant, Imbiss, Lebensmittelhandlung, usw.)    733
Anderes (bitte bei der nächsten Frage beschreiben)                    254
Mode / Bekleidung                                                      61
Buchhandlung                                                           32
Gesundheit                                                             30
Sportwaren (inkl. Fahrradgeschäfte)                                    27
Möbel                                                                  26
Getränkemarkt                                                          20
Blumenladen                                                            15
Bürobedarf                                                             15
Haushalt                                                                9
Baumarkt                                                                6
Name: Business_type, dtype: int64

just keep the gastronomy businesses

In [62]:
berlin_delivery_df = berlin_delivery_df.loc[berlin_delivery_df['Business_type'] == 'Gastronomie (Café, Restaurant, Imbiss, Lebensmittelhandlung, usw.)']

let's check the shape again if it worked

In [63]:
berlin_delivery_df.shape

(733, 8)

we sucessfully just have the gastronomies left in our df.
after that we don't need the business type columnn anymore

In [64]:
berlin_delivery_df = berlin_delivery_df.drop(columns='Business_type')

let's check the shape again if it worked

In [65]:
berlin_delivery_df.shape

(733, 7)

there are businesses whose values for Delivery AND Pickup are False(FALSCH) -> drop them by just keeping when at least one of both values is True(WAHR) 

In [66]:
berlin_delivery_df = berlin_delivery_df[(berlin_delivery_df['Delivery'] == 'WAHR') | (berlin_delivery_df['Pickup'] == 'WAHR')]

let's check the shape again if it worked

In [67]:
berlin_delivery_df.shape

(720, 7)

after we cleaned our data we can reset the index

In [68]:
berlin_delivery_df = berlin_delivery_df.reset_index(drop=True)

and have a look into the dataframe

In [69]:
berlin_delivery_df.head()

Unnamed: 0,Name,Speciality,Delivery,Pickup,Postal_code,Latitude,Longitude
0,Brasserie la bonne franquette,Klassische Französische Küche,FALSCH,WAHR,10115,52.53116,13.38212
1,Alpenstück,Süddeutsche Spezialitäten wie Maultaschen Köni...,FALSCH,WAHR,10115,52.53033,13.39184
2,sagrantino 136,Italian fusion kitchen 3 -course menus by Mat...,FALSCH,WAHR,10115,52.52633,13.38897
3,Risorante Bonfini,"Italienische Küche, Pasta und Pizza",FALSCH,WAHR,10115,52.52953,13.3848
4,CAFE RIBO,"Maultaschen, schwäbische küche, cafe",FALSCH,WAHR,10115,52.53089,13.39654


Like I wrote in the beginning Geocoder doesn't work for me. So I got the coordinates of Berlin from the URL in google maps. 

In [70]:
lat_berlin = 52.5201158
lon_berlin = 13.2790854
print('The geograpical coordinate of Berlin are {}, {}.'.format(lat_berlin, lon_berlin))

The geograpical coordinate of Berlin are 52.5201158, 13.2790854.


#### Create a map of Berlin with classified venues as markers.

In [None]:
# create map of Berlin using latitude and longitude values
map_berlin = folium.Map(location=[lat_berlin, lon_berlin], zoom_start=10)

# add markers to map
for lat, lng, name, goods, pc, d, p in zip(berlin_delivery_df['Latitude'], berlin_delivery_df['Longitude'], berlin_delivery_df['Name'], berlin_delivery_df['Speciality'], berlin_delivery_df['Postal_code'], berlin_delivery_df['Delivery'], berlin_delivery_df['Pickup']):
    if d == 'WAHR':
        if  p == 'WAHR':
            label = '{}, {}, {}'.format(name, goods, pc)
            label = folium.Popup(label, parse_html=True)
            folium.CircleMarker(
                [lat, lng],
                radius=5,
                popup=label,
                color = 'green',
                fill=True,
                fill_color='lime',
                fill_opacity=0.7,
                parse_html=False).add_to(map_berlin) #add a green marker when the business delivers AND customers can pickup
        else:
            label = '{}, {}'.format(name, goods)
            label = folium.Popup(label, parse_html=True)
            folium.CircleMarker(
                [lat, lng],
                radius=5,
                popup=label,
                color = 'olive',
                fill=True,
                fill_color='yellow',
                fill_opacity=0.7,
                parse_html=False).add_to(map_berlin)  #add a yellow marker when the business delivers but customers cannot pickup
    else:
        label = '{}, {}'.format(name, goods)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color = 'maroon',
            fill=True,
            fill_color='red',
            fill_opacity=0.7,
            parse_html=False).add_to(map_berlin)  #add a red marker when the business doesn't deliver but customers can pickup
map_berlin

I could have worked with this data even further, but how you can see there are some missing information. Furthermore, there are also a lot of venues missing, because the data is just reported voluntarily. For that reasons I decided to work with the powerful foursuare API again.  

#### Define Foursquare Credentials and Version

In [6]:
CLIENT_ID = 'XYQOYS4IBK2455WALEDYOFG50DM0N1V4QX0DTDMC0CWJRGFM' # your Foursquare ID
CLIENT_SECRET = 'MPDR2J3BFMTZO253OX4NYIE0JDZ0JKRPXGMW1B51FMWHDZDC' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XYQOYS4IBK2455WALEDYOFG50DM0N1V4QX0DTDMC0CWJRGFM
CLIENT_SECRET:MPDR2J3BFMTZO253OX4NYIE0JDZ0JKRPXGMW1B51FMWHDZDC


#### because the geocoder doesn't work I created a district-dataframe in the oldschool way

In [7]:
district_data = {'District':['Mitte', 'Tiergarten', 'Hansaviertel', 'Moabit', 'Charlottenburg', 'Halensee', 'Wilmersdorf', 'Schöneberg', 'Tempelhof', 'Neukölln', 'Kreuzberg', 'Alt-Treptow', 'Friedrichshain', 'Prenzlauer Berg', 'Gesundbrunnen', 'Wedding' ],
                 'Latitude':[52.5222952, 52.5108066, 52.5182467, 52.5246541, 52.5172167, 52.4957193, 52.4852483, 52.4799155, 52.4651934, 52.4773654, 52.4966363, 52.4925908, 52.5085849, 52.5392985, 52.5504451, 52.5501229],
                 'Longitude':[13.362587, 13.3366072, 13.3335393, 13.3221234, 13.3454206, 13.2754826, 13.2894013, 13.3213381, 13.3564437, 13.4072598, 13.3758184, 13.3994455, 13.4207506, 13.3995093, 13.3496643, 13.3022043],
                 'Radius':[2000, 1500, 500, 1500, 3000, 500, 2000, 2500, 2000, 2000, 2000, 500, 2000, 2000, 1500, 1500]}
districts_df = pd.DataFrame(district_data)


In [8]:
districts_df

Unnamed: 0,District,Latitude,Longitude,Radius
0,Mitte,52.522295,13.362587,2000
1,Tiergarten,52.510807,13.336607,1500
2,Hansaviertel,52.518247,13.333539,500
3,Moabit,52.524654,13.322123,1500
4,Charlottenburg,52.517217,13.345421,3000
5,Halensee,52.495719,13.275483,500
6,Wilmersdorf,52.485248,13.289401,2000
7,Schöneberg,52.479915,13.321338,2500
8,Tempelhof,52.465193,13.356444,2000
9,Neukölln,52.477365,13.40726,2000


#### Now, let's get the top 100 venues that are in Mitte within a radius of 2000 meters.

In [10]:
district_latitude = districts_df.loc[0, 'Latitude'] 
district_longitude = districts_df.loc[0, 'Longitude'] 
district_radius = districts_df.loc[0, 'Radius']
district_name = districts_df.loc[0, 'District'] 

First, let's create the GET request URL.

In [11]:
# type your answer here
LIMIT = 100 # limit of number of venues returned by Foursquare API

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    district_latitude,
    district_longitude,
    district_radius, 
    LIMIT)
url 

'https://api.foursquare.com/v2/venues/explore?&client_id=XYQOYS4IBK2455WALEDYOFG50DM0N1V4QX0DTDMC0CWJRGFM&client_secret=MPDR2J3BFMTZO253OX4NYIE0JDZ0JKRPXGMW1B51FMWHDZDC&v=20180605&ll=52.5222952,13.362587&radius=2000&limit=100'

Send the GET request and examine the resutls

In [12]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5eb161590cc1fd001bac72f2'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Mitte',
  'headerFullLocation': 'Mitte, Berlin',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 232,
  'suggestedBounds': {'ne': {'lat': 52.54029521800002,
    'lng': 13.39211503201418},
   'sw': {'lat': 52.504295181999986, 'lng': 13.333058967985819}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4adcda84f964a520d04821e3',
       'name': 'Paris-Moskau',
       'location': {'address': 'Alt-Moabit 141',
        'lat': 52.522695363826976,
        'lng': 13.364400102023664,
        'labeledLatLngs': [{'label': 'display',
          'lat': 52.5226953638269

From the Foursquare lab, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [13]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [14]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Paris-Moskau,French Restaurant,52.522695,13.3644
1,Haus der Kulturen der Welt,Performing Arts Venue,52.518597,13.364739
2,vabali spa,Spa,52.527603,13.360555
3,Steigenberger Spa,Spa,52.523587,13.368029
4,HKW Auditorium,Concert Hall,52.518522,13.36478


And how many venues were returned by Foursquare?

In [15]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


In [16]:
nearby_venues

Unnamed: 0,name,categories,lat,lng
0,Paris-Moskau,French Restaurant,52.522695,13.3644
1,Haus der Kulturen der Welt,Performing Arts Venue,52.518597,13.364739
2,vabali spa,Spa,52.527603,13.360555
3,Steigenberger Spa,Spa,52.523587,13.368029
4,HKW Auditorium,Concert Hall,52.518522,13.36478
5,Steigenberger Hotel Am Kanzleramt,Hotel,52.52361,13.368045
6,Zollpackhof,Beer Garden,52.521223,13.367035
7,Tipi am Kanzleramt,Comedy Club,52.518247,13.367187
8,Calumet Photographic,Camera Store,52.523183,13.366734
9,Tiergarten,Park,52.514628,13.357208


<a id='item2'></a>

## 2. Explore the other districts in Berlin

#### Let's create a function to repeat the same process to all the districts in Berlin

In [20]:
def getNearbyVenues(names, latitudes, longitudes, radius):
    
    venues_list=[]
    for name, lat, lng, rad in zip(names, latitudes, longitudes, radius):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            rad, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *manhattan_venues*.

In [21]:
# type your answer here

berlin_venues = getNearbyVenues(names=districts_df['District'],
                                latitudes=districts_df['Latitude'],
                                longitudes=districts_df['Longitude'],
                                radius=districts_df['Radius']
                                  )



Mitte
Tiergarten
Hansaviertel
Moabit
Charlottenburg
Halensee
Wilmersdorf
Schöneberg
Tempelhof
Neukölln
Kreuzberg
Alt-Treptow
Friedrichshain
Prenzlauer Berg
Gesundbrunnen
Wedding


#### Let's check the size of the resulting dataframe

In [22]:
print(berlin_venues.shape)
berlin_venues.head()

(1328, 7)


Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Mitte,52.522295,13.362587,Paris-Moskau,52.522695,13.3644,French Restaurant
1,Mitte,52.522295,13.362587,Haus der Kulturen der Welt,52.518597,13.364739,Performing Arts Venue
2,Mitte,52.522295,13.362587,vabali spa,52.527603,13.360555,Spa
3,Mitte,52.522295,13.362587,Steigenberger Spa,52.523587,13.368029,Spa
4,Mitte,52.522295,13.362587,HKW Auditorium,52.518522,13.36478,Concert Hall


Let's check how many venues were returned for each district

In [23]:
berlin_venues.groupby('District').count()

Unnamed: 0_level_0,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alt-Treptow,56,56,56,56,56,56
Charlottenburg,100,100,100,100,100,100
Friedrichshain,100,100,100,100,100,100
Gesundbrunnen,100,100,100,100,100,100
Halensee,6,6,6,6,6,6
Hansaviertel,26,26,26,26,26,26
Kreuzberg,100,100,100,100,100,100
Mitte,100,100,100,100,100,100
Moabit,100,100,100,100,100,100
Neukölln,100,100,100,100,100,100


#### Let's find out how many unique categories can be curated from all the returned venues

In [24]:
print('There are {} uniques categories.'.format(len(berlin_venues['Venue Category'].unique())))

There are 242 uniques categories.


<a id='item3'></a>

## 3. Analyze Each District

to make the data processable we do one hot encoding

In [27]:
# one hot encoding
berlin_onehot = pd.get_dummies(berlin_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
berlin_onehot['District'] = berlin_venues['District'] 

# move neighborhood column to the first column
fixed_columns = [berlin_onehot.columns[-1]] + list(berlin_onehot.columns[:-1])
berlin_onehot = berlin_onehot[fixed_columns]

berlin_onehot.head()

Unnamed: 0,District,African Restaurant,Airport Lounge,Airport Service,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Bavarian Restaurant,Beach,Beach Bar,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Bookstore,Bowling Alley,Brasserie,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Bus Stop,Business Service,Butcher,Café,Camera Store,Canal,Candy Store,Capitol Building,Caucasian Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Roaster,Coffee Shop,College Cafeteria,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Cultural Center,Currywurst Joint,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Duty-free Shop,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Fabric Shop,Fair,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,Football Stadium,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,Gastropub,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indoor Play Area,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Karaoke Bar,Kebab Restaurant,Korean Restaurant,Kumpir Restaurant,Lake,Laser Tag,Laundromat,Lebanese Restaurant,Light Rail Station,Liquor Store,Lounge,Market,Massage Studio,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nightclub,Noodle House,Optical Shop,Organic Grocery,Outdoor Sculpture,Palace,Palatine Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Planetarium,Playground,Plaza,Pool,Pool Hall,Portuguese Restaurant,Post Office,Pub,Racecourse,Record Shop,Recreation Center,Rental Car Location,Rest Area,Restaurant,River,Rock Climbing Spot,Rock Club,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Schnitzel Restaurant,Science Museum,Sculpture Garden,Seafood Restaurant,Shawarma Place,Shipping Store,Shopping Mall,Skate Park,Skating Rink,Snack Place,Soup Place,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Club,Steakhouse,Storage Facility,Street Food Gathering,Supermarket,Sushi Restaurant,Szechuan Restaurant,Tapas Restaurant,Taverna,Taxi Stand,Tea Room,Tennis Stadium,Thai Restaurant,Theater,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Trattoria/Osteria,Turkish Restaurant,Vacation Rental,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Volleyball Court,Waterfall,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yemeni Restaurant,Yoga Studio,Zoo,Zoo Exhibit
0,Mitte,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Mitte,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Mitte,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Mitte,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Mitte,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [28]:
berlin_onehot.shape

(1328, 243)

#### Next, let's group rows by district and by taking the mean of the frequency of occurrence of each category

In [29]:
berlin_grouped = berlin_onehot.groupby('District').mean().reset_index()
berlin_grouped

Unnamed: 0,District,African Restaurant,Airport Lounge,Airport Service,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Bavarian Restaurant,Beach,Beach Bar,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Bookstore,Bowling Alley,Brasserie,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Bus Stop,Business Service,Butcher,Café,Camera Store,Canal,Candy Store,Capitol Building,Caucasian Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Roaster,Coffee Shop,College Cafeteria,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Cultural Center,Currywurst Joint,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Duty-free Shop,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Fabric Shop,Fair,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,Football Stadium,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,Gastropub,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indoor Play Area,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Karaoke Bar,Kebab Restaurant,Korean Restaurant,Kumpir Restaurant,Lake,Laser Tag,Laundromat,Lebanese Restaurant,Light Rail Station,Liquor Store,Lounge,Market,Massage Studio,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nightclub,Noodle House,Optical Shop,Organic Grocery,Outdoor Sculpture,Palace,Palatine Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Planetarium,Playground,Plaza,Pool,Pool Hall,Portuguese Restaurant,Post Office,Pub,Racecourse,Record Shop,Recreation Center,Rental Car Location,Rest Area,Restaurant,River,Rock Climbing Spot,Rock Club,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Schnitzel Restaurant,Science Museum,Sculpture Garden,Seafood Restaurant,Shawarma Place,Shipping Store,Shopping Mall,Skate Park,Skating Rink,Snack Place,Soup Place,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Club,Steakhouse,Storage Facility,Street Food Gathering,Supermarket,Sushi Restaurant,Szechuan Restaurant,Tapas Restaurant,Taverna,Taxi Stand,Tea Room,Tennis Stadium,Thai Restaurant,Theater,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Trattoria/Osteria,Turkish Restaurant,Vacation Rental,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Volleyball Court,Waterfall,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yemeni Restaurant,Yoga Studio,Zoo,Zoo Exhibit
0,Alt-Treptow,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.017857,0.0,0.0,0.0,0.0,0.053571,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.035714,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.089286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.035714,0.0,0.089286,0.0,0.017857,0.0,0.017857,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.017857,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.053571,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.017857,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.017857,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.017857,0.0,0.017857,0.0,0.017857,0.0,0.0,0.0,0.0,0.017857,0.017857,0.0,0.0,0.0,0.0,0.0
1,Charlottenburg,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.12,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.05
2,Friedrichshain,0.01,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.08,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.06,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.02,0.01,0.02,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0
3,Gesundbrunnen,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.11,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.15,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
4,Halensee,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Hansaviertel,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.038462,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.153846,0.038462,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.038462,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Kreuzberg,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.04,0.01,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.06,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.04,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Mitte,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.11,0.01,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.04,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.03,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Moabit,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.01,0.0,0.01,0.0,0.0,0.06,0.01,0.04,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.02,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.05,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0
9,Neukölln,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.1,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.01,0.05,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.01


#### Let's confirm the new size

In [30]:
berlin_grouped.shape

(16, 243)

#### Let's print each neighborhood along with the top 5 most common venues

In [31]:
num_top_venues = 5

for district in berlin_grouped['District']:
    print("----"+district+"----")
    temp = berlin_grouped[berlin_grouped['District'] == district].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alt-Treptow----
                      venue  freq
0                      Café  0.09
1               Coffee Shop  0.09
2                    Bakery  0.05
3        Italian Restaurant  0.05
4  Mediterranean Restaurant  0.04


----Charlottenburg----
          venue  freq
0         Hotel  0.12
1          Café  0.06
2   Zoo Exhibit  0.05
3   Coffee Shop  0.04
4  Cocktail Bar  0.03


----Friedrichshain----
                venue  freq
0                 Bar  0.08
1         Coffee Shop  0.06
2                Café  0.05
3  Turkish Restaurant  0.05
4  Italian Restaurant  0.04


----Gesundbrunnen----
            venue  freq
0            Café  0.15
1             Bar  0.11
2            Park  0.04
3  Ice Cream Shop  0.03
4     Supermarket  0.03


----Halensee----
                venue  freq
0  Italian Restaurant  0.17
1     Automotive Shop  0.17
2  Light Rail Station  0.17
3       Historic Site  0.17
4                Lake  0.17


----Hansaviertel----
            venue  freq
0           Hotel  0.15


#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [32]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [38]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
district_venues_sorted_df = pd.DataFrame(columns=columns)
district_venues_sorted_df['District'] = berlin_grouped['District']

for ind in np.arange(berlin_grouped.shape[0]):
    district_venues_sorted_df.iloc[ind, 1:] = return_most_common_venues(berlin_grouped.iloc[ind, :], num_top_venues)

district_venues_sorted_df.head()

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alt-Treptow,Coffee Shop,Café,Italian Restaurant,Bakery,Cocktail Bar,Mediterranean Restaurant,Bookstore,Gift Shop,Kebab Restaurant,Breakfast Spot
1,Charlottenburg,Hotel,Café,Zoo Exhibit,Coffee Shop,Beer Garden,Art Museum,Concert Hall,Cocktail Bar,Monument / Landmark,Gym / Fitness Center
2,Friedrichshain,Bar,Coffee Shop,Café,Turkish Restaurant,Italian Restaurant,Bakery,Hotel,Art Gallery,Plaza,Organic Grocery
3,Gesundbrunnen,Café,Bar,Park,Coffee Shop,Ice Cream Shop,Supermarket,Drugstore,Turkish Restaurant,Chinese Restaurant,Fast Food Restaurant
4,Halensee,Historic Site,Italian Restaurant,Light Rail Station,Lake,Automotive Shop,Beach,Farmers Market,Falafel Restaurant,Fair,Fabric Shop


<a id='item4'></a>

## 4. Cluster Districts

Run *k*-means to cluster the neighborhood into 5 clusters.

In [39]:
# set number of clusters
kclusters = 5

berlin_grouped_clustering = berlin_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(berlin_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 1, 1, 4, 0, 1, 0, 1, 1], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each district.

In [40]:
# add clustering labels
district_venues_sorted_df.insert(0, 'Cluster Labels', kmeans.labels_)

berlin_merged = districts_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
berlin_merged = berlin_merged.join(district_venues_sorted_df.set_index('District'), on='District')

berlin_merged.head() # check the last columns!

Unnamed: 0,District,Latitude,Longitude,Radius,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Mitte,52.522295,13.362587,2000,0,Hotel,Monument / Landmark,Spa,Park,Science Museum,Art Museum,Plaza,Concert Hall,Indie Movie Theater,Bookstore
1,Tiergarten,52.510807,13.336607,1500,0,Zoo Exhibit,Hotel,Jazz Club,Monument / Landmark,Art Museum,Restaurant,Theater,German Restaurant,Furniture / Home Store,Fried Chicken Joint
2,Hansaviertel,52.518247,13.333539,500,0,Hotel,Hotel Bar,Gastropub,Pizza Place,Pub,Coffee Shop,Cocktail Bar,Clothing Store,Restaurant,River
3,Moabit,52.524654,13.322123,1500,1,Bakery,Hotel,Supermarket,Bar,Asian Restaurant,Coffee Shop,Italian Restaurant,Doner Restaurant,Café,Vietnamese Restaurant
4,Charlottenburg,52.517217,13.345421,3000,0,Hotel,Café,Zoo Exhibit,Coffee Shop,Beer Garden,Art Museum,Concert Hall,Cocktail Bar,Monument / Landmark,Gym / Fitness Center


Finally, let's visualize the resulting clusters

In [41]:
# create map
map_clusters = folium.Map(location=[lat_berlin, lon_berlin], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(berlin_merged['Latitude'], berlin_merged['Longitude'], berlin_merged['District'], berlin_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<a id='item5'></a>

## 5. Examine Clusters

Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. 

#### Cluster 1

In [43]:
berlin_merged.loc[berlin_merged['Cluster Labels'] == 0, berlin_merged.columns[[0] + list(range(5, berlin_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Mitte,Hotel,Monument / Landmark,Spa,Park,Science Museum,Art Museum,Plaza,Concert Hall,Indie Movie Theater,Bookstore
1,Tiergarten,Zoo Exhibit,Hotel,Jazz Club,Monument / Landmark,Art Museum,Restaurant,Theater,German Restaurant,Furniture / Home Store,Fried Chicken Joint
2,Hansaviertel,Hotel,Hotel Bar,Gastropub,Pizza Place,Pub,Coffee Shop,Cocktail Bar,Clothing Store,Restaurant,River
4,Charlottenburg,Hotel,Café,Zoo Exhibit,Coffee Shop,Beer Garden,Art Museum,Concert Hall,Cocktail Bar,Monument / Landmark,Gym / Fitness Center


#### Cluster 2

In [44]:
berlin_merged.loc[berlin_merged['Cluster Labels'] == 1, berlin_merged.columns[[0] + list(range(5, berlin_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Moabit,Bakery,Hotel,Supermarket,Bar,Asian Restaurant,Coffee Shop,Italian Restaurant,Doner Restaurant,Café,Vietnamese Restaurant
9,Neukölln,Bar,Café,Coffee Shop,Italian Restaurant,Restaurant,Pizza Place,Indie Movie Theater,Park,Music Venue,Vietnamese Restaurant
10,Kreuzberg,Italian Restaurant,Hotel,Korean Restaurant,Coffee Shop,Park,Ice Cream Shop,Café,Plaza,Cocktail Bar,Pastry Shop
11,Alt-Treptow,Coffee Shop,Café,Italian Restaurant,Bakery,Cocktail Bar,Mediterranean Restaurant,Bookstore,Gift Shop,Kebab Restaurant,Breakfast Spot
12,Friedrichshain,Bar,Coffee Shop,Café,Turkish Restaurant,Italian Restaurant,Bakery,Hotel,Art Gallery,Plaza,Organic Grocery
13,Prenzlauer Berg,Coffee Shop,Café,Bar,Italian Restaurant,Wine Bar,Ice Cream Shop,Vietnamese Restaurant,Plaza,Bakery,Seafood Restaurant
14,Gesundbrunnen,Café,Bar,Park,Coffee Shop,Ice Cream Shop,Supermarket,Drugstore,Turkish Restaurant,Chinese Restaurant,Fast Food Restaurant


#### Cluster 3

In [45]:
berlin_merged.loc[berlin_merged['Cluster Labels'] == 2, berlin_merged.columns[[0] + list(range(5, berlin_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,Wedding,Airport Service,Rental Car Location,Bus Stop,Airport Lounge,Brewery,Rock Climbing Spot,Taxi Stand,Lake,Storage Facility,Bookstore


#### Cluster 4

In [46]:
berlin_merged.loc[berlin_merged['Cluster Labels'] == 3, berlin_merged.columns[[0] + list(range(5, berlin_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Wilmersdorf,Hotel,Italian Restaurant,Supermarket,Plaza,Café,German Restaurant,Trattoria/Osteria,Ice Cream Shop,Drugstore,Chinese Restaurant
7,Schöneberg,Café,Bakery,Italian Restaurant,Supermarket,Park,Ice Cream Shop,Plaza,Bistro,Organic Grocery,Dessert Shop
8,Tempelhof,Café,Italian Restaurant,Park,Supermarket,Bistro,Ice Cream Shop,Korean Restaurant,Vietnamese Restaurant,Thai Restaurant,Gym / Fitness Center


#### Cluster 5

In [47]:
berlin_merged.loc[berlin_merged['Cluster Labels'] == 4, berlin_merged.columns[[0] + list(range(5, berlin_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Halensee,Historic Site,Italian Restaurant,Light Rail Station,Lake,Automotive Shop,Beach,Farmers Market,Falafel Restaurant,Fair,Fabric Shop


### So when I look in cluster 2 at my district Moabit there is still missing a vegan restaurant.

This notebook is based on a notebook created by [Alex Aklson](https://www.linkedin.com/in/aklson/) and [Polong Lin](https://www.linkedin.com/in/polonglin/).