# Segmenting and Clustering Neighborhoods in Berlin <br> Is Berlin still a divided City?

## Introduction

Berlin, the capital of Germany, celebrates it's 30th anniversary of reunification this year. After almost thirty years of division into a capitalistic western and a socialistic eastern part, the quarters and neighborhoods of Berlin had developed quite differently. In the years after the reunification, the convergence of the eastern and western parts of the city towards a more homogenous, united city, was one of the important political goals.

Therefore, now is a good moment to analyze the similarities and differences of Berlin's boroughs and neighborhoods. There are many approaches to do so. While it is relatively easy to compare figures such as the average income per person or the unemployment rate per borough (and in these terms, the differences between the eastern and western parts of the city are still obivious), it is a more difficult task to compare the "look and feel" of the boroughs and neighborhoods today.

## Data

My approach to do this will be by using the Foursquare API to find out about the most common venue categories in the respective boroughs and neighborhoods. This analysis, supported by the official latitude / longitude data of the Berlin districts, is supposed to lead into a k-means-clustering of the neighborhoods. Is there something like a "typical group" of western and eastern districts? Or will we see a mixed picture of similar boroughs and neighborhoods on both sides of the former Berlin wall? Finally, the answer to this question is going to be illustrated on a Folium map of Berlin.

## Methodology

Firstly, all the necessary dependencies will be downloaded.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


<a id='item1'></a>

### Download and Exploration of the Dataset

The Berlin district and neighborhood data is downloaded in the format of GeoJSON.

In [188]:
!wget -q -O 'berlin_data.json' https://tsb-opendata.s3.eu-central-1.amazonaws.com/ortsteile/lor_ortsteile.geojson
print('Data downloaded!')

Data downloaded!


#### Load and exploration of the data

Now, the downloaded file is opened and parsed.

In [2]:
with open('C:/Users/Ben/GMX MediaCenter/lor_ortsteile.json') as json_data:
    berlin_data = json.load(json_data)

All the relevant data is in the *features* key, which is basically a list of the neighborhoods. It is stored in the variable "neighborhoods_data".

In [4]:
neighborhoods_data = berlin_data['features']

#### Tranforming the data into a *pandas* dataframe

The next task is to transform this data of nested Python dictionaries into a *pandas* dataframe. Therefore, as a first step, an empty dataframe is prepared.

In [6]:
# Definition of the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# Instantiation of the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

The new, empty dataframe looks as follows:

In [7]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


For some unknown reason, the coordinate data in the GeoJSON file is nested at different levels (three or four levels deep). In order to find out the respective nesting level, the following function is defined:

In [8]:
def nest_level(obj):

    # Not a list? So the nest level will always be 0:
    if type(obj) != list:
        return 0

    # Now we're dealing only with list objects:

    max_level = 0
    for item in obj:
        # Getting recursively the level for each item in the list,
        # then updating the max found level:
        max_level = max(max_level, nest_level(item))

    # Adding 1, because 'obj' is a list (here is the recursion magic):
    return max_level + 1

The next step is to to loop through the data and fill the dataframe one row at a time.

In [9]:
if len(neighborhoods) > 0:
    neighborhoods.drop(neighborhoods.index, inplace=True)

for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['BEZIRK'] 
    neighborhood_name = data['properties']['OTEIL']
    
    neighborhood_latlon = data['geometry']['coordinates']
    
    neighborhood_lat = 0
    neighborhood_lon = 0
    
    #for data2 in neighborhood_latlon:
    if nest_level(neighborhood_latlon) == 3:
        neighborhood_lat = neighborhood_latlon[0][0][1]
        neighborhood_lon = neighborhood_latlon[0][0][0]
    
    if nest_level(neighborhood_latlon) == 4:
        neighborhood_lat = neighborhood_latlon[0][0][0][1]
        neighborhood_lon = neighborhood_latlon[0][0][0][0]
    
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat + 0.01,
                                          'Longitude': neighborhood_lon - 0.01}, ignore_index=True)

The previously prepared dataframe is now filled and looks as follows:

In [10]:
neighborhoods.head(10)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Mitte,Mitte,52.536962,13.40649
1,Mitte,Moabit,52.529735,13.328836
2,Mitte,Hansaviertel,52.525565,13.333217
3,Mitte,Tiergarten,52.508781,13.358794
4,Mitte,Wedding,52.548789,13.336565
5,Mitte,Gesundbrunnen,52.573386,13.384491
6,Friedrichshain-Kreuzberg,Friedrichshain,52.535546,13.409753
7,Friedrichshain-Kreuzberg,Kreuzberg,52.49961,13.429264
8,Pankow,Prenzlauer Berg,52.536962,13.40649
9,Pankow,WeiÃŸensee,52.548495,13.45731


The dimensions of the "neighborhoods" dataframe now correspond to the contained number of boroughs and neighborhoods:

In [11]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 12 boroughs and 96 neighborhoods.


#### Using the geopy library to get the coordinate values of Berlin.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>berlin_explorer</em>, as shown below.

In [12]:
address = 'Berlin, Germany'

geolocator = Nominatim(user_agent="berlin_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Berlin are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Berlin are 52.5170365, 13.3888599.


#### Creating a map of Berlin with neighborhoods superimposed on top.

In [13]:
# A map of Berlin is now created using the found latitude and longitude values
map_berlin = folium.Map(location=[latitude, longitude], zoom_start=10)

In [78]:
#Now, the neighborhoods are added to the map as markers
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_berlin)  
    
map_berlin

It has to be noted that the markers are in some cases not perfectly in the middle of the respective neighborhoods. This is because the coordinates were derived from given polygons which were supposed to represent the borders of those neighborhoods, not the centers. Nevertheless, for the current analysis, the precision should be high enough.

The next step , is to start utilizing the Foursquare API to explore the neighborhoods and segment them.

### Exploring the Berlin neighborhoods

#### First, the Foursquare Credentials and Version have to be defined.

In [79]:
CLIENT_ID = 'XXX' # your Foursquare ID
CLIENT_SECRET = 'XXX' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XXX
CLIENT_SECRET:XXX


Two more variables are necessary: The max. number of venues to be queried and the radius around the coordinate which shall be considered.

In [18]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius

#### As the request has to be done for all the neighborhoods, a function is defined for the re-occuring task:

In [19]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now, the function can be called:

In [20]:
# type your answer here

berlin_venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )



Mitte
Moabit
Hansaviertel
Tiergarten
Wedding
Gesundbrunnen
Friedrichshain
Kreuzberg
Prenzlauer Berg
WeiÃŸensee
Blankenburg
Heinersdorf
Karow
Stadtrandsiedlung Malchow
Pankow
Blankenfelde
Buch
FranzÃ¶sisch Buchholz
NiederschÃ¶nhausen
Rosenthal
Wilhelmsruh
Charlottenburg
Wilmersdorf
Schmargendorf
Grunewald
Westend
Charlottenburg-Nord
Halensee
Spandau
Haselhorst
Siemensstadt
Staaken
Gatow
Kladow
Hakenfelde
Falkenhagener Feld
Wilhelmstadt
Steglitz
Lichterfelde
Lankwitz
Zehlendorf
Dahlem
Nikolassee
Wannsee
SchÃ¶neberg
Friedenau
Tempelhof
Mariendorf
Marienfelde
Lichtenrade
NeukÃ¶lln
Britz
Buckow
Rudow
Gropiusstadt
Alt-Treptow
PlÃ¤nterwald
Baumschulenweg
Johannisthal
NiederschÃ¶neweide
Altglienicke
Adlershof
Bohnsdorf
OberschÃ¶neweide
KÃ¶penick
Friedrichshagen
Rahnsdorf
GrÃ¼nau
MÃ¼ggelheim
SchmÃ¶ckwitz
Marzahn
Biesdorf
Kaulsdorf
Mahlsdorf
Hellersdorf
Friedrichsfelde
Karlshorst
Lichtenberg
Falkenberg
Malchow
Wartenberg
Neu-HohenschÃ¶nhausen
Alt-HohenschÃ¶nhausen
Fennpfuhl
Rummelsburg
Reinicken

#### The resulting dataframe has 1.114 lines.

In [21]:
print(berlin_venues.shape)
berlin_venues.head()

(1114, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Mitte,52.536962,13.40649,Dock 11,52.536546,13.408585,Indie Theater
1,Mitte,52.536962,13.40649,rou,52.535233,13.407213,Vegetarian / Vegan Restaurant
2,Mitte,52.536962,13.40649,Modern Graphics,52.536796,13.407885,Comic Shop
3,Mitte,52.536962,13.40649,Weinerei Perlin,52.535863,13.405094,Wine Bar
4,Mitte,52.536962,13.40649,Café Morgenrot,52.537593,13.408662,Vegetarian / Vegan Restaurant


#### Now it is possible to check how many venues were returned for each neighborhood...

In [22]:
berlin_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adlershof,8,8,8,8,8,8
Alt-HohenschÃ¶nhausen,9,9,9,9,9,9
Alt-Treptow,40,40,40,40,40,40
Altglienicke,3,3,3,3,3,3
Baumschulenweg,4,4,4,4,4,4
Biesdorf,6,6,6,6,6,6
Blankenfelde,1,1,1,1,1,1
Bohnsdorf,5,5,5,5,5,5
Borsigwalde,6,6,6,6,6,6
Britz,8,8,8,8,8,8


#### ... and how many unique categories can be curated from all the returned venues:

In [23]:
print('There are {} unique categories.'.format(len(berlin_venues['Venue Category'].unique())))

There are 208 unique categories.


<a id='item3'></a>

### Analysis of each Neighborhood

For further analysis it is necessary to count how many venues of the respective categories were found for each neighborhood. Therefore, the technique of "one-hot-encoding" is used, giving every occurence of a venue of a certain category a 1, while a non-occurence gets a zero.

In [24]:
# one hot encoding
berlin_onehot = pd.get_dummies(berlin_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
berlin_onehot['Neighborhood'] = berlin_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [berlin_onehot.columns[-1]] + list(berlin_onehot.columns[:-1])
berlin_onehot = berlin_onehot[fixed_columns]

berlin_onehot.head()

Unnamed: 0,Zoo Exhibit,ATM,Adult Boutique,African Restaurant,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Auto Dealership,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Bavarian Restaurant,Beach,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Trail,Bistro,Boarding House,Boat or Ferry,Bowling Alley,Breakfast Spot,Brewery,Burger Joint,Bus Stop,Café,Camera Store,Campground,Canal Lock,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Cemetery,Cheese Shop,Chinese Restaurant,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Creperie,Currywurst Joint,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Distillery,Doner Restaurant,Drugstore,Eastern European Restaurant,Electronics Store,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Film Studio,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,German Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Harbor / Marina,Hardware Store,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Insurance Office,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Korean Restaurant,Kumpir Restaurant,Lake,Latin American Restaurant,Laundromat,Laundry Service,Lebanese Restaurant,Light Rail Station,Liquor Store,Lounge,Market,Martial Arts Dojo,Mediterranean Restaurant,Memorial Site,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Moroccan Restaurant,Motorcycle Shop,Mountain,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Neighborhood,Nightclub,Organic Grocery,Outdoor Sculpture,Paintball Field,Park,Pastry Shop,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Post Office,Pub,Racecourse,Racetrack,Ramen Restaurant,Record Shop,Restaurant,Rock Climbing Spot,Rock Club,Roof Deck,Sauna / Steam Room,Scandinavian Restaurant,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shawarma Place,Smoothie Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Stables,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Swiss Restaurant,Tapas Restaurant,Taverna,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park,Toy / Game Store,Trail,Train Station,Tram Station,Trattoria/Osteria,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Mitte,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Mitte,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Mitte,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Mitte,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Mitte,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0


The new dataframe size is the known line size times the number of categories:

In [25]:
berlin_onehot.shape

(1114, 208)

#### Next, the rows are grouped by neighborhood and by taking the mean of the frequency of occurrence of each category

In [26]:
berlin_grouped = berlin_onehot.groupby('Neighborhood').mean().reset_index()
berlin_grouped

Unnamed: 0,Neighborhood,Zoo Exhibit,ATM,Adult Boutique,African Restaurant,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Auto Dealership,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Bavarian Restaurant,Beach,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Trail,Bistro,Boarding House,Boat or Ferry,Bowling Alley,Breakfast Spot,Brewery,Burger Joint,Bus Stop,Café,Camera Store,Campground,Canal Lock,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Cemetery,Cheese Shop,Chinese Restaurant,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Creperie,Currywurst Joint,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Distillery,Doner Restaurant,Drugstore,Eastern European Restaurant,Electronics Store,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Film Studio,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,German Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Harbor / Marina,Hardware Store,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Insurance Office,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Korean Restaurant,Kumpir Restaurant,Lake,Latin American Restaurant,Laundromat,Laundry Service,Lebanese Restaurant,Light Rail Station,Liquor Store,Lounge,Market,Martial Arts Dojo,Mediterranean Restaurant,Memorial Site,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Moroccan Restaurant,Motorcycle Shop,Mountain,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nightclub,Organic Grocery,Outdoor Sculpture,Paintball Field,Park,Pastry Shop,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Post Office,Pub,Racecourse,Racetrack,Ramen Restaurant,Record Shop,Restaurant,Rock Climbing Spot,Rock Club,Roof Deck,Sauna / Steam Room,Scandinavian Restaurant,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shawarma Place,Smoothie Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Stables,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Swiss Restaurant,Tapas Restaurant,Taverna,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park,Toy / Game Store,Trail,Train Station,Tram Station,Trattoria/Osteria,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Yoga Studio
0,Adlershof,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Alt-HohenschÃ¶nhausen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.111111,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Alt-Treptow,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.025,0.0,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.05,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.1,0.0,0.0,0.025,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.025,0.025,0.0,0.025,0.0,0.0,0.025,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.025,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0
3,Altglienicke,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Baumschulenweg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Biesdorf,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Blankenfelde,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Bohnsdorf,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Borsigwalde,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Britz,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### The new size now is:

In [27]:
berlin_grouped.shape

(88, 208)

#### Just for reference, the neighborhoods along with the top 5 most common venues are printed

In [28]:
num_top_venues = 5

for hood in berlin_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = berlin_grouped[berlin_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adlershof----
                venue  freq
0    Greek Restaurant  0.25
1  Light Rail Station  0.12
2  Italian Restaurant  0.12
3         Supermarket  0.12
4   Trattoria/Osteria  0.12


----Alt-HohenschÃ¶nhausen----
            venue  freq
0       Drugstore  0.11
1     Supermarket  0.11
2  Hardware Store  0.11
3     Coffee Shop  0.11
4  Discount Store  0.11


----Alt-Treptow----
               venue  freq
0               Café  0.10
1  German Restaurant  0.05
2               Park  0.05
3                Bar  0.05
4    Organic Grocery  0.02


----Altglienicke----
                 venue  freq
0        Auto Workshop  0.33
1        Train Station  0.33
2      Harbor / Marina  0.33
3      Paintball Field  0.00
4  Moroccan Restaurant  0.00


----Baumschulenweg----
             venue  freq
0  Organic Grocery  0.25
1    Garden Center  0.25
2           Bakery  0.25
3             Café  0.25
4      Zoo Exhibit  0.00


----Biesdorf----
                venue  freq
0            Bus Stop  0.33
1      

                   venue  freq
0                   Café  0.09
1       Asian Restaurant  0.09
2     Italian Restaurant  0.06
3              Drugstore  0.06
4  Vietnamese Restaurant  0.06


----MÃ¤rkisches Viertel----
               venue  freq
0  Indian Restaurant  0.17
1    Auto Dealership  0.17
2   Doner Restaurant  0.17
3  Food & Drink Shop  0.17
4       Liquor Store  0.17


----MÃ¼ggelheim----
                 venue  freq
0                 Lake   1.0
1          Zoo Exhibit   0.0
2    Outdoor Sculpture   0.0
3  Moroccan Restaurant   0.0
4      Motorcycle Shop   0.0


----Neu-HohenschÃ¶nhausen----
                 venue  freq
0               Garden   1.0
1          Zoo Exhibit   0.0
2   Miscellaneous Shop   0.0
3  Moroccan Restaurant   0.0
4      Motorcycle Shop   0.0


----NeukÃ¶lln----
          venue  freq
0           Bar  0.10
1  Cocktail Bar  0.10
2          Café  0.10
3         Plaza  0.07
4          Park  0.07


----NiederschÃ¶neweide----
              venue  freq
0            

#### Transformation into a *pandas* dataframe

First, a function to sort the venues in descending order is defined:

In [29]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Second, a new dataframe is created that displays the top 10 venues for each neighborhood.

In [30]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = berlin_grouped['Neighborhood']

for ind in np.arange(berlin_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(berlin_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adlershof,Greek Restaurant,Italian Restaurant,Trattoria/Osteria,Supermarket,Bank,Light Rail Station,Drugstore,Electronics Store,Food & Drink Shop,Flower Shop
1,Alt-HohenschÃ¶nhausen,Tram Station,Coffee Shop,Pharmacy,Drugstore,Doner Restaurant,Discount Store,Hardware Store,Asian Restaurant,Supermarket,Fast Food Restaurant
2,Alt-Treptow,Café,German Restaurant,Park,Bar,Pizza Place,Beer Garden,Sporting Goods Shop,Speakeasy,Soup Place,Lounge
3,Altglienicke,Train Station,Harbor / Marina,Auto Workshop,Yoga Studio,Eastern European Restaurant,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fish Market
4,Baumschulenweg,Garden Center,Bakery,Café,Organic Grocery,Yoga Studio,Electronics Store,Food & Drink Shop,Flower Shop,Flea Market,Fish Market


<a id='item4'></a>

### Clustering Neighborhoods

Now the time has come to cluster the neighborhoods. Multiple tests with the *k*-means algorithm have shown that the number of 5 clusters yields the best results.

In [90]:
# set number of clusters
kclusters = 5

berlin_grouped_clustering = berlin_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(berlin_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([3, 3, 0, 0, 0, 0, 0, 3, 0, 3])

In [91]:
#For testing purposes, the cluser labels column sometimes had to be dropped again.
neighborhoods_venues_sorted.drop('Cluster Labels', axis = 1, inplace = True)

A new dataframe that includes the cluster as well as the top 10 venues for each neighborhood is created:

In [92]:
# add clustering labels
#neighborhoods_venues_sorted.drop('Cluster Labels', inplace = True)
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

berlin_merged = neighborhoods

berlin_merged = berlin_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
#berlin_merged['Cluster Labels'].fillna(4, inplace=True)
berlin_merged.dropna(inplace=True)
berlin_merged = berlin_merged.reset_index(drop=True)
berlin_merged.head(100)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Mitte,Mitte,52.536962,13.40649,0.0,Pizza Place,Wine Bar,Bar,Vegetarian / Vegan Restaurant,Italian Restaurant,Vietnamese Restaurant,Coffee Shop,Sushi Restaurant,Breakfast Spot,German Restaurant
1,Mitte,Moabit,52.529735,13.328836,0.0,Café,Asian Restaurant,Italian Restaurant,Lebanese Restaurant,Pizza Place,Vietnamese Restaurant,Drugstore,Turkish Restaurant,Burger Joint,Persian Restaurant
2,Mitte,Hansaviertel,52.525565,13.333217,0.0,Vietnamese Restaurant,Italian Restaurant,Indian Restaurant,Pizza Place,Asian Restaurant,Café,Lebanese Restaurant,Bakery,Fried Chicken Joint,German Restaurant
3,Mitte,Tiergarten,52.508781,13.358794,0.0,Hotel,Hotel Bar,Lounge,Hostel,German Restaurant,Deli / Bodega,Park,Museum,Scandinavian Restaurant,Sculpture Garden
4,Mitte,Wedding,52.548789,13.336565,0.0,Park,Tennis Stadium,Ice Cream Shop,Pharmacy,Soccer Field,Outdoor Sculpture,Yoga Studio,Flea Market,Fish Market,Film Studio
5,Mitte,Gesundbrunnen,52.573386,13.384491,3.0,Soccer Field,Supermarket,Yoga Studio,Eastern European Restaurant,Food & Drink Shop,Flower Shop,Flea Market,Fish Market,Film Studio,Fast Food Restaurant
6,Friedrichshain-Kreuzberg,Friedrichshain,52.535546,13.409753,0.0,Vegetarian / Vegan Restaurant,Pizza Place,Bar,Breakfast Spot,Vietnamese Restaurant,Plaza,Café,Gym / Fitness Center,Italian Restaurant,Beer Bar
7,Friedrichshain-Kreuzberg,Kreuzberg,52.49961,13.429264,0.0,Bar,Café,German Restaurant,Cocktail Bar,Turkish Restaurant,Italian Restaurant,Korean Restaurant,Coffee Shop,Pizza Place,African Restaurant
8,Pankow,Prenzlauer Berg,52.536962,13.40649,0.0,Pizza Place,Wine Bar,Bar,Vegetarian / Vegan Restaurant,Italian Restaurant,Vietnamese Restaurant,Coffee Shop,Sushi Restaurant,Breakfast Spot,German Restaurant
9,Pankow,WeiÃŸensee,52.548495,13.45731,3.0,Supermarket,Ice Cream Shop,Bakery,Asian Restaurant,Drugstore,Vietnamese Restaurant,Bus Stop,Bistro,Mexican Restaurant,German Restaurant


## Results

Finally, the resulting clusters can be visualized:

In [93]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(berlin_merged['Latitude'], berlin_merged['Longitude'], berlin_merged['Neighborhood'], berlin_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<a id='item5'></a>

### Examination of Clusters

Now it is time to take a closer look into the found clusters.



#### Cluster 1 - The "big" city center and some minor centers around it

In [73]:
berlin_merged.loc[berlin_merged['Cluster Labels'] == 0, berlin_merged.columns[[1] + list(range(5, berlin_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Mitte,Pizza Place,Wine Bar,Bar,Vegetarian / Vegan Restaurant,Italian Restaurant,Vietnamese Restaurant,Coffee Shop,Sushi Restaurant,Breakfast Spot,German Restaurant
1,Moabit,Café,Asian Restaurant,Italian Restaurant,Lebanese Restaurant,Pizza Place,Vietnamese Restaurant,Drugstore,Turkish Restaurant,Burger Joint,Persian Restaurant
2,Hansaviertel,Vietnamese Restaurant,Italian Restaurant,Indian Restaurant,Pizza Place,Asian Restaurant,Café,Lebanese Restaurant,Bakery,Fried Chicken Joint,German Restaurant
3,Tiergarten,Hotel,Hotel Bar,Lounge,Hostel,German Restaurant,Deli / Bodega,Park,Museum,Scandinavian Restaurant,Sculpture Garden
4,Wedding,Park,Tennis Stadium,Ice Cream Shop,Pharmacy,Soccer Field,Outdoor Sculpture,Yoga Studio,Flea Market,Fish Market,Film Studio
6,Friedrichshain,Vegetarian / Vegan Restaurant,Pizza Place,Bar,Breakfast Spot,Vietnamese Restaurant,Plaza,Café,Gym / Fitness Center,Italian Restaurant,Beer Bar
7,Kreuzberg,Bar,Café,German Restaurant,Cocktail Bar,Turkish Restaurant,Italian Restaurant,Korean Restaurant,Coffee Shop,Pizza Place,African Restaurant
8,Prenzlauer Berg,Pizza Place,Wine Bar,Bar,Vegetarian / Vegan Restaurant,Italian Restaurant,Vietnamese Restaurant,Coffee Shop,Sushi Restaurant,Breakfast Spot,German Restaurant
11,Pankow,Pool,Clothing Store,Bar,Hardware Store,Yoga Studio,Eastern European Restaurant,Food & Drink Shop,Flower Shop,Flea Market,Fish Market
12,Blankenfelde,Stables,Fried Chicken Joint,French Restaurant,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fish Market,Film Studio,Fast Food Restaurant


#### Cluster 2 - "villages" on the outer city borders

In [74]:
berlin_merged.loc[berlin_merged['Cluster Labels'] == 1, berlin_merged.columns[[1] + list(range(5, berlin_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,Rosenthal,Garden Center,Bus Stop,Automotive Shop,Yoga Studio,Falafel Restaurant,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fish Market
16,Wilhelmsruh,Garden Center,Bus Stop,Automotive Shop,Yoga Studio,Falafel Restaurant,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fish Market
27,Staaken,Bus Stop,Chinese Restaurant,Drugstore,Yoga Studio,Electronics Store,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fish Market
28,Gatow,Bus Stop,Yoga Studio,Coffee Shop,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fish Market,Film Studio,Fast Food Restaurant


#### Cluster 3 - an outlier?

In [75]:
berlin_merged.loc[berlin_merged['Cluster Labels'] == 2, berlin_merged.columns[[1] + list(range(5, berlin_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
72,Malchow,Athletics & Sports,Yoga Studio,Eastern European Restaurant,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fish Market,Film Studio,Fast Food Restaurant


#### Cluster 4 - a place to live - not for party

In [76]:
berlin_merged.loc[berlin_merged['Cluster Labels'] == 3, berlin_merged.columns[[1] + list(range(5, berlin_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Gesundbrunnen,Soccer Field,Supermarket,Yoga Studio,Eastern European Restaurant,Food & Drink Shop,Flower Shop,Flea Market,Fish Market,Film Studio,Fast Food Restaurant
9,WeiÃŸensee,Supermarket,Ice Cream Shop,Bakery,Asian Restaurant,Drugstore,Vietnamese Restaurant,Bus Stop,Bistro,Mexican Restaurant,German Restaurant
10,Karow,Supermarket,Bakery,Greek Restaurant,Yoga Studio,Fried Chicken Joint,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fish Market
13,FranzÃ¶sisch Buchholz,Supermarket,Tram Station,Drugstore,Flower Shop,Gas Station,Organic Grocery,Eastern European Restaurant,Food & Drink Shop,Flea Market,Fish Market
22,Charlottenburg-Nord,Supermarket,Italian Restaurant,Indian Restaurant,Thai Restaurant,Hotel,Ice Cream Shop,Metro Station,Park,Post Office,Chinese Restaurant
31,Falkenhagener Feld,Drugstore,Supermarket,Soccer Field,Snack Place,Lake,Eastern European Restaurant,Flower Shop,Flea Market,Fish Market,Film Studio
32,Wilhelmstadt,Harbor / Marina,Boat or Ferry,Supermarket,Bus Stop,Yoga Studio,Electronics Store,Food & Drink Shop,Flower Shop,Flea Market,Fish Market
33,Steglitz,Gym,Sports Club,Supermarket,Bakery,Roof Deck,Yoga Studio,Electronics Store,Food & Drink Shop,Flower Shop,Flea Market
36,Zehlendorf,Supermarket,Bar,Italian Restaurant,Gym / Fitness Center,Organic Grocery,Bus Stop,Taverna,Yoga Studio,Electronics Store,Flower Shop
38,Nikolassee,Supermarket,Bar,Italian Restaurant,Gym / Fitness Center,Organic Grocery,Bus Stop,Taverna,Yoga Studio,Electronics Store,Flower Shop


#### Cluster 5 - socialistic mass-housing

In [97]:
berlin_merged.loc[berlin_merged['Cluster Labels'] == 4, berlin_merged.columns[[1] + list(range(5, berlin_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
64,Marzahn,Garden,Yoga Studio,Fried Chicken Joint,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fish Market,Film Studio,Fast Food Restaurant
73,Neu-HohenschÃ¶nhausen,Garden,Yoga Studio,Fried Chicken Joint,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fish Market,Film Studio,Fast Food Restaurant


## Discussion

The results of the k-means clustering are to some extent surprising. Although the venue types that were found do not seem all too different to me, the algorithm managed to find clusters that make sense - mostly. I decided to name the clusters as follows:

#### Cluster 1 - The "big" city center and some minor centers around it

This is, where the life is - the world-famous Berlin center with all the places and venues that attract tourists as well as people who live there. These neighborhoods are strongly concentrated in the city center, but, as Berlin is a city of many centers, also occur in other places around the inner city. However, it is worth to note, that these "red dots" are less frequent not only the further one gets away from the middle of the city, but also the further one gets to the east. In the north-east there are none of them at all, which could be taken as a hint that this is not the most attractive region of the city.

#### Cluster 2 - "villages" on the outer city borders

This little number of neighborhoods were originally villages which were, at some point, politically integrated into the city of Berlin. Nevertheless, they seem to have preserved their village-like character to some extent, at least enabling the venue-comparing algorithm to identify them as a group that's slightly different from the rest.

#### Cluster 3 - an outlier?

This one-neighborhood cluster has no obvious justification of existence to me. Nevertheless, reducing the number of clusters led to completely different results, so I decided to stick to k=5.

#### Cluster 4 - a place to live - not for party

This second really big cluster represents to me neighborhoods where usually high numbers of people live. These places are domniated by venues that are usually needed and used primarily by the inhabitants of the neighborhoods (like supermarkets, traffic infrastructure, local restaurants, etc.) and do not aim too much at tourists as customers. The neighborhoods of cluster 4 are never in the city center but can be found in the sub-urban areas in all cardinal directions. Additionally it can be said that the eastern part of the city has a higher number of cluster 4 neighborhoods than the west. 

#### Cluster 5 - socialistic mass-housing

The two Neighborhoods contained in this cluster represent areas where there are many examples of former socialistic mass-housing, meaning buildings that provided (and still provide) an affordable place to live for many people on limited space.
To me, it is not quite clear why exactly these two neighborhoods stand out from the rest so much that they "deserve" their own cluster, but for the fundamental questions of this analysis, this doesn't seem two important. Just as with cluster 3, I will accept it without further examination.

## Conclusion

The inital question, that is still waiting to be answered, was: Is Berlin still a devided city?
I think, that from what can be seen in this analysis, there is no absolute yes or no as an answer.

Yes, Berlin is devided in a sense, that the hip and world-famous parts in the center of the city stand in contrast to less touristically attractive neighborhoods in the suburban areas, which are home to a high percentage of the Berlin inhabitants.

Yes, Berlin is also devided in a way, that some of the eastern parts of the city still can not offer the same level of quality of life as many western parts can - at least from a venues point of view.

But:
No, the Berlin wall is no longer clearly visible on the map we created here. Berlins attractive center today consists of former eastern and western parts and there are attractive and rather boring neighborhoods on both sides of the former Berlin wall. Thirty years of (at least partial and politically enforced) convergence in a re-united city have clearly left their footprint, especially in the cityscape. The factors that are still dividing the city usually lie deeper and consist of less wealth and income in some of the eastern parts, a higher dependency on social transfers, less well-paid jobs, etc., just to mention a few. But those kinds of statistics were not in the scope of this analysis.

This notebook was created based on an example notebook which is part of a course on **Coursera** called *Applied Data Science Capstone*. The course can be found online by clicking [here](http://cocl.us/DP0701EN_Coursera_Week3_LAB2).

<hr>

Copyright &copy; 2018 [Cognitive Class](https://cognitiveclass.ai/?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).