# Welcome to week 5  of the Coursera Data Science Capstone Project!


## 1. Introduction

The aim of this notebook is to define the code relevant to find a suitable neighborhood for a restaurant serving sustainable meat choices in Berlin.
The data used are coming from Wikipedia, OpenStreetMap and Foursquare API.
The Machine Learning algorithm to be used is k-means clustering.

## 2. Web scraping to obtain a dataframe with all neighbourhoods of Berlin


First we are going to download the relevant libraries.

In [None]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!pip install geopy  
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!pip install folium 
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


Let's scrape the neighborhoods from Wikipedia.


In [None]:
df=pd.read_html("https://de.wikipedia.org/wiki/Verwaltungsgliederung_Berlins")[2]
df

Unnamed: 0,Nr.,Ortsteil,Bezirk,Fläche(km²),Einwohner[2](30. Juni 2019),Einwohnerpro km²
0,101,Mitte,Mitte,1070,101.932,9526.0
1,102,Moabit,Mitte,772,79.512,10.299
2,103,Hansaviertel,Mitte,53,5.894,11.121
3,104,Tiergarten,Mitte,517,14.753,2854.0
4,105,Wedding,Mitte,923,86.688,9392.0
5,106,Gesundbrunnen,Mitte,613,95.393,15.562
6,201,Friedrichshain,Friedrichshain-Kreuzberg,978,134.9,13.793
7,202,Kreuzberg,Friedrichshain-Kreuzberg,1040,154.862,14.891
8,301,Prenzlauer Berg,Pankow,1100,164.593,14.963
9,302,Weißensee,Pankow,793,53.737,6776.0


Now this dataframe needs some cleaning before we can add location data.

In [None]:
cols = [0,3,4,5]
df.drop(df.columns[cols],axis=1, inplace=True)
df.head()

Unnamed: 0,Ortsteil,Bezirk
0,Mitte,Mitte
1,Moabit,Mitte
2,Hansaviertel,Mitte
3,Tiergarten,Mitte
4,Wedding,Mitte


There we have the clean dataframe with the Ortsteile and corresponding Bezirk which is the same as Neighborhood and borough.


In [None]:
df.rename(columns={"Ortsteil":"Neighborhood", "Bezirk":"borough"}, inplace=True)
df.head()

Unnamed: 0,Neighborhood,borough
0,Mitte,Mitte
1,Moabit,Mitte
2,Hansaviertel,Mitte
3,Tiergarten,Mitte
4,Wedding,Mitte


In [None]:
df.shape

(96, 2)

In [None]:
Berlin_neighbourhoods=df
Berlin_neighbourhoods

Unnamed: 0,Neighborhood,borough
0,Mitte,Mitte
1,Moabit,Mitte
2,Hansaviertel,Mitte
3,Tiergarten,Mitte
4,Wedding,Mitte
5,Gesundbrunnen,Mitte
6,Friedrichshain,Friedrichshain-Kreuzberg
7,Kreuzberg,Friedrichshain-Kreuzberg
8,Prenzlauer Berg,Pankow
9,Weißensee,Pankow


In [None]:
Berlin_neighbourhoods.to_excel('Berlin_neighborhoods.xlsx')

There we have 96 neighborhoods in Berlin.

 For the second part of the job let's get the location data from the geocoding API.


In [None]:

# Add Latitude and Longitude to the DataFrame

# In order to search for Borough Coordinates
geolocator = Nominatim(user_agent="Berlin_Agent", timeout=15)

# Create empty lists for lat, lng values
lat = []
lng = []
Berlin_location=[]

# Add Latitude and Longitude values of each Borough to the DataFrame
for neighbourhood in Berlin_neighbourhoods.itertuples():
    # Set index
    index = int(neighbourhood.Index)
    
    try:
        # Get address and save it, use Borough name as well instead of neighbourhood only
        Berlin_location = geolocator.geocode('{},{}, Berlin'.format(Berlin_neighbourhoods.at[index, 'Neighborhood'],
                                                                    Berlin_neighbourhoods.at[index, 'borough']))
    except: 
        print('This generally occurs due to a timeout error from geolocator side, try again.')
        
    
    # Insert new data
    lat.insert(index, Berlin_location.latitude)
    lng.insert(index, Berlin_location.longitude)

# Add New columns with extracted values
Berlin_neighbourhoods['Latitude'] = lat
Berlin_neighbourhoods['Longitude'] = lng

# Examine the data
Berlin_neighbourhoods.head()

Unnamed: 0,Neighborhood,borough,Latitude,Longitude
0,Mitte,Mitte,52.51769,13.402376
1,Moabit,Mitte,52.530102,13.342542
2,Hansaviertel,Mitte,52.519123,13.341872
3,Tiergarten,Mitte,52.509778,13.35726
4,Wedding,Mitte,52.550123,13.34197


Let's get the coordinates for Berlin and create a folium map.

In [None]:
address = 'Berlin, Germany'

geolocator = Nominatim(user_agent="Berlin_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Berlin are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Berlin are 52.5170365, 13.3888599.


In [None]:
# create map of Berlin using latitude and longitude values
map_berlin = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighbourhood in zip(Berlin_neighbourhoods['Latitude'], Berlin_neighbourhoods['Longitude'], Berlin_neighbourhoods['borough'], Berlin_neighbourhoods['Neighborhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_berlin)  
    
map_berlin

In [None]:
map_berlin.save('map_clusters.html')

## 3. Retrieving the venue data from the Foursquare API

Let's start using the Foursquare API and add my credentials.

In [None]:
CLIENT_ID = 'UH5FRRUJB5GMBXGQMHBDXH4M4304RBY4PNCXFQ5CDS5VKFBR' # your Foursquare ID
CLIENT_SECRET = 'GGH3UZDSKMPSUQRBSS2HXCL0TBBFIZQCOCWRWVH5GB14FVO1' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: UH5FRRUJB5GMBXGQMHBDXH4M4304RBY4PNCXFQ5CDS5VKFBR
CLIENT_SECRET:GGH3UZDSKMPSUQRBSS2HXCL0TBBFIZQCOCWRWVH5GB14FVO1


Select the first neighborhood from our Berlin_neighbourhoods and check it out.

In [None]:
Berlin_neighbourhoods.loc[0, 'Neighborhood']

'Mitte'

In [None]:
neighbourhood_latitude = Berlin_neighbourhoods.loc[0, 'Latitude'] # neighborhood latitude value
neighbourhood_longitude = Berlin_neighbourhoods.loc[0, 'Longitude'] # neighborhood longitude value

neighbourhood_name = Berlin_neighbourhoods.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighbourhood_name, 
                                                               neighbourhood_latitude, 
                                                               neighbourhood_longitude))

Latitude and longitude values of Mitte are 52.5176896, 13.4023757.


Top 5 Foursquare venues for the first neighborhood.

In [None]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

 # create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, neighbourhood_latitude, neighbourhood_longitude, radius, LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=UH5FRRUJB5GMBXGQMHBDXH4M4304RBY4PNCXFQ5CDS5VKFBR&client_secret=GGH3UZDSKMPSUQRBSS2HXCL0TBBFIZQCOCWRWVH5GB14FVO1&v=20180605&ll=52.5176896,13.4023757&radius=500&limit=100'

In [None]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f2f00d817a0975c8c56f538'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4adcda7cf964a5205f4721e3-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/garden_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d15a941735',
         'name': 'Garden',
         'pluralName': 'Gardens',
         'primary': True,
         'shortName': 'Garden'}],
       'id': '4adcda7cf964a5205f4721e3',
       'location': {'address': 'Am Lustgarten',
        'cc': 'DE',
        'city': 'Berlin',
        'country': 'Deutschland',
        'crossStreet': 'Schloßplatz',
        'distance': 216,
        'formattedAddress': ['Am Lustgarten (Schloßplatz)',
         '10178 Berlin',
         'Deutschland'],
        'labeledLatLngs': [{'label'

Now we don't need all these infos so we extract just the categories of the venues.

In [None]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  from ipykernel import kernelapp as app


Unnamed: 0,name,categories,lat,lng
0,Lustgarten,Garden,52.518469,13.399454
1,Kuppelumgang Berliner Dom,Scenic Lookout,52.518966,13.400981
2,Radisson Blu,Hotel,52.519561,13.402857
3,"Bronzestatue ""Heiliger St. Georg im Kampf mit ...",Outdoor Sculpture,52.51629,13.405558
4,Designpanoptikum - surreales Museum für indust...,Museum,52.516941,13.406072


In [None]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

49 venues were returned by Foursquare.


There are 49 venues to be discovered in that Neighborhood called Mitte. Let's create a dataframe with all venues in Berlin.

In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
#All neighborhoods in Berlin
berlin_venues = getNearbyVenues(names=Berlin_neighbourhoods['Neighborhood'],
                                   latitudes=Berlin_neighbourhoods['Latitude'],
                                   longitudes=Berlin_neighbourhoods['Longitude']
                                  )

Mitte
Moabit
Hansaviertel
Tiergarten
Wedding
Gesundbrunnen
Friedrichshain
Kreuzberg
Prenzlauer Berg
Weißensee
Blankenburg
Heinersdorf
Karow
Stadtrandsiedlung Malchow
Pankow
Blankenfelde
Buch
Französisch Buchholz
Niederschönhausen
Rosenthal
Wilhelmsruh
Charlottenburg
Wilmersdorf
Schmargendorf
Grunewald
Westend
Charlottenburg-Nord
Halensee
Spandau
Haselhorst
Siemensstadt
Staaken
Gatow
Kladow
Hakenfelde
Falkenhagener Feld
Wilhelmstadt
Steglitz
Lichterfelde
Lankwitz
Zehlendorf
Dahlem
Nikolassee
Wannsee
Schöneberg
Friedenau
Tempelhof
Mariendorf
Marienfelde
Lichtenrade
Neukölln
Britz
Buckow
Rudow
Gropiusstadt
Alt-Treptow
Plänterwald
Baumschulenweg
Johannisthal
Niederschöneweide
Altglienicke
Adlershof
Bohnsdorf
Oberschöneweide
Köpenick
Friedrichshagen
Rahnsdorf
Grünau
Müggelheim
Schmöckwitz
Marzahn
Biesdorf
Kaulsdorf
Mahlsdorf
Hellersdorf
Friedrichsfelde
Karlshorst
Lichtenberg
Falkenberg
Malchow
Wartenberg
Neu-Hohenschönhausen
Alt-Hohenschönhausen
Fennpfuhl
Rummelsburg
Reinickendorf
Tegel
Kon

In [None]:
print(berlin_venues.shape)
berlin_venues.head()

(1474, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Mitte,52.51769,13.402376,Lustgarten,52.518469,13.399454,Garden
1,Mitte,52.51769,13.402376,Kuppelumgang Berliner Dom,52.518966,13.400981,Scenic Lookout
2,Mitte,52.51769,13.402376,Radisson Blu,52.519561,13.402857,Hotel
3,Mitte,52.51769,13.402376,"Bronzestatue ""Heiliger St. Georg im Kampf mit ...",52.51629,13.405558,Outdoor Sculpture
4,Mitte,52.51769,13.402376,Designpanoptikum - surreales Museum für indust...,52.516941,13.406072,Museum


How many venues are there in each neighborhood using the groupby function?

In [None]:
berlin_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adlershof,10,10,10,10,10,10
Alt-Hohenschönhausen,10,10,10,10,10,10
Alt-Treptow,24,24,24,24,24,24
Baumschulenweg,6,6,6,6,6,6
Biesdorf,8,8,8,8,8,8
Blankenburg,4,4,4,4,4,4
Blankenfelde,3,3,3,3,3,3
Bohnsdorf,3,3,3,3,3,3
Borsigwalde,5,5,5,5,5,5
Britz,6,6,6,6,6,6


How many unique categories per neighborhood?

In [None]:
print('There are {} uniques categories.'.format(len(berlin_venues['Venue Category'].unique())))

There are 238 uniques categories.


Using the method one-hot-encoding, we aim to find out what venue categories are present in each neighborhood.

In [None]:
# one hot encoding
berlin_onehot = pd.get_dummies(berlin_venues['Venue Category'], prefix="", prefix_sep="")

# add neighborhood column back to dataframe

berlin_onehot['Neighborhood'] = berlin_venues['Neighborhood']

# move neighborhood column to the first column

berlin_onehot.set_index('Neighborhood', inplace=True)

In [None]:
berlin_onehot.head()

Unnamed: 0_level_0,Accessories Store,African Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Austrian Restaurant,Auto Dealership,Auto Garage,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Bavarian Restaurant,Beach,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Shop,Bistro,Board Shop,Boarding House,Boat or Ferry,Bookstore,Bowling Alley,Brasserie,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Bus Stop,Business Service,Butcher,Cafeteria,Café,Canal,Candy Store,Cheese Shop,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Roaster,Coffee Shop,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Credit Union,Creperie,Cupcake Shop,Currywurst Joint,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish Market,Fishing Store,Flower Shop,Food & Drink Shop,Food Court,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,German Restaurant,Gift Shop,Go Kart Track,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Harbor / Marina,Hardware Store,Health Food Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Insurance Office,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Karaoke Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Kurdish Restaurant,Lake,Lebanese Restaurant,Light Rail Station,Lighting Store,Liquor Store,Lounge,Martial Arts School,Mediterranean Restaurant,Memorial Site,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Motel,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nature Preserve,Nightclub,Opera House,Optical Shop,Organic Grocery,Outdoor Event Space,Outdoor Sculpture,Outdoor Supply Store,Pakistani Restaurant,Palace,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Perfume Shop,Pet Café,Pet Store,Pharmacy,Pide Place,Pizza Place,Planetarium,Platform,Playground,Plaza,Polish Restaurant,Pool,Post Office,Pub,Record Shop,Residential Building (Apartment / Condo),Restaurant,River,Rock Climbing Spot,Sandwich Place,Sauna / Steam Room,Scandinavian Restaurant,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Snack Place,Soccer Field,Soup Place,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Club,Stables,Stationery Store,Steakhouse,Street Art,Supermarket,Sushi Restaurant,Syrian Restaurant,Tanning Salon,Tapas Restaurant,Taverna,Tea Room,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Toy / Game Store,Trail,Tram Station,Trattoria/Osteria,Tree,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Water Park,Waterfront,Windmill,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo Exhibit
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1,Unnamed: 152_level_1,Unnamed: 153_level_1,Unnamed: 154_level_1,Unnamed: 155_level_1,Unnamed: 156_level_1,Unnamed: 157_level_1,Unnamed: 158_level_1,Unnamed: 159_level_1,Unnamed: 160_level_1,Unnamed: 161_level_1,Unnamed: 162_level_1,Unnamed: 163_level_1,Unnamed: 164_level_1,Unnamed: 165_level_1,Unnamed: 166_level_1,Unnamed: 167_level_1,Unnamed: 168_level_1,Unnamed: 169_level_1,Unnamed: 170_level_1,Unnamed: 171_level_1,Unnamed: 172_level_1,Unnamed: 173_level_1,Unnamed: 174_level_1,Unnamed: 175_level_1,Unnamed: 176_level_1,Unnamed: 177_level_1,Unnamed: 178_level_1,Unnamed: 179_level_1,Unnamed: 180_level_1,Unnamed: 181_level_1,Unnamed: 182_level_1,Unnamed: 183_level_1,Unnamed: 184_level_1,Unnamed: 185_level_1,Unnamed: 186_level_1,Unnamed: 187_level_1,Unnamed: 188_level_1,Unnamed: 189_level_1,Unnamed: 190_level_1,Unnamed: 191_level_1,Unnamed: 192_level_1,Unnamed: 193_level_1,Unnamed: 194_level_1,Unnamed: 195_level_1,Unnamed: 196_level_1,Unnamed: 197_level_1,Unnamed: 198_level_1,Unnamed: 199_level_1,Unnamed: 200_level_1,Unnamed: 201_level_1,Unnamed: 202_level_1,Unnamed: 203_level_1,Unnamed: 204_level_1,Unnamed: 205_level_1,Unnamed: 206_level_1,Unnamed: 207_level_1,Unnamed: 208_level_1,Unnamed: 209_level_1,Unnamed: 210_level_1,Unnamed: 211_level_1,Unnamed: 212_level_1,Unnamed: 213_level_1,Unnamed: 214_level_1,Unnamed: 215_level_1,Unnamed: 216_level_1,Unnamed: 217_level_1,Unnamed: 218_level_1,Unnamed: 219_level_1,Unnamed: 220_level_1,Unnamed: 221_level_1,Unnamed: 222_level_1,Unnamed: 223_level_1,Unnamed: 224_level_1,Unnamed: 225_level_1,Unnamed: 226_level_1,Unnamed: 227_level_1,Unnamed: 228_level_1,Unnamed: 229_level_1,Unnamed: 230_level_1,Unnamed: 231_level_1,Unnamed: 232_level_1,Unnamed: 233_level_1,Unnamed: 234_level_1,Unnamed: 235_level_1,Unnamed: 236_level_1,Unnamed: 237_level_1
Mitte,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Mitte,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Mitte,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Mitte,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Mitte,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Let's see how many venues we have in total per the 237 different ones using the shape function.

In [None]:
berlin_onehot.shape

(1474, 237)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category.

In [None]:
berlin_grouped = berlin_onehot.groupby('Neighborhood').mean().reset_index()
berlin_grouped

Unnamed: 0,Neighborhood,Accessories Store,African Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Austrian Restaurant,Auto Dealership,Auto Garage,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Bavarian Restaurant,Beach,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Shop,Bistro,Board Shop,Boarding House,Boat or Ferry,Bookstore,Bowling Alley,Brasserie,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Bus Stop,Business Service,Butcher,Cafeteria,Café,Canal,Candy Store,Cheese Shop,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Roaster,Coffee Shop,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Credit Union,Creperie,Cupcake Shop,Currywurst Joint,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish Market,Fishing Store,Flower Shop,Food & Drink Shop,Food Court,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,German Restaurant,Gift Shop,Go Kart Track,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Harbor / Marina,Hardware Store,Health Food Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Insurance Office,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Karaoke Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Kurdish Restaurant,Lake,Lebanese Restaurant,Light Rail Station,Lighting Store,Liquor Store,Lounge,Martial Arts School,Mediterranean Restaurant,Memorial Site,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Motel,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nature Preserve,Nightclub,Opera House,Optical Shop,Organic Grocery,Outdoor Event Space,Outdoor Sculpture,Outdoor Supply Store,Pakistani Restaurant,Palace,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Perfume Shop,Pet Café,Pet Store,Pharmacy,Pide Place,Pizza Place,Planetarium,Platform,Playground,Plaza,Polish Restaurant,Pool,Post Office,Pub,Record Shop,Residential Building (Apartment / Condo),Restaurant,River,Rock Climbing Spot,Sandwich Place,Sauna / Steam Room,Scandinavian Restaurant,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Snack Place,Soccer Field,Soup Place,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Club,Stables,Stationery Store,Steakhouse,Street Art,Supermarket,Sushi Restaurant,Syrian Restaurant,Tanning Salon,Tapas Restaurant,Taverna,Tea Room,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Toy / Game Store,Trail,Tram Station,Trattoria/Osteria,Tree,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Water Park,Waterfront,Windmill,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo Exhibit
0,Adlershof,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Alt-Hohenschönhausen,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Alt-Treptow,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Baumschulenweg,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Biesdorf,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Blankenburg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Blankenfelde,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Bohnsdorf,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Borsigwalde,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Britz,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Confirm new size.

In [None]:
berlin_grouped.shape

(93, 238)

Each neighborhood with top 5 most common venues.

In [None]:
num_top_venues = 5

for hood in berlin_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = berlin_grouped[berlin_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adlershof----
              venue  freq
0  Insurance Office   0.1
1              Bank   0.1
2  Greek Restaurant   0.1
3         Drugstore   0.1
4       Pizza Place   0.1


----Alt-Hohenschönhausen----
              venue  freq
0      Tram Station   0.2
1    Discount Store   0.2
2  Greek Restaurant   0.1
3       Post Office   0.1
4  Asian Restaurant   0.1


----Alt-Treptow----
           venue  freq
0           Café  0.08
1         Bakery  0.08
2       Platform  0.08
3  Garden Center  0.04
4       Bus Stop  0.04


----Baumschulenweg----
                venue  freq
0         Supermarket  0.33
1  Italian Restaurant  0.17
2            Bus Stop  0.17
3    Asian Restaurant  0.17
4           Drugstore  0.17


----Biesdorf----
           venue  freq
0           Park  0.12
1  Big Box Store  0.12
2   Liquor Store  0.12
3          Plaza  0.12
4         Bakery  0.12


----Blankenburg----
               venue  freq
0         Playground  0.25
1           Bus Stop  0.25
2   Greek Restaurant  0.25

Let's create a new dataframe with these data.

In [None]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Top 10 venues for each neighborhood.

In [None]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = berlin_grouped['Neighborhood']

for ind in np.arange(berlin_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(berlin_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adlershof,Insurance Office,Trattoria/Osteria,Pizza Place,Drugstore,Steakhouse,Tanning Salon,Greek Restaurant,Bank,Supermarket,Italian Restaurant
1,Alt-Hohenschönhausen,Discount Store,Tram Station,Post Office,Asian Restaurant,Greek Restaurant,Drugstore,Big Box Store,Indian Restaurant,Austrian Restaurant,Falafel Restaurant
2,Alt-Treptow,Café,Platform,Bakery,Mexican Restaurant,Nightclub,Tapas Restaurant,Garden Center,Sandwich Place,Electronics Store,Outdoor Sculpture
3,Baumschulenweg,Supermarket,Italian Restaurant,Drugstore,Asian Restaurant,Bus Stop,Flower Shop,Farmers Market,Fast Food Restaurant,Fish Market,Fishing Store
4,Biesdorf,Big Box Store,Liquor Store,Plaza,Bakery,Palace,Outdoor Event Space,Park,Light Rail Station,Farmers Market,Farm
5,Blankenburg,Playground,Bus Stop,Café,Greek Restaurant,Zoo Exhibit,Falafel Restaurant,Fountain,Food Court,Food & Drink Shop,Flower Shop
6,Blankenfelde,Stables,Miscellaneous Shop,Café,Zoo Exhibit,Fried Chicken Joint,Fountain,Food Court,Food & Drink Shop,Flower Shop,Fishing Store
7,Bohnsdorf,Park,Italian Restaurant,Flower Shop,Ethiopian Restaurant,French Restaurant,Fountain,Food Court,Food & Drink Shop,Fishing Store,Fish Market
8,Borsigwalde,Motorcycle Shop,Bakery,Italian Restaurant,Go Kart Track,Soccer Field,Fish Market,Farm,Farmers Market,Fast Food Restaurant,Fishing Store
9,Britz,History Museum,Bakery,Soccer Field,German Restaurant,Historic Site,Palace,Zoo Exhibit,Falafel Restaurant,Farm,Farmers Market


# 4. K-means clustering
with 5 clusters

In [None]:
# set number of clusters
kclusters = 5

berlin_grouped_clustering = berlin_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(berlin_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 4, 4, 0, 4, 4, 4, 4, 4, 4], dtype=int32)

After having applied the algorithm we create a new dataframe with the cluster per neighborhood and then analyse the different clusters.

In [None]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels 36', kmeans.labels_)



# merge the Berlin_neighbourhoods with neighborhoods_venues_sorted to add latitude/longitude for each neighborhood
Berlin_neighbourhoods = Berlin_neighbourhoods.merge(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

Berlin_neighbourhoods.head()

Unnamed: 0,Neighborhood,borough,Latitude,Longitude,Cluster Labels 36,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Mitte,Mitte,52.51769,13.402376,4,German Restaurant,Museum,History Museum,Art Gallery,Café,Hotel,Fountain,Art Museum,Concert Hall,Brewery
1,Moabit,Mitte,52.530102,13.342542,4,Café,Hostel,Doner Restaurant,Hotel,German Restaurant,Bar,Gym / Fitness Center,Burger Joint,Drugstore,Bakery
2,Hansaviertel,Mitte,52.519123,13.341872,4,Café,Art Museum,Bakery,Plaza,Bus Stop,Metro Station,Farmers Market,Sporting Goods Shop,Mediterranean Restaurant,Boat or Ferry
3,Tiergarten,Mitte,52.509778,13.35726,4,Lounge,Hotel Bar,Hotel,Memorial Site,Sculpture Garden,German Restaurant,Garden,Historic Site,Scandinavian Restaurant,Café
4,Wedding,Mitte,52.550123,13.34197,4,Bus Stop,Park,Café,Tram Station,Bakery,Tennis Court,Big Box Store,Gas Station,Organic Grocery,Supermarket


Let's create a map of the neighborhoods' clusters.


In [None]:
Berlin_neighbourhoods['Cluster Labels 36'].isna().sum()


## create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Berlin_neighbourhoods['Latitude'], Berlin_neighbourhoods['Longitude'], Berlin_neighbourhoods['Neighborhood'], Berlin_neighbourhoods['Cluster Labels 36']):
    label = folium.Popup(str(poi) + ' Cluster' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)],
        fill = True,
        fill_color = rainbow[int(cluster)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [None]:
map_clusters.save('map_clusters.html')


## Results

As clearly visible on the map, we see mostly green clusters. 

## Examine clusters

### Cluster 1

The outskirts of Berlin have a very basic set of venues and are more practical with supermarkets very often as the most common venue. So it is not recommended for our restaurant.

In [None]:
Berlin_neighbourhoods.loc[Berlin_neighbourhoods['Cluster Labels 36'] == 0, Berlin_neighbourhoods.columns[[0] + list(range(5, Berlin_neighbourhoods.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Heinersdorf,Tram Station,Fish Market,Supermarket,Chinese Restaurant,Music Store,Discount Store,Farm,Farmers Market,Fast Food Restaurant,Zoo Exhibit
17,Französisch Buchholz,Supermarket,Chinese Restaurant,Drugstore,Theme Park,Zoo Exhibit,Fast Food Restaurant,Farm,Farmers Market,Fish Market,Event Space
20,Wilhelmsruh,Bus Stop,Bakery,Supermarket,Asian Restaurant,Mexican Restaurant,Clothing Store,Lake,Pharmacy,Discount Store,Farmers Market
26,Charlottenburg-Nord,Supermarket,Metro Station,Post Office,Rock Climbing Spot,Soccer Field,Plaza,Hobby Shop,Flower Shop,Fish Market,Fast Food Restaurant
29,Haselhorst,Supermarket,Bus Stop,Park,Automotive Shop,Metro Station,Food Court,Food & Drink Shop,Fountain,Falafel Restaurant,Flower Shop
31,Staaken,Bus Stop,Restaurant,Eastern European Restaurant,Zoo Exhibit,Event Space,French Restaurant,Fountain,Food Court,Food & Drink Shop,Flower Shop
32,Gatow,Bus Stop,Supermarket,Trattoria/Osteria,Italian Restaurant,Hotel,Harbor / Marina,Fish Market,Farm,Farmers Market,Fast Food Restaurant
34,Falkenhagener Feld,Snack Place,Drugstore,Liquor Store,Supermarket,Zoo Exhibit,Falafel Restaurant,French Restaurant,Fountain,Food Court,Food & Drink Shop
35,Wilhelmstadt,Bus Stop,Harbor / Marina,Supermarket,Park,Bakery,Boat or Ferry,Sporting Goods Shop,Lake,Farmers Market,Fast Food Restaurant
41,Nikolassee,Supermarket,Plaza,Lake,Park,Trail,Farmers Market,Event Space,Falafel Restaurant,Farm,Fish Market


### Cluster 2
This cluster is very similar to the first one, but has a slightly higher choice of restaurants.

In [None]:
Berlin_neighbourhoods.loc[Berlin_neighbourhoods['Cluster Labels 36'] == 1, Berlin_neighbourhoods.columns[[0] + list(range(5, Berlin_neighbourhoods.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Karow,Supermarket,Restaurant,Bus Stop,Zoo Exhibit,Ethiopian Restaurant,Fountain,Food Court,Food & Drink Shop,Flower Shop,Fishing Store
19,Rosenthal,Tram Station,Supermarket,German Restaurant,Automotive Shop,Currywurst Joint,Event Space,French Restaurant,Fountain,Food Court,Food & Drink Shop
51,Buckow,Supermarket,Women's Store,Pizza Place,Zoo Exhibit,Fish Market,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flower Shop
74,Lichtenberg,Tram Station,Automotive Shop,Gym / Fitness Center,Bowling Alley,Park,Supermarket,Diner,Discount Store,French Restaurant,Fountain
78,Neu-Hohenschönhausen,Supermarket,Movie Theater,Tram Station,Fried Chicken Joint,French Restaurant,Fountain,Food Court,Food & Drink Shop,Flower Shop,Fishing Store


### Cluster 3
Miscallenous.

In [None]:
Berlin_neighbourhoods.loc[Berlin_neighbourhoods['Cluster Labels 36'] == 2, Berlin_neighbourhoods.columns[[0] + list(range(5, Berlin_neighbourhoods.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Stadtrandsiedlung Malchow,Playground,Restaurant,Zoo Exhibit,Electronics Store,Fountain,Food Court,Food & Drink Shop,Flower Shop,Fishing Store,Fish Market


### Cluster 4
Miscallenous.

In [None]:
Berlin_neighbourhoods.loc[Berlin_neighbourhoods['Cluster Labels 36'] == 3, Berlin_neighbourhoods.columns[[0] + list(range(5, Berlin_neighbourhoods.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
85,Heiligensee,Insurance Office,Ethiopian Restaurant,French Restaurant,Fountain,Food Court,Food & Drink Shop,Flower Shop,Fishing Store,Fish Market,Fast Food Restaurant


### Cluster 5
This is a cluster for people who like a variety of places to get coffee and food, that also has entertainment venues. This seems like an interesting choice.

In [None]:
Cluster_5=Berlin_neighbourhoods.loc[Berlin_neighbourhoods['Cluster Labels 36'] == 4, Berlin_neighbourhoods.columns[[0] + list(range(5, Berlin_neighbourhoods.shape[1]))]]
Cluster_5

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Mitte,German Restaurant,Museum,History Museum,Art Gallery,Café,Hotel,Fountain,Art Museum,Concert Hall,Brewery
1,Moabit,Café,Hostel,Doner Restaurant,Hotel,German Restaurant,Bar,Gym / Fitness Center,Burger Joint,Drugstore,Bakery
2,Hansaviertel,Café,Art Museum,Bakery,Plaza,Bus Stop,Metro Station,Farmers Market,Sporting Goods Shop,Mediterranean Restaurant,Boat or Ferry
3,Tiergarten,Lounge,Hotel Bar,Hotel,Memorial Site,Sculpture Garden,German Restaurant,Garden,Historic Site,Scandinavian Restaurant,Café
4,Wedding,Bus Stop,Park,Café,Tram Station,Bakery,Tennis Court,Big Box Store,Gas Station,Organic Grocery,Supermarket
5,Gesundbrunnen,Drugstore,Turkish Restaurant,Bar,Supermarket,Trail,Platform,Italian Restaurant,Bookstore,Clothing Store,Hotel
6,Friedrichshain,Coffee Shop,Café,Pub,Bar,Vegetarian / Vegan Restaurant,Middle Eastern Restaurant,Bookstore,Bagel Shop,Ice Cream Shop,Pizza Place
7,Kreuzberg,Café,Turkish Restaurant,Coffee Shop,Bakery,Bar,Italian Restaurant,German Restaurant,Nightclub,Waterfront,Cocktail Bar
8,Prenzlauer Berg,Café,Bakery,Cocktail Bar,Beer Bar,Vietnamese Restaurant,Falafel Restaurant,Park,Organic Grocery,Coffee Shop,Donut Shop
9,Weißensee,Tram Station,Hotel,Park,Beach,Vietnamese Restaurant,German Restaurant,Flower Shop,Fishing Store,Fish Market,Ethiopian Restaurant


The cluster 5 is big, but it could give a good choice for any kind of new restaurant in Berlin. While certain neighborhoods are obviously very touristic let's drop rows where we suspect more of them, such as abundant German restaurants, hotels and hostels, also we don't want any people who consume a lot of fast food. So these rows will also be dropped.

In [None]:
Cluster_5=Cluster_5[Cluster_5 != 'Fast Food']
Cluster_5=Cluster_5[Cluster_5 != 'German Restaurant']
Cluster_5=Cluster_5[Cluster_5 != 'Hotel']
Cluster_5=Cluster_5[Cluster_5 != 'Hostel']
Cluster_5.dropna(axis=0, inplace=True)

In [None]:
Cluster_5.shape
Cluster_5

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Hansaviertel,Café,Art Museum,Bakery,Plaza,Bus Stop,Metro Station,Farmers Market,Sporting Goods Shop,Mediterranean Restaurant,Boat or Ferry
4,Wedding,Bus Stop,Park,Café,Tram Station,Bakery,Tennis Court,Big Box Store,Gas Station,Organic Grocery,Supermarket
6,Friedrichshain,Coffee Shop,Café,Pub,Bar,Vegetarian / Vegan Restaurant,Middle Eastern Restaurant,Bookstore,Bagel Shop,Ice Cream Shop,Pizza Place
8,Prenzlauer Berg,Café,Bakery,Cocktail Bar,Beer Bar,Vietnamese Restaurant,Falafel Restaurant,Park,Organic Grocery,Coffee Shop,Donut Shop
10,Blankenburg,Playground,Bus Stop,Café,Greek Restaurant,Zoo Exhibit,Falafel Restaurant,Fountain,Food Court,Food & Drink Shop,Flower Shop
15,Blankenfelde,Stables,Miscellaneous Shop,Café,Zoo Exhibit,Fried Chicken Joint,Fountain,Food Court,Food & Drink Shop,Flower Shop,Fishing Store
16,Buch,Italian Restaurant,Art Gallery,Bakery,Asian Restaurant,Big Box Store,Drugstore,Zoo Exhibit,Flower Shop,Fast Food Restaurant,Fish Market
18,Niederschönhausen,Tram Station,Bakery,Italian Restaurant,Shipping Store,Trattoria/Osteria,Park,Dessert Shop,Thai Restaurant,Hobby Shop,Supermarket
23,Schmargendorf,Italian Restaurant,Ice Cream Shop,Drugstore,Gym,Chinese Restaurant,Restaurant,Café,Trattoria/Osteria,Deli / Bodega,Coffee Shop
25,Westend,Café,Bar,Art Museum,Gourmet Shop,Drugstore,Liquor Store,Italian Restaurant,Plaza,Bus Stop,Ice Cream Shop


Yet we have 36 neighborhoods to choose from. Let's drop shopping malls and motor cycle shops, electronics stores and supermarkets to have a smaller choice.


In [None]:
Cluster_5=Cluster_5[Cluster_5 != 'Shopping Mall']
Cluster_5=Cluster_5[Cluster_5 != 'Motorcycle Shop']
Cluster_5=Cluster_5[Cluster_5 != 'Electronics Store']
Cluster_5=Cluster_5[Cluster_5 != 'Insurance Office']
Cluster_5=Cluster_5[Cluster_5 != 'Supermarket']
Cluster_5=Cluster_5[Cluster_5 != 'Discount Store']
Cluster_5=Cluster_5[Cluster_5 != 'Fried Chicken Joint']

Cluster_5.dropna(axis=0, inplace=True)

In [None]:
Cluster_5
Cluster_5.shape

(16, 11)

In [None]:
Cluster_5

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Hansaviertel,Café,Art Museum,Bakery,Plaza,Bus Stop,Metro Station,Farmers Market,Sporting Goods Shop,Mediterranean Restaurant,Boat or Ferry
6,Friedrichshain,Coffee Shop,Café,Pub,Bar,Vegetarian / Vegan Restaurant,Middle Eastern Restaurant,Bookstore,Bagel Shop,Ice Cream Shop,Pizza Place
8,Prenzlauer Berg,Café,Bakery,Cocktail Bar,Beer Bar,Vietnamese Restaurant,Falafel Restaurant,Park,Organic Grocery,Coffee Shop,Donut Shop
10,Blankenburg,Playground,Bus Stop,Café,Greek Restaurant,Zoo Exhibit,Falafel Restaurant,Fountain,Food Court,Food & Drink Shop,Flower Shop
16,Buch,Italian Restaurant,Art Gallery,Bakery,Asian Restaurant,Big Box Store,Drugstore,Zoo Exhibit,Flower Shop,Fast Food Restaurant,Fish Market
23,Schmargendorf,Italian Restaurant,Ice Cream Shop,Drugstore,Gym,Chinese Restaurant,Restaurant,Café,Trattoria/Osteria,Deli / Bodega,Coffee Shop
25,Westend,Café,Bar,Art Museum,Gourmet Shop,Drugstore,Liquor Store,Italian Restaurant,Plaza,Bus Stop,Ice Cream Shop
37,Lichterfelde,Bakery,Italian Restaurant,Vietnamese Restaurant,Café,Liquor Store,Chinese Restaurant,Sculpture Garden,Eastern European Restaurant,Pool,Dive Bar
49,Neukölln,Bar,Café,Coffee Shop,Middle Eastern Restaurant,Dive Bar,Cocktail Bar,Bistro,Italian Restaurant,Vegetarian / Vegan Restaurant,Nightclub
57,Johannisthal,Tram Station,Pub,Park,Taverna,Dessert Shop,Burger Joint,Sushi Restaurant,Movie Theater,Pizza Place,Food & Drink Shop


There we have 16 suitable neighborhoods to open a modern sustainable restaurant.
They all provide an infrastructure with different kinds of places to eat. Yet certain venues indicate to attract younger people. We hope this can give this business some suitable options. This work was mainly done for outsiders who have no knowledge of the city.