# Capstone Project: The Battle of Neighborhoods
## Leveraging data science to analyze the potential of starting business activities in the most promising neighborhoods of Munich, Germany

### Table of Contents
1. Introduction: Business Problem
2. Data
3. Methodology
4. Analysis
5. Results and Discussion
6. Conclusion

#### 1.  Introduction: Business Problem
In this report, we are trying to find a borough in Munich, where we could facilitate a new business, namely a greek restaurant. 
According to published data, Munich is host to the largest greek community in Germany. 
Therefore, we strongly believe that opening a greek restaurant in a suitable neighborhood in Munich will be a very profitable action, and it will comfort a lot of immigrants from Greece, who are homesick for greek traditional food.
For this reason we will analyze and cluster Munich's neighborhoods in terms of venues# categories and price per square meter.

#### 2. Data

In order to find the most promising borough for opening a greek restaurant in Munich we will utilize following data:

a. Information about the venues in all boroughs of Munich, which is gathered by web scraping 'https://www.muenchen.de/leben/service/postleitzahlen.html'. The Geocoder Python package (https://geocoder.readthedocs.io/index.html) will be used to provide us with the latitude and logitude coordinate for all neighborhoods in Munich.
Longitude and latitude are then used with the Foursquare API to provide information about the nearby venues.

b. Average price per m² of the apartments in Munich: We will use web scraping to read and save the information from 'https://de.statista.com/statistik/daten/studie/260438/umfrage/mietpreise-in-muenchen-nach-bezirken/' into a dataframe.  itude and longitude are used as input for FourSquare to source information about the boroughs.

#### 3. Methodology

The steps will be as follow:
Firstly, we will read the postal codes of each neighborhood into a dataframe and transform the structure of the dataframe, as several postal codes correspond to the same borough.
Then we will utilize folium map and Munich's geographical coordinates in order to visualize the districts of Munich. 
Following, we analyze with Foursquare API the venues and venues' categories and we visualize their distribution on the map with color code, in order to gain insights into the lifestyle of each district and the popularity of different venues' categories such as modern and italian restaurants.
We assume that a neighborhood, where mediterranean and modern restaurants are popular could be a target for our business.
Nevertheless, we should also take into account the rent cost of each district for our decision, so that we start our business in a borough, where a greek restaurant would be popular, but also where the rent price is not too high.
Finally, we perform a clustering and exploratory analysis to obtain information about the rent price per square meter and the number of venues.

#### 4. Analysis

We start with importing and installing all necessary libraries for our analysis.

In [2]:
!pip install pandas
!pip install requests
!pip install folium
!conda install -c conda-forge folium=0.5.0 --yes

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 7.0 MB/s  eta 0:00:01
[?25hCollecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python-3.7-main

  add

In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')


Libraries imported.


 We will read each postal code in a dataframe using following website: 'https://www.muenchen.de/int/en/living/postal-codes.html'

In [4]:
url = 'https://www.muenchen.de/int/en/living/postal-codes.html'
munich_data_list = pd.read_html(url)
munich_data = munich_data_list[0]
munich_data

Unnamed: 0,District,Postal Code
0,Allach-Untermenzing,"80995, 80997, 80999, 81247, 81249"
1,Altstadt-Lehel,"80331, 80333, 80335, 80336, 80469, 80538, 80539"
2,Au-Haidhausen,"81541, 81543, 81667, 81669, 81671, 81675, 81677"
3,Aubing-Lochhausen-Langwied,"81243, 81245, 81249"
4,Berg am Laim,"81671, 81673, 81735, 81825"
5,Bogenhausen,"81675, 81677, 81679, 81925, 81927, 81929"
6,Feldmoching-Hasenbergl,"80933, 80935, 80995"
7,Hadern,"80689, 81375, 81377"
8,Laim,"80686, 80687, 80689"
9,Ludwigsvorstadt-Isarvorstadt,"80335, 80336, 80337, 80469"


We can see that several districts are assigned to more than one postal codes, therefore we split the above dataframe.

In [5]:
munich_data_cleaned = pd.DataFrame(columns=['District', 'Postal Code'])
munich_data_cleaned.head()

Unnamed: 0,District,Postal Code


In [6]:
items = []
for idx, codes in enumerate(munich_data['Postal Code']):
    code_list = codes.split(',')
    district = munich_data['District'][idx]
    for element in code_list:
        element = element.replace(' ', '')
        items.append({'District': district, 'Postal Code': element})

In [7]:
munich_data_cleaned = munich_data_cleaned.append(items)
munich_data_cleaned.head()

Unnamed: 0,District,Postal Code
0,Allach-Untermenzing,80995
1,Allach-Untermenzing,80997
2,Allach-Untermenzing,80999
3,Allach-Untermenzing,81247
4,Allach-Untermenzing,81249


In [8]:
#Clean data
muc_data_cleaned = pd.DataFrame(columns=['District', 'Postal Code'])
muc_data_cleaned.head()

Unnamed: 0,District,Postal Code


We use our credentials for Foursquare API in order to fetch latitude and longitude data for each postal code.

In [9]:
# credentials
CLIENT_ID = 'SS2MEJUIINVDIJQBQSOVLD4F0GASSZCEKMYI4J35V4H45BQE'
CLIENT_SECRET = 'HMCUCAXXWML5R3KB3DQOWZCGOIRR4LL2QA2P0GPYYVS2RHIL'
ACCESS_TOKEN = '4WBNLZ5S2P2EGGJCKUZY4IS2UEG52BARDFMSWRJDJZXCD4TT'
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: SS2MEJUIINVDIJQBQSOVLD4F0GASSZCEKMYI4J35V4H45BQE
CLIENT_SECRET:HMCUCAXXWML5R3KB3DQOWZCGOIRR4LL2QA2P0GPYYVS2RHIL


We create a new dataframe, which contains the geographical coordinates of each district. 

In [10]:
# create new dataframe containing latitude and longitude values 
munich_data_ll = pd.DataFrame(columns=['District', 'Postal Code', 'Latitude', 'Longitude'])

# loop 
items = []
for idx, district in enumerate(munich_data_cleaned['District']):
    code = munich_data_cleaned['Postal Code'][idx]
    address = district + ', ' + code # to get format of address

    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    items.append({'District': district, 
                  'Postal Code': code,
                  'Latitude': latitude,
                  'Longitude': longitude})

In [11]:
munich_data_ll = munich_data_ll.append(items)
munich_data_ll.head()

Unnamed: 0,District,Postal Code,Latitude,Longitude
0,Allach-Untermenzing,80995,48.195157,11.462973
1,Allach-Untermenzing,80997,48.195157,11.462973
2,Allach-Untermenzing,80999,48.195157,11.462973
3,Allach-Untermenzing,81247,48.195157,11.462973
4,Allach-Untermenzing,81249,48.195157,11.462973


We can now create Munich's map utilizing the folium library.

In [12]:
# create map of munich using coordinates' values
map_munich = folium.Map(location=[munich_data_ll["Latitude"].iloc[0], munich_data_ll["Longitude"].iloc[0]], zoom_start=11)

# add markers to map
for lat, lng, district in zip(munich_data_ll['Latitude'], munich_data_ll['Longitude'], munich_data_ll['District']):
    label = '{}'.format(district)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_munich)  
    
map_munich

We explore the nearby in each borough venues with the Foursquare API and we create the Munich venues dataframe.

In [13]:
# get all Munich venues
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [14]:
munich_venues = getNearbyVenues(names=munich_data_ll['District'],
                                   latitudes=munich_data_ll['Latitude'],
                                   longitudes=munich_data_ll['Longitude']
                                  )

Allach-Untermenzing
Allach-Untermenzing
Allach-Untermenzing
Allach-Untermenzing
Allach-Untermenzing
Altstadt-Lehel
Altstadt-Lehel
Altstadt-Lehel
Altstadt-Lehel
Altstadt-Lehel
Altstadt-Lehel
Altstadt-Lehel
Au-Haidhausen
Au-Haidhausen
Au-Haidhausen
Au-Haidhausen
Au-Haidhausen
Au-Haidhausen
Au-Haidhausen
Aubing-Lochhausen-Langwied
Aubing-Lochhausen-Langwied
Aubing-Lochhausen-Langwied
Berg am Laim
Berg am Laim
Berg am Laim
Berg am Laim
Bogenhausen
Bogenhausen
Bogenhausen
Bogenhausen
Bogenhausen
Bogenhausen
Feldmoching-Hasenbergl
Feldmoching-Hasenbergl
Feldmoching-Hasenbergl
Hadern
Hadern
Hadern
Laim
Laim
Laim
Ludwigsvorstadt-Isarvorstadt
Ludwigsvorstadt-Isarvorstadt
Ludwigsvorstadt-Isarvorstadt
Ludwigsvorstadt-Isarvorstadt
Maxvorstadt
Maxvorstadt
Maxvorstadt
Maxvorstadt
Maxvorstadt
Maxvorstadt
Maxvorstadt
Maxvorstadt
Maxvorstadt
Milbertshofen-Am Hart
Milbertshofen-Am Hart
Milbertshofen-Am Hart
Milbertshofen-Am Hart
Moosach
Moosach
Moosach
Moosach
Moosach
Neuhausen-Nymphenburg
Neuhausen-Nym

In [15]:
munich_venues.head()

Unnamed: 0,District,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Allach-Untermenzing,48.195157,11.462973,Bäckerei Schuhmair,48.197175,11.459016,Bakery
1,Allach-Untermenzing,48.195157,11.462973,dm-drogerie markt,48.194118,11.46564,Drugstore
2,Allach-Untermenzing,48.195157,11.462973,Sport Bittl,48.191447,11.466553,Sporting Goods Shop
3,Allach-Untermenzing,48.195157,11.462973,Sicilia,48.193331,11.459387,Italian Restaurant
4,Allach-Untermenzing,48.195157,11.462973,Lidl,48.194428,11.465612,Supermarket


We use the groupby command and the onehot encoding, as we are particularly interested in the number and types of venue categories per district.

In [16]:
munich_venues.groupby('District').count()

Unnamed: 0_level_0,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allach-Untermenzing,45,45,45,45,45,45
Altstadt-Lehel,210,210,210,210,210,210
Au-Haidhausen,210,210,210,210,210,210
Berg am Laim,30,30,30,30,30,30
Bogenhausen,72,72,72,72,72,72
Feldmoching-Hasenbergl,3,3,3,3,3,3
Hadern,33,33,33,33,33,33
Laim,60,60,60,60,60,60
Ludwigsvorstadt-Isarvorstadt,120,120,120,120,120,120
Maxvorstadt,270,270,270,270,270,270


We are interested in specific venues, i.e greek tavernas and greek restaurants and we create the dataframes of Munich venues of interest

In [17]:
#venues of interest
muc_venueint = munich_venues[(munich_venues['Venue Category'].str.contains('Greek')==True)  | (munich_venues['Venue Category'].str.contains('Taverna')==True)]

#all other venues 
muc_venueother =  munich_venues[(munich_venues['Venue Category'].str.contains('Greek')==False)  | (munich_venues['Venue Category'].str.contains('Taverna')==False)]

In [18]:
#Dataframes with specific favourite venues
muc_venuegreek = munich_venues[munich_venues['Venue Category'].str.match('Greek')]
muc_venuetaverna = munich_venues[munich_venues['Venue Category'].str.match('Taverna')]

In [19]:
muc_venuetaverna.head(10)

Unnamed: 0,District,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
1495,Pasing-Obermenzing,48.143502,11.465118,Taverna Kypros,48.145074,11.459974,Taverna
1518,Pasing-Obermenzing,48.165049,11.474079,Taverna Naxos,48.164144,11.479353,Taverna
1721,Schwabing-West,48.168271,11.569873,Georgios,48.165728,11.564714,Taverna
1742,Schwabing-West,48.168271,11.569873,Georgios,48.165728,11.564714,Taverna
1763,Schwabing-West,48.168271,11.569873,Georgios,48.165728,11.564714,Taverna
1784,Schwabing-West,48.168271,11.569873,Georgios,48.165728,11.564714,Taverna
1805,Schwabing-West,48.168271,11.569873,Georgios,48.165728,11.564714,Taverna
1826,Schwabing-West,48.168271,11.569873,Georgios,48.165728,11.564714,Taverna
1847,Schwabing-West,48.168271,11.569873,Georgios,48.165728,11.564714,Taverna
1868,Schwabing-West,48.168271,11.569873,Georgios,48.165728,11.564714,Taverna


In [20]:
muc_venuegreek.head(10)

Unnamed: 0,District,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
497,Bogenhausen,48.158487,11.636682,Pyrsos,48.154944,11.637708,Greek Restaurant
510,Bogenhausen,48.158487,11.636682,Pyrsos,48.154944,11.637708,Greek Restaurant
523,Bogenhausen,48.158487,11.636682,Pyrsos,48.154944,11.637708,Greek Restaurant
536,Bogenhausen,48.158487,11.636682,Pyrsos,48.154944,11.637708,Greek Restaurant
549,Bogenhausen,48.154782,11.633484,Pyrsos,48.154944,11.637708,Greek Restaurant
556,Bogenhausen,48.158487,11.636682,Pyrsos,48.154944,11.637708,Greek Restaurant
567,Feldmoching-Hasenbergl,48.218462,11.520409,Seehaus Feldmoching,48.216965,11.517452,Greek Restaurant
568,Feldmoching-Hasenbergl,48.218462,11.520409,Seehaus Feldmoching,48.216965,11.517452,Greek Restaurant
569,Feldmoching-Hasenbergl,48.218462,11.520409,Seehaus Feldmoching,48.216965,11.517452,Greek Restaurant
603,Laim,48.139551,11.502166,Potlatsch,48.139821,11.500279,Greek Restaurant


We can now create a Map to visualize the venues of interest per Munich district.

In [21]:
# create map and display it
muc_map = folium.Map(location=[latitude, longitude], zoom_start=11)
incidents = folium.map.FeatureGroup()


for lat, lng, in zip(muc_venuegreek["Venue Latitude"], muc_venuegreek["Venue Longitude"]):
    incidents.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='green',
        )
    )

for lat, lng, in zip(muc_venuetaverna["Venue Latitude"], muc_venuetaverna["Venue Longitude"]):
    incidents.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='orange',
        )
    )    

#muc_map.save('mymap.html') 

# add incidents to map
muc_map.add_child(incidents)
muc_map

In [22]:
#Step 1: One hot encoding of Johns favourite venues 
muc_venueint_onehot = pd.get_dummies(muc_venueint[['Venue Category']], prefix="", prefix_sep="")

#Step 2: Smaller modifications 
########add neighborhood column back to dataframe

muc_venueint_onehot['District'] = muc_venueint['District'] 
########move neighborhood column to the first column
fixed_columns = [muc_venueint_onehot.columns[-1]] + list(muc_venueint_onehot.columns[:-1])
muc_venueint_onehot = muc_venueint_onehot[fixed_columns]

We group the venues'categories by mean and we print our dataframe again.

In [23]:
muc_venueint_grouped = muc_venueint_onehot.groupby('District').sum().reset_index()
muc_venueint_grouped.head(12)

Unnamed: 0,District,Greek Restaurant,Taverna
0,Bogenhausen,6,0
1,Feldmoching-Hasenbergl,3,0
2,Laim,3,0
3,Milbertshofen-Am Hart,8,0
4,Neuhausen-Nymphenburg,5,0
5,Obergiesing,3,0
6,Pasing-Obermenzing,0,2
7,Schwabing-Freimann,9,0
8,Schwabing-West,0,8
9,Sendling,1,0


We are nevertheless interested in other venue categories too, so we check the mean of the frequency of occurrence of each venue category per district and we print the first 5 most common venues.

In [24]:
# one hot encoding 
munich_onehot = pd.get_dummies(munich_venues[['Venue Category']], prefix="", prefix_sep="")

# Add District column 
munich_onehot.insert(0, 'District', munich_data_ll['District'])
munich_onehot.head(10)

Unnamed: 0,District,ATM,Afghan Restaurant,American Restaurant,Arcade,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Auto Dealership,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Bavarian Restaurant,Beach,Beer Garden,Beer Store,Big Box Store,Bistro,Bookstore,Boutique,Boxing Gym,Brewery,Burger Joint,Burrito Place,Bus Line,Bus Stop,Business Service,Cafeteria,Café,Candy Store,Chinese Restaurant,Church,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Creperie,Cupcake Shop,Currywurst Joint,Deli / Bodega,Department Store,Design Studio,Diner,Discount Store,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Food,Food & Drink Shop,Food Court,Fountain,French Restaurant,Garden,Gas Station,Gastropub,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grilled Meat Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hawaiian Restaurant,Hill,Hookah Bar,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Insurance Office,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewish Restaurant,Juice Bar,Kebab Restaurant,Lake,Laser Tag,Laundry Service,Light Rail Station,Liquor Store,Manti Place,Market,Men's Store,Metro Station,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Motel,Movie Theater,Museum,Nightclub,Organic Grocery,Outdoor Sculpture,Park,Pastry Shop,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Post Office,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Salon / Barbershop,Sandwich Place,School,Snack Place,Soccer Field,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Taverna,Tea Room,Thai Restaurant,Tourist Information Center,Track,Tram Station,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vietnamese Restaurant,Water Park,Wine Bar,Wine Shop,Xinjiang Restaurant,Yoga Studio
0,Allach-Untermenzing,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Allach-Untermenzing,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Allach-Untermenzing,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Allach-Untermenzing,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Allach-Untermenzing,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,Altstadt-Lehel,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,Altstadt-Lehel,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,Altstadt-Lehel,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,Altstadt-Lehel,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,Altstadt-Lehel,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [25]:
munich_grouped = munich_onehot.groupby('District').mean().reset_index()
munich_grouped.head(10)

Unnamed: 0,District,ATM,Afghan Restaurant,American Restaurant,Arcade,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Auto Dealership,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Bavarian Restaurant,Beach,Beer Garden,Beer Store,Big Box Store,Bistro,Bookstore,Boutique,Boxing Gym,Brewery,Burger Joint,Burrito Place,Bus Line,Bus Stop,Business Service,Cafeteria,Café,Candy Store,Chinese Restaurant,Church,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Creperie,Cupcake Shop,Currywurst Joint,Deli / Bodega,Department Store,Design Studio,Diner,Discount Store,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Food,Food & Drink Shop,Food Court,Fountain,French Restaurant,Garden,Gas Station,Gastropub,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grilled Meat Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hawaiian Restaurant,Hill,Hookah Bar,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Insurance Office,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewish Restaurant,Juice Bar,Kebab Restaurant,Lake,Laser Tag,Laundry Service,Light Rail Station,Liquor Store,Manti Place,Market,Men's Store,Metro Station,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Motel,Movie Theater,Museum,Nightclub,Organic Grocery,Outdoor Sculpture,Park,Pastry Shop,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Post Office,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Salon / Barbershop,Sandwich Place,School,Snack Place,Soccer Field,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Taverna,Tea Room,Thai Restaurant,Tourist Information Center,Track,Tram Station,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vietnamese Restaurant,Water Park,Wine Bar,Wine Shop,Xinjiang Restaurant,Yoga Studio
0,Allach-Untermenzing,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Altstadt-Lehel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Au-Haidhausen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aubing-Lochhausen-Langwied,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Berg am Laim,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Bogenhausen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Feldmoching-Hasenbergl,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Hadern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Laim,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Ludwigsvorstadt-Isarvorstadt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [26]:
num_top_venues = 5

for hood in munich_grouped['District']:
    print("----"+hood+"----")
    temp = munich_grouped[munich_grouped['District'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Allach-Untermenzing----
                 venue  freq
0          Supermarket   0.2
1   Italian Restaurant   0.2
2               Bakery   0.2
3            Drugstore   0.2
4  Sporting Goods Shop   0.2


----Altstadt-Lehel----
                 venue  freq
0            Drugstore  0.29
1          Supermarket  0.14
2      Automotive Shop  0.14
3  Sporting Goods Shop  0.14
4               Bakery  0.14


----Au-Haidhausen----
                venue  freq
0         Supermarket  0.29
1  Italian Restaurant  0.14
2           Drugstore  0.14
3              Bakery  0.14
4     Automotive Shop  0.14


----Aubing-Lochhausen-Langwied----
                 venue  freq
0  Sporting Goods Shop  0.33
1   Italian Restaurant  0.33
2            Drugstore  0.33
3               Museum  0.00
4        Metro Station  0.00


----Berg am Laim----
                venue  freq
0         Supermarket  0.50
1           Drugstore  0.25
2     Automotive Shop  0.25
3       Metro Station  0.00
4  Miscellaneous Shop  0.00


---

Let's find out the top 10 venues of each neighborhood.

In [27]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:] # exclude District column
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [28]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
district_venues_sorted = pd.DataFrame(columns=columns)
district_venues_sorted['District'] = munich_grouped['District']

for ind in np.arange(munich_grouped.shape[0]):
    district_venues_sorted.iloc[ind, 1:] = return_most_common_venues(munich_grouped.iloc[ind, :], num_top_venues)

district_venues_sorted

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allach-Untermenzing,Supermarket,Italian Restaurant,Bakery,Drugstore,Sporting Goods Shop,Pastry Shop,Park,Peruvian Restaurant,Outdoor Sculpture,Organic Grocery
1,Altstadt-Lehel,Drugstore,Supermarket,Automotive Shop,Sporting Goods Shop,Bakery,Playground,Italian Restaurant,Nightclub,Miscellaneous Shop,Mobile Phone Shop
2,Au-Haidhausen,Supermarket,Italian Restaurant,Drugstore,Bakery,Automotive Shop,Playground,Israeli Restaurant,Insurance Office,Mobile Phone Shop,Modern European Restaurant
3,Aubing-Lochhausen-Langwied,Sporting Goods Shop,Italian Restaurant,Drugstore,Museum,Metro Station,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Motel,Movie Theater
4,Berg am Laim,Supermarket,Drugstore,Automotive Shop,Metro Station,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Motel,Movie Theater,Museum
5,Bogenhausen,Supermarket,Italian Restaurant,Drugstore,Sporting Goods Shop,Bakery,Playground,Israeli Restaurant,Metro Station,Pet Store,Peruvian Restaurant
6,Feldmoching-Hasenbergl,Supermarket,Drugstore,Automotive Shop,Metro Station,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Motel,Movie Theater,Museum
7,Hadern,Drugstore,Bakery,Playground,Pet Store,Museum,Metro Station,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Motel
8,Laim,Sporting Goods Shop,Supermarket,Italian Restaurant,Movie Theater,Men's Store,Metro Station,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Motel
9,Ludwigsvorstadt-Isarvorstadt,Supermarket,Automotive Shop,Drugstore,Playground,Irish Pub,Nightclub,Mobile Phone Shop,Modern European Restaurant,Motel,Movie Theater


We are aiming for obtaining information related to similarities between districts, so we split Munich's districts into 5 clusters.

We create a dataframe which includes the clusters and the most common venues and then we visualize the clusters with folium.

In [29]:
#Cluster neighborhoods

num_clusters = 5

X = munich_grouped.drop('District', 1)

kmeans = KMeans(n_clusters=num_clusters, random_state=0).fit(X)



In [30]:
#Add clustering labels
district_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

munich_merged = munich_data_ll

# merge labels and data about venues to district data and latitude plus longitude data to have all in one dataframe
munich_merged = munich_merged.join(district_venues_sorted.set_index('District'), on='District')

munich_merged.head()

Unnamed: 0,District,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allach-Untermenzing,80995,48.195157,11.462973,1,Supermarket,Italian Restaurant,Bakery,Drugstore,Sporting Goods Shop,Pastry Shop,Park,Peruvian Restaurant,Outdoor Sculpture,Organic Grocery
1,Allach-Untermenzing,80997,48.195157,11.462973,1,Supermarket,Italian Restaurant,Bakery,Drugstore,Sporting Goods Shop,Pastry Shop,Park,Peruvian Restaurant,Outdoor Sculpture,Organic Grocery
2,Allach-Untermenzing,80999,48.195157,11.462973,1,Supermarket,Italian Restaurant,Bakery,Drugstore,Sporting Goods Shop,Pastry Shop,Park,Peruvian Restaurant,Outdoor Sculpture,Organic Grocery
3,Allach-Untermenzing,81247,48.195157,11.462973,1,Supermarket,Italian Restaurant,Bakery,Drugstore,Sporting Goods Shop,Pastry Shop,Park,Peruvian Restaurant,Outdoor Sculpture,Organic Grocery
4,Allach-Untermenzing,81249,48.195157,11.462973,1,Supermarket,Italian Restaurant,Bakery,Drugstore,Sporting Goods Shop,Pastry Shop,Park,Peruvian Restaurant,Outdoor Sculpture,Organic Grocery


In [31]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
indian_red = '#CD5C5C'
blue = '#2980B9'
purple = '#5B2C6F'
gold = '#F1C40F'
green = '#239B56'
x = np.arange(num_clusters)
rainbow = [indian_red, blue, purple, gold, green]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(munich_merged['Latitude'], munich_merged['Longitude'], munich_merged['District'], munich_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [32]:
# first: examine the green cluster (number zero)
cluster0 = munich_merged.loc[munich_merged['Cluster Labels'] == 0, munich_merged.columns[[1] + list(range(5, munich_merged.shape[1]))]]
cluster0['1st Most Common Venue'].value_counts()

Bookstore       17
Fountain         7
Plaza            6
Gourmet Shop     5
Boutique         4
Creperie         4
Name: 1st Most Common Venue, dtype: int64

In [33]:
# next: examine the red cluster (number one)
cluster1 = munich_merged.loc[munich_merged['Cluster Labels'] == 1, munich_merged.columns[[1] + list(range(5, munich_merged.shape[1]))]]
cluster1['1st Most Common Venue'].value_counts()

Supermarket            29
Drugstore              10
Sporting Goods Shop     6
Name: 1st Most Common Venue, dtype: int64

In [34]:
# next: examine the blue cluster (number two)
cluster2 = munich_merged.loc[munich_merged['Cluster Labels'] == 2, munich_merged.columns[[1] + list(range(5, munich_merged.shape[1]))]]
cluster2['1st Most Common Venue'].value_counts()

Plaza          16
Men's Store     5
Name: 1st Most Common Venue, dtype: int64

In [35]:
# examine the purple cluster (number three)
cluster3 = munich_merged.loc[munich_merged['Cluster Labels'] == 3, munich_merged.columns[[1] + list(range(5, munich_merged.shape[1]))]]
cluster3['1st Most Common Venue'].value_counts()

Café              13
Clothing Store     3
Name: 1st Most Common Venue, dtype: int64

In [36]:
# examine the yellow cluster (number four)
cluster4 = munich_merged.loc[munich_merged['Cluster Labels'] == 4, munich_merged.columns[[1] + list(range(5, munich_merged.shape[1]))]]
cluster4['1st Most Common Venue'].value_counts()

Hotel    2
Name: 1st Most Common Venue, dtype: int64

We can see that there are neighborhoods in Munich, where mediterranean restaurants are very popular, so we will focus on such districts.

Nevertheless, Munich is a really expensive city in terms of rent prices, so below we will explore rent prices per district by utilizing the website: 'https://www.tz.de/leben/wohnen/uebersicht-muenchner-mieten-preise-nach-postleitzahlen-tz-6133643.html'

In [37]:
#Get rent prices for Munich districts
url = 'https://www.tz.de/leben/wohnen/uebersicht-muenchner-mieten-preise-nach-postleitzahlen-tz-6133643.html'
df_mucPrice = pd.read_html(url, header=0)[0]
df_mucPrice.head(30)
df_mucPrice.dropna(inplace = True)

df_mucPrice.head()

Unnamed: 0,PLZ,Miete,Trend,Kaufpreis,Trend.1
1,80995,1410.0,"1,1%",5000,"6,8%"
2,80997,1325.0,"-1,9%",5430,"12,4%"
3,80999,1305.0,"5,2%",5880,"11,4%"
4,81247,1455.0,"2,5%",6520,"1,4%"
5,81249,1325.0,"5,6%",5100,"3,4%"


In [38]:
#Get rent price per square meter
df_mucPrice.rename(columns={'PLZ':'PostalCode', 'Miete':'PricePerm2'}, inplace = True)
df_mucPrice["PricePerm2"] = df_mucPrice["PricePerm2"] /100
df_mucPrice.head(20)

Unnamed: 0,PostalCode,PricePerm2,Trend,Kaufpreis,Trend.1
1,80995,14.1,"1,1%",5000,"6,8%"
2,80997,13.25,"-1,9%",5430,"12,4%"
3,80999,13.05,"5,2%",5880,"11,4%"
4,81247,14.55,"2,5%",6520,"1,4%"
5,81249,13.25,"5,6%",5100,"3,4%"
7,80331,22.3,"3,5%",k.A.,k.A.
8,80333,19.1,"1,9%",9120,"20,8%"
9,80335,19.55,"2,9%",8690,"5,8%"
10,80336,18.15,"0,0%",8960,"9,0%"
11,80469,2.06,"4,8%",8370,"0,5%"


### 5. Results & Discussion

The above analysis demonstrates, that the most suitable districts in Munich for opening a greek restaurant are located in cluster 3, because the frequency of occurence of gourmet shops is high:

Plaza                  9
Gourmet Shop           7
Fountain               7
Sporting Goods Shop    6
Men's Store            5

Such districts are Neu Langwied, Sendling, Obersendling, Pasing, Obermenzing, Perlach, Maxvorstadt and Neuperlach.

Our findings from the rent price exploratory analysis suggest, that the top 5 neighborhoods with the lowest rent price per m2 in the purple cluster are:
1. Neu Langwied
2. Allach
3. Aubing
4. Pasing
5. Moosach


### 6. Conclusion

Based on our analysis and results, we choose Pasing-Obermenzing as the district that will host our new greek restaurant because there already exist 3 greek tavernas and 0 greek restaurants.

That means that a lot of Greek people live in this neighborhood and since there is no greek restaurant operating there we will have the monopoly of the business.

In terms of price Pasing is not as cheap as Neu Langwied, but also not as pricey as Moosach or Maxvorstadt, therefore it is the most suitable district for our new operations.