# Capstone Project Final Assignment
## Business problem: Location analysis in the center of Hamburg / Germany for a planned opening of an Italian Restaurant


##### In this scenario I do have an own consultancy, which is specialized on data science. One of my customers is planning to open a modern and stylish Italian restaurant in one of the three boroughs Altstadt, HafenCity or Neustadt in the center of Hamburg. I am asked to assist him with a location analysis. He wants in particular to know about potential competitors in the vicinity and about the people who live there (age, income). The restaurant wants to focus on people in working age (18-64) with a good to high income.

##### Data 
#### According to the saying that the three most important considerations in business are location, location and location (at least for offline businesses like a restaurant), I am planning to use statistical data of the official statistics office in Northern Germany (https://www.statistik-nord.de) for the demographic data (especially the Excel table about the people that live in the boroughs "StadtteilprofileBerichtsjahr2017.xlsx") as well as the API of Foursquare(https://api.foursquare.com) for obtaining information about venues.

##### A mapping between boroughs and postal codes is available via the website https://ahoihamburg.net/postleitzahlen-plz-liste-hamburg/.

##### Results - At the end the customer shall get information about 
1. Potential competitors in the vicinity
1. Demographic statistics about the people that live there with people between 18-64 and income
1. A recommendation with discussion about the results and conclusion

##### Methodology
* We start with obtaining the statistical data from the statistics office in Northern Germany and reduce it to the data that we really need (amount of residents, age, income)
* Then we get the long and lat values for the three selected boroughs by the customer (Altstadt, Neustadt, HafenCity)
* Then we use the Foursquare API to explore the neighborhood including the venues
* Then we add to our table the amount of restaurants, the available income per restaurant, the amount of restaurants per resident
* Then we analyze and cluster our data with k-Means
* Then we give a recommendation (our result)


In [1]:
import numpy as np
import pandas as pd
import requests
import folium 
import matplotlib.pyplot as plt

### Let's start with some demographic data about the three boroughs

In [2]:
url="https://www.statistik-nord.de/fileadmin/Dokumente/Datenbanken_und_Karten/Stadtteilprofile/StadtteilprofileBerichtsjahr2017.xlsx"
df=pd.read_excel(url, skiprows=3)

In [3]:
df.head(5)

Unnamed: 0.1,Unnamed: 0,Anzahl der Einwohnerinnen und Einwohner,Anzahl der Kinder und Jugendlichen unter 18 Jahren,Anteil Kinder und Jugendlicher unter 18 Jahren an der Gesamt-bevölkerung,Anzahl älterer Einwohnerinnen und Einwohner über 64 Jahren,Anteil älterer Einwohnerinnen und Einwohner über 64 Jahren an der Gesamt-bevölkerung,Anzahl ausländischer Einwohnerinnen und Einwohner,Anteil ausländischer Einwohnerinnen und Einwohner an der Gesamt-bevölkerung,Anzahl der Einwohnerinnen und Einwohner mit Migrations-hintergrund,Anteil der Einwohnerinnen und Einwohner mit Migrations-hintergrund an der Gesamt-bevölkerung,...,Anzahl der Grundschulen,Anzahl der Schülerinnen und Schüler der Sekundarstufe I (nach Wohnort),Anteil der Schülerinnen und Schüler in Stadtteilschulen an allen Schülerinnen und Schülern der Sekundarstufe I (nach Wohnort),Anteil der Schülerinnen und Schüler in Gymnasien an allen Schülerinnen und Schülern der Sekundarstufe I (nach Wohnort),Anzahl der niedergelassenen Ärzte,Anzahl der Allgemeinärzte,Anzahl der Zahnärzte,Anzahl der Apotheken,Anzahl privater PKW,Anzahl der privaten PKW je 1 000 Einwohnerinnen und Einwohner
0,Hamburg-Altstadt,2305,277,12.017354,256,11.106291,506,21.952278,990,42.912874,...,0,61,55.737705,40.983607,161,48,45,8,629,272.885033
1,HafenCity,3627,756,20.843672,333,9.181141,1168,32.202923,1718,47.262724,...,1,138,48.550725,46.376812,13,5,7,1,960,264.681555
2,Neustadt,12719,1456,11.447441,1836,14.435097,2580,20.284614,4670,36.644696,...,1,400,50.0,48.25,173,18,76,9,3131,246.167152
3,St. Pauli,22501,2991,13.292743,2150,9.555131,4880,21.687925,8309,36.801311,...,1,818,56.479218,39.242054,39,11,15,8,4370,194.213591
4,St. Georg,11055,1108,10.022614,1397,12.636816,2552,23.084577,4274,38.608853,...,3,299,43.143813,54.180602,106,36,15,7,2300,208.050656


In [4]:
df.drop(df.index[3:], axis=0, inplace=True) # Deleting all boroughs except Altstadt, HafenCity and Neustadt

In [5]:
df.head()

Unnamed: 0.1,Unnamed: 0,Anzahl der Einwohnerinnen und Einwohner,Anzahl der Kinder und Jugendlichen unter 18 Jahren,Anteil Kinder und Jugendlicher unter 18 Jahren an der Gesamt-bevölkerung,Anzahl älterer Einwohnerinnen und Einwohner über 64 Jahren,Anteil älterer Einwohnerinnen und Einwohner über 64 Jahren an der Gesamt-bevölkerung,Anzahl ausländischer Einwohnerinnen und Einwohner,Anteil ausländischer Einwohnerinnen und Einwohner an der Gesamt-bevölkerung,Anzahl der Einwohnerinnen und Einwohner mit Migrations-hintergrund,Anteil der Einwohnerinnen und Einwohner mit Migrations-hintergrund an der Gesamt-bevölkerung,...,Anzahl der Grundschulen,Anzahl der Schülerinnen und Schüler der Sekundarstufe I (nach Wohnort),Anteil der Schülerinnen und Schüler in Stadtteilschulen an allen Schülerinnen und Schülern der Sekundarstufe I (nach Wohnort),Anteil der Schülerinnen und Schüler in Gymnasien an allen Schülerinnen und Schülern der Sekundarstufe I (nach Wohnort),Anzahl der niedergelassenen Ärzte,Anzahl der Allgemeinärzte,Anzahl der Zahnärzte,Anzahl der Apotheken,Anzahl privater PKW,Anzahl der privaten PKW je 1 000 Einwohnerinnen und Einwohner
0,Hamburg-Altstadt,2305,277,12.017354,256,11.106291,506,21.952278,990,42.912874,...,0,61,55.737705,40.983607,161,48,45,8,629,272.885033
1,HafenCity,3627,756,20.843672,333,9.181141,1168,32.202923,1718,47.262724,...,1,138,48.550725,46.376812,13,5,7,1,960,264.681555
2,Neustadt,12719,1456,11.447441,1836,14.435097,2580,20.284614,4670,36.644696,...,1,400,50.0,48.25,173,18,76,9,3131,246.167152


In [6]:
df_new=df[["Unnamed: 0", "Anzahl der Einwohnerinnen und Einwohner", "Anzahl älterer Einwohnerinnen und Einwohner über 64 Jahren", "Anzahl der Kinder und Jugendlichen unter 18 Jahren", "Gesamtbetrag der Einkünfte je Steuerpflichtigen (Lohn- und Einkommen-steuer) im Jahr"]]

### Now we have got a first overview of residents in each borough (I will translate the columns in a few cells)

In [7]:
df_new.head()

Unnamed: 0.1,Unnamed: 0,Anzahl der Einwohnerinnen und Einwohner,Anzahl älterer Einwohnerinnen und Einwohner über 64 Jahren,Anzahl der Kinder und Jugendlichen unter 18 Jahren,Gesamtbetrag der Einkünfte je Steuerpflichtigen (Lohn- und Einkommen-steuer) im Jahr
0,Hamburg-Altstadt,2305,256,277,31336
1,HafenCity,3627,333,756,93206
2,Neustadt,12719,1836,1456,34521


In [8]:
df_new["Adults"]=df_new["Anzahl der Einwohnerinnen und Einwohner"]-df_new["Anzahl älterer Einwohnerinnen und Einwohner über 64 Jahren"]-df_new["Anzahl der Kinder und Jugendlichen unter 18 Jahren"]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [9]:
df_new.drop(columns=["Anzahl älterer Einwohnerinnen und Einwohner über 64 Jahren", "Anzahl der Einwohnerinnen und Einwohner", "Anzahl der Kinder und Jugendlichen unter 18 Jahren"], axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


In [10]:
df_new.rename(columns={"Unnamed: 0":"Borough", "Gesamtbetrag der Einkünfte je Steuerpflichtigen (Lohn- und Einkommen-steuer) im Jahr":"Yearly Income"}, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


In [11]:
df_new.head()

Unnamed: 0,Borough,Yearly Income,Adults
0,Hamburg-Altstadt,31336,1772
1,HafenCity,93206,2538
2,Neustadt,34521,9427


## We now know how many residents as potential customers for the restaurant live in each borough and how much their average income is. 


## Let's obtain now the lat and long values of the boroughs for further data

In [12]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
geolocator = Nominatim(user_agent="explorer")

address1 = 'Altstadt, Hamburg'
address2 = 'HafenCity, Hamburg'
address3 = 'Neustadt, Hamburg'

location1 = geolocator.geocode(address1)
location2 = geolocator.geocode(address2)
location3 = geolocator.geocode(address3)

latitude1 = location1.latitude # Hafencity
longitude1 = location1.longitude

latitude2 = location2.latitude # Altstadt
longitude2 = location2.longitude

latitude3 = location3.latitude # Neustadt
longitude3 = location3.longitude

In [13]:
 df_new["Longitude"]=[longitude1, longitude2, longitude3]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [14]:
df_new["Latitude"]=[latitude1, latitude2, latitude3]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [15]:
df_new

Unnamed: 0,Borough,Yearly Income,Adults,Longitude,Latitude
0,Hamburg-Altstadt,31336,1772,9.99464,53.550468
1,HafenCity,93206,2538,9.995835,53.542913
2,Neustadt,34521,9427,9.979048,53.549881


## Let's create a map now to see the boroughs

In [16]:
address = 'Hamburg, Germany'

location = geolocator.geocode(address)

latitude = location.latitude # Hamburg
longitude = location.longitude

In [17]:
map_hamburg = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough in zip(df_new['Latitude'], df_new['Longitude'], df_new['Borough']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_hamburg)  
map_hamburg

## Now let's explore the neighborhood with Foursquare

In [18]:
CLIENT_ID = 'MFQNK1DP2WSXKXN55NKHXTCUPTUI0DNBSSLRN5KUXVTJRU4A' 
CLIENT_SECRET = 'F5N1A4G5DOFFCPVMCZLW03K2MAC1PBDMQMK1ZVUK1LQV4V15'
VERSION = '20180605'

In [19]:
df_expl = df_new.copy()

## Let's get the top venues that are in each borough within a radius of 500 meters.

In [20]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [21]:
LIMIT = 5000
radius = 500
hamburg_venues = getNearbyVenues(names=df_expl['Borough'],
                                   latitudes=df_expl['Latitude'],
                                   longitudes=df_expl['Longitude']
                                  )

Hamburg-Altstadt
HafenCity
Neustadt


## Let's have a look on the venues in our boroughs that we obtained via Foursquare

In [22]:
hamburg_venues

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Hamburg-Altstadt,53.550468,9.994640,Rathausmarkt,53.550737,9.993503,Plaza
1,Hamburg-Altstadt,53.550468,9.994640,Le Lion,53.550125,9.994436,Cocktail Bar
2,Hamburg-Altstadt,53.550468,9.994640,Picasso,53.549934,9.995627,Spanish Restaurant
3,Hamburg-Altstadt,53.550468,9.994640,Vincent Vegan,53.551407,9.996376,Vegetarian / Vegan Restaurant
4,Hamburg-Altstadt,53.550468,9.994640,estancia steaks,53.548581,9.995539,Steakhouse
5,Hamburg-Altstadt,53.550468,9.994640,Le Plat du Jour,53.548773,9.994295,French Restaurant
6,Hamburg-Altstadt,53.550468,9.994640,Café Paris,53.550106,9.994227,Café
7,Hamburg-Altstadt,53.550468,9.994640,Passage Kino,53.550708,9.998299,Indie Movie Theater
8,Hamburg-Altstadt,53.550468,9.994640,Playground Coffee,53.551171,9.992224,Coffee Shop
9,Hamburg-Altstadt,53.550468,9.994640,Jungfernstieg,53.552862,9.993174,Plaza


## Let's how many venues were found in each borough

In [23]:
hamburg_venues.groupby('Borough').count()

Unnamed: 0_level_0,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
HafenCity,62,62,62,62,62,62
Hamburg-Altstadt,100,100,100,100,100,100
Neustadt,68,68,68,68,68,68


In [24]:
hc_venues = hamburg_venues[hamburg_venues["Borough"]=="HafenCity"]
ns_venues = hamburg_venues[hamburg_venues["Borough"]=="Neustadt"]
as_venues = hamburg_venues[hamburg_venues["Borough"]=="Hamburg-Altstadt"]

In [25]:
hc_venues[hc_venues["Venue Category"] == "Italian Restaurant"]

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
109,HafenCity,53.542913,9.995835,cantinetta ristorante & bar,53.544114,9.994533,Italian Restaurant
122,HafenCity,53.542913,9.995835,musica e ristorante,53.541466,9.993658,Italian Restaurant
136,HafenCity,53.542913,9.995835,Bella Italia,53.546386,9.997071,Italian Restaurant


In [26]:
ns_venues[ns_venues["Venue Category"] == "Italian Restaurant"]

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
167,Neustadt,53.549881,9.979048,Capriccio,53.551133,9.980351,Italian Restaurant
197,Neustadt,53.549881,9.979048,Restaurant Buon Giorno,53.549316,9.983093,Italian Restaurant
209,Neustadt,53.549881,9.979048,Vin O Vin,53.553018,9.983609,Italian Restaurant
214,Neustadt,53.549881,9.979048,Insieme,53.547947,9.983149,Italian Restaurant
225,Neustadt,53.549881,9.979048,Casa Rita,53.547969,9.982964,Italian Restaurant


In [27]:
as_venues[as_venues["Venue Category"] == "Italian Restaurant"]

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
27,Hamburg-Altstadt,53.550468,9.99464,La Tavola Calda,53.549334,9.993789,Italian Restaurant
36,Hamburg-Altstadt,53.550468,9.99464,Pastaria Da Franco,53.551148,9.994428,Italian Restaurant
88,Hamburg-Altstadt,53.550468,9.99464,Bella Italia,53.546386,9.997071,Italian Restaurant
99,Hamburg-Altstadt,53.550468,9.99464,Il Cappuccino,53.548315,9.999451,Italian Restaurant


In [28]:
df_new["Italian Restaurants"]=[4,3,5]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [29]:
hc_venues[hc_venues["Venue Category"].str.contains("Restaurant")].count()

Borough              23
Borough Latitude     23
Borough Longitude    23
Venue                23
Venue Latitude       23
Venue Longitude      23
Venue Category       23
dtype: int64

In [30]:
ns_venues[ns_venues["Venue Category"].str.contains("Restaurant")].count()

Borough              29
Borough Latitude     29
Borough Longitude    29
Venue                29
Venue Latitude       29
Venue Longitude      29
Venue Category       29
dtype: int64

In [31]:
as_venues[as_venues["Venue Category"].str.contains("Restaurant")].count()

Borough              28
Borough Latitude     28
Borough Longitude    28
Venue                28
Venue Latitude       28
Venue Longitude      28
Venue Category       28
dtype: int64

In [32]:
df_new["Restaurants"]=[28,23,29]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


## Let's see our current status in our analysis - we have residents, age, income and amount of restaurants and Italian restaurants in particular

In [33]:
df_new

Unnamed: 0,Borough,Yearly Income,Adults,Longitude,Latitude,Italian Restaurants,Restaurants
0,Hamburg-Altstadt,31336,1772,9.99464,53.550468,4,28
1,HafenCity,93206,2538,9.995835,53.542913,3,23
2,Neustadt,34521,9427,9.979048,53.549881,5,29


## Let's add some statistics

In [34]:
df_new["Restaurants per Resident"]=df_new["Restaurants"]/df_new["Adults"]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [35]:
df_new["Summed income per Restaurant"]=df_new["Yearly Income"]*df_new["Adults"]/df_new["Restaurants"]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [36]:
df_new['Summed income per Restaurant'] = df_new['Summed income per Restaurant'].apply(lambda x: '{:.2f}'.format(x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [37]:
df_new

Unnamed: 0,Borough,Yearly Income,Adults,Longitude,Latitude,Italian Restaurants,Restaurants,Restaurants per Resident,Summed income per Restaurant
0,Hamburg-Altstadt,31336,1772,9.99464,53.550468,4,28,0.015801,1983121.14
1,HafenCity,93206,2538,9.995835,53.542913,3,23,0.009062,10285079.48
2,Neustadt,34521,9427,9.979048,53.549881,5,29,0.003076,11221705.76


# Now let's analyze and cluster the boroughs

In [38]:
# one hot encoding
hamburg_onehot = pd.get_dummies(hamburg_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
hamburg_onehot['Borough'] = hamburg_venues['Borough'] 

# move neighborhood column to the first column
fixed_columns = [hamburg_onehot.columns[-1]] + list(hamburg_onehot.columns[:-1])
hamburg_onehot = hamburg_onehot[fixed_columns]

hamburg_onehot

Unnamed: 0,Borough,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Bar,Bavarian Restaurant,Bistro,...,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Theme Park Ride / Attraction,Trattoria/Osteria,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
0,Hamburg-Altstadt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Hamburg-Altstadt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Hamburg-Altstadt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Hamburg-Altstadt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
4,Hamburg-Altstadt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Hamburg-Altstadt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Hamburg-Altstadt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,Hamburg-Altstadt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,Hamburg-Altstadt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Hamburg-Altstadt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [39]:
hamburg_grouped = hamburg_onehot.groupby('Borough').mean().reset_index()
hamburg_grouped

Unnamed: 0,Borough,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Bar,Bavarian Restaurant,Bistro,...,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Theme Park Ride / Attraction,Trattoria/Osteria,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
0,HafenCity,0.016129,0.0,0.0,0.048387,0.0,0.016129,0.048387,0.016129,0.032258,...,0.032258,0.032258,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0
1,Hamburg-Altstadt,0.0,0.01,0.0,0.02,0.01,0.01,0.01,0.0,0.0,...,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.04,0.01
2,Neustadt,0.014706,0.0,0.014706,0.0,0.0,0.014706,0.0,0.0,0.0,...,0.0,0.0,0.014706,0.0,0.0,0.029412,0.014706,0.014706,0.014706,0.0


In [40]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 2

hamburg_grouped_clustering = hamburg_grouped.drop('Borough', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(hamburg_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 1])

In [41]:
df_new["Cluster"]=kmeans.labels_

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [42]:
df_new

Unnamed: 0,Borough,Yearly Income,Adults,Longitude,Latitude,Italian Restaurants,Restaurants,Restaurants per Resident,Summed income per Restaurant,Cluster
0,Hamburg-Altstadt,31336,1772,9.99464,53.550468,4,28,0.015801,1983121.14,1
1,HafenCity,93206,2538,9.995835,53.542913,3,23,0.009062,10285079.48,0
2,Neustadt,34521,9427,9.979048,53.549881,5,29,0.003076,11221705.76,1


## Results - Recommendation, Discussion & Conclusion

##### The inner city of Hamburg consists mainly of three boroughs (Altstadt, HafenCity and Neustadt). We have analyzed them concerning their demographic data as well as concerning their existing venues / restaurants, which could act as potential customers. Our descriptive data analysis as well as our clustering analysis via k-Means, have shown that our customer would have the best chances with his Italian restaurant in Neustadt, even when the average income is not as high as in HafenCity. Neustadt has a very positive ratio of restaurants per residence and only 5 Italian restaurants for nearly 10000 people between 18 and 64 as residents plus a potential of more than 1 Million tourists that visit the Neustadt for at least 3 days per year according to the official tourism office in Northern Germany.

##### Our customer is very thankful and decides to the open the restaurant in our recommended borough - good luck! :-)