# Capstone Project

## Introduction

A family living in the center of Düsseldorf, Germany would like to move to the outskirts of the City and needs to evaluate which neighborhood will provide them similar venues. 

For this they contacted a friend, who is currently studying the Data Science specialization, with the hope she can provide them of some ideas of where to start looking. 

Once she understood the problem. She remembered she had a module on Foursquare, where they employed K-means to cluster neighborhoods for Manhathan and Toronto. She started to look at the exercises of for that course and came up with the following results.

## Data requirements

The data requirements to solve this problem will be the neighborhoods from Düsseldorf, latitude and longitude and the zip codes known as "Postleitzahl".
For this our data scientist found the required information in this web site: http://postleitzahlen.woxikon.de/plz/duesseldorf

She needed to use the package BeatifulSoup to scrap the information. Then she changed the column names and save it as dataframe:

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

res = requests.get("http://postleitzahlen.woxikon.de/plz/duesseldorf")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))[0]

df.columns = ['Postalcode','District','Street']
print(df.shape)
df.to_csv('df.csv')
df.head()

(3079, 3)


Unnamed: 0,Postalcode,District,Street
0,40210,Düsseldorf Stadtmitte,Oststr.
1,40210,Düsseldorf Stadtmitte,Steinstr.
2,40210,Düsseldorf Stadtmitte,Oststr.
3,40210,Düsseldorf Stadtmitte,Marienstr.
4,40210,Düsseldorf Stadtmitte,Platz der Deutschen Einheit


To get the Latitude and Longitude, the dataframe was uploaded to this service provider: https://csv2geo.com/

Once the csv file was ready with the Latitude and Longitude coordinates. The data was imported to this notebook. Duplicates where removed and 

In [2]:
df_dus=pd.read_csv('dus_lat_long.csv',sep=',')
df_dus=df_dus[['PostalCode','District','Street','Latitude','Longitude']]
df_dus=df_dus.drop_duplicates(['Street'])
df_dus=df_dus.drop_duplicates(['Latitude'])
df_dus=df_dus.drop_duplicates(['Longitude'])
print(df_dus.shape)
df_dus.head()

(2124, 5)


Unnamed: 0,PostalCode,District,Street,Latitude,Longitude
0,40210,Düsseldorf Stadtmitte,Oststr.,51.22191,6.7858
1,40210,Düsseldorf Stadtmitte,Steinstr.,51.223,6.78467
2,40210,Düsseldorf Stadtmitte,Marienstr.,51.22371,6.78631
3,40210,Düsseldorf Stadtmitte,Platz der Deutschen Einheit,51.22329,6.78351
4,40210,Düsseldorf Stadtmitte,Stresemannplatz,51.21922,6.788


Since they can be many Street in one District, Streets where grouped and the average Latitude and Longitude for each District where calculated.

In [3]:
df_dus1=df_dus.groupby(['PostalCode','District'])['Street'].apply(lambda x: ','.join(x)).reset_index()
df_dus2=df_dus.groupby(['PostalCode','District'])['Latitude','Longitude'].mean().reset_index()

Then the name of Street column was changed to Neighborhood, the order of the columns where arranged.

In [4]:
df_dus2['Neighborhood']=df_dus1['Street']
df_dus2=df_dus2[['PostalCode','District','Neighborhood','Latitude','Longitude']]
df_dus2=df_dus2.drop_duplicates(['Latitude'])
df_dus2=df_dus2.drop_duplicates(['Longitude'])
print(df_dus2.shape)
df_dus2.head()

(107, 5)


Unnamed: 0,PostalCode,District,Neighborhood,Latitude,Longitude
0,40210,Düsseldorf Stadtmitte,"Oststr.,Steinstr.,Marienstr.,Platz der Deutsch...",51.221643,6.788295
1,40211,Düsseldorf Pempelfort,"Malkastenstr.,Louise-Dumont-Str.,Wielandstr.,C...",51.230667,6.791758
2,40211,Düsseldorf Stadtmitte,"Liesegangstr.,Leopoldstr.,Kölner Str.,Kurfürst...",51.2266,6.789675
3,40212,Düsseldorf Stadtmitte,"Wagnerstr.,Josephinenstr.,Königsallee,Königstr...",51.224387,6.78141
4,40213,Düsseldorf Altstadt,"Flinger Str.,Hunsrückenstr.,Altestadt,Andreass...",51.227703,6.773738


## Methodology

First all the needed libraries where imported:

In [5]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.18.1               |             py_0          51 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          84 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.49-py_0   conda-forge
    geopy:         1.18.1-py_0 conda-forge


Downloading and Extracting Packages
geopy-1.18.1         | 51 KB     | ##################################### | 100% 
geographiclib-1.49   | 32 KB     | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Solving environme

To center the Folium map on Düsseldorf, the coordinates where imported using the geopy package

In [6]:
address = 'Düsseldorf, DE'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Düsseldorf are {}, {}.'.format(latitude, longitude))

  This is separate from the ipykernel package so we can avoid doing imports until


The geograpical coordinate of Düsseldorf are 51.2384394, 6.79027031367694.


#### Create a map of Düsseldorf with Districts superimposed on top

Here the different Districts are displayed:

In [7]:
# create map of New York using latitude and longitude values
map_duesseldorf = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, dist, pc in zip(df_dus2['Latitude'], df_dus2['Longitude'], df_dus2['District'], df_dus2['PostalCode']):
    label = '{}, {}'.format(dist, pc)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_duesseldorf)  
    
map_duesseldorf

#### Define Foursquare Credentials and Version

Using the credential as in previous labs

In [8]:
CLIENT_ID = 'OK3AEGZMAH5Z0SEJER13ISCD421CYD4ME1ZFHAUXG0YNX525' # your Foursquare ID
CLIENT_SECRET = 'ZYG0IYC3CSGIPYK154RISAEGG3ZOWGVEWJ5M41VBDWFX5CLS' # your Foursquare Secret
VERSION = '20181227' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: OK3AEGZMAH5Z0SEJER13ISCD421CYD4ME1ZFHAUXG0YNX525
CLIENT_SECRET:ZYG0IYC3CSGIPYK154RISAEGG3ZOWGVEWJ5M41VBDWFX5CLS


### Function from k-clustering Lab

This function will help us to get the most nearby venues using the latitude and longitude specified per Neighborhood/District.

In [9]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

The limit of venues per coordinate was limited to 10. For the sake of clustering the districts, we will be only analyzing the first 10 venues per neighborhood.

In [59]:
LIMIT=10
radius=500
duesseldorf_venues = getNearbyVenues(names=df_dus2['Neighborhood'],
                                   latitudes=df_dus2['Latitude'],
                                   longitudes=df_dus2['Longitude']
                                  )

Oststr.,Steinstr.,Marienstr.,Platz der Deutschen Einheit,Stresemannplatz,Stresemannstr.,Worringer Str.,Harkortstr.,Worringer Platz,Bahnstr.,Immermannstr.,Kreuzstr.,Bendemannstr.,Bismarckstr.,Charlottenstr.,Karlstr.,Graf-Adolf-Str.,Grupellostr.,Friedrich-Ebert-Str.,Alexanderstr.,Konrad-Adenauer-Platz
Malkastenstr.,Louise-Dumont-Str.,Wielandstr.,Couvenstr.,Pempelforter Str.,Schinkelstr.,Schirmerstr.,Am Wehrhahn,Beuthstr.,Düsselthaler Str.,Adlerstr.
Liesegangstr.,Leopoldstr.,Kölner Str.,Kurfürstenstr.,Schützenstr.,Tonhallenstr.,Stephanienstr.,Klosterstr.,Gerresheimer Str.,August-Thyssen-Str.,Bleichstr.,Börnestr.,Gustaf-Gründgens-Platz,Hohenzollernstr.,Jacobistr.
Wagnerstr.,Josephinenstr.,Königsallee,Königstr.,Kö-Passage,Martin-Luther-Platz,Schadowstr.,Jan-Wellem-Platz,Berliner Allee,Schadowplatz,Blumenstr.,Ernst-Schneider-Platz,Grünstr.,Huschbergerstr.
Flinger Str.,Hunsrückenstr.,Altestadt,Andreasstr.,Hofgartenrampe,Grabbeplatz,Marktplatz,Mertensgasse,Marktstr.,Mühlengasse,Mühlenstr.,Müll

In [60]:
print(duesseldorf_venues.shape)

#Venues per Neighborhood
duesseldorf_venues.groupby('Neighborhood').count()

(745, 7)


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Am Backesberg,Fahneburgstr.",6,6,6,6,6,6
Am Gentenberg,1,1,1,1,1,1
Am Hasselberg,3,3,3,3,3,3
"An St. Lambertus,Auf der Bieth,Kleiansring,Oberdorfstr.,Kleianspatt,Unterdorfstr.,Viehstr.,Zeppenheimer Str.,Zum Hohen Bröhl,An der Alten Mühle,An der Anger,Am Frohnhof,Alte Kalkumer Str.,Am Flugfeld,Am Hüttenhof,Am Kleiansacker,Am Klompenkothen,Friedhofsweg,Edmund-Bertrams-Str.,Auf der Hofreith",2,2,2,2,2,2
"Bankstr.,Schwannstr.,Schwerinstr.,Zietenstr.,Kennedydamm,Mauerstr.,Rolandstr.,Fischerstr.,Hans-Böckler-Str.",10,10,10,10,10,10
"Barbarossawall,Burgallee,Dauzenbergstr.,Leinpfad,Klosekamp,Kreuzbergstr.,Mühlenweg,Kalkumer Schloßallee,Joseph-Brodmann-Str.,Im Luftfeld,Im Spich,Kaiserswerther Markt,Kesselsbergweg,Kittelbachstr.,Verweyenstr.,Walburgisstr.,Weg Nach den Hingbenden,Zeppenheimer Weg,Paul-Klee-Weg,Pfaffenmühlenweg,Plektrudisstr.,Rheinbrohler Weg,Schleifergasse,St.-Göres-Str.,Stiftsgasse,Stockhausgasse,Suitbertus-Stiftsplatz,Am Mühlenturm,Am Oberen Werth,Am Kreuzberg,Am Wiedenhof,Am Ritterskamp,Alte Landstr.,Am Fronberg,Egbertstr.,Fährerweg,Fliednerstr.,Annostr.,Friedrich-von-Spee-Str.,Gandersheimer Str.,An St. Swidbert",10,10,10,10,10,10
Brockenstr.,1,1,1,1,1,1
"Buddestr.,Maybachstr.,Liststr.",10,10,10,10,10,10
"Buscherhofstr.,Bamberger Str.",4,4,4,4,4,4
"Bückerbergweg,Am Scharfenstein,Gantenbergweg,In der Hött,Fleher Deich",3,3,3,3,3,3


### Neighborhood Analysis

In [61]:
# one hot encoding
duesseldorf_onehot = pd.get_dummies(duesseldorf_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
duesseldorf_onehot['zNeighborhood'] = duesseldorf_venues['Neighborhood'] 
duesseldorf_onehot.shape
# move neighborhood column to the first column
fixed_columns = [duesseldorf_onehot.columns[-1]] + list(duesseldorf_onehot.columns[:-1])
duesseldorf_onehot = duesseldorf_onehot[fixed_columns]
duesseldorf_onehot.shape

(745, 181)

In [62]:
duesseldorf_grouped = duesseldorf_onehot.groupby('zNeighborhood').mean().reset_index()
print(duesseldorf_grouped.shape)
duesseldorf_grouped.head()

(106, 181)


Unnamed: 0,zNeighborhood,Airport Terminal,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Workshop,BBQ Joint,Bakery,Bank,Bar,Beach,Beach Bar,Bed & Breakfast,Beer Garden,Beer Store,Bike Trail,Bistro,Bookstore,Boutique,Breakfast Spot,Brewery,Building,Burger Joint,Burrito Place,Bus Stop,Business Service,Butcher,Cafeteria,Café,Camera Store,Castle,Chinese Restaurant,Clothing Store,Club House,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cultural Center,Currywurst Joint,Dance Studio,Deli / Bodega,Dessert Shop,Dim Sum Restaurant,Diner,Doner Restaurant,Drugstore,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Service,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Financial or Legal Service,Flea Market,Flower Shop,Food & Drink Shop,Food Truck,Forest,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gym Pool,Hardware Store,Historic Site,Hobby Shop,Hockey Rink,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Insurance Office,Intersection,Italian Restaurant,Japanese Restaurant,Jewish Restaurant,Juice Bar,Kids Store,Korean Restaurant,Lake,Lawyer,Light Rail Station,Liquor Store,Market,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Mongolian Restaurant,Motorcycle Shop,Music Venue,Nightclub,Office,Opera House,Organic Grocery,Outdoor Sculpture,Palace,Park,Pet Service,Pet Store,Pharmacy,Photography Studio,Pizza Place,Playground,Plaza,Polish Restaurant,Pool,Portuguese Restaurant,Post Office,Pub,Racecourse,Ramen Restaurant,Residential Building (Apartment / Condo),Rest Area,Restaurant,River,Rock Club,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shop & Service,Slovak Restaurant,Smoke Shop,Snack Place,Soba Restaurant,Soccer Field,Soup Place,Souvlaki Shop,Spa,Spanish Restaurant,Sports Club,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Taverna,Tennis Court,Thai Restaurant,Theme Park Ride / Attraction,Trail,Train Station,Tram Station,Trattoria/Osteria,Tree,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Warehouse Store,Wine Bar,Yoga Studio,Zoo Exhibit
0,"Am Backesberg,Fahneburgstr.",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Am Gentenberg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Am Hasselberg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"An St. Lambertus,Auf der Bieth,Kleiansring,Obe...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bankstr.,Schwannstr.,Schwerinstr.,Zietenstr.,K...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.2,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0


In [63]:
num_top_venues = 5

for hood in duesseldorf_grouped['zNeighborhood']:
    print("----"+hood+"----")
    temp = duesseldorf_grouped[duesseldorf_grouped['zNeighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Am Backesberg,Fahneburgstr.----
                venue  freq
0               Hotel  0.17
1         Golf Course  0.17
2  Spanish Restaurant  0.17
3              Forest  0.17
4            Building  0.17


----Am Gentenberg----
                        venue  freq
0                  Playground   1.0
1            Airport Terminal   0.0
2           Outdoor Sculpture   0.0
3  Modern European Restaurant   0.0
4        Mongolian Restaurant   0.0


----Am Hasselberg----
                  venue  freq
0                 Trail  0.33
1           Beer Garden  0.33
2                Lawyer  0.33
3     Mobile Phone Shop  0.00
4  Mongolian Restaurant  0.00


----An St. Lambertus,Auf der Bieth,Kleiansring,Oberdorfstr.,Kleianspatt,Unterdorfstr.,Viehstr.,Zeppenheimer Str.,Zum Hohen Bröhl,An der Alten Mühle,An der Anger,Am Frohnhof,Alte Kalkumer Str.,Am Flugfeld,Am Hüttenhof,Am Kleiansacker,Am Klompenkothen,Friedhofsweg,Edmund-Bertrams-Str.,Auf der Hofreith----
                        venue  freq
0        

In [64]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [65]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = duesseldorf_grouped['zNeighborhood']

for ind in np.arange(duesseldorf_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(duesseldorf_grouped.iloc[ind, :], num_top_venues)

print(neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted.head()

(106, 11)


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Am Backesberg,Fahneburgstr.",Forest,Racecourse,Spanish Restaurant,Golf Course,Building,Hotel,Dessert Shop,Dim Sum Restaurant,Flower Shop,Flea Market
1,Am Gentenberg,Playground,Zoo Exhibit,Ethiopian Restaurant,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space
2,Am Hasselberg,Beer Garden,Lawyer,Trail,Zoo Exhibit,Event Space,Food & Drink Shop,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant
3,"An St. Lambertus,Auf der Bieth,Kleiansring,Obe...",Gastropub,German Restaurant,Zoo Exhibit,Event Service,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant,Farmers Market,Falafel Restaurant
4,"Bankstr.,Schwannstr.,Schwerinstr.,Zietenstr.,K...",Trattoria/Osteria,Turkish Restaurant,Thai Restaurant,German Restaurant,Portuguese Restaurant,Sushi Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,Burger Joint,Falafel Restaurant


### Cluster Neigborhoods - K-means

We will like to classify the neighborhoods into 5 cluster, to study their characteristics.

In [66]:
# set number of clusters
kclusters = 5

duesseldorf_grouped_clustering = duesseldorf_grouped.drop('zNeighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(duesseldorf_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([3, 2, 2, 1, 3, 3, 4, 3, 0, 3, 3, 3, 3, 3, 3, 0, 0, 3, 3, 3, 3, 3,
       3, 2, 0, 3, 3, 3, 3, 1, 3, 3, 2, 0, 3, 0, 3, 0, 3, 2, 3, 0, 3, 0,
       3, 0, 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 2, 3, 3, 3, 0, 3, 3, 3, 3, 3,
       3, 2, 3, 3, 2, 3, 0, 0, 3, 0, 3, 3, 3, 3, 0, 3, 0, 3, 3, 0, 0, 3,
       3, 2, 3, 0, 3, 3, 0, 3, 2, 3, 3, 3, 0, 2, 3, 3, 3, 3], dtype=int32)

In [67]:
duesseldorf_merged = df_dus2.sort_values('Neighborhood').reset_index(drop=True)

# add clustering labels
duesseldorf_merged['Cluster Labels'] = pd.Series(kmeans.labels_)

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
duesseldorf_merged = duesseldorf_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

duesseldorf_merged = duesseldorf_merged[:-2]
print(duesseldorf_merged.shape)
duesseldorf_merged.head()

(105, 16)


Unnamed: 0,PostalCode,District,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,40629,Düsseldorf Rath,"Am Backesberg,Fahneburgstr.",51.25517,6.84163,3.0,Forest,Racecourse,Spanish Restaurant,Golf Course,Building,Hotel,Dessert Shop,Dim Sum Restaurant,Flower Shop,Flea Market
1,40489,Düsseldorf Lohausen,Am Gentenberg,51.28838,6.72976,2.0,Playground,Zoo Exhibit,Ethiopian Restaurant,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space
2,40489,Düsseldorf,Am Hasselberg,51.33632,6.71606,2.0,Beer Garden,Lawyer,Trail,Zoo Exhibit,Event Space,Food & Drink Shop,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant
3,40489,Düsseldorf Kalkum,"An St. Lambertus,Auf der Bieth,Kleiansring,Obe...",51.302427,6.759643,1.0,Gastropub,German Restaurant,Zoo Exhibit,Event Service,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant,Farmers Market,Falafel Restaurant
4,40476,Düsseldorf Golzheim,"Bankstr.,Schwannstr.,Schwerinstr.,Zietenstr.,K...",51.24464,6.77551,3.0,Trattoria/Osteria,Turkish Restaurant,Thai Restaurant,German Restaurant,Portuguese Restaurant,Sushi Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,Burger Joint,Falafel Restaurant


In [68]:
duesseldorf_merged['Cluster Labels'] = [int(i) for i in duesseldorf_merged['Cluster Labels']]
duesseldorf_merged.head()

Unnamed: 0,PostalCode,District,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,40629,Düsseldorf Rath,"Am Backesberg,Fahneburgstr.",51.25517,6.84163,3,Forest,Racecourse,Spanish Restaurant,Golf Course,Building,Hotel,Dessert Shop,Dim Sum Restaurant,Flower Shop,Flea Market
1,40489,Düsseldorf Lohausen,Am Gentenberg,51.28838,6.72976,2,Playground,Zoo Exhibit,Ethiopian Restaurant,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space
2,40489,Düsseldorf,Am Hasselberg,51.33632,6.71606,2,Beer Garden,Lawyer,Trail,Zoo Exhibit,Event Space,Food & Drink Shop,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant
3,40489,Düsseldorf Kalkum,"An St. Lambertus,Auf der Bieth,Kleiansring,Obe...",51.302427,6.759643,1,Gastropub,German Restaurant,Zoo Exhibit,Event Service,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant,Farmers Market,Falafel Restaurant
4,40476,Düsseldorf Golzheim,"Bankstr.,Schwannstr.,Schwerinstr.,Zietenstr.,K...",51.24464,6.77551,3,Trattoria/Osteria,Turkish Restaurant,Thai Restaurant,German Restaurant,Portuguese Restaurant,Sushi Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,Burger Joint,Falafel Restaurant


## Results

Creating a map using Folium to see the spatial distribution of each cluster:

In [71]:
# create map
latitude=51.2384394
longitude=6.79027031367694
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, pc, cluster in zip(duesseldorf_merged['Latitude'], duesseldorf_merged['Longitude'], duesseldorf_merged['District'], duesseldorf_merged['PostalCode'], duesseldorf_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ', ' + str(pc) + ',' +' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In the map we see a predominance of points with color green. These points belong to the cluster  number 3. 
After this analysis we can clasify the neighborhoods in three main clusters:
- High density of venues (green) - cluster 3
- Medium density of venues (red) - cluster 0
- Lower density of venues (blue) - cluster 2

Grouping each main cluster to verify most common venues:

In [72]:
cluster_3 = duesseldorf_merged.loc[duesseldorf_merged['Cluster Labels'] == 3, duesseldorf_merged.columns[[0,1] + list(range(5, duesseldorf_merged.shape[1]))]]
print(cluster_3.shape)
cluster_3.head()

(69, 13)


Unnamed: 0,PostalCode,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,40629,Düsseldorf Rath,3,Forest,Racecourse,Spanish Restaurant,Golf Course,Building,Hotel,Dessert Shop,Dim Sum Restaurant,Flower Shop,Flea Market
4,40476,Düsseldorf Golzheim,3,Trattoria/Osteria,Turkish Restaurant,Thai Restaurant,German Restaurant,Portuguese Restaurant,Sushi Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,Burger Joint,Falafel Restaurant
5,40489,Düsseldorf Kaiserswerth,3,Plaza,German Restaurant,Breakfast Spot,Italian Restaurant,French Restaurant,Beer Garden,Currywurst Joint,Grocery Store,Gun Range,Flea Market
7,40470,Düsseldorf Düsseltal,3,Juice Bar,Gym / Fitness Center,Hotel,Fast Food Restaurant,Drugstore,Café,Italian Restaurant,Music Venue,Supermarket,Kids Store
9,40223,Düsseldorf Flehe,3,Soccer Field,Gym,Beach,Event Service,Food & Drink Shop,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant,Farmers Market


In [73]:
cluster_0 = duesseldorf_merged.loc[duesseldorf_merged['Cluster Labels'] == 0, duesseldorf_merged.columns[[0,1] + list(range(5, duesseldorf_merged.shape[1]))]]
print(cluster_0.shape)
cluster_0.head()

(21, 13)


Unnamed: 0,PostalCode,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,40599,Düsseldorf Benrath,0,Bus Stop,Supermarket,Liquor Store,Event Service,Zoo Exhibit,Falafel Restaurant,Food & Drink Shop,Flower Shop,Flea Market,Financial or Legal Service
15,40489,Düsseldorf Angermund,0,Light Rail Station,Supermarket,Trattoria/Osteria,Restaurant,Zoo Exhibit,Event Service,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant
16,40227,Düsseldorf Eller,0,Supermarket,Hotel,Light Rail Station,Zoo Exhibit,Event Service,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant,Farmers Market
24,40595,Düsseldorf Benrath,0,Supermarket,Forest,Light Rail Station,Zoo Exhibit,Food & Drink Shop,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant,Farmers Market
33,40489,Düsseldorf Wittlaer,0,Beach,Pharmacy,Bike Trail,Italian Restaurant,Supermarket,Zoo Exhibit,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant


In [74]:
cluster_2 = duesseldorf_merged.loc[duesseldorf_merged['Cluster Labels'] == 2, duesseldorf_merged.columns[[0,1] + list(range(5, duesseldorf_merged.shape[1]))]]
print(cluster_2.shape)
cluster_2.head()

(12, 13)


Unnamed: 0,PostalCode,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,40489,Düsseldorf Lohausen,2,Playground,Zoo Exhibit,Ethiopian Restaurant,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space
2,40489,Düsseldorf,2,Beer Garden,Lawyer,Trail,Zoo Exhibit,Event Space,Food & Drink Shop,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant
23,40597,Düsseldorf Hassels,2,Playground,Bus Stop,Intersection,Greek Restaurant,Zoo Exhibit,Event Service,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant
32,40589,Düsseldorf Flehe,2,Bus Stop,Fountain,Forest,Food & Drink Shop,Flower Shop,Flea Market,Financial or Legal Service,Fast Food Restaurant,Farmers Market,Falafel Restaurant
39,40237,Düsseldorf Düsseltal,2,Breakfast Spot,Brewery,Business Service,Playground,Gym / Fitness Center,Event Service,Flower Shop,Flea Market,Hardware Store,Financial or Legal Service


## Discussion

After running the analysis, it was very interesting to see the algorithm clasify the neighborhoods in a similar way to what the general knowledge is. 
The areas with a green marker are precisely the most populated, expensive ones and mainly very close to the city center. It was great to see that this family now counts with additional information about other neighborhoods in Düsseldorf. Now they can decide to move e.g. from Bilk to Garath with similar number of venues and types (supermarkets and restaurants). 

## Conclusion

This kind of analysis proved to be very useful for the decision making process of moving to a new neighborhood. Normally, families employ weeks or maybe months to get to similar results. They first use "analog" techniques like buying a city map and start marking which neighborhoods they kind visit and expending weekends doing so.

Here machine learning proved to be a very useful tool to very easily classify neighborhoods. 

A future use of the present study will be to match this information with house prices to create a recommendation engine to optimize the buying of a property.