# IBM Data Science Capstone Project Week 4 + 5

## Optimal location for a Sushi restaurant near subway station in Stuttgart

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a Sushi restaurant. 

Stuttgart is a beautiful city, but like all the other big cities, it has frequent traffic jams. Fortunately it good subway system to connect all parts of cities. Sushi is very delicious food liked by young people, who also like to travel with public traffic like subway. Therefore in project we will use subway station instead of neighbourhood to explore the restaurants.

The first criteria for a good location of restaurant should be number of people who is passing by. In case of Stuttgart is should be near the city center. The second criteria should be the number of existing restaurants, the best location has been choosen thousand times by other people, we can trust it is a good location if there are many restaurants.
But we can avoid Sushi restaurant. We don't want to open a Sushi restaurant near a existing one. This is then the third criteria.

We will use data analysis and machine learning methodology to explore and find out the possible location, which meet the criteria we defined.

## Data <a name="data"></a>

We need the following data for our problem:
* the location of all subway stations in stuttgart, this can be found in Wikipedia by using Web Scraping of HTML 
* total number of existing restaurants near subway station, it will be obtained using **Foursquare API**
* total number of existing Sushi restaurants near subway station, it can also be find out using **Foursquare API**
* distance of subway from city center, this can be calculated using the infomation of subway station from the Wikipedia




#### Web Scraping of the location data of Stuttgart subway station from Wikipedia

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis

from bs4 import BeautifulSoup 

import requests # library to handle requests

In [2]:
#wikipedia page of subway stations
url = "https://de.wikipedia.org/wiki/Liste_der_Haltestellen_der_Stadtbahn_Stuttgart"
data = requests.get(url).text
soup = BeautifulSoup(data,"html5lib")   

In [3]:

table_contents=[] # define empty date
table=soup.find('table') # find table
table_body = table.find('tbody')

rows = table_body.find_all('tr') 
#get column name from the first row
column = rows[0] #get column
column_name = column.find_all('th')
column_name = [ele.text.strip() for ele in column_name]
# get the content of the table
for row in rows[1:]:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    table_contents.append(cols) 

# create pandas dataframe
dataframe= pd.DataFrame(table_contents)
# change the column name
dataframe.columns = column_name
dataframe.head(10)

Unnamed: 0,Name,Linie,Eröffnung,Lage,Stadt,Stadtteil (Stadtbezirk),Anmerkungen,Sehenswürdigkeiten /\nBauwerke,Bild
0,"Antwerpener Straße48° 48′ 30″ N, 9° 14′ 35″ O",U001 11 U016 1616,49860419♠19. Apr. 1986 *,Oberirdisch,Stuttgart,Espan (Bad Cannstatt),Ehemals Friedrich-List-Heim. Erschließt auch d...,,
1,"Arndt-/Spittastraße48° 46′ 30″ N, 9° 9′ 3″ O",U002 22 U009 2929 U014 3434,49890930♠30. Sep. 1989 *,Oberirdisch,Stuttgart,Vogelsang (West),,,
2,"Augsburger Platz48° 48′ 21″ N, 9° 13′ 51″ O",U001 11 U016 1616,49860419♠19. Apr. 1986 *,Oberirdisch,Stuttgart,Vorstadt (Bad Cannstatt),Am Augsburger Platz treffen sich die Grenzen d...,"Sankt-Anna-Klinik, Kurpark",
3,"Auwiesen48° 50′ 22″ N, 9° 13′ 30″ O",U012 1212 U014 1414,49860712♠12. Jul. 1986 *,Oberirdisch,Stuttgart,Mühlhausen (Mühlhausen),,"Kirche Sankt Barbara, Burg Hofen (Ruine)",
4,"Bad Cannstatt Wilhelmsplatz48° 48′ 10″ N, 9° 1...",U001 11 U002 22 U019 1919,49860419♠19. Apr. 1986 *,Oberirdisch,Stuttgart,Cannstatt-Mitte (Bad Cannstatt),,"Historische Altstadt, Stadtmauer",
5,"Bad Cannstatt Wilhelmsplatz48° 48′ 12″ N, 9° 1...",U013 1313 U016 1616,49970913♠13. Sep. 1997 *,Oberirdisch,Stuttgart,Cannstatt-Mitte (Bad Cannstatt),Diese Haltestelle erhielt als letzte im gesamt...,,
6,"Beethovenstraße48° 46′ 38″ N, 9° 8′ 0″ O",U002 22 U009 2929,49940924♠24. Sep. 1994,Oberirdisch,Stuttgart,Botnang-Ost (Botnang),,,
7,"Bergfriedhof (Merz-Akademie)48° 47′ 21″ N, 9° ...",U004 44 U009 99,49890930♠30. Sep. 1989 *,Oberirdisch,Stuttgart,Ostheim (Ost),Die Hochbahnsteige stehen versetzt zueinander.,"Bergfriedhof, Hauptzollamt",
8,"Bergheimer Hof48° 48′ 15″ N, 9° 6′ 3″ O",U006 66 U016 1616,49920926♠26. Sep. 1992 *,Oberirdisch,Stuttgart,Bergheim (Weilimdorf),,,
9,"Berliner Platz (Hohe Straße)48° 46′ 38″ N, 9° ...",U002 22 U004 44 U014 1414 U014 3434,49860712♠12. Jul. 1986 *,Oberirdisch,Stuttgart,Neue Vorstadt (Mitte),Ehemals Berliner Platz (Fritz-Elsas-Straße).,"Kultur- und Kongresszentrum Liederhalle, Hoppe...",


In [4]:
#choose only station within Stuttgart
index = dataframe[dataframe['Stadt']!='Stuttgart'].index
dataframe.drop(index , inplace=True)
dataframe.head(10)

Unnamed: 0,Name,Linie,Eröffnung,Lage,Stadt,Stadtteil (Stadtbezirk),Anmerkungen,Sehenswürdigkeiten /\nBauwerke,Bild
0,"Antwerpener Straße48° 48′ 30″ N, 9° 14′ 35″ O",U001 11 U016 1616,49860419♠19. Apr. 1986 *,Oberirdisch,Stuttgart,Espan (Bad Cannstatt),Ehemals Friedrich-List-Heim. Erschließt auch d...,,
1,"Arndt-/Spittastraße48° 46′ 30″ N, 9° 9′ 3″ O",U002 22 U009 2929 U014 3434,49890930♠30. Sep. 1989 *,Oberirdisch,Stuttgart,Vogelsang (West),,,
2,"Augsburger Platz48° 48′ 21″ N, 9° 13′ 51″ O",U001 11 U016 1616,49860419♠19. Apr. 1986 *,Oberirdisch,Stuttgart,Vorstadt (Bad Cannstatt),Am Augsburger Platz treffen sich die Grenzen d...,"Sankt-Anna-Klinik, Kurpark",
3,"Auwiesen48° 50′ 22″ N, 9° 13′ 30″ O",U012 1212 U014 1414,49860712♠12. Jul. 1986 *,Oberirdisch,Stuttgart,Mühlhausen (Mühlhausen),,"Kirche Sankt Barbara, Burg Hofen (Ruine)",
4,"Bad Cannstatt Wilhelmsplatz48° 48′ 10″ N, 9° 1...",U001 11 U002 22 U019 1919,49860419♠19. Apr. 1986 *,Oberirdisch,Stuttgart,Cannstatt-Mitte (Bad Cannstatt),,"Historische Altstadt, Stadtmauer",
5,"Bad Cannstatt Wilhelmsplatz48° 48′ 12″ N, 9° 1...",U013 1313 U016 1616,49970913♠13. Sep. 1997 *,Oberirdisch,Stuttgart,Cannstatt-Mitte (Bad Cannstatt),Diese Haltestelle erhielt als letzte im gesamt...,,
6,"Beethovenstraße48° 46′ 38″ N, 9° 8′ 0″ O",U002 22 U009 2929,49940924♠24. Sep. 1994,Oberirdisch,Stuttgart,Botnang-Ost (Botnang),,,
7,"Bergfriedhof (Merz-Akademie)48° 47′ 21″ N, 9° ...",U004 44 U009 99,49890930♠30. Sep. 1989 *,Oberirdisch,Stuttgart,Ostheim (Ost),Die Hochbahnsteige stehen versetzt zueinander.,"Bergfriedhof, Hauptzollamt",
8,"Bergheimer Hof48° 48′ 15″ N, 9° 6′ 3″ O",U006 66 U016 1616,49920926♠26. Sep. 1992 *,Oberirdisch,Stuttgart,Bergheim (Weilimdorf),,,
9,"Berliner Platz (Hohe Straße)48° 46′ 38″ N, 9° ...",U002 22 U004 44 U014 1414 U014 3434,49860712♠12. Jul. 1986 *,Oberirdisch,Stuttgart,Neue Vorstadt (Mitte),Ehemals Berliner Platz (Fritz-Elsas-Straße).,"Kultur- und Kongresszentrum Liederhalle, Hoppe...",


In [5]:
#funtion to coversion of coordination
def converseCoordinate(degree,minute,second):
    coordDec = degree + minute/60 + second/3600
    return coordDec

In [6]:
# define the dataframe columns
column_names = ['Subway Station', 'Line', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
subway_station = pd.DataFrame(columns=column_names)

In [7]:
# read and prepare the information of subway station from the row data
for index, rows in dataframe.iterrows():
    #cell['Borough'] = (row.span.text).split('(')[0]
    #cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
    stationID = rows['Name'][0: rows['Name'].find('48')]
    coordinate = rows['Name'][rows['Name'].find('48'):]
    coordinate_lat = coordinate.split('N,')[0]
    coordinate_lon = coordinate.split('N,')[1]
    #print(coordinate_lat)
    #print(coordinate_lon)
    minute_lat = int(coordinate_lat[coordinate_lat.find('°')+1:coordinate_lat.find('′')])
    second_lat = int(coordinate_lat[coordinate_lat.find('′')+1:coordinate_lat.find('″')])
    minute_lon = int(coordinate_lon[coordinate_lat.find('°')+1:coordinate_lon.find('′')])
    second_lon = int(coordinate_lon[coordinate_lat.find('′')+1:coordinate_lon.find('″')])
    station_lat = converseCoordinate(48,minute_lat,second_lat)
    station_lon = converseCoordinate(9,minute_lon,second_lon)
    #print(station_lat)
    #print(station_lon)
    lineID = [i.replace('0','') for i in rows['Linie'].split(' ') if i.find('U')!=-1]
    #print(lineID)
    neighborhood = rows['Stadtteil (Stadtbezirk)'][rows['Stadtteil (Stadtbezirk)'].find('(')+1:rows['Stadtteil (Stadtbezirk)'].find(')')]
    #print(neighborhood)

    subway_station = subway_station.append({'Subway Station': stationID,
                                            'Line': lineID,
                                            'Neighborhood': neighborhood,
                                            'Latitude': station_lat,
                                            'Longitude': station_lon}, ignore_index=True)

In [8]:
subway_station.shape

(188, 5)

In [10]:
subway_station.head(10)

Unnamed: 0,Subway Station,Line,Neighborhood,Latitude,Longitude
0,Antwerpener Straße,"[U1, U16]",Bad Cannstatt,48.808333,9.243056
1,Arndt-/Spittastraße,"[U2, U9, U14]",West,48.775,9.150833
2,Augsburger Platz,"[U1, U16]",Bad Cannstatt,48.805833,9.230833
3,Auwiesen,"[U12, U14]",Mühlhausen,48.839444,9.225
4,Bad Cannstatt Wilhelmsplatz,"[U1, U2, U19]",Bad Cannstatt,48.802778,9.215833
5,Bad Cannstatt Wilhelmsplatz,"[U13, U16]",Bad Cannstatt,48.803333,9.215556
6,Beethovenstraße,"[U2, U9]",Botnang,48.777222,9.133333
7,Bergfriedhof (Merz-Akademie),"[U4, U9]",Ost,48.789167,9.205556
8,Bergheimer Hof,"[U6, U16]",Weilimdorf,48.804167,9.100833
9,Berliner Platz (Hohe Straße),"[U2, U4, U14, U14]",Mitte,48.777222,9.168611


## Methodology <a name="methodology"></a>

In this project we will explore the restaurant near each subway stations with Foursquare API. And visualize the location with Folium map.

we will count the number of total restaurant and the total Sushi restaurants near the subway station. 

The Distance of the subway station to the city center will be calculated and considered in the analysis.

we will use K means to cluster the subway station base on the all information above. The classification will be visualized the city map.

Base on the classification we can find a group of candidate of subway station which we can open a Sushi restaurant near them.


## Analysis <a name="analysis"></a>

Import the libaries

In [11]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
from geopy.geocoders import Nominatim

<h4>Getting Stuttgart Coordinates from Geopy library</h4>

In [12]:
address = 'Stuttgart, Germany'

geolocator = Nominatim(user_agent="can_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Stuttgart are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Stuttgart are 48.7784485, 9.1800132.


Visualize the location of the Subway Stations

In [13]:
# create map of New York using latitude and longitude values
map_stuttgart = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, station, neighborhood in zip(subway_station['Latitude'], subway_station['Longitude'], subway_station['Subway Station'], subway_station['Neighborhood']):
    label = '{}, {}'.format(station, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_stuttgart)  
    
map_stuttgart

In [14]:
CLIENT_ID = 'UCAD22UHIEXLJT5GFMM1PLTLT1IF0WP1DVVYQU21DQQYJ5FW' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: UCAD22UHIEXLJT5GFMM1PLTLT1IF0WP1DVVYQU21DQQYJ5FW
CLIENT_SECRET:YGTTPPE5VBXUNUMNFNAROUT4HPCTD0SH4JIJFDE0ZS3KB3YL


#### Creating a function to find vanues of all the subway station within 300 meter

In [15]:
def getNearbyVenues(names, latitudes, longitudes, radius=300):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            100)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Subway Station', 
                  'Subway Station Latitude', 
                  'Subway Station Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [16]:
stuttgart_venues = getNearbyVenues(names=subway_station['Subway Station'],
                                   latitudes=subway_station['Latitude'],
                                   longitudes=subway_station['Longitude'],
                                  )

Antwerpener Straße
Arndt-/Spittastraße
Augsburger Platz
Auwiesen
Bad Cannstatt Wilhelmsplatz
Bad Cannstatt Wilhelmsplatz
Beethovenstraße
Bergfriedhof (Merz-Akademie)
Bergheimer Hof
Berliner Platz (Hohe Straße)
Berliner Platz (Liederhalle)
Berliner Platz (Liederhalle)
Beskidenstraße
Bihlplatz
Blick
Bopser
Borsigstraße
Botnang
Bottroper Straße (BiL-Schulen)
Börsenplatz (L-Bank)
Brendle (Großmarkt)
Bubenbad (IB-Jugendgästehaus)
Budapester Platz
Cannstatter Wasen
Charlottenplatz (Ebene -1)
Charlottenplatz (Ebene -2)
Daimlerplatz
Degerloch
Degerloch Albstraße
Dobelstraße
Dürrlewang
Ebitzweg
Elbestraße
Eltinger Straße
EnBW-City
Engelboldstraße
Erwin-Schoettle-Platz
(Marienhospital)
Eszet
Eugensplatz (Jugendherberge)
Europaplatz
Fasanenhof (Bonhoefferkirche)
Fasanenhof Schelmenwasen
Fauststraße
Feuerbach Bahnhof
Feuerbach Pfostenwäldle (AWO)
Föhrich
Freiberg
Freibergstraße
Friedrichswahl
Fürfelder Straße
Gaisburg
Geroksruhe
Giebel
Glockenstraße (Mahle)
Gnesener Straße
Hallschlag
Hauptbahnhof 

#### Let's check how many venues were returned for each subway station

In [17]:
stuttgart_venues.groupby('Subway Station').count()

Unnamed: 0_level_0,Subway Station Latitude,Subway Station Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Subway Station,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Antwerpener Straße,3,3,3,3,3,3
Arndt-/Spittastraße,7,7,7,7,7,7
Augsburger Platz,5,5,5,5,5,5
Auwiesen,5,5,5,5,5,5
Bad Cannstatt Wilhelmsplatz,47,47,47,47,47,47
...,...,...,...,...,...,...
Wolfbusch,2,2,2,2,2,2
Zuffenhausen Kelterplatz,9,9,9,9,9,9
Zuffenhausen Rathaus,13,13,13,13,13,13
Züricher Straße,4,4,4,4,4,4


In [19]:
stuttgart_venues.head(10)

Unnamed: 0,Subway Station,Subway Station Latitude,Subway Station Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Antwerpener Straße,48.808333,9.243056,U Antwerpener Straße,48.808411,9.242957,Metro Station
1,Antwerpener Straße,48.808333,9.243056,Melirrytom,48.808361,9.243657,Pastry Shop
2,Antwerpener Straße,48.808333,9.243056,Creative Logo,48.80761,9.240681,Advertising Agency
3,Arndt-/Spittastraße,48.775,9.150833,Meister Lampe,48.774808,9.149525,Dessert Shop
4,Arndt-/Spittastraße,48.775,9.150833,tarte & törtchen,48.773773,9.153773,Dessert Shop
5,Arndt-/Spittastraße,48.775,9.150833,Er Vaquita,48.773664,9.154185,Spanish Restaurant
6,Arndt-/Spittastraße,48.775,9.150833,Taverna Yol,48.775071,9.151474,Turkish Restaurant
7,Arndt-/Spittastraße,48.775,9.150833,West Pizza & Kebap,48.77493,9.15234,Kebab Restaurant
8,Arndt-/Spittastraße,48.775,9.150833,Café Seyffer's,48.773952,9.152532,Café
9,Arndt-/Spittastraße,48.775,9.150833,Restaurant Rösch,48.774791,9.147877,German Restaurant


Let's find all restaurant 

In [20]:
stuttgart_restaurant = stuttgart_venues[stuttgart_venues['Venue Category'].map(lambda x: x.find('Restaurant')!=-1)]

Let's remove all kind of fast food restaurant

In [21]:
for index, rows in stuttgart_restaurant.iterrows():
    if rows['Venue Category'] in ['Fast Food Restaurant','Comfort Food Restaurant','Doner Restaurant','Kebab Restaurant']:
        stuttgart_restaurant.drop(index, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


In [22]:
stuttgart_restaurant.reset_index(inplace=True)

In [23]:
stuttgart_restaurant.head(10)

Unnamed: 0,index,Subway Station,Subway Station Latitude,Subway Station Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,5,Arndt-/Spittastraße,48.775,9.150833,Er Vaquita,48.773664,9.154185,Spanish Restaurant
1,6,Arndt-/Spittastraße,48.775,9.150833,Taverna Yol,48.775071,9.151474,Turkish Restaurant
2,9,Arndt-/Spittastraße,48.775,9.150833,Restaurant Rösch,48.774791,9.147877,German Restaurant
3,11,Augsburger Platz,48.805833,9.230833,Meze Meze Taverna,48.806317,9.231987,Greek Restaurant
4,21,Bad Cannstatt Wilhelmsplatz,48.802778,9.215833,sushi le,48.804388,9.216886,Sushi Restaurant
5,26,Bad Cannstatt Wilhelmsplatz,48.802778,9.215833,87 Restaurant & Eventlocation,48.800681,9.213318,Restaurant
6,32,Bad Cannstatt Wilhelmsplatz,48.802778,9.215833,Divan Döner,48.803255,9.21701,Middle Eastern Restaurant
7,33,Bad Cannstatt Wilhelmsplatz,48.802778,9.215833,Jakobsbrunnen,48.804341,9.216276,Greek Restaurant
8,34,Bad Cannstatt Wilhelmsplatz,48.802778,9.215833,serhat döner kebap,48.803436,9.215685,Restaurant
9,37,Bad Cannstatt Wilhelmsplatz,48.802778,9.215833,Viet Long Asia Wok,48.804285,9.216936,Thai Restaurant


Let's check how many restaurant are there

In [24]:
stuttgart_restaurant.groupby('Subway Station').count()

Unnamed: 0_level_0,index,Subway Station Latitude,Subway Station Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Subway Station,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Arndt-/Spittastraße,3,3,3,3,3,3,3
Augsburger Platz,1,1,1,1,1,1,1
Bad Cannstatt Wilhelmsplatz,15,15,15,15,15,15,15
Beethovenstraße,1,1,1,1,1,1,1
Berliner Platz (Hohe Straße),6,6,6,6,6,6,6
...,...,...,...,...,...,...,...
Wilhelm-Geiger-Platz,6,6,6,6,6,6,6
Wilhelma,1,1,1,1,1,1,1
Zuffenhausen Kelterplatz,1,1,1,1,1,1,1
Zuffenhausen Rathaus,3,3,3,3,3,3,3


Let's create a new dataframe with the total number of reataurant

In [86]:
station_summary = stuttgart_restaurant.groupby('Subway Station')['Venue Category'].count().reset_index()

In [87]:
station_summary.rename(columns = {'Venue Category':'Total restaurant'}, inplace = True)

In [88]:
station_summary.head(10)

Unnamed: 0,Subway Station,Total restaurant
0,Arndt-/Spittastraße,3
1,Augsburger Platz,1
2,Bad Cannstatt Wilhelmsplatz,15
3,Beethovenstraße,1
4,Berliner Platz (Hohe Straße),6
5,Berliner Platz (Liederhalle),11
6,Blick,1
7,Botnang,1
8,Bubenbad (IB-Jugendgästehaus),1
9,Budapester Platz,1


Let' check all Sushi restaurant

In [89]:
sushi_restaurant = stuttgart_restaurant[stuttgart_restaurant['Venue Category']=='Sushi Restaurant']

In [90]:
sushi_restaurant.head(10)

Unnamed: 0,index,Subway Station,Subway Station Latitude,Subway Station Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
4,21,Bad Cannstatt Wilhelmsplatz,48.802778,9.215833,sushi le,48.804388,9.216886,Sushi Restaurant
12,44,Bad Cannstatt Wilhelmsplatz,48.803333,9.215556,sushi le,48.804388,9.216886,Sushi Restaurant
44,211,Budapester Platz,48.790556,9.185833,Tokyo,48.790902,9.183083,Sushi Restaurant
49,233,Charlottenplatz (Ebene -1),48.775833,9.182778,QQ Sushi Lounge,48.775175,9.183356,Sushi Restaurant
52,241,Charlottenplatz (Ebene -1),48.775833,9.182778,wip sushi lounge,48.774476,9.182102,Sushi Restaurant
60,264,Charlottenplatz (Ebene -2),48.776667,9.183056,QQ Sushi Lounge,48.775175,9.183356,Sushi Restaurant
64,275,Charlottenplatz (Ebene -2),48.776667,9.183056,wip sushi lounge,48.774476,9.182102,Sushi Restaurant
70,291,Daimlerplatz,48.805278,9.218889,sushi le,48.804388,9.216886,Sushi Restaurant
149,754,Olgaeck,48.773889,9.186111,QQ Sushi Lounge,48.775175,9.183356,Sushi Restaurant
159,784,Österreichischer Platz (WGV-Versicherungen),48.77,9.175556,Origami,48.77025,9.176105,Sushi Restaurant


Let' calculate the total number of Sushi restaurant

In [91]:
sushi_restaurant_count = sushi_restaurant.groupby('Subway Station')['Venue Category'].count().reset_index()

In [92]:
sushi_restaurant_count.rename(columns = {'Venue Category':'Total Sushi restaurant'}, inplace = True)

In [93]:
sushi_restaurant_count.head()

Unnamed: 0,Subway Station,Total Sushi restaurant
0,Bad Cannstatt Wilhelmsplatz,2
1,Budapester Platz,1
2,Charlottenplatz (Ebene -1),2
3,Charlottenplatz (Ebene -2),2
4,Daimlerplatz,1


Add the Sushi restaurant infomation into the data frame 

In [94]:
station_summary = station_summary.merge(sushi_restaurant_count, how='outer',on='Subway Station')

In [95]:
station_summary = station_summary.fillna(0)

Let find the location of the subway, we will use mean value in case there are many exit of one station

In [96]:
station_location = stuttgart_restaurant.groupby('Subway Station')['Subway Station Latitude','Subway Station Longitude'].mean().reset_index()

  """Entry point for launching an IPython kernel.


In [97]:
station_location.head(10)

Unnamed: 0,Subway Station,Subway Station Latitude,Subway Station Longitude
0,Arndt-/Spittastraße,48.775,9.150833
1,Augsburger Platz,48.805833,9.230833
2,Bad Cannstatt Wilhelmsplatz,48.803037,9.215704
3,Beethovenstraße,48.777222,9.133333
4,Berliner Platz (Hohe Straße),48.777222,9.168611
5,Berliner Platz (Liederhalle),48.778611,9.168131
6,Blick,48.793333,9.242222
7,Botnang,48.778611,9.122222
8,Bubenbad (IB-Jugendgästehaus),48.773333,9.194722
9,Budapester Platz,48.790556,9.185833


We will define function to caloulate the distance between subway station and city center 

In [None]:
#!pip install pyproj
import pyproj
import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

Let's calculate the distance between each subway station and city center 

In [49]:
distance_center = pd.DataFrame(columns= ['Distance'])

In [None]:
for index, rows in station_location.iterrows():
    x1, y1 = lonlat_to_xy(rows['Subway Station Latitude'], rows['Subway Station Longitude'])
    x2, y2 = lonlat_to_xy(latitude,longitude)
    distance = str(calc_xy_distance(x1,y1,x2,y2))
    print(distance)
    
    distance_center = distance_center.append({'Distance': distance}, ignore_index=True)

In [51]:
distance_center.head()

Unnamed: 0,Distance
0,3891.2501100861095
1,7635.814913017591
2,5729.130893618349
3,6184.7048814929085
4,1518.7378735567436


In [98]:
station_summary.insert(3,column='Distance',value=distance_center)

In [99]:
station_summary.head(10)

Unnamed: 0,Subway Station,Total restaurant,Total Sushi restaurant,Distance
0,Arndt-/Spittastraße,3,0.0,3891.2501100861095
1,Augsburger Platz,1,0.0,7635.814913017591
2,Bad Cannstatt Wilhelmsplatz,15,2.0,5729.130893618349
3,Beethovenstraße,1,0.0,6184.7048814929085
4,Berliner Platz (Hohe Straße),6,0.0,1518.7378735567436
5,Berliner Platz (Liederhalle),11,0.0,1573.8432885647078
6,Blick,1,0.0,8469.252067060806
7,Botnang,1,0.0,7654.370042621837
8,Bubenbad (IB-Jugendgästehaus),1,0.0,2061.073677377733
9,Budapester Platz,1,1.0,1770.1359677851751


In [100]:
station_summary.shape

(104, 4)

Let normalize the summary of the restaurant and subway station distance infomation 

In [101]:
import pandas as pd
from sklearn import preprocessing
station_summary_norm = station_summary.drop('Subway Station',1)
x = station_summary_norm.values #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
station_summary_norm = pd.DataFrame(x_scaled)

In [102]:
station_summary_norm.head()

Unnamed: 0,0,1,2
0,0.111111,0.0,0.334357
1,0.0,0.0,0.669888
2,0.777778,1.0,0.49904
3,0.0,0.0,0.539861
4,0.277778,0.0,0.121768


#### Cluster Neighborhoods
Run *k*-means to cluster the neighborhood into 6 clusters.

In [103]:
# set number of clusters
kclusters = 6

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(station_summary_norm)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 3, 4, 3, 5, 5, 0, 3, 5, 2], dtype=int32)

Let's merge the restaurant and location infomation together

In [104]:
station_summary = station_summary.merge(station_location, on='Subway Station')

station_summary = station_summary.reset_index(drop=True)

station_summary

Unnamed: 0,Subway Station,Total restaurant,Total Sushi restaurant,Distance,Subway Station Latitude,Subway Station Longitude
0,Arndt-/Spittastraße,3,0.0,3891.2501100861095,48.775000,9.150833
1,Augsburger Platz,1,0.0,7635.814913017591,48.805833,9.230833
2,Bad Cannstatt Wilhelmsplatz,15,2.0,5729.130893618349,48.803037,9.215704
3,Beethovenstraße,1,0.0,6184.7048814929085,48.777222,9.133333
4,Berliner Platz (Hohe Straße),6,0.0,1518.7378735567436,48.777222,9.168611
...,...,...,...,...,...,...
99,Wilhelm-Geiger-Platz,6,0.0,5191.492881321446,48.811667,9.158889
100,Wilhelma,1,0.0,5000.950811129121,48.803889,9.208056
101,Zuffenhausen Kelterplatz,1,0.0,6769.738400755982,48.829722,9.176111
102,Zuffenhausen Rathaus,3,0.0,6931.840353161858,48.830833,9.174722


In [105]:
station_summary.insert(1, 'Cluster Labels', kmeans.labels_)

station_summary

Unnamed: 0,Subway Station,Cluster Labels,Total restaurant,Total Sushi restaurant,Distance,Subway Station Latitude,Subway Station Longitude
0,Arndt-/Spittastraße,1,3,0.0,3891.2501100861095,48.775000,9.150833
1,Augsburger Platz,3,1,0.0,7635.814913017591,48.805833,9.230833
2,Bad Cannstatt Wilhelmsplatz,4,15,2.0,5729.130893618349,48.803037,9.215704
3,Beethovenstraße,3,1,0.0,6184.7048814929085,48.777222,9.133333
4,Berliner Platz (Hohe Straße),5,6,0.0,1518.7378735567436,48.777222,9.168611
...,...,...,...,...,...,...,...
99,Wilhelm-Geiger-Platz,1,6,0.0,5191.492881321446,48.811667,9.158889
100,Wilhelma,1,1,0.0,5000.950811129121,48.803889,9.208056
101,Zuffenhausen Kelterplatz,3,1,0.0,6769.738400755982,48.829722,9.176111
102,Zuffenhausen Rathaus,3,3,0.0,6931.840353161858,48.830833,9.174722


Finally, let's visualize the resulting clusters

In [120]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(station_summary['Subway Station Latitude'], station_summary['Subway Station Longitude'], station_summary['Subway Station'], station_summary['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters
Now, we can explore each cluster. 

In [130]:
station_summary.loc[station_summary['Cluster Labels'] == 0, station_summary.columns[[0] + list(range(2, station_summary.shape[1]))]]

Unnamed: 0,Subway Station,Total restaurant,Total Sushi restaurant,Distance,Subway Station Latitude,Subway Station Longitude
6,Blick,1,0.0,8469.252067060806,48.793333,9.242222
15,Dürrlewang,1,0.0,11319.90646658139,48.719167,9.118056
17,Engelboldstraße,1,0.0,8739.024105833638,48.7375,9.128056
20,Europaplatz,1,0.0,9096.233444334495,48.712222,9.160278
21,Fasanenhof Schelmenwasen,2,0.0,9386.095253041902,48.707778,9.170278
22,Fauststraße,1,0.0,10532.071755697203,48.731944,9.115278
28,Hedelfingen,1,0.0,10115.279810353206,48.76,9.254167
29,Heumaden,1,0.0,8895.472911339675,48.738056,9.233889
31,Hofen,1,0.0,9457.525977991196,48.836389,9.222222
34,Jurastraße,1,0.0,10566.695230123936,48.728889,9.117222


In [131]:
station_summary.loc[station_summary['Cluster Labels'] == 1, station_summary.columns[[0] + list(range(2, station_summary.shape[1]))]]

Unnamed: 0,Subway Station,Total restaurant,Total Sushi restaurant,Distance,Subway Station Latitude,Subway Station Longitude
0,Arndt-/Spittastraße,3,0.0,3891.2501100861095,48.775,9.150833
14,Degerloch,3,0.0,4159.278159173068,48.748889,9.168889
18,Erwin-Schoettle-Platz\n(Marienhospital),2,0.0,3358.229092155877,48.762778,9.16
23,Feuerbach Bahnhof,2,0.0,4880.416918951074,48.813611,9.168333
26,Glockenstraße (Mahle),1,0.0,5042.103780061536,48.808056,9.204167
36,Killesberg,1,0.0,2976.5218301473747,48.799444,9.171667
43,Löwentor (SV SparkassenVersicherung),1,0.0,4150.119417878625,48.808333,9.19
46,Maybachstraße,2,0.0,4315.29898130593,48.811111,9.177222
47,Mercedesstraße,1,0.0,5062.0902793368,48.8,9.211667
48,Metzstraße (Südwestrundfunk),1,0.0,3269.467150234373,48.792222,9.200556


In [132]:
station_summary.loc[station_summary['Cluster Labels'] == 2, station_summary.columns[[0] + list(range(2, station_summary.shape[1]))]]

Unnamed: 0,Subway Station,Total restaurant,Total Sushi restaurant,Distance,Subway Station Latitude,Subway Station Longitude
9,Budapester Platz,1,1.0,1770.1359677851751,48.790556,9.185833
13,Daimlerplatz,6,1.0,6243.931107911371,48.805278,9.218889
58,Olgaeck,9,1.0,1006.1275402653744,48.773889,9.186111
72,Rotebühlplatz (Das Gerber),9,1.0,1142.457636958647,48.774722,9.172222
79,Schlossplatz,13,1.0,159.79046175711247,48.778889,9.178889
89,Stadtbibliothek (Handwerkskammer),1,1.0,1600.094471702996,48.790556,9.181111
103,Österreichischer Platz (WGV-Versicherungen),7,1.0,1258.8492438984558,48.77,9.175556


In [133]:
station_summary.loc[station_summary['Cluster Labels'] == 3, station_summary.columns[[0] + list(range(2, station_summary.shape[1]))]]

Unnamed: 0,Subway Station,Total restaurant,Total Sushi restaurant,Distance,Subway Station Latitude,Subway Station Longitude
1,Augsburger Platz,1,0.0,7635.814913017591,48.805833,9.230833
3,Beethovenstraße,1,0.0,6184.7048814929085,48.777222,9.133333
7,Botnang,1,0.0,7654.370042621837,48.778611,9.122222
16,Elbestraße,1,0.0,7569.0673037925935,48.825556,9.212778
24,Freibergstraße,1,0.0,8189.683792483637,48.824167,9.221944
25,Fürfelder Straße,1,0.0,7253.292880765147,48.832222,9.191944
30,Heumaden Bockelstraße,1,0.0,8121.534033634644,48.739722,9.227778
32,Hohensteinstraße,1,0.0,6435.566299003065,48.826944,9.173889
37,Kirchtalstraße (Volksbank Zuffenhausen),2,0.0,7182.971722788237,48.832222,9.170833
38,Kursaal,1,0.0,6929.671614009908,48.809167,9.2225


In [134]:
station_summary.loc[station_summary['Cluster Labels'] == 4, station_summary.columns[[0] + list(range(2, station_summary.shape[1]))]]

Unnamed: 0,Subway Station,Total restaurant,Total Sushi restaurant,Distance,Subway Station Latitude,Subway Station Longitude
2,Bad Cannstatt Wilhelmsplatz,15,2.0,5729.130893618349,48.803037,9.215704
11,Charlottenplatz (Ebene -1),14,2.0,502.5065755336415,48.775833,9.182778
12,Charlottenplatz (Ebene -2),10,2.0,466.2050179406534,48.776667,9.183056
66,Rathaus,19,2.0,600.061270968111,48.773889,9.18


In [135]:
station_summary.loc[station_summary['Cluster Labels'] == 5, station_summary.columns[[0] + list(range(2, station_summary.shape[1]))]]

Unnamed: 0,Subway Station,Total restaurant,Total Sushi restaurant,Distance,Subway Station Latitude,Subway Station Longitude
4,Berliner Platz (Hohe Straße),6,0.0,1518.7378735567436,48.777222,9.168611
5,Berliner Platz (Liederhalle),11,0.0,1573.8432885647078,48.778611,9.168131
8,Bubenbad (IB-Jugendgästehaus),1,0.0,2061.073677377733,48.773333,9.194722
10,Börsenplatz (L-Bank),4,0.0,278.5719176074256,48.779722,9.178333
19,Eugensplatz (Jugendherberge),3,0.0,1397.2177319877555,48.778056,9.190556
27,Hauptbahnhof Arnulf-Klett-Platz,2,0.0,680.3555359069533,48.783611,9.180278
33,Hölderlinplatz,4,0.0,2902.529315539401,48.781667,9.158333
44,Marienplatz,7,0.0,2429.8486390372223,48.764444,9.168056
57,Neckartor,4,0.0,1663.3289127679031,48.786111,9.19
63,Pragfriedhof,1,0.0,2070.094527470249,48.793611,9.184167


Let's have a look at the stations considering the total number of restaurant on the map 

In [122]:
# create map
map_restaurant = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, res, res_sushi in zip(station_summary['Subway Station Latitude'], station_summary['Subway Station Longitude'], station_summary['Subway Station'], station_summary['Cluster Labels'], station_summary['Total restaurant'], station_summary['Total Sushi restaurant']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=1 + res*2,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=False,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_restaurant)
       
map_restaurant

Let's have a look at the stations considering the total number of Sushi restaurant on the map

In [127]:
# create map
map_restaurant_Sushi = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, res, res_sushi in zip(station_summary['Subway Station Latitude'], station_summary['Subway Station Longitude'], station_summary['Subway Station'], station_summary['Cluster Labels'], station_summary['Total restaurant'], station_summary['Total Sushi restaurant']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=3 + res_sushi*10,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=False,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_restaurant_Sushi)
       
map_restaurant_Sushi

## Results and Discussion <a name="results"></a>

Base on the cluster we can find out that cluster 2 and 5 is most intereting for us. Most of the station of Cluster 5 is near the city center stuttgart and has no Sushi restaurant within 300 meter. Cluster 2 is very near from the center and has only one Sushi restaurant nearby. 

we can also easily find out, that cluster 0, 1 and 3 are not our candidate, because they are two far from the city center and has very less restaurants nearby. They should be reasons that other don't want to open restaurant there.
For example, far from shopping area or living area. 

Cluster 4 are right in the middle of the city and have a lot or restaurant nearby, but it has already two Sushi restaurant near each of the station. So it would be better to avoid the competions.

Finally, we can choose a place to open the sushi restaurant base on the information we have. I would like to choose Schlossplatz of cluster 5, because it is at the middle if city center that will has a really large nummer of people around for shopping or eating, even there is already one Sushi restaurant there, but the location is perfect. The other choice would be Berliner Platz of cluster 2, it has two stations near each other with large number of restaurant nearby, and there is no Sushi restaurant there.

## Conclusion <a name="conclusion"></a>

In this project we explored and visualized the location date of subway station and restaurant near them. we find out the total number of all restaurant and specially Sushi restautant. We also calculated the distance between subway station and city center. With all the information we used K means as classification method to create different clusters of all subway stations. Among of all these clusters we chosen two of them, which could be the candidates to open a Sushi restaurant.

In this project the location data is used as criteria for opening a restaurant. Of course, there is more criterias needed in the real world. But at least we can have some overview of the restaurants and narrow down the area we are looking for.

I hope this kind of exploration could really help the stakeholder for opening a restaurant.