# Location recommandation of restaurant opening in Lausanne (Switzerland)

## Introduction

##### To open a new restaurant in a new city, it's difficult to choose the right place to start the business. There are a lot of parmeters to take into account in order to succeed to run a new businness. The food, service but also the location have to be choosen in a meaningful way in order to establish a productive bussiness. The location of the restaurant should take into account different parameters like parking availability, visiblity and neighborhood competition. With Foursquare location data, it will be possible to recommand a specific location adapted to a new restaurant opening.

# Data section

### To recommand a location for an opening of a new restaurant, the data from data.geo.admin.ch will be used in order to have geodata of the city of the interest like the commune, postcode, longitude and latitude for each neighborhood (locality).

In [1]:
!wget -O PLZO_CSV_WGS84.zip https://data.geo.admin.ch/ch.swisstopo-vd.ortschaftenverzeichnis_plz/PLZO_CSV_WGS84.zip
print('unziping ...')
!unzip -o -j PLZO_CSV_WGS84.zip


--2020-03-21 20:03:06--  https://data.geo.admin.ch/ch.swisstopo-vd.ortschaftenverzeichnis_plz/PLZO_CSV_WGS84.zip
Resolving data.geo.admin.ch (data.geo.admin.ch)... 13.225.54.29, 13.225.54.52, 13.225.54.48, ...
Connecting to data.geo.admin.ch (data.geo.admin.ch)|13.225.54.29|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 126673 (124K) [application/zip]
Saving to: ‘PLZO_CSV_WGS84.zip’


2020-03-21 20:03:06 (14.8 MB/s) - ‘PLZO_CSV_WGS84.zip’ saved [126673/126673]

unziping ...
Archive:  PLZO_CSV_WGS84.zip
  inflating: PLZO_CSV_WGS84.csv      


In [2]:
# Code to solve issues in data downloading
import pkgutil
import encodings
import os

def all_encodings():
    modnames = set([modname for importer, modname, ispkg in pkgutil.walk_packages(
        path=[os.path.dirname(encodings.__file__)], prefix='')])
    aliases = set(encodings.aliases.aliases.values())
    return modnames.union(aliases)

text = b'\xfc'
for enc in all_encodings():
    try:
        msg = text.decode(enc)
    except Exception:
        continue
    if msg == 'ü':
        print('Decoding {t} with {enc} is {m}'.format(t=text, enc=enc, m=msg))

Decoding b'\xfc' with iso8859_9 is ü
Decoding b'\xfc' with cp1254 is ü
Decoding b'\xfc' with iso8859_13 is ü
Decoding b'\xfc' with cp1252 is ü
Decoding b'\xfc' with raw_unicode_escape is ü
Decoding b'\xfc' with iso8859_2 is ü
Decoding b'\xfc' with iso8859_10 is ü
Decoding b'\xfc' with iso8859_15 is ü
Decoding b'\xfc' with unicode_escape is ü
Decoding b'\xfc' with iso8859_16 is ü
Decoding b'\xfc' with cp1256 is ü
Decoding b'\xfc' with iso8859_3 is ü
Decoding b'\xfc' with palmos is ü
Decoding b'\xfc' with cp1250 is ü
Decoding b'\xfc' with cp1258 is ü
Decoding b'\xfc' with charmap is ü
Decoding b'\xfc' with cp1257 is ü
Decoding b'\xfc' with latin_1 is ü
Decoding b'\xfc' with iso8859_14 is ü
Decoding b'\xfc' with iso8859_1 is ü
Decoding b'\xfc' with iso8859_4 is ü




In [3]:
# Import data
import pandas as pd
import csv
with open('PLZO_CSV_WGS84.csv','r',encoding='latin_1', newline='') as csvfile:
    swiss_df = pd.read_csv(csvfile,delimiter = ';')
swiss_df.head()

Unnamed: 0,Ortschaftsname,PLZ,Zusatzziffer,Gemeindename,BFS-Nr,Kantonskürzel,E,N,Sprache
0,Aeugst am Albis,8914,0,Aeugst am Albis,1,ZH,8.488313,47.267004,de
1,Aeugstertal,8914,2,Aeugst am Albis,1,ZH,8.493642,47.282761,de
2,Zwillikon,8909,0,Affoltern am Albis,2,ZH,8.431459,47.287633,de
3,Affoltern am Albis,8910,0,Affoltern am Albis,2,ZH,8.448945,47.279169,de
4,Bonstetten,8906,0,Bonstetten,3,ZH,8.467611,47.31551,de


In [4]:
# Rename the columns names with english name
swiss_df.rename({'Ortschaftsname': 'Locality', 'PLZ': 'Postcode', 'Zusatzziffer': 'Amendment','Gemeindename': 'Commune','Kantonskürzel': 'Canton','E': 'Longitude','N': 'Latitude','Sprache': 'Language'}, axis=1, inplace=True)
swiss_df.head()

Unnamed: 0,Locality,Postcode,Amendment,Commune,BFS-Nr,Canton,Longitude,Latitude,Language
0,Aeugst am Albis,8914,0,Aeugst am Albis,1,ZH,8.488313,47.267004,de
1,Aeugstertal,8914,2,Aeugst am Albis,1,ZH,8.493642,47.282761,de
2,Zwillikon,8909,0,Affoltern am Albis,2,ZH,8.431459,47.287633,de
3,Affoltern am Albis,8910,0,Affoltern am Albis,2,ZH,8.448945,47.279169,de
4,Bonstetten,8906,0,Bonstetten,3,ZH,8.467611,47.31551,de


In [5]:
# Set the index in commune to choose the canton that we are interested in (Lausanne)
swiss_df.set_index('Commune',inplace=True)
swiss_df.head()

Unnamed: 0_level_0,Locality,Postcode,Amendment,BFS-Nr,Canton,Longitude,Latitude,Language
Commune,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Aeugst am Albis,Aeugst am Albis,8914,0,1,ZH,8.488313,47.267004,de
Aeugst am Albis,Aeugstertal,8914,2,1,ZH,8.493642,47.282761,de
Affoltern am Albis,Zwillikon,8909,0,2,ZH,8.431459,47.287633,de
Affoltern am Albis,Affoltern am Albis,8910,0,2,ZH,8.448945,47.279169,de
Bonstetten,Bonstetten,8906,0,3,ZH,8.467611,47.31551,de


In [6]:
# Select the data that we are interested in (Lausanne)
Lausanne_df=swiss_df.loc['Lausanne'].reset_index(drop=True)
Lausanne_df.head()

Unnamed: 0,Locality,Postcode,Amendment,BFS-Nr,Canton,Longitude,Latitude,Language
0,Lausanne 25,1000,25,5586,VD,6.683444,46.562237,fr
1,Lausanne 26,1000,26,5586,VD,6.696216,46.556483,fr
2,Lausanne 27,1000,27,5586,VD,6.681465,46.541743,fr
3,Lausanne,1003,0,5586,VD,6.630034,46.520004,fr
4,Lausanne,1004,0,5586,VD,6.618678,46.52848,fr


In [7]:
# Remove unnecessary data
Lausanne_df.drop(['Amendment', 'BFS-Nr','Language','Canton'], axis=1,inplace=True)
Lausanne_df

Unnamed: 0,Locality,Postcode,Longitude,Latitude
0,Lausanne 25,1000,6.683444,46.562237
1,Lausanne 26,1000,6.696216,46.556483
2,Lausanne 27,1000,6.681465,46.541743
3,Lausanne,1003,6.630034,46.520004
4,Lausanne,1004,6.618678,46.52848
5,Lausanne,1005,6.6425,46.519859
6,Lausanne,1006,6.63711,46.510849
7,Lausanne,1007,6.608606,46.517754
8,Lausanne,1010,6.65892,46.536143
9,Lausanne,1011,6.64288,46.525635


In [8]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.11.28         |   py36h9f0ad1d_1         149 KB  conda-forge
    altair-4.0.1               |             py_0         575 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    openssl-1.1.1e             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                       

In [9]:
address = 'Lausanne,LA'
geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Lausanne are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Lausanne are 46.5218269, 6.6327025.


In [10]:
# create map of Lausanne using latitude and longitude values
map_lausanne = folium.Map(location=[latitude, longitude], zoom_start=12)
# add markers to map
for lat, lng, loc, post in zip(Lausanne_df['Latitude'], Lausanne_df['Longitude'], Lausanne_df['Locality'], Lausanne_df['Postcode']):
    label = '{}, {}'.format(post, loc)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_lausanne)  
map_lausanne

In [11]:
# Start utilizing the Foursquare API to explore the Postcode and segment them
# Define Foursquare Credentials and Version
CLIENT_ID = '4G233CCR10CY0UMG4ADK5QME3MEPB13SJZ1SPHRSFW4Q3IUC' # your Foursquare ID
CLIENT_SECRET = 'H30UQFMIPEZKQUU41CLHELJSKQ0PIDRL5QSQ4140AJERSFCV' # your Foursquare Secret
VERSION = '20200319' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

# Let's explore the first neighborhood in our dataframe.
# Get the neighborhood's name.
Lausanne_df.loc[0, 'Postcode']

# Get the neighborhood's latitude and longitude values.
postcode_latitude = Lausanne_df.loc[0, 'Latitude'] # postcode latitude value
postcode_longitude = Lausanne_df.loc[0, 'Longitude'] # postcode longitude value

postcode_name = Lausanne_df.loc[0, 'Postcode'] # Postcode name

print('Latitude and longitude values of {} are {}, {}.'.format(postcode_name, 
                                                               postcode_latitude, 
                                                               postcode_longitude))

Your credentails:
CLIENT_ID: 4G233CCR10CY0UMG4ADK5QME3MEPB13SJZ1SPHRSFW4Q3IUC
CLIENT_SECRET:H30UQFMIPEZKQUU41CLHELJSKQ0PIDRL5QSQ4140AJERSFCV
Latitude and longitude values of 1000 are 46.56223671140397, 6.683443539253286.


In [12]:
# Get the top 100 venues that are in Postcode 1000 within a radius of 500 meters
# First, let's create the GET request URL. 
# Name your URL url.
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    postcode_latitude, 
    postcode_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=4G233CCR10CY0UMG4ADK5QME3MEPB13SJZ1SPHRSFW4Q3IUC&client_secret=H30UQFMIPEZKQUU41CLHELJSKQ0PIDRL5QSQ4140AJERSFCV&v=20200319&ll=46.56223671140397,6.683443539253286&radius=500&limit=100'

In [13]:
# Send the GET request and examine the results
results = requests.get(url).json()
results
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# Now we are ready to clean the json and structure it into a pandas dataframe.

venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

# And how many venues were returned by Foursquare?
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

4 venues were returned by Foursquare.


In [14]:
# Explore Neighborhoods in Manhattan

# Let's create a function to repeat the same process to all the locality in Lausanne
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    return(nearby_venues)



In [18]:
# Now write the code to run the above function on each locality and create a new
# dataframe called lausanne_venues.

lausanne_venues = getNearbyVenues(names=Lausanne_df['Postcode'],
                                   latitudes=Lausanne_df['Latitude'],
                                   longitudes=Lausanne_df['Longitude']
                                  )

# Let's check the size of the resulting dataframe
print(lausanne_venues.shape)
lausanne_venues.head()

# Let's check how many venues were returned for each neighborhood
lausanne_venues.groupby('Neighborhood').count()

# Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(lausanne_venues['Venue Category'].unique())))



1000
1000
1000
1003
1004
1005
1006
1007
1010
1011
1012
1015
1018
(235, 7)
There are 93 uniques categories.


In [19]:
# Analyze Each Neighborhood

# one hot encoding
lausanne_onehot = pd.get_dummies(lausanne_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
lausanne_onehot['Neighborhood'] = lausanne_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [lausanne_onehot.columns[-1]] + list(lausanne_onehot.columns[:-1])
lausanne_onehot = lausanne_onehot[fixed_columns]
lausanne_onehot.head()

# And let's examine the new dataframe size.
lausanne_onehot.shape

# Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence
# of each category
lausanne_grouped = lausanne_onehot.groupby('Neighborhood').mean().reset_index()
lausanne_grouped


Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Bakery,Bar,Bed & Breakfast,Bistro,Bookstore,Boutique,Breakfast Spot,Brewery,Burger Joint,Bus Station,Bus Stop,Café,Candy Store,Chinese Restaurant,Church,Coffee Shop,College Cafeteria,Construction & Landscaping,Creperie,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Electronics Store,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Food Court,French Restaurant,Furniture / Home Store,Gas Station,Gastropub,Greek Restaurant,Grocery Store,Gym,Gym Pool,Home Service,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jewish Restaurant,Light Rail Station,Lounge,Massage Studio,Mediterranean Restaurant,Metro Station,Modern European Restaurant,Moroccan Restaurant,Multiplex,Museum,Nightclub,Opera House,Park,Pedestrian Plaza,Pharmacy,Pizza Place,Platform,Plaza,Pool,Restaurant,Rock Club,Sandwich Place,Shoe Store,Skating Rink,Snack Place,Spa,Sporting Goods Shop,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Swiss Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Train Station,Trattoria/Osteria,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
0,1000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.25,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1003,0.010526,0.010526,0.0,0.010526,0.0,0.021053,0.0,0.115789,0.0,0.010526,0.010526,0.0,0.021053,0.010526,0.031579,0.0,0.0,0.052632,0.010526,0.010526,0.010526,0.010526,0.0,0.0,0.021053,0.010526,0.0,0.010526,0.010526,0.0,0.010526,0.010526,0.0,0.010526,0.010526,0.0,0.042105,0.0,0.0,0.0,0.0,0.0,0.021053,0.0,0.0,0.010526,0.010526,0.0,0.010526,0.010526,0.052632,0.031579,0.0,0.0,0.031579,0.0,0.010526,0.0,0.010526,0.0,0.010526,0.010526,0.010526,0.0,0.010526,0.010526,0.0,0.021053,0.021053,0.042105,0.0,0.010526,0.010526,0.021053,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.021053,0.0,0.031579,0.010526,0.0,0.010526,0.0,0.010526,0.0,0.010526,0.010526,0.021053
2,1004,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.136364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.090909,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.045455,0.090909,0.045455,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0
3,1005,0.0,0.0,0.019608,0.0,0.0,0.019608,0.039216,0.137255,0.0,0.0,0.0,0.019608,0.019608,0.0,0.039216,0.0,0.0,0.019608,0.0,0.019608,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.019608,0.0,0.0,0.019608,0.0,0.0,0.0,0.039216,0.0,0.0,0.019608,0.019608,0.0,0.0,0.0,0.0,0.0,0.039216,0.019608,0.0,0.0,0.039216,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.058824,0.019608,0.039216,0.0,0.0,0.078431,0.0,0.0,0.019608,0.019608,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.019608,0.039216,0.019608,0.0,0.0,0.0,0.019608,0.0,0.0,0.019608,0.0,0.0,0.0
4,1006,0.0,0.0,0.0,0.0,0.083333,0.0,0.083333,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.083333,0.0,0.083333,0.0,0.0,0.083333,0.083333,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,1007,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.157895,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.052632,0.052632,0.0,0.052632,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,1010,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,1011,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.090909,0.0,0.090909,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.090909,0.0
8,1012,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,1015,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [20]:
# Let's confirm the new size
lausanne_grouped.shape


(11, 94)

In [23]:
# Let's print each neighborhood along with the top 5 most common venues
num_top_venues = 5
for hood in lausanne_grouped['Neighborhood']:
    temp = lausanne_grouped[lausanne_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

               venue  freq
0   Swiss Restaurant  0.25
1  French Restaurant  0.25
2         Food Court  0.12
3         Restaurant  0.12
4                Spa  0.12


                venue  freq
0                 Bar  0.12
1  Italian Restaurant  0.05
2                Café  0.05
3               Plaza  0.04
4   French Restaurant  0.04


           venue  freq
0       Bus Stop  0.14
1  Grocery Store  0.09
2    Supermarket  0.09
3       Gym Pool  0.05
4     Restaurant  0.05


               venue  freq
0                Bar  0.14
1        Pizza Place  0.08
2          Nightclub  0.06
3  French Restaurant  0.04
4       Burger Joint  0.04


          venue  freq
0          Café  0.17
1         Plaza  0.08
2        Museum  0.08
3    Art Museum  0.08
4  Skating Rink  0.08


           venue  freq
0    Bus Station  0.16
1    Snack Place  0.11
2     Restaurant  0.11
3    Supermarket  0.11
4  Grocery Store  0.05


               venue  freq
0                Gym  0.25
1        Gas Station  0.25
2   Sus

In [24]:
# Let's put that into a pandas dataframe
# First, let's write a function to sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False) 
    return row_categories_sorted.index.values[0:num_top_venues]

# Now let's create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [32]:
# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = lausanne_grouped['Neighborhood']

for ind in np.arange(lausanne_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(lausanne_grouped.iloc[ind, :], num_top_venues)
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1000,French Restaurant,Swiss Restaurant,Spa,Restaurant,Gas Station,Food Court,Wine Bar,Creperie,Cupcake Shop,Deli / Bodega
1,1003,Bar,Café,Italian Restaurant,French Restaurant,Plaza,Japanese Restaurant,Swiss Restaurant,Lounge,Burger Joint,Gym
2,1004,Bus Stop,Grocery Store,Supermarket,Bus Station,Jewish Restaurant,Restaurant,Italian Restaurant,Sporting Goods Shop,Stadium,Steakhouse
3,1005,Bar,Pizza Place,Nightclub,Hotel,Italian Restaurant,French Restaurant,Park,Bakery,Supermarket,Burger Joint
4,1006,Café,Park,Skating Rink,Museum,Pharmacy,Art Museum,Bakery,Bar,Plaza,Pool


In [33]:
# Cluster Neighborhoods

# Run k-means to cluster the neighborhood into 5 clusters.
# set number of clusters
kclusters = 5
lausanne_grouped_clustering = lausanne_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(lausanne_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 2, 0, 2, 2, 0, 0, 2, 4, 1], dtype=int32)

In [34]:
# Let's create a new dataframe that includes the cluster as well as the top 10 venues for each
# neighborhood.
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
lausanne_merged = Lausanne_df

# merge lausanne_grouped with lausanne_df to add latitude/longitude for each neighborhood
lausanne_merged = lausanne_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Postcode')
lausanne_merged.head() # check the last columns!

Unnamed: 0,Locality,Postcode,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Lausanne 25,1000,6.683444,46.562237,3,French Restaurant,Swiss Restaurant,Spa,Restaurant,Gas Station,Food Court,Wine Bar,Creperie,Cupcake Shop,Deli / Bodega
1,Lausanne 26,1000,6.696216,46.556483,3,French Restaurant,Swiss Restaurant,Spa,Restaurant,Gas Station,Food Court,Wine Bar,Creperie,Cupcake Shop,Deli / Bodega
2,Lausanne 27,1000,6.681465,46.541743,3,French Restaurant,Swiss Restaurant,Spa,Restaurant,Gas Station,Food Court,Wine Bar,Creperie,Cupcake Shop,Deli / Bodega
3,Lausanne,1003,6.630034,46.520004,2,Bar,Café,Italian Restaurant,French Restaurant,Plaza,Japanese Restaurant,Swiss Restaurant,Lounge,Burger Joint,Gym
4,Lausanne,1004,6.618678,46.52848,0,Bus Stop,Grocery Store,Supermarket,Bus Station,Jewish Restaurant,Restaurant,Italian Restaurant,Sporting Goods Shop,Stadium,Steakhouse


In [36]:
# Finally, let's visualize the resulting clusters
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(lausanne_merged['Latitude'], lausanne_merged['Longitude'], lausanne_merged['Postcode'], lausanne_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)    
map_clusters

In [37]:
# Examine Clusters

# Now, you can examine each cluster and determine the discriminating venue categories that
# distinguish each cluster.

# Cluster 1
lausanne_merged.loc[lausanne_merged['Cluster Labels'] == 0, lausanne_merged.columns[[1] + list(range(5, lausanne_merged.shape[1]))]]


Unnamed: 0,Postcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,1004,Bus Stop,Grocery Store,Supermarket,Bus Station,Jewish Restaurant,Restaurant,Italian Restaurant,Sporting Goods Shop,Stadium,Steakhouse
7,1007,Bus Station,Snack Place,Restaurant,Supermarket,Hotel,Grocery Store,Gas Station,Construction & Landscaping,Park,Gym
8,1010,Gym,Gas Station,Bus Station,Sushi Restaurant,Wine Bar,Ethiopian Restaurant,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop
12,1018,Grocery Store,Bus Stop,Bus Station,Supermarket,Wine Bar,Ethiopian Restaurant,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop


In [38]:
# Cluster 2
lausanne_merged.loc[lausanne_merged['Cluster Labels'] == 1, lausanne_merged.columns[[1] + list(range(5, lausanne_merged.shape[1]))]]


Unnamed: 0,Postcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,1015,Stadium,Deli / Bodega,Light Rail Station,College Cafeteria,Wine Bar,Creperie,Cupcake Shop,Department Store,Dessert Shop,Diner


In [40]:
# Cluster 3
lausanne_merged.loc[lausanne_merged['Cluster Labels'] == 2, lausanne_merged.columns[[1] + list(range(5, lausanne_merged.shape[1]))]]


Unnamed: 0,Postcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,1003,Bar,Café,Italian Restaurant,French Restaurant,Plaza,Japanese Restaurant,Swiss Restaurant,Lounge,Burger Joint,Gym
5,1005,Bar,Pizza Place,Nightclub,Hotel,Italian Restaurant,French Restaurant,Park,Bakery,Supermarket,Burger Joint
6,1006,Café,Park,Skating Rink,Museum,Pharmacy,Art Museum,Bakery,Bar,Plaza,Pool
9,1011,Thai Restaurant,Hotel,Sushi Restaurant,Vietnamese Restaurant,French Restaurant,Italian Restaurant,Massage Studio,Metro Station,Museum,Pizza Place


In [41]:
# Cluster 4
lausanne_merged.loc[lausanne_merged['Cluster Labels'] == 3, lausanne_merged.columns[[1] + list(range(5, lausanne_merged.shape[1]))]]

Unnamed: 0,Postcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1000,French Restaurant,Swiss Restaurant,Spa,Restaurant,Gas Station,Food Court,Wine Bar,Creperie,Cupcake Shop,Deli / Bodega
1,1000,French Restaurant,Swiss Restaurant,Spa,Restaurant,Gas Station,Food Court,Wine Bar,Creperie,Cupcake Shop,Deli / Bodega
2,1000,French Restaurant,Swiss Restaurant,Spa,Restaurant,Gas Station,Food Court,Wine Bar,Creperie,Cupcake Shop,Deli / Bodega


In [43]:
# Cluster 5
lausanne_merged.loc[lausanne_merged['Cluster Labels'] == 4, lausanne_merged.columns[[1] + list(range(5, lausanne_merged.shape[1]))]]


Unnamed: 0,Postcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,1012,Department Store,Bakery,Sushi Restaurant,Supermarket,Fast Food Restaurant,Wine Bar,Ethiopian Restaurant,Creperie,Cupcake Shop,Deli / Bodega


# Conclusion

#### According the restaurant competition, cluster 3 and cluster 4 should be avoided to start a new restaurant. Cluster 1, 2 and 5 are better place to start a new restaurant bussiness because there are not so much famous restaurants in these areas.