## 1.A description of the problem and a discussion of the background.

In this project, I will analyze where it is most appropriate if someone wants to open a Chinese restaurant in Melbourne  

Melbourne has a population of almost 5 million, of which 650,000 are Chinese. In Australia, most of the population is concentrated in the CBD, and this is no exception in Melbourne. The profitability of a restaurant mainly requires a large flow of people, so through analysis, the location of the restaurant is preferably somewhere in the city center. There is also competition between the same types of restaurants. We can analyze the types of nearby restaurants to exclude some of the more competitive locations and choose the best location.

## 2.A description of the data and how it will be used to solve the problem.

In this project, we will use restaurants near Melbourne as data set.  
Data obtained through foursquare API

In [141]:
#data
print(melbourne_venues.shape)
melbourne_venues.head()

(2979, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Tipo 00,-37.813527,144.961978,Tipo 00,-37.813527,144.961978,Italian Restaurant
1,Tipo 00,-37.813527,144.961978,Calia,-37.812724,144.96393,Japanese Restaurant
2,Tipo 00,-37.813527,144.961978,Trattoria Emilia,-37.81522,144.962636,Italian Restaurant
3,Tipo 00,-37.813527,144.961978,Nosh,-37.815396,144.962999,Asian Restaurant
4,Tipo 00,-37.813527,144.961978,B'cos Brazil,-37.815486,144.963085,Brazilian Restaurant


The fact that there are more restaurants also proves that there is more local traffic.  
Classify different types of restaurants through clustering, excluding places with more Chinese restaurants.


# Code

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


In [2]:
address = 'Melbourne'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are -37.8142176, 144.9631608.


In [4]:
# create map of Manhattan using latitude and longitude values
map_Melbourne = folium.Map(location=[latitude, longitude], zoom_start=11)


    
map_Melbourne

In [5]:
CLIENT_ID = 'TMEG43JMICJTXSLJPELQVVWJSSBRL5D2GXM34Q2P1UKGUVIC' # your Foursquare ID
CLIENT_SECRET = 'L2UAX2FU0ITUHBJACKVSE1HAGPLJOMWK1CK3OCZAJ0OBWXSM' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: TMEG43JMICJTXSLJPELQVVWJSSBRL5D2GXM34Q2P1UKGUVIC
CLIENT_SECRET:L2UAX2FU0ITUHBJACKVSE1HAGPLJOMWK1CK3OCZAJ0OBWXSM


In [17]:
# type your answer here
LIMIT = 300 # limit of number of venues returned by Foursquare API
radius = 3000 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url # display URL




'https://api.foursquare.com/v2/venues/explore?&client_id=TMEG43JMICJTXSLJPELQVVWJSSBRL5D2GXM34Q2P1UKGUVIC&client_secret=L2UAX2FU0ITUHBJACKVSE1HAGPLJOMWK1CK3OCZAJ0OBWXSM&v=20180605&ll=-37.8142176,144.9631608&radius=3000&limit=300'

In [18]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5df86c8740a7ea001bc67df2'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Melbourne',
  'headerFullLocation': 'Melbourne',
  'headerLocationGranularity': 'city',
  'totalResults': 244,
  'suggestedBounds': {'ne': {'lat': -37.78721757299997,
    'lng': 144.99727411135387},
   'sw': {'lat': -37.84121762700003, 'lng': 144.92904748864612}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '54631f6e498ed0dde017e53c',
       'name': 'Tipo 00',
       'location': {'address': '361 Little Bourke St',
        'lat': -37.81352651659617,
        'lng': 144.96197769842925,
        'labeledLatLngs': [{'label': 'display',
          'lat': -37.81352651659617,
 

In [19]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [102]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Tipo 00,Italian Restaurant,-37.813527,144.961978
1,Brother Baba Budan,Coffee Shop,-37.813445,144.962137
2,Kirk's Wine Bar,Wine Bar,-37.813661,144.961351
3,B'cos Brazil,Brazilian Restaurant,-37.815486,144.963085
4,Calia,Japanese Restaurant,-37.812724,144.96393


In [103]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


In [113]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        
        
#         for v in results:
#             if "Restaurant" not in  v['venue']['categories'][0]['name']:
#                 #print("11111")
#                 continue
#             #print(v['venue']['categories'][0]['name'])
#             venues_list.append((
#                 name, 
#                 lat, 
#                 lng, 
#                 v['venue']['name'], 
#                 v['venue']['location']['lat'], 
#                 v['venue']['location']['lng'],  
#                 v['venue']['categories'][0]['name']))

        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results if "Restaurant"  in  v['venue']['categories'][0]['name']])        
        
        

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [114]:
# type your answer here

melbourne_venues = getNearbyVenues(names=nearby_venues['name'],
                                   latitudes=nearby_venues['lat'],
                                   longitudes=nearby_venues['lng']
                                  )



Tipo 00
Brother Baba Budan
Kirk's Wine Bar
B'cos Brazil
Calia
Nosh
Koko Black
Little Rogue Coffee
Boilermaker House
Shortstop Coffee & Donuts
Degraves Street
Gewürzhaus
Union Electric
Beneath Driver Lane
La Belle Miette
Chuckle Park
Curtin House Rooftop Bar
Patricia Coffee Brewers
Pidapipó Gelateria
Brick Lane
Dukes Coffee Roasters
Minotaur
Trattoria Emilia
Emporium Melbourne
Hopetoun Tea Rooms
Manchester Press
The Butterfly Club
State Library of Victoria
Whitehart
Roule Galette
Bar Americano
Red Spice Road
All Star Comics
The Gin Palace
Dragon Hot Pot
The Hardware Société
Ganache Chocolate
Candela Nuevo
Rooftop at QT
Gong Cha (貢茶)
Supernormal
Grand Hyatt Melbourne
Whisky + Alement
Hosier Lane
Federation Square
Coda
Games Laboratory
Bartronica
Australian Centre for the Moving Image (ACMI)
Louis Vuitton
Chin Chin
The Wheeler Centre
Southbank Promenade
Meatmaiden
Embla
Whisky Den
Oli & Levi
State Library Lawn
MoVida
Heartbreaker
Clementine's Fine Food & Gifts
Simpsons Burgers
Royal Stack

In [115]:
print(melbourne_venues.shape)
melbourne_venues.head()

(2979, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Tipo 00,-37.813527,144.961978,Tipo 00,-37.813527,144.961978,Italian Restaurant
1,Tipo 00,-37.813527,144.961978,Calia,-37.812724,144.96393,Japanese Restaurant
2,Tipo 00,-37.813527,144.961978,Trattoria Emilia,-37.81522,144.962636,Italian Restaurant
3,Tipo 00,-37.813527,144.961978,Nosh,-37.815396,144.962999,Asian Restaurant
4,Tipo 00,-37.813527,144.961978,B'cos Brazil,-37.815486,144.963085,Brazilian Restaurant


In [116]:
melbourne_data = melbourne_venues.drop(['Neighborhood',"Neighborhood Latitude",'Neighborhood Longitude','Venue Category'],axis=1)

In [117]:
melbourne_data

Unnamed: 0,Venue,Venue Latitude,Venue Longitude
0,Tipo 00,-37.813527,144.961978
1,Calia,-37.812724,144.96393
2,Trattoria Emilia,-37.81522,144.962636
3,Nosh,-37.815396,144.962999
4,B'cos Brazil,-37.815486,144.963085
5,Red Spice Road,-37.815332,144.961644
6,Mjølner,-37.81214,144.96068
7,Blok M Express,-37.813665,144.961404
8,Palermo,-37.813851,144.961015
9,Guzman y Gomez,-37.812144,144.96355


In [118]:
melbourne_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ABC Chicken,41,41,41,41,41,41
All Star Comics,23,23,23,23,23,23
Arbory Bar & Eatery,21,21,21,21,21,21
Australian Centre for the Moving Image (ACMI),27,27,27,27,27,27
B'cos Brazil,19,19,19,19,19,19
Bar Americano,22,22,22,22,22,22
Bar Lourinhã,35,35,35,35,35,35
Bartronica,21,21,21,21,21,21
Beneath Driver Lane,25,25,25,25,25,25
Boilermaker House,29,29,29,29,29,29


In [119]:
print('There are {} uniques categories.'.format(len(melbourne_venues['Venue Category'].unique())))

There are 47 uniques categories.


In [120]:
# one hot encoding
melbourne_onehot = pd.get_dummies(melbourne_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
melbourne_onehot['Neighborhood'] = melbourne_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [melbourne_onehot.columns[-1]] + list(melbourne_onehot.columns[:-1])
melbourne_onehot = melbourne_onehot[fixed_columns]

melbourne_onehot.head()

Unnamed: 0,Neighborhood,African Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Brazilian Restaurant,Cantonese Restaurant,Caucasian Restaurant,Chinese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Lebanese Restaurant,Malay Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Persian Restaurant,Peruvian Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Scandinavian Restaurant,Seafood Restaurant,Shanghai Restaurant,Soba Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sushi Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Tipo 00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Tipo 00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Tipo 00,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Tipo 00,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Tipo 00,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [121]:
melbourne_onehot.shape

(2979, 48)

In [122]:
melbourne_grouped = melbourne_onehot.groupby('Neighborhood').mean().reset_index()
melbourne_grouped

Unnamed: 0,Neighborhood,African Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Brazilian Restaurant,Cantonese Restaurant,Caucasian Restaurant,Chinese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Lebanese Restaurant,Malay Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Persian Restaurant,Peruvian Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Scandinavian Restaurant,Seafood Restaurant,Shanghai Restaurant,Soba Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sushi Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,ABC Chicken,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.073171,0.02439,0.0,0.04878,0.02439,0.0,0.0,0.0,0.0,0.0,0.073171,0.02439,0.073171,0.0,0.365854,0.0,0.02439,0.0,0.02439,0.0,0.0,0.0,0.0,0.04878,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.02439,0.02439,0.02439,0.02439,0.02439
1,All Star Comics,0.0,0.043478,0.043478,0.0,0.043478,0.0,0.0,0.0,0.043478,0.043478,0.0,0.0,0.043478,0.0,0.0,0.043478,0.043478,0.043478,0.217391,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.043478,0.043478,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.043478,0.043478,0.0,0.0,0.130435,0.0,0.0,0.0
2,Arbory Bar & Eatery,0.0,0.0,0.142857,0.047619,0.047619,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.190476,0.047619,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.047619,0.047619,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.095238,0.095238,0.0,0.0,0.047619,0.0,0.0,0.0
3,Australian Centre for the Moving Image (ACMI),0.0,0.0,0.111111,0.037037,0.0,0.0,0.0,0.037037,0.0,0.037037,0.0,0.0,0.074074,0.0,0.0,0.111111,0.037037,0.0,0.185185,0.074074,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.037037,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.074074,0.074074,0.0,0.0,0.0,0.0,0.0,0.037037
4,B'cos Brazil,0.0,0.052632,0.052632,0.0,0.052632,0.0,0.0,0.0,0.052632,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.157895,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.052632,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.105263,0.0,0.052632,0.0
5,Bar Americano,0.0,0.0,0.090909,0.0,0.045455,0.0,0.045455,0.045455,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.136364,0.136364,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.090909,0.0,0.045455,0.045455,0.0,0.045455,0.045455
6,Bar Lourinhã,0.0,0.028571,0.142857,0.028571,0.0,0.0,0.028571,0.028571,0.0,0.028571,0.0,0.0,0.057143,0.0,0.0,0.085714,0.057143,0.0,0.171429,0.114286,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.028571,0.0,0.028571,0.028571,0.0,0.0,0.028571
7,Bartronica,0.0,0.0,0.047619,0.047619,0.047619,0.0,0.0,0.0,0.047619,0.047619,0.0,0.0,0.0,0.0,0.0,0.047619,0.047619,0.047619,0.238095,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.047619,0.047619,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.095238,0.0,0.0,0.0
8,Beneath Driver Lane,0.0,0.04,0.04,0.0,0.04,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.04,0.08,0.08,0.0,0.08,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.12,0.04,0.04,0.08,0.0,0.0,0.0
9,Boilermaker House,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.137931,0.0,0.103448,0.068966,0.0,0.0,0.0,0.0,0.103448,0.0,0.0,0.0,0.103448,0.0,0.068966,0.0,0.0,0.103448,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.068966,0.0,0.068966,0.034483,0.0,0.0,0.0


In [123]:
melbourne_grouped.shape

(100, 48)

print each neighborhood along with the top 5 most common venues

In [124]:
num_top_venues = 5

for hood in melbourne_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = melbourne_grouped[melbourne_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----ABC Chicken----
                   venue  freq
0      Korean Restaurant  0.37
1  Indonesian Restaurant  0.07
2     Chinese Restaurant  0.07
3    Japanese Restaurant  0.07
4  Portuguese Restaurant  0.05


----All Star Comics----
                   venue  freq
0     Italian Restaurant  0.22
1        Thai Restaurant  0.13
2  Portuguese Restaurant  0.04
3       Sushi Restaurant  0.04
4  Indonesian Restaurant  0.04


----Arbory Bar & Eatery----
                venue  freq
0  Italian Restaurant  0.19
1    Asian Restaurant  0.14
2  Spanish Restaurant  0.10
3    Sushi Restaurant  0.10
4    Ramen Restaurant  0.05


----Australian Centre for the Moving Image (ACMI)----
                 venue  freq
0   Italian Restaurant  0.19
1     Asian Restaurant  0.11
2     Greek Restaurant  0.11
3     Sushi Restaurant  0.07
4  Japanese Restaurant  0.07


----B'cos Brazil----
                 venue  freq
0   Italian Restaurant  0.16
1  Dumpling Restaurant  0.11
2     Ramen Restaurant  0.11
3      Thai Res

                   venue  freq
0     Italian Restaurant  0.14
1    Japanese Restaurant  0.11
2  Vietnamese Restaurant  0.08
3       Asian Restaurant  0.08
4       Sushi Restaurant  0.08


----Lucy Liu----
                 venue  freq
0   Italian Restaurant  0.18
1  Japanese Restaurant  0.12
2     Asian Restaurant  0.09
3     Sushi Restaurant  0.09
4   Spanish Restaurant  0.06


----Lune Croissanterie----
                 venue  freq
0   Italian Restaurant  0.16
1  Japanese Restaurant  0.12
2     Asian Restaurant  0.09
3     Sushi Restaurant  0.09
4     Greek Restaurant  0.06


----Lupino----
                 venue  freq
0   Italian Restaurant  0.17
1     Asian Restaurant  0.14
2  Japanese Restaurant  0.11
3     Greek Restaurant  0.09
4    French Restaurant  0.06


----Manchester Press----
                 venue  freq
0  Japanese Restaurant  0.15
1     Sushi Restaurant  0.12
2    Korean Restaurant  0.08
3      Thai Restaurant  0.08
4   Italian Restaurant  0.08


----Market Lane Coffee--

In [125]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [126]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = melbourne_grouped['Neighborhood']

for ind in np.arange(melbourne_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(melbourne_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ABC Chicken,Korean Restaurant,Chinese Restaurant,Indonesian Restaurant,Japanese Restaurant,Portuguese Restaurant,Fast Food Restaurant,Vietnamese Restaurant,Argentinian Restaurant,Dim Sum Restaurant,Filipino Restaurant
1,All Star Comics,Italian Restaurant,Thai Restaurant,Seafood Restaurant,Indonesian Restaurant,Indian Restaurant,Greek Restaurant,Middle Eastern Restaurant,French Restaurant,Portuguese Restaurant,Ramen Restaurant
2,Arbory Bar & Eatery,Italian Restaurant,Asian Restaurant,Sushi Restaurant,Spanish Restaurant,Dumpling Restaurant,Middle Eastern Restaurant,Thai Restaurant,Australian Restaurant,Brazilian Restaurant,Portuguese Restaurant
3,Australian Centre for the Moving Image (ACMI),Italian Restaurant,Asian Restaurant,Greek Restaurant,Japanese Restaurant,Sushi Restaurant,Spanish Restaurant,French Restaurant,Seafood Restaurant,Indian Restaurant,Peruvian Restaurant
4,B'cos Brazil,Italian Restaurant,Ramen Restaurant,Thai Restaurant,Dumpling Restaurant,Argentinian Restaurant,Asian Restaurant,Indonesian Restaurant,Sushi Restaurant,Brazilian Restaurant,Vegetarian / Vegan Restaurant


In [130]:
neighborhoods_venues_sorted.sort_values("1st Most Common Venue",inplace=False)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
25,Dukes Coffee Roasters,Asian Restaurant,Italian Restaurant,Sushi Restaurant,Spanish Restaurant,Vietnamese Restaurant,Seafood Restaurant,Greek Restaurant,Vegetarian / Vegan Restaurant,Dumpling Restaurant,Portuguese Restaurant
23,Degraves Street,Asian Restaurant,Sushi Restaurant,Italian Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Spanish Restaurant,Vietnamese Restaurant,Seafood Restaurant,Greek Restaurant,Middle Eastern Restaurant
27,Embla,Asian Restaurant,Italian Restaurant,Dumpling Restaurant,Vietnamese Restaurant,Chinese Restaurant,Japanese Restaurant,French Restaurant,Tapas Restaurant,Ramen Restaurant,Sushi Restaurant
76,Simpsons Burgers,Asian Restaurant,Dumpling Restaurant,Italian Restaurant,Vietnamese Restaurant,Chinese Restaurant,Japanese Restaurant,Greek Restaurant,Tapas Restaurant,Thai Restaurant,Ramen Restaurant
35,Ganache Chocolate,Asian Restaurant,Italian Restaurant,Japanese Restaurant,Thai Restaurant,Sushi Restaurant,Spanish Restaurant,Dumpling Restaurant,Korean Restaurant,Indonesian Restaurant,Vegetarian / Vegan Restaurant
18,Clementine's Fine Food & Gifts,Asian Restaurant,Sushi Restaurant,Italian Restaurant,Spanish Restaurant,Vietnamese Restaurant,Seafood Restaurant,Greek Restaurant,Vegetarian / Vegan Restaurant,Middle Eastern Restaurant,Portuguese Restaurant
70,Rooftop at QT,Asian Restaurant,Japanese Restaurant,Dumpling Restaurant,Vietnamese Restaurant,Chinese Restaurant,Italian Restaurant,Sushi Restaurant,Ramen Restaurant,Greek Restaurant,Korean Restaurant
61,Oli & Levi,Asian Restaurant,Italian Restaurant,Dumpling Restaurant,Japanese Restaurant,Vietnamese Restaurant,Tapas Restaurant,Thai Restaurant,French Restaurant,Ramen Restaurant,Chinese Restaurant
66,Pidapipó Gelateria,Asian Restaurant,Italian Restaurant,Sushi Restaurant,Spanish Restaurant,Vietnamese Restaurant,Seafood Restaurant,Greek Restaurant,Vegetarian / Vegan Restaurant,Middle Eastern Restaurant,Portuguese Restaurant
94,Union Electric,Chinese Restaurant,Dumpling Restaurant,Japanese Restaurant,Mexican Restaurant,Vietnamese Restaurant,Ramen Restaurant,Greek Restaurant,Sushi Restaurant,Tapas Restaurant,German Restaurant


In [131]:
# set number of clusters
kclusters = 5

melbourn_grouped_clustering = melbourne_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(melbourn_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 2, 2, 1, 2, 4, 1, 2, 4, 3], dtype=int32)

In [135]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)


melbourne_merged = melbourne_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
melbourne_merged = melbourne_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Venue')

melbourne_merged.head() # check the last columns!

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Tipo 00,-37.813527,144.961978,4.0,Japanese Restaurant,Korean Restaurant,Sushi Restaurant,Thai Restaurant,Italian Restaurant,Chinese Restaurant,Indonesian Restaurant,Mexican Restaurant,Gluten-free Restaurant,Dumpling Restaurant
1,Calia,-37.812724,144.96393,4.0,Sushi Restaurant,Japanese Restaurant,Italian Restaurant,Mexican Restaurant,Greek Restaurant,Dumpling Restaurant,Asian Restaurant,Indonesian Restaurant,Tapas Restaurant,Korean Restaurant
2,Trattoria Emilia,-37.81522,144.962636,2.0,Italian Restaurant,Thai Restaurant,Ramen Restaurant,Dumpling Restaurant,Scandinavian Restaurant,Chinese Restaurant,Indonesian Restaurant,Middle Eastern Restaurant,Dim Sum Restaurant,Japanese Restaurant
3,Nosh,-37.815396,144.962999,2.0,Italian Restaurant,Ramen Restaurant,Thai Restaurant,Dumpling Restaurant,Argentinian Restaurant,Asian Restaurant,Indonesian Restaurant,Sushi Restaurant,Brazilian Restaurant,Vegetarian / Vegan Restaurant
4,B'cos Brazil,-37.815486,144.963085,2.0,Italian Restaurant,Ramen Restaurant,Thai Restaurant,Dumpling Restaurant,Argentinian Restaurant,Asian Restaurant,Indonesian Restaurant,Sushi Restaurant,Brazilian Restaurant,Vegetarian / Vegan Restaurant


In [136]:
melbourne_merged = melbourne_merged.dropna()

In [137]:
melbourne_merged['Cluster Labels']=melbourne_merged['Cluster Labels'].astype(int)

In [139]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=15)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(melbourne_merged['Venue Latitude'][:1000], melbourne_merged['Venue Longitude'][:1000], melbourne_merged['Venue'][:1000], melbourne_merged['Cluster Labels'][:1000]):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [75]:
latitude

-37.8142176

In [76]:
longitude

144.9631608