# Detecting promising neighborhoods for the next boba storefront

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data that we need](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem  <a name='introduction'></a>

Toronto has witnessed the rapid growth of boba market in recent years. The competition among bubble tea brands is still in process. This project is to detect promising neighborhoods in Toronto to start the next boba storefront. It is clear that the target audience of the project are bubble tea sellers like giant boba brands like Coco, Chatime and Gong Cha, whose target cunsumers are young people.  

__In this project,places not already crowed with bubble tea cafe and near universities or colleges are considered to be otptimal locations.__  

After generating a number of prospective neighborhoods, we will have a closer look to these neighborhoods and analyze their advantages and disadvantages, in order to help decision makers to find the best possible site.

## Data that we need <a name='data'></a>

1. number of existing boba storefronts in the neighborhood   
2. number of universities and colleges, if any  

Following data sources will be needed to extract/generate the required information:
* names of Toronto neighborhoods will be obtained from **Wikipedia**
* coordinate of neighborhoods will be obtained from **cousera** in case of the disconnection of **geocoder**
* number of existing boba storefronts will be obtained using **Foursquare API**
* number of universities and colleges and their location in every neighborhood will be obtained using **Foursquare API** 
* number of shopping places and their location in every neighborhood will be obtained using **Foursquare API** 

In [2]:
# Import necessary libraries
import pandas as pd
import numpy as np
import string
import requests
import geocoder
from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim 
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium
from sklearn.cluster import KMeans
import csv

### Names and Coordinare of Neighborhoods 

#### The geographical scope is Toronto, so I continue to use the geographical neighbourhoods data I got last week, below is a link to the jupyter notebook with related code    
[applied_data_science_capstone.ipynb](https://github.com/JiaqiChen0119/coursera-IBM-capstone/blob/master/applied_data_science_capstone.ipynb)

In [3]:
toronto_data = pd.read_csv('toronto_data.csv')

In [4]:
toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


#### Let's crete a map showing _distribution of neighborhoods_.The coordinate of Toronto is from wikipedia.

In [5]:
#geographical coordinates of Toronto
latitude = 43.653226
longitude = -79.383184

In [6]:
#visulize neighborhoods in Toronto using latitude and longitude values
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=11)
# add markers to map
for lat, lng, label in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto

With coordinate of each neighborhood, let's use Foursquare API to get info on __bubble tea shops, schools, and shopping malls or shopping plazea__ in each neighborhood.The exploring radius of each neighborhood is set to be 1km.

In [10]:
CLIENT_ID = '*********' # your Foursquare ID
CLIENT_SECRET = '**********' # your Foursquare Secret
VERSION = '20180605'

Bubble_Tea_Shop = '52e81612bcbc57f1066b7a0c'#Category No of bubble tea shops
College_University = '4d4b7105d754a06372d81259'#Category No of colleges and universities
Shopping_Plaza = '5744ccdfe4b0c0459246b4dc' #Category No of shopping plazas
Shopping_Mall = '4bf58dd8d48988d1fd941735'#Category No of shopping malls

def getNearbyvenues(names,latitudes,longitudes,category,radius = 1000,LIMIT = 100):
    venues_list = []
    for name, lat, lng in zip(names, latitudes, longitudes):
        url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(
                CLIENT_ID,
                CLIENT_SECRET,
                lat,
                lng,
                VERSION,
                category,
                radius,
                LIMIT)
        try:
            results = requests.get(url).json()['response']['groups'][0]['items']
            venues_list.append([(name,
                               lat,
                               lng,
                               v['venue']['name'],
                               v['venue']['location']['lat'],
                               v['venue']['location']['lng']) for v in results])
        except:
            pass
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood',
                             'Neighbourhood Latitude',
                             'Neighbourhood Longitude',
                             'Venue',
                             'Venue Latitude',
                             'Venue Longitude']

    return(nearby_venues)

In [11]:
nearbybobashop = getNearbyvenues(toronto_data['Neighbourhood'],
                                  toronto_data['Latitude'],
                                  toronto_data['Longitude'],Bubble_Tea_Shop,radius = 1000,LIMIT = 100)

In [12]:
nearbybobashop.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude
0,"Maryvale,Wexford",43.750071,-79.295849,Pho Metro,43.745365,-79.294462
1,Agincourt,43.7942,-79.262029,Health Oolong Tea,43.789042,-79.268513
2,Agincourt,43.7942,-79.262029,Real Fruit Bubble Tea 真果茶坊,43.797208,-79.271523
3,"Agincourt North,L'Amoreaux East,Milliken,Steel...",43.815252,-79.284577,Go For Tea,43.814701,-79.292643
4,"Agincourt North,L'Amoreaux East,Milliken,Steel...",43.815252,-79.284577,OneZo Tapioca 丸作食茶,43.815838,-79.293655


In [13]:
nearbyCollege_University = getNearbyvenues(toronto_data['Neighbourhood'],
                                  toronto_data['Latitude'],
                                  toronto_data['Longitude'],College_University,radius = 1000,LIMIT = 100)

In [14]:
nearbyCollege_University.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude
0,"Rouge,Malvern",43.806686,-79.194353,Evergreen College,43.802162,-79.199654
1,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,UTSC - Physics Labs,43.780079,-79.156193
2,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,st brendan catholic school,43.783052,-79.149267
3,"Guildwood,Morningside,West Hill",43.763573,-79.188711,Z Cups Etc,43.767973,-79.188123
4,"Guildwood,Morningside,West Hill",43.763573,-79.188711,Boys and Girls Club of East Scarborough,43.757549,-79.193691


We're interested in schools because there are young people who are probably partial to sweeet bubble tea, but students are more likely to stay at school, college, office, institute and university, instead of other places like laboratories. So we will only consider venues that have any word from __'school', 'college', 'office', 'institute' and 'university'__. 

In [15]:
uni_words = ['school', 'college', 'office', 'institute','university']
uni_venues = []
for venue in nearbyCollege_University['Venue']:
    words=venue.lower()
    for uni_word in uni_words:
        if uni_word in words:
            print(words)
            uni_venues.append(venue)

evergreen college
st brendan catholic school
st. richard catholic school
oxford college
uwin pro o/a the canadian college for higher studies
cedarbrook jr public school
salaheddin islamic school
computek college
everest college - scarborough
institute of technical trades
brown's schoolhouse
cdi college scarborough
bond academy private school
medix college
trios college ica
trios college pta/ota
trios college career services
trios college scarborough
trios college book room
stephen leacock collegiate institute
success tutorial school
cliffwood public school
canadian memorial chiropractic college
william lyon mackenzie collegiate institute
master's college & seminary
umc high school
school of design and arts - room a802
yamaha music school
upper madison college
tdsb head office
toronto district school board
toronto catholic district school board
st. andrew's junior high school
eitz chaim day schools - administrative/patricia branch
toronto international college
metamorphosis greek orthod

In [16]:
nearbyCollege_University1 = nearbyCollege_University[nearbyCollege_University['Venue'].isin(uni_venues)]

In [17]:
nearbyCollege_University1.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude
0,"Rouge,Malvern",43.806686,-79.194353,Evergreen College,43.802162,-79.199654
2,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,st brendan catholic school,43.783052,-79.149267
5,Cedarbrae,43.773136,-79.239476,St. Richard Catholic School,43.768646,-79.240365
8,Cedarbrae,43.773136,-79.239476,Oxford College,43.777108,-79.250061
9,Cedarbrae,43.773136,-79.239476,UWIN Pro O/A the Canadian College for Higher S...,43.776911,-79.246111


__Shopping malls and shopping plazas__ can attract visitors and citizens, who are potential cunsumers of bubble tea.

In [18]:
nearby_shopping_plaza = getNearbyvenues(toronto_data['Neighbourhood'],
                                  toronto_data['Latitude'],
                                  toronto_data['Longitude'],Shopping_Plaza,radius = 1000,LIMIT = 100)

In [20]:
nearby_shopping_plaza.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude
0,"Guildwood,Morningside,West Hill",43.763573,-79.188711,Westhill Plaza,43.768039,-79.189845
1,"Dorset Park,Scarborough Town Centre,Wexford He...",43.75741,-79.273304,Midland Lawrence Plaza,43.756688,-79.265784
2,"Clarks Corners,Sullivan,Tam O'Shanter",43.781638,-79.304302,Warden Plaza,43.785275,-79.31041
3,"Agincourt North,L'Amoreaux East,Milliken,Steel...",43.815252,-79.284577,Midland Square,43.818752,-79.290249
4,"Agincourt North,L'Amoreaux East,Milliken,Steel...",43.815252,-79.284577,Maxim ii,43.814862,-79.293021


In [21]:
nearby_shopping_mall =getNearbyvenues(toronto_data['Neighbourhood'],
                                  toronto_data['Latitude'],
                                  toronto_data['Longitude'],Shopping_Mall,radius = 1000,LIMIT = 100)

In [22]:
nearby_shopping_mall.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude
0,"Rouge,Malvern",43.806686,-79.194353,Pleasant Corner,43.801164,-79.200254
1,"Guildwood,Morningside,West Hill",43.763573,-79.188711,Kingston Square,43.76986,-79.187158
2,"Guildwood,Morningside,West Hill",43.763573,-79.188711,Morningside Crossing,43.770599,-79.185541
3,"East Birchmount Park,Ionview,Kennedy Park",43.727929,-79.262029,Midland Plaza,43.73473,-79.259544
4,"Maryvale,Wexford",43.750071,-79.295849,Wexford Heights Plaza,43.746136,-79.293782


In [23]:
nearby_shopping = pd.concat([nearby_shopping_plaza,nearby_shopping_mall],axis = 0)

__Let's create an interactive map with all the neighbourhoods, collected boba shops, schools, shopping malls and shopping plazas in Toronto in different color.The density of boba shops is express via headmap. By clicking on different buttons on the control board, we can explore the correlation between different types (boba shops, schools, shopping malls and shopping plazas)of venues, in order to try to discover useful information.__

In [168]:
# create map
lgd_txt = '<span style="color: {col};">{txt}</span>'
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=11)
fgn = folium.FeatureGroup(name= lgd_txt.format( txt= 'neighborhoods',col='blue'))
fgs = folium.FeatureGroup(name= lgd_txt.format( txt= 'schools',col='green'))
fgsp = folium.FeatureGroup(name= lgd_txt.format( txt= 'shopping malls',col='orange'))
fgbb = folium.FeatureGroup(name= lgd_txt.format( txt= 'boba frontstores heatmap',col='red'))
# add markers to the map
for lat, lng, label in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_opacity=0.7,
        parse_html=False).add_to(fgn) 
map_Toronto.add_child(fgn)
for lat, lng, label in zip(nearbyCollege_University1['Venue Latitude'], nearbyCollege_University1['Venue Longitude'], nearbyCollege_University1['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_opacity=0.7,
        parse_html=False).add_to(fgs)
map_Toronto.add_child(fgs)

for lat, lng, label in zip(nearby_shopping['Venue Latitude'], nearby_shopping['Venue Longitude'], nearby_shopping['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='orange',
        fill=True,
        fill_opacity=0.7,
        parse_html=False).add_to(fgsp)
map_Toronto.add_child(fgsp)
HeatMap(nearbybobashop[['Venue Latitude','Venue Longitude']],radius=25, blur=10).add_to(fgbb)
map_Toronto.add_child(fgbb)
folium.map.LayerControl('bottomright', collapsed= False).add_to(map_Toronto)
map_Toronto   

Now we have all the boba shops,school venues, shopping venues in neighbourhoods within Toronto, and we can roughly estimate the number of boba shops in boroughs: Old Toronto and North York have the highest one, while Scarborough, East York,Mississauga and Etobicoke have not been "invaded" by a large number of boba shops. 

We're now ready to concatenate these data for analysis to produce the report on optimal locations for a new boba frontstore.

In [54]:
neighborhoodslist = toronto_data[['Neighbourhood','Latitude','Longitude']]

In [55]:
neighborhoodslist.set_index('Neighbourhood',inplace = True)

In [56]:
neighborhoodslist.head(1)

Unnamed: 0_level_0,Latitude,Longitude
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1
"Rouge,Malvern",43.806686,-79.194353


In [57]:
neighborhoodslist1 = pd.merge(neighborhoodslist, pd.DataFrame(nearbybobashop.groupby('Neighbourhood').count()['Venue']), how='outer',left_index=True, right_index=True)
neighborhoodslist1.rename(columns={'Venue':'bobashop'}, inplace=True)

In [58]:
neighborhoodslist1.head()

Unnamed: 0_level_0,Latitude,Longitude,bobashop
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"Adelaide,King,Richmond",43.650571,-79.384568,27.0
Agincourt,43.7942,-79.262029,2.0
"Agincourt North,L'Amoreaux East,Milliken,Steeles East",43.815252,-79.284577,2.0
"Albion Gardens,Beaumond Heights,Humbergate,Jamestown,Mount Olive,Silverstone,South Steeles,Thistletown",43.739416,-79.588437,1.0
"Alderwood,Long Branch",43.602414,-79.543484,


In [59]:
neighborhoodslist2 = pd.merge(neighborhoodslist1, pd.DataFrame(nearbyCollege_University1.groupby('Neighbourhood').count()['Venue']), how='outer',left_index=True, right_index=True)
neighborhoodslist2.rename(columns={'Venue':'school'}, inplace=True)

In [60]:
neighborhoodslist3 = pd.merge(neighborhoodslist2, pd.DataFrame(nearby_shopping.groupby('Neighbourhood').count()['Venue']), how='outer',left_index=True, right_index=True)
neighborhoodslist3.rename(columns={'Venue':'shopping'}, inplace=True)

In [61]:
neighborhoodslist3.fillna(0,inplace = True)

## Methodology <a name="methodology"></a>

The goal of this project is to provide a list of promising neighbourhoods for new boba storefronts.These neighbourhoods should satisfy the primary criterium of low density of boba shops. Then we give preference to neighbourhoods with as many as possible number of schools and shopping venues.  

To clearly define the scope of searching area, we will only take into consideration venues in radius of 1000 meters about the given coordinate of neighbourhoods.  

Above we have collected the required **data: coordinate of each neighbourhood in Toronto, location and number of boba shops in each neighbourhood, location and number of school venues in each neighbourhood, location and number of shopping venues in each neighbourhood within 1km from neighbourhood center**.

In the next step, we will use machine learning methods to create clusters and try to find out ones that meet up with the expectation of promising candidates.After machine learning, we will also filter promising neighbourhoods that have no more than 5 boba shops and more than 3 school venues.  

In the final part, we will focus on visualizing the machine learning result and filter result on map.We will present all possible neighbourhoods for new boba stores on map, to offer a 'neighbourhood level' reconmendation to a specific boba brands.


#### Try to use machine learning method k-means

In [62]:
neighborhoodslist3 = neighborhoodslist3.reset_index()
# set number of clusters
kclusters = 4

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(neighborhoodslist3[['bobashop','school','shopping']])

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 1, 1, 1, 1, 1, 1, 3, 1])

In [63]:
# add clustering labels
neighborhoodslist3.insert(0, 'Cluster Labels', kmeans.labels_)

Manually select promising neighbourhoods with the requirement of no more than 5 boba shops and more than 3 scools.

In [64]:
neighborhoodslist3.head()

Unnamed: 0,Cluster Labels,Neighbourhood,Latitude,Longitude,bobashop,school,shopping
0,0,"Adelaide,King,Richmond",43.650571,-79.384568,27.0,26.0,20.0
1,1,Agincourt,43.7942,-79.262029,2.0,5.0,5.0
2,1,"Agincourt North,L'Amoreaux East,Milliken,Steel...",43.815252,-79.284577,2.0,1.0,5.0
3,1,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",43.739416,-79.588437,1.0,2.0,1.0
4,1,"Alderwood,Long Branch",43.602414,-79.543484,0.0,1.0,2.0


In [65]:
prolist = neighborhoodslist3[(neighborhoodslist3['bobashop']<6)&(neighborhoodslist3['school']>3)].sort_values(by=['school','shopping'],ascending = False)

## Analysis <a name="analysis"></a>

Let's perform some basic explanatory data analysis and derive some additional info from our raw data. First let's have a closer look to each cluster we got from k-means:

There are 7 neighbourhoods in cluster0, and the number of bobashops, schools, and shopping venues are close.

In [67]:
neighborhoodslist3[neighborhoodslist3['Cluster Labels'] == 0].reset_index()

Unnamed: 0,index,Cluster Labels,Neighbourhood,Latitude,Longitude,bobashop,school,shopping
0,0,0,"Adelaide,King,Richmond",43.650571,-79.384568,27.0,26.0,20.0
1,20,0,"Chinatown,Grange Park,Kensington Market",43.653206,-79.400049,19.0,28.0,7.0
2,22,0,Church and Wellesley,43.66586,-79.38316,24.0,30.0,10.0
3,27,0,"Commerce Court,Victoria Hotel",43.648198,-79.379817,19.0,18.0,18.0
4,32,0,"Design Exchange,Toronto Dominion Centre",43.647177,-79.381576,16.0,15.0,18.0
5,44,0,"First Canadian Place,Underground city",43.648429,-79.38228,21.0,18.0,20.0
6,83,0,St. James Town,43.651494,-79.375418,17.0,34.0,16.0


There are 88 neighbourhoods in cluster1, the biggest cluster. Most of them far from schools and shopping areas. This cluster supports the assumption that low density of schools and shopping hinders the development of boba market.

In [68]:
neighborhoodslist3[neighborhoodslist3['Cluster Labels'] == 1].reset_index()

Unnamed: 0,index,Cluster Labels,Neighbourhood,Latitude,Longitude,bobashop,school,shopping
0,1,1,Agincourt,43.794200,-79.262029,2.0,5.0,5.0
1,2,1,"Agincourt North,L'Amoreaux East,Milliken,Steel...",43.815252,-79.284577,2.0,1.0,5.0
2,3,1,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",43.739416,-79.588437,1.0,2.0,1.0
3,4,1,"Alderwood,Long Branch",43.602414,-79.543484,0.0,1.0,2.0
4,5,1,"Bathurst Manor,Downsview North,Wilson Heights",43.754328,-79.442259,0.0,0.0,1.0
5,6,1,Bayview Village,43.786947,-79.385975,0.0,0.0,2.0
6,7,1,"Bedford Park,Lawrence Manor East",43.733283,-79.419750,0.0,0.0,0.0
7,9,1,"Birch Cliff,Cliffside West",43.692657,-79.264848,0.0,1.0,0.0
8,10,1,"Bloordale Gardens,Eringate,Markland Wood,Old B...",43.643515,-79.577201,0.0,1.0,2.0
9,11,1,"Brockton,Exhibition Place,Parkdale Village",43.636847,-79.428191,2.0,1.0,3.0


Having 3 neighbourhoods, cluster2 shows a distinguishing feature of extremely high number of bobashops, although there are also a large number of schools and shopping places.

In [69]:
neighborhoodslist3[neighborhoodslist3['Cluster Labels'] == 2]

Unnamed: 0,Cluster Labels,Neighbourhood,Latitude,Longitude,bobashop,school,shopping
19,2,Central Bay Street,43.657952,-79.387383,45.0,37.0,16.0
75,2,Queen's Park,43.662301,-79.389494,35.0,33.0,10.0
80,2,"Ryerson,Garden District",43.657162,-79.378937,28.0,37.0,16.0


The 4 neighbourhoods in Cluster3 have less number of bobashops than cluster0 and cluster2, but higher number of schools and shopping places than cluster1.

In [77]:
neighborhoodslist3[neighborhoodslist3['Cluster Labels'] == 3].reset_index(drop=True)

Unnamed: 0,Cluster Labels,Neighbourhood,Latitude,Longitude,bobashop,school,shopping
0,3,Berczy Park,43.644771,-79.373306,2.0,14.0,6.0
1,3,"Harbord,University of Toronto",43.662696,-79.400049,8.0,28.0,2.0
2,3,"Harbourfront,Regent Park",43.65426,-79.360636,1.0,17.0,1.0
3,3,Stn A PO Boxes 25 The Esplanade,43.646435,-79.374846,5.0,20.0,11.0


__After scrolling down the 4 clusters, we can regard cluster3 as the cluster that reaches the criterium of promising starting points for the next boba frontstore.__  
Now we can visualize __the resulting clusters on Toronto map__ and show the __promising neighbourhoods in cluster3 on heatmap of boba shops__.

In [71]:
neighborhoodslist3.groupby('Cluster Labels').mean()

Unnamed: 0_level_0,Latitude,Longitude,bobashop,school,shopping
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,43.652134,-79.383838,20.428571,24.142857,15.571429
1,43.71263,-79.399497,0.808989,1.550562,1.370787
2,43.659139,-79.385271,36.0,35.666667,14.0
3,43.65204,-79.377209,4.0,19.75,5.0


In [184]:
neighborhoodslist3.reset_index(inplace = True)

In [186]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(neighborhoodslist3['Latitude'], neighborhoodslist3['Longitude'], neighborhoodslist3['Neighbourhood'], neighborhoodslist3['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [160]:
proneis = neighborhoodslist3[neighborhoodslist3['Cluster Labels'] == 3]
# create map
# create map
lgd_txt = '<span style="color: {col};">{txt}</span>'
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=11)
fgn = folium.FeatureGroup(name= lgd_txt.format( txt= 'neighborhoods',col='blue'))
fgs = folium.FeatureGroup(name= lgd_txt.format( txt= 'schools',col='green'))
fgsp = folium.FeatureGroup(name= lgd_txt.format( txt= 'shopping malls',col='orange'))
fgbb = folium.FeatureGroup(name= lgd_txt.format( txt= 'boba frontstores heatmap',col='red'))
# add markers to the map
for lat, lng, label in zip(proneis['Latitude'], proneis['Longitude'], proneis['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_opacity=0.7,
        parse_html=False).add_to(fgn) 
map_Toronto.add_child(fgn)
for lat, lng, label in zip(nearbyCollege_University1['Venue Latitude'], nearbyCollege_University1['Venue Longitude'], nearbyCollege_University1['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_opacity=0.7,
        parse_html=False).add_to(fgs)
map_Toronto.add_child(fgs)

for lat, lng, label in zip(nearby_shopping['Venue Latitude'], nearby_shopping['Venue Longitude'], nearby_shopping['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='orange',
        fill=True,
        fill_opacity=0.7,
        parse_html=False).add_to(fgsp)
map_Toronto.add_child(fgsp)
HeatMap(nearbybobashop[['Venue Latitude','Venue Longitude']],radius=25, blur=10).add_to(fgbb)
map_Toronto.add_child(fgbb)
folium.map.LayerControl('bottomright', collapsed= False).add_to(map_Toronto)
map_Toronto   

Let's look at the manually selected neighbourhoods.

In [250]:
# create map
lgd_txt = '<span style="color: {col};">{txt}</span>'
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=11)
fgn = folium.FeatureGroup(name= lgd_txt.format( txt= 'promising neighborhoods',col='blue'))
fgs = folium.FeatureGroup(name= lgd_txt.format( txt= 'schools',col='green'))
fgsp = folium.FeatureGroup(name= lgd_txt.format( txt= 'shopping malls',col='orange'))
fgbb = folium.FeatureGroup(name= lgd_txt.format( txt= 'boba frontstores heatmap',col='red'))
# add markers to the map
for lat, lng, label in zip(prolist['Latitude'], prolist['Longitude'], prolist['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_opacity=0.7,
        parse_html=False).add_to(fgn) 
map_Toronto.add_child(fgn)
for lat, lng, label in zip(nearbyCollege_University1['Venue Latitude'], nearbyCollege_University1['Venue Longitude'], nearbyCollege_University1['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_opacity=0.7,
        parse_html=False).add_to(fgs)
map_Toronto.add_child(fgs)

for lat, lng, label in zip(nearby_shopping['Venue Latitude'], nearby_shopping['Venue Longitude'], nearby_shopping['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='orange',
        fill=True,
        fill_opacity=0.7,
        parse_html=False).add_to(fgsp)
map_Toronto.add_child(fgsp)
HeatMap(nearbybobashop[['Venue Latitude','Venue Longitude']],radius=25, blur=10).add_to(fgbb)
map_Toronto.add_child(fgbb)
folium.map.LayerControl('bottomright', collapsed= False).add_to(map_Toronto)
map_Toronto   

Check the same neighbourhoods in both clusters3 and manual selection.

In [75]:
prolist.reset_index(drop=True)

Unnamed: 0,Cluster Labels,Neighbourhood,Latitude,Longitude,bobashop,school,shopping
0,3,Stn A PO Boxes 25 The Esplanade,43.646435,-79.374846,5.0,20.0,11.0
1,3,"Harbourfront,Regent Park",43.65426,-79.360636,1.0,17.0,1.0
2,3,Berczy Park,43.644771,-79.373306,2.0,14.0,6.0
3,1,"The Annex,North Midtown,Yorkville",43.67271,-79.405678,4.0,8.0,2.0
4,1,Davisville,43.704324,-79.38879,4.0,7.0,3.0
5,1,Agincourt,43.7942,-79.262029,2.0,5.0,5.0
6,1,Davisville North,43.712751,-79.390197,4.0,5.0,3.0
7,1,Don Mills North,43.745906,-79.352188,0.0,5.0,0.0
8,1,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",43.686412,-79.400049,0.0,4.0,4.0
9,1,"Little Portugal,Trinity",43.647927,-79.41975,0.0,4.0,3.0


In [253]:
proneis.set_index('Neighbourhood')

Unnamed: 0_level_0,Cluster Labels,Latitude,Longitude,bobashop,school,shopping
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,3,43.644771,-79.373306,2.0,14.0,6.0
"Harbord,University of Toronto",3,43.662696,-79.400049,8.0,28.0,2.0
"Harbourfront,Regent Park",3,43.65426,-79.360636,1.0,17.0,1.0
Stn A PO Boxes 25 The Esplanade,3,43.646435,-79.374846,5.0,20.0,11.0


In [255]:
pd.merge(proneis.set_index('Neighbourhood'),prolist.set_index('Neighbourhood'),left_index=True, right_index=True)

Unnamed: 0_level_0,Cluster Labels_x,Latitude_x,Longitude_x,bobashop_x,school_x,shopping_x,index,Cluster Labels_y,Latitude_y,Longitude_y,bobashop_y,school_y,shopping_y
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Berczy Park,3,43.644771,-79.373306,2.0,14.0,6.0,8,3,43.644771,-79.373306,2.0,14.0,6.0
"Harbourfront,Regent Park",3,43.65426,-79.360636,1.0,17.0,1.0,51,3,43.65426,-79.360636,1.0,17.0,1.0
Stn A PO Boxes 25 The Esplanade,3,43.646435,-79.374846,5.0,20.0,11.0,84,3,43.646435,-79.374846,5.0,20.0,11.0


Berczy Park, Harbourfront,Regent Park, Stn A PO Boxes 25 The Esplanade are selected by both approaches, and they all situate in the south of old Toronto near the seashore.

## Results and Discussion <a name="results"></a>

Purpose of this project was to detect promising neighborhoods for new boba storefront, which have low density of bubble tea shops but high of schools and shopping malls. At the beginning, we explore all bubble tea shops, schools, shopping places within 1km from each neighbor center. By plotting a heatmap, we roughly know the distribution of boba shops in Toronto. To achieve the goal, we try two methods, one is k-means and the other is manual selection. There are 4 neighborhoods in k-means results but 13 in manual selection. The coincidence of 3 neighborhoods somewhat proofs the work of machine learning.  

Result of K-Means is a cluster including Berczy Park, Harbord, University of Toronto, Harbourfront, Regent Park, Stn A PO Boxes 25 The Esplanade. Result of manual selection is a cluster containing 13 neighborhoods. The common ones that they select are Berczy Park, Harbourfront, Regent Park, Stn A PO Boxes 25 The Esplanade. However, these results certainly do not imply these neighborhoods are ideal locations for a new boba storefront. Our goal in this analysis is detecting neighborhoods with as less as possible boba shops but as many as possible schools and shopping places within 1km.It is entirely possible that there is factors that stop boba brands from opening storefront in these neighborhoods, like low consumption level of people in the neighborhood, high rent, or stress of strong competitors. For further research, below the Top10 boba brands sorted by number of boba storefronts in Toronto is provided.

In [35]:
pd.DataFrame(nearbybobashop.groupby('Venue').count().sort_values(by='Neighbourhood',ascending = False)['Neighbourhood'][:10])

Unnamed: 0_level_0,Neighbourhood
Venue,Unnamed: 1_level_1
Real Fruit Bubble Tea,20
Tea Shop 168,15
Sharetea,12
Chatime,11
Presotea,10
CoCo Fresh Tea & Juice,9
Chatime Atealier,9
Chatime 日出茶太,9
The Alley,9
ZenQ,8


In [31]:
pd.DataFrame(nearbybobashop.groupby('Venue').count().sort_values(by='Neighbourhood',ascending = False).index.values[:10]).rename(columns={0:'boba brand'})

Unnamed: 0,boba brand
0,Real Fruit Bubble Tea
1,Tea Shop 168
2,Sharetea
3,Chatime
4,Presotea
5,CoCo Fresh Tea & Juice
6,Chatime Atealier
7,Chatime 日出茶太
8,The Alley
9,ZenQ


Recommended neighbourhoods should therefore be considered only as a starting point for more detailed analysis which could eventually result in location which has not only no nearby competition but also other factors taken into account and all other relevant conditions met.

## Conclusion <a name="conclusion"></a>

Final decision on optimal location will be made by stakeholders of boba shops based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (traffic, parking, sightseeing attractions),residence composition, real estate availability, prices, social and economic dynamics of every neighborhood etc.