# 1. Description of the problem

Cynthia would like to open a new clothing store in the city of Toronto. More specifically, she is interested in the area of downtown Toronto, so she wants to explore where are located the already existing stores so as to decide accordingly.

# 2. Description of the Data
The data that will be used are:
* **Toronto neighborhood data:**\
The following wikipedia page (https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M) will be used to pull information such as postal codes, boroughs and neighborhoods of Canada.
* **Venues in Toronto:**\
Foursquare API will help us to get venues by category, location and the relevant business names.

In [1]:
# Import the necessary libraries
from bs4 import BeautifulSoup
import requests
import pandas as pd
from IPython.display import display_html
import numpy as np
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
import random
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.21.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

In [2]:
# pull Canada info from wikipedia
wiki_page = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text
soup = BeautifulSoup(wiki_page, 'xml')
table=str(soup.table)

In [5]:
df = pd.read_html(table)
df=df[0]
df1=df[df.Borough!='Not assigned']

In [6]:
df2 = df1.groupby(['Postcode','Borough'], sort=False).agg(', '.join)
df2.reset_index(inplace=True)
df2['Neighbourhood'] = np.where(df2['Neighbourhood'] == 'Not assigned',df2['Borough'], df2['Neighbourhood'])
df2

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Downtown Toronto,Queen's Park
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson, Garden District"


In [7]:
df2.shape

(103, 3)

In [8]:
# keep only Downtown Toronto data
downtown_data = df2[df2['Borough']=='Downtown Toronto'].reset_index(drop=True)
downtown_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M5A,Downtown Toronto,Harbourfront
1,M7A,Downtown Toronto,Queen's Park
2,M5B,Downtown Toronto,"Ryerson, Garden District"
3,M5C,Downtown Toronto,St. James Town
4,M5E,Downtown Toronto,Berczy Park


In [9]:
# Find longitude and latitude of Downtown Toronto area
address = 'Downtown Toronto'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Downtown Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Downtown Toronto are 43.6541737, -79.38081164513409.


In [10]:
CLIENT_ID = 'VMPSY13RK0C2I2BB3EULDMKY11ZS3TKKIUGGPHWPTQZOEJDW' # Foursquare ID
CLIENT_SECRET = 'LUDOEOOOXPHFN5XNRGFB4GRMA45GNOW5XT2J104V1T2UAJPB' # Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: VMPSY13RK0C2I2BB3EULDMKY11ZS3TKKIUGGPHWPTQZOEJDW
CLIENT_SECRET:LUDOEOOOXPHFN5XNRGFB4GRMA45GNOW5XT2J104V1T2UAJPB


In [80]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=VMPSY13RK0C2I2BB3EULDMKY11ZS3TKKIUGGPHWPTQZOEJDW&client_secret=LUDOEOOOXPHFN5XNRGFB4GRMA45GNOW5XT2J104V1T2UAJPB&v=20180605&ll=43.6541737,-79.38081164513409&radius=500&limit=100'

In [81]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e62584840a7ea001b97ed28'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Bay Street Corridor',
  'headerFullLocation': 'Bay Street Corridor, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 128,
  'suggestedBounds': {'ne': {'lat': 43.6586737045, 'lng': -79.37460365419369},
   'sw': {'lat': 43.6496736955, 'lng': -79.38701963607448}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '57eda381498ebe0e6ef40972',
       'name': 'UNIQLO ユニクロ',
       'location': {'address': '220 Yonge St',
        'crossStreet': 'at Dundas St W',
        'lat': 43.65591027779457,
        'lng': -79.38064099181345,
        'labeledLatLngs': [

In [122]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [146]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.location.postalCode', 'venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(10)

Unnamed: 0,postalCode,name,categories,lat,lng
0,M5B 2H1,UNIQLO ユニクロ,Clothing Store,43.65591,-79.380641
1,M5B 2H1,Elgin And Winter Garden Theatres,Theater,43.653394,-79.378507
2,M5B 2H1,LUSH,Cosmetics Shop,43.653557,-79.3804
3,M5B 1V8,Ed Mirvish Theatre,Theater,43.655102,-79.379768
4,M5B 2H1,Indigo,Bookstore,43.653515,-79.380696
5,M5B 2H1,CF Toronto Eaton Centre,Shopping Mall,43.65454,-79.380677
6,M5B 2R8,Yonge-Dundas Square,Plaza,43.656054,-79.380495
7,M5G 2C9,Eggspectation Bell Trinity Square,Breakfast Spot,43.653144,-79.38198
8,M5G 1Z3,JOEY Eaton Centre,Restaurant,43.655404,-79.381929
9,M5B,Samsung Experience Store (Eaton Centre),Electronics Store,43.655648,-79.381011


In [147]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


In [156]:
# Select Clothing Stores
nearby_venues_cloth = nearby_venues[nearby_venues['categories']=='Clothing Store']
nearby_venues_cloth

Unnamed: 0,postalCode,name,categories,lat,lng
0,M5B 2H1,UNIQLO ユニクロ,Clothing Store,43.65591,-79.380641
13,M5B 2L9,Nordstrom,Clothing Store,43.655041,-79.380966
19,,Magic Tailor,Clothing Store,43.653742,-79.379745
20,M5B 2H1,Roots,Clothing Store,43.653613,-79.380244
27,M5B 2H1,Hollister Co.,Clothing Store,43.65448,-79.380914
38,M5B 2H1,Abercrombie & Fitch,Clothing Store,43.652915,-79.380495
45,M5B 2H1,lululemon athletica,Clothing Store,43.653394,-79.380722
92,M5B 1N8,Urban Outfitters,Clothing Store,43.654411,-79.380055


In [149]:
# create map of Downtown Toronto to have a view of Clothing Stores locations
map_downtown_toronto = folium.Map(location=[43.6541737, -79.38081164513409],zoom_start=10)

for lat,lng,name,postalCode in zip(nearby_venues_cloth['lat'],nearby_venues_cloth['lng'],nearby_venues_cloth['name'], nearby_venues_cloth['postalCode']):
    label = '{}, {}'.format(name, postalCode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_downtown_toronto)
map_downtown_toronto

## Modeling

In [157]:
# kmeans clustering
k=2
downtown_toronto_cluster = nearby_venues_cloth.drop(['name', 'categories', 'postalCode'],1)
kmeans = KMeans(n_clusters = k,random_state=0).fit(downtown_toronto_cluster)
kmeans.labels_
nearby_venues_cloth.insert(0, 'Cluster Labels', kmeans.labels_)
nearby_venues_cloth

Unnamed: 0,Cluster Labels,postalCode,name,categories,lat,lng
0,1,M5B 2H1,UNIQLO ユニクロ,Clothing Store,43.65591,-79.380641
13,1,M5B 2L9,Nordstrom,Clothing Store,43.655041,-79.380966
19,0,,Magic Tailor,Clothing Store,43.653742,-79.379745
20,0,M5B 2H1,Roots,Clothing Store,43.653613,-79.380244
27,1,M5B 2H1,Hollister Co.,Clothing Store,43.65448,-79.380914
38,0,M5B 2H1,Abercrombie & Fitch,Clothing Store,43.652915,-79.380495
45,0,M5B 2H1,lululemon athletica,Clothing Store,43.653394,-79.380722
92,0,M5B 1N8,Urban Outfitters,Clothing Store,43.654411,-79.380055


In [158]:
# create map that shows store clusters
map_clusters = folium.Map(location=[43.6541737, -79.38081164513409],zoom_start=10)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, name, postalCode, cluster in zip(nearby_venues_cloth['lat'], nearby_venues_cloth['lng'], nearby_venues_cloth['name'], nearby_venues_cloth['postalCode'], nearby_venues_cloth['Cluster Labels']):
    label = folium.Popup(' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Results
We identified that the downtown Toronto clothing stores are located very close each other on a specific area. they can clearly also be separated into two groups.
As a result if Cynthia wants to open a new clothing store she should consider opening it in this area neat any of those two groups of stores.

# Conclusion
This project helps a person to get a better understanding of the neighborhoods and the existing venues per neighborhood and more specifically for a neighberhood of his choice. So if someone wants for example, to open a new store in a specific area, he can apply this code with his choice of area so that he gets a vew of the locations of existing stores and how those are grouped.