# A- Introduction and Business Problem

### A company who sells chocolate wants to enter a new market while segmenting coffee shops. This company wants to know the most recommended places in Canada to start a social media marketing campaign. The region has to be relatively small with a high number of coffee shops with good ratings.

### To provide the best recommendations for the chocolate company, we propose to choose a well-connected city with a relatively big population. Then, we will find the coffee shops with the highest frequencies and choose among them the best places depending on ratings.

### For the location, we proposed Toronto for the following reasons:
1-	Toronto is the capital of Ontario and is the most populated city in Canada (around 3 million in 2018). Toronto census metropolitan area (CMA) has a population of approximately 6 million. Therefore, it is Canada's most populous metropolis.

2-	Toronto is also an international center of “business, finance, arts, and culture, and is recognized as one of the most multicultural and cosmopolitan cities in the world” as per Wikipedia.

3-	Toronto area is interspersed with rivers, ravines and forests. Its current area is 630.2 km2.The city has a diverse population and is an important destination for immigrants to Canada.

4-	The city is a center for music, theatre, movie productionand television production. It contains cultural institutions like museums, galleries, festival and entertainment districts, national historic sites, and sports centers with over 43 million tourists each year.

5-	Toronto Stock Exchange (the headquarters of Canada's five largest banks)and multinational corporations are also located at Toronto. Also as per Wikipedia, “Its economy is highly diversified with strengths in technology, design, financial services, life sciences, education, arts, fashion, aerospace, environmental innovation, food services, and tourism.”

6-	Toronto is a great distribution point for the industrial sector. The city has a strategic position along the Quebec City–Windsor Corridor and has well connected infrastructure, roads and rails linking it to the surrounding cities.

### Therefore, as part of this project, we will list and visualize all coffee shops that would be our target in the social media marketing strategy for selling new chocolate bars in Toronto City. They will have the highest frequencies in their neighborhoods and good ratings.

---------

# B- Data:

### For this project we need to get the following data and start exploring it and we will download all the dependencies that we will need in the notebook:

1•	Toronto City data that contains lists of neighborhoods.We will rely on the postal codes to represent neighborhoods getting the table from Wikipedia:

    	To get their latitudes and longitudes, we will rely on the data source : https://cocl.us/Geospatial_data

    	Description: This data set contains the essential columns. 
         We will use it to explore various neighborhoods of Toronto city while focusing on coffee shops.



2•	Coffee shops in each neighborhood of Toronto city.

    	Data source : Foursquare API

    	Description: By using this API, we will get all the venues and coffee shops in each neighborhood. 
        We can filter these venues to get only the ones with the highest frequencies.



3•	Maps:

    	We will rely on Folium to get the maps and visualize the locations of the chosen shops.


In [5]:
#Getting the data
#Downloading all the dependencies

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json # library to handle JSON files
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
from sklearn.datasets.samples_generator import make_blobs
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
from bs4 import BeautifulSoup
import lxml
print('Libraries imported.')

Libraries imported.


In [7]:
#Getting data and preparing it for analysis

In [8]:
# download data and parse it:
r = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
soup = BeautifulSoup(r.text, 'html.parser')
table=soup.find('table', attrs={'class':'wikitable sortable'})

#get headers:
headers=table.findAll('th')
for i, head in enumerate(headers): headers[i]=str(headers[i]).replace("<th>","").replace("</th>","").replace("\n","")

#Find all items and skip first one:
rows=table.findAll('tr')
rows=rows[1:len(rows)]

# skip all meta symbols and line feeds between rows:
for i, row in enumerate(rows): rows[i] = str(rows[i]).replace("\n</td></tr>","").replace("<tr>\n<td>","")

# make dataframe, expand rows and drop the old one:
df=pd.DataFrame(rows)
df[headers] = df[0].str.split("</td>\n<td>", n = 2, expand = True) 
df.drop(columns=[0],inplace=True)

In [9]:
# skip not assigned boroughs:
df = df.drop(df[(df.Borough == "Not assigned")].index)
# give "Not assigned" Neighborhoods same name as Borough:
df.Neighbourhood.replace("Not assigned", df.Borough, inplace=True)

# copy Borough value to Neighborhood if NaN:
df.Neighbourhood.fillna(df.Borough, inplace=True)
# drop duplicate rows:
df=df.drop_duplicates()

# extract titles from columns
df.update(
    df.Neighbourhood.loc[
        lambda x: x.str.contains('title')
    ].str.extract('title=\"([^\"]*)',expand=False))

df.update(
    df.Borough.loc[
        lambda x: x.str.contains('title')
    ].str.extract('title=\"([^\"]*)',expand=False))

# delete Toronto annotation from Neighbourhood:
df.update(
    df.Neighbourhood.loc[
        lambda x: x.str.contains('Toronto')
    ].str.replace(", Toronto",""))
df.update(
    df.Neighbourhood.loc[
        lambda x: x.str.contains('Toronto')
    ].str.replace("\(Toronto\)",""))


In [10]:
# combine multiple neighborhoods with the same post code
df2 = pd.DataFrame({'Postcode':df.Postcode.unique()})
df2['Borough']=pd.DataFrame(list(set(df['Borough'].loc[df['Postcode'] == x['Postcode']])) for i, x in df2.iterrows())
df2['Neighborhood']=pd.Series(list(set(df['Neighbourhood'].loc[df['Postcode'] == x['Postcode']])) for i, x in df2.iterrows())
df2['Neighborhood']=df2['Neighborhood'].apply(lambda x: ', '.join(x))
df2.dtypes

df2.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,Queen's Park


In [11]:
#Getting more data: Geo-spatial data

dfll= pd.read_csv("http://cocl.us/Geospatial_data")
dfll.rename(columns={'Postal Code':'Postcode'}, inplace=True)
dfll.set_index("Postcode")
df2.set_index("Postcode")
toronto_data=pd.merge(df2, dfll)

toronto_data.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494


In [12]:
address = 'Toronto, ON, Canada'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('Geographical coordinates of Toronto, ON, Canada: {}, {}.'.format(latitude, longitude))

Geographical coordinates of Toronto, ON, Canada: 43.653963, -79.387207.


In [18]:
#We got the longitude and latitude
!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library


Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    altair-4.0.1               |             py_0         575 KB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.0 MB

The following NEW packages will be 

In [19]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Borough'], toronto_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Let's enter credentials for Foursquare to get the remaining data

In [21]:
CLIENT_ID = 'KLLVC0JYMWQ30TWJ3CAY0PZRZWLHMVDKOXWUNGAU4K4L1RWF' # my Foursquare ID
CLIENT_SECRET = 'CHSZ2PARKPP441LI333HATNCBUJ5DDFGEGLG3333S2HPZIQC' # your Foursquare Secret
VERSION = '20200301' # Foursquare API version

In [29]:
toronto_data.loc[0, 'Neighborhood']
neighborhood_latitude = toronto_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = toronto_data.loc[0, 'Longitude'] # neighborhood longitude value
neighborhood_name = toronto_data.loc[0, 'Neighborhood'] # neighborhood name


## Here, we will make a query for coffee shops in a radius of 2000m

In [37]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 2000 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&query=coffee'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=KLLVC0JYMWQ30TWJ3CAY0PZRZWLHMVDKOXWUNGAU4K4L1RWF&client_secret=CHSZ2PARKPP441LI333HATNCBUJ5DDFGEGLG3333S2HPZIQC&v=20200301&ll=43.7532586,-79.3296565&radius=2000&limit=100&query=coffee'

In [38]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e5d4cbc0de0d9001b8c8f3a'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Parkwoods - Donalda',
  'headerFullLocation': 'Parkwoods - Donalda, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'query': 'coffee',
  'totalResults': 16,
  'suggestedBounds': {'ne': {'lat': 43.77125861800002,
    'lng': -79.30478345939711},
   'sw': {'lat': 43.735258581999986, 'lng': -79.35452954060288}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '57e286f2498e43d84d92d34a',
       'name': 'Tim Hortons',
       'location': {'address': '215 Brookbanks',
        'crossStreet': 'York Miils Rd',
        'lat': 43.76066827030228,
        'lng': -79.326367635

In [39]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [40]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Tim Hortons,Café,43.760668,-79.326368
1,Baretto Caffé,Café,43.744456,-79.34646
2,Starbucks,Coffee Shop,43.754199,-79.351382
3,Aroma Espresso Bar,Coffee Shop,43.77088,-79.331775
4,Tim Hortons,Coffee Shop,43.741579,-79.318966


# As we can see in the table above, all categories are Café, Coffee shop, Coffee shops...

In [41]:

print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

16 venues were returned by Foursquare.


# Below, we will do the same to the others venues to get all coffee shops in the provided radius by Foursquare.

In [45]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&query=coffee'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [46]:
toronto_venues = getNearbyVenues(names=toronto_data['Neighborhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park
Lawrence Manor, Lawrence Heights
Queen's Park 
Islington Avenue
Malvern, Rouge
Don Mills North
Parkview Hill, Woodbine Gardens
Ryerson, Garden District
Glencairn
West Deane Park, Cloverdale, Princess Gardens, Martin Grove, Islington
Rouge Hill, Highland Creek , Port Union
Flemingdon Park, Don Mills South
Woodbine Heights
St. James Town
Humewood-Cedarvale
Bloordale Gardens, Eringate, Old Burnhamthorpe, Markland Wood
Morningside, Guildwood, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Downsview North, Wilson Heights
Thorncliffe Park
King, Richmond, Adelaide
Dovercourt Village, Dufferin
Scarborough Village
Henry Farm, Oriole, Fairview
Northwood Park, York University
East Toronto
Union Station , Toronto Islands, Harbourfront East
Little Portugal, Trinity–Bellwoods
East Birchmount Park, Ionview, Kennedy Park
Bayview Village
CFB Toronto, Downsview East
Riverd

In [49]:
print(toronto_venues.shape)
toronto_venues.head(2)

(944, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
1,Regent Park,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop


## The size of the resulting dataframe is (944, 7)

In [50]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,2,2,2,2,2,2
"Alderwood, Long Branch",4,4,4,4,4,4
"Bathurst Manor, Downsview North, Wilson Heights",2,2,2,2,2,2
Bayview Village,1,1,1,1,1,1
"Bedford Park, Lawrence Manor East",4,4,4,4,4,4
Berczy Park,23,23,23,23,23,23
"Bloordale Gardens, Eringate, Old Burnhamthorpe, Markland Wood",2,2,2,2,2,2
"Brockton, Parkdale Village, Exhibition Place",6,6,6,6,6,6
Business Reply Mail Processing Centre 969 Eastern,1,1,1,1,1,1
Caledonia-Fairbanks,1,1,1,1,1,1


In [51]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 31 uniques categories.


# Now that we got our data, we will start the analysis in week 2 with one hot encoding, get dummies, frequencies and, finally, maps.

In [52]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighborhood,Arts & Crafts Store,Bakery,Bank,Bar,Bookstore,Breakfast Spot,Bubble Tea Shop,Cafeteria,Café,Chinese Restaurant,Coffee Shop,College Quad,Comic Shop,Convenience Store,Coworking Space,Creperie,Deli / Bodega,Dessert Shop,Donut Shop,Eastern European Restaurant,French Restaurant,Hotel Bar,Ice Cream Shop,Italian Restaurant,Juice Bar,Lounge,Pool Hall,Restaurant,Sandwich Place,Tea Room,Vegetarian / Vegan Restaurant
0,Victoria Village,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Regent Park,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Regent Park,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Regent Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
4,Regent Park,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [None]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

In [55]:
num_top_venues = 3

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                 venue  freq
0          Coffee Shop   1.0
1  Arts & Crafts Store   0.0
2        Deli / Bodega   0.0


----Alderwood, Long Branch----
                 venue  freq
0                 Café   0.5
1          Coffee Shop   0.5
2  Arts & Crafts Store   0.0


----Bathurst Manor, Downsview North, Wilson Heights----
                 venue  freq
0          Coffee Shop   1.0
1  Arts & Crafts Store   0.0
2        Deli / Bodega   0.0


----Bayview Village----
                 venue  freq
0                 Café   1.0
1  Arts & Crafts Store   0.0
2        Deli / Bodega   0.0


----Bedford Park, Lawrence Manor East----
                 venue  freq
0                 Café   0.5
1          Coffee Shop   0.5
2  Arts & Crafts Store   0.0


----Berczy Park----
         venue  freq
0  Coffee Shop  0.74
1         Café  0.17
2     Creperie  0.04


----Bloordale Gardens, Eringate, Old Burnhamthorpe, Markland Wood----
                 venue  freq
0                 Café   0.5
1    

## So, we got the frequencies mainly for Coffee Shop types and Café

#### We put them in a pandas dataframe
First, we sort the venues in descending order.

In [56]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [57]:
num_top_venues = 3

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,Agincourt,Coffee Shop,Vegetarian / Vegan Restaurant,Coworking Space
1,"Alderwood, Long Branch",Café,Coffee Shop,Vegetarian / Vegan Restaurant
2,"Bathurst Manor, Downsview North, Wilson Heights",Coffee Shop,Vegetarian / Vegan Restaurant,Coworking Space
3,Bayview Village,Café,Vegetarian / Vegan Restaurant,Coworking Space
4,"Bedford Park, Lawrence Manor East",Café,Coffee Shop,Vegetarian / Vegan Restaurant


### As we see in the table above, we found the neighborhoods with "Coffee Shop" and "Café" as the most common venues.

#### Now we will cluster the Neighborhoods.
k-means to cluster the neighborhood into 5 clusters.

In [58]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([4, 2, 4, 3, 2, 0, 2, 2, 1, 3], dtype=int32)

In [59]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,,,,
1,M4A,North York,Victoria Village,43.725882,-79.315572,4.0,Coffee Shop,Vegetarian / Vegan Restaurant,Coworking Space
2,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636,0.0,Coffee Shop,Café,Bakery
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,4.0,Coffee Shop,Vegetarian / Vegan Restaurant,Coworking Space
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,0.0,Coffee Shop,Café,Vegetarian / Vegan Restaurant


### Visualize the resulting clusters on map

In [60]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        #color=rainbow[cluster-1],
        fill=True,
        #fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examining each cluster and determining the discriminating venue categories that distinguish each cluster.

#### Cluster 1

In [61]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
2,Downtown Toronto,0.0,Coffee Shop,Café,Bakery
4,Downtown Toronto,0.0,Coffee Shop,Café,Vegetarian / Vegan Restaurant
9,Downtown Toronto,0.0,Coffee Shop,Café,Tea Room
15,Downtown Toronto,0.0,Coffee Shop,Café,Restaurant
20,Downtown Toronto,0.0,Coffee Shop,Café,Creperie
24,Downtown Toronto,0.0,Coffee Shop,Café,Tea Room
29,East York,0.0,Coffee Shop,Café,Vegetarian / Vegan Restaurant
30,Downtown Toronto,0.0,Coffee Shop,Café,Tea Room
33,North York,0.0,Coffee Shop,Tea Room,Juice Bar
36,Downtown Toronto,0.0,Coffee Shop,Café,Tea Room


### So, the first cluster is for Borough with Coffee Shop as the most common venue and Café as the second most common venue.

#### Cluster 2

In [62]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
100,East Toronto,1.0,Comic Shop,Vegetarian / Vegan Restaurant,Coworking Space
102,Etobicoke,1.0,Convenience Store,Vegetarian / Vegan Restaurant,Coworking Space


### The second cluster is of no use from our chocolate company who is interested only in Coffee Shops and Cafés.

#### Cluster 3

In [63]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
8,East York,2.0,Café,Coffee Shop,Vegetarian / Vegan Restaurant
10,North York,2.0,Café,Coffee Shop,Vegetarian / Vegan Restaurant
13,North York,2.0,Café,Coffee Shop,Vegetarian / Vegan Restaurant
17,Etobicoke,2.0,Café,Coffee Shop,Vegetarian / Vegan Restaurant
25,Downtown Toronto,2.0,Café,Coffee Shop,Vegetarian / Vegan Restaurant
31,West Toronto,2.0,Café,Coffee Shop,Vegetarian / Vegan Restaurant
34,North York,2.0,Pool Hall,Café,Coffee Shop
43,West Toronto,2.0,Café,Coffee Shop,Vegetarian / Vegan Restaurant
54,East Toronto,2.0,Coffee Shop,Café,Coworking Space
55,North York,2.0,Café,Coffee Shop,Vegetarian / Vegan Restaurant


### So, the third cluster is for Borough with Café as the most common venue and Coffee Shop as the second most common venue.
This cluster is also important for our quest.

#### Cluster 4

In [64]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
7,North York,3.0,Café,Vegetarian / Vegan Restaurant,Coworking Space
14,East York,3.0,Café,Vegetarian / Vegan Restaurant,Coworking Space
21,York,3.0,Café,Vegetarian / Vegan Restaurant,Coworking Space
26,"Scarborough, Toronto",3.0,Café,Vegetarian / Vegan Restaurant,Coworking Space
39,North York,3.0,Café,Vegetarian / Vegan Restaurant,Coworking Space
58,"Scarborough, Toronto",3.0,Café,Vegetarian / Vegan Restaurant,Coworking Space


### So, the fourth cluster is for Borough with Café as the most common venue and no Coffee Shop in the top 3.

#### Cluster 5

In [65]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
1,North York,4.0,Coffee Shop,Vegetarian / Vegan Restaurant,Coworking Space
3,North York,4.0,Coffee Shop,Vegetarian / Vegan Restaurant,Coworking Space
19,East Toronto,4.0,Coffee Shop,Vegetarian / Vegan Restaurant,Coworking Space
22,"Scarborough, Toronto",4.0,Coffee Shop,Vegetarian / Vegan Restaurant,Coworking Space
23,East York,4.0,Coffee Shop,Vegetarian / Vegan Restaurant,Coworking Space
28,North York,4.0,Coffee Shop,Vegetarian / Vegan Restaurant,Coworking Space
35,East York,4.0,Coffee Shop,Vegetarian / Vegan Restaurant,Coworking Space
38,"Scarborough, Toronto",4.0,Coffee Shop,Vegetarian / Vegan Restaurant,Coworking Space
47,East Toronto,4.0,Coffee Shop,Vegetarian / Vegan Restaurant,Coworking Space
56,"York, Toronto",4.0,Coffee Shop,Vegetarian / Vegan Restaurant,Coworking Space


### So, the fifth cluster is for Borough with Coffee Shop as the most common venue and no Café in the top 3.

In [136]:
def get_rating(row):
    try:
        rating = row['rating']
    except:
        rating = None        
    return rating

def get_likes(row):
    try:
        likes = row['likes']['count']
    except:
        likes = None        
    return likes


def get_venue_details(Venue_ID):
    venues_list=[]
    for venue_id in Venue_ID:
        print('. ')
               # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
            
        # make the GET request
        results = requests.get(url).json()
        results ['response']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
             
            get_rating(results), 
            get_likes(results))])

    venues_details = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    venues_details.columns = [ 
                             'Rating',
                             'Likes']
                            
    
    return(venues_details)

In [138]:
Toronto_Merged_2 = pd.DataFrame({})

#Cluster 1
#Toronto_Merged_2 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
#Toronto_Merged_2 = Toronto_Merged_2.append(get_venue_details(Toronto_Merged_2 ['Borough'].iloc[1:10])).reset_index(drop=True)

In [139]:
Toronto_Merged_2.head(11)

# Explore Trending Venues

In [140]:
# define URL
url = 'https://api.foursquare.com/v2/venues/trending?client_id={}&client_secret={}&ll={},{}&v={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION)

# send GET request and get trending venues
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e5d8df50be7b4002914b9e9'},
 'response': {'venues': []}}

In [141]:
if len(results['response']['venues']) == 0:
    trending_venues_df = 'No trending venues are available at the moment!'
    
else:
    trending_venues = results['response']['venues']
    trending_venues_df = json_normalize(trending_venues)

    # filter columns
    columns_filtered = ['name', 'categories'] + ['location.distance', 'location.city', 'location.postalCode', 'location.state', 'location.country', 'location.lat', 'location.lng']
    trending_venues_df = trending_venues_df.loc[:, columns_filtered]

    # filter the category for each row
    trending_venues_df['categories'] = trending_venues_df.apply(get_category_type, axis=1)

In [142]:
# display trending venues
trending_venues_df

'No trending venues are available at the moment!'

### As a result, in the studied boroughs, no trending venues are available at the moment.