## Part I -  Generate Location Information for Toronto Neighborhoods by Scraping Public Website and Generating Clean Starting Data

Using the wiki page: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M generate a pandas dataframe.
Index + Columns: PostalCode, Borough, Neighborhood

Notes on Data Wrangling:
1. Only postal codes with the assigned burrough will be included in the dataset.  Burrough = Not Assigned or Null will not be included.
2. Neighborhood to PostalCode is M:1.  Many neighborhoods can exist in a single zipcode.
3. PostalCodes with more than one neighborhood will be combined in a single record.  All neighborhoods within the postalcode will be combined into the Neighborhood column for that record as a comma seperated list.
4. If a record has Borough that has Not assigned as a Neighborhood, the Borough Name will be replicated to the Neighborhood column.

The result shall be a Pandas Dataframe Containing Data Columns and a # of Records as determined by the reporting from python on the dataframe.



### Install HTML Parser Packages

In [1]:
!pip install beautifulsoup4
!pip install lxml
!pip install html5lib



### Make an HTML Request to Get Wikipedia Content, Store to BeautifulSoup Container

In [2]:
import requests
website_url = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text

from bs4 import BeautifulSoup
soup=BeautifulSoup(website_url, 'lxml')
#print(soup.prettify())         #Visual check of content grab

### Process BeautifulSoup into Data Table - Extract the Raw Data

In [3]:
#Grab the table with the information
location_info_raw=soup.find('table', {'class':'wikitable sortable'})
#location_info_raw

#Create list of all the rows to process in the table
rows = location_info_raw.tbody.find_all('tr')
#print(rows[0:3])

#Extract the Column Headers from the th row
headings= [] #Create empty list for column headers
for th in rows[0].find_all("th"):    #Extract text from each th and add to the headings variable.  Make sure to strip special char.
    headings.append(th.text.replace('\n', ' ').strip())    
print(headings)

#Extract the Data Elements from the td cells in each row, skipping the first tr that contains the headers
table_data_rows = []  # Create empty list of table data
for table_row in rows[1:]:
    #print(table_row)
    columns=table_row.find_all("td")
    row_content=[]
    for column in columns:
        row_content.append(column.text.replace('\n', " ").strip())
    table_data_rows.append(row_content)
print(table_data_rows[0:5])        

['Postal Code', 'Borough', 'Neighbourhood']
[['M1A', 'Not assigned', 'Not assigned'], ['M2A', 'Not assigned', 'Not assigned'], ['M3A', 'North York', 'Parkwoods'], ['M4A', 'North York', 'Victoria Village'], ['M5A', 'Downtown Toronto', 'Regent Park, Harbourfront']]


### Create the Pandas Dataframe from the raw data

In [4]:
import pandas as pd
import numpy as np
df=pd.DataFrame(table_data_rows, columns=headings)

print(df.shape)
df.head()

(180, 3)


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


### Clean the Data Frame Values

In [5]:
#Remove Records with Borough values = Not assigned
df.drop(df[(df['Borough'] == 'Not assigned')].index, inplace = True) 
df.shape

(103, 3)

In [6]:
#Add Borough Name to Any Unassigned Neighborhood Names
#df.nunique(axis=1)

#Find Neighborhoods labelled unassigned.
unassigned_list=[]
for ind in df.index:
    if df['Neighbourhood'][ind] == "Not assigned":
        unassigned_list.append([df['Postal Code'][ind], df['Borough'][ind]])
print("There are {} unassigned neighbourhood names.".format(len(unassigned_list)))

#Find duplicate Postal Code instances.  Find their index and combine their neighborhood values.
zip_list=[]
appears_multiple=[]

for ind in df.index:
    if df['Postal Code'][ind] not in zip_list:
        zip_list.append(df['Postal Code'][ind])
    else: 
        appears_multiple.append(df['Postal Code'][ind])
    #print(df['Postal Code'][ind], df['Borough'][ind]) 
    
print("There are {} duplicate postal codes.".format(len(appears_multiple)))

There are 0 unassigned neighbourhood names.
There are 0 duplicate postal codes.


A note about multiple postal codes vs instructions.  In the dataset each zip code is unique.  There are no situations where multiple neighborhoods need to be added to the same zip code to eliminate redundancy.   Someone on the wiki has already combined the neighbourhoods into their Postal Code list

In [7]:
df.shape

(103, 3)

## Continuing Part II of Submission Request - Make External Request for Lat/Long and Merge with Clean Neighborhood Data

Using the df created with the Toronto Borough data, add the lat and long data to the dataframe.

In [8]:
#!pip install geocoder

###This section is downgraded.  Geocoder is not responding.  Switching to local file approach.

import geocoder # import geocoder

for ind in df.index:
    postal_code = df['Postal Code'][ind]

    # initialize your variable to None
    lat_lng_coords = None

    # loop until you get the coordinates
    while(lat_lng_coords is None):
      g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
      lat_lng_coords = g.latlng

    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    
    df['Latitude'][ind]=latitude
    df['Longitude'][ind]=longitude
    
df

### Use Local File of Postal Codes to Add Lat/Long Coordinates to Dataframe

In [9]:
!wget -q -O 'Geospatial_Coordinates.csv' https://cocl.us/Geospatial_data
print('Data downloaded!')

Data downloaded!


In [10]:
df['Latitude'] = None
df['Longitude'] = None

with open("Geospatial_Coordinates.csv") as gd:
    gd.readline() #Skip header
    
    for line in gd:
        code, lat, long = line.split(",")
        long = long.replace('\n', '')
        #Find the location of the Postal Code in the dataframe         
        for ind in df.index:
            if df['Postal Code'][ind] == code:
                df['Longitude'][ind]=long
                df['Latitude'][ind]=lat
                
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
2,M3A,North York,Parkwoods,43.7532586,-79.3296565
3,M4A,North York,Victoria Village,43.7258823,-79.3155716
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6542599,-79.3606359
5,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.4647633
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6623015,-79.3894938


## Continuing Part III of Submission Request - A.  Generate Map of Neighbhoods Superimposed on Toronto

Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you. 

Just make sure:

to add enough Markdown cells to explain what you decided to do and to report any observations you make. 
to generate maps to visualize your neighborhoods and how they cluster together. 

In [11]:
!pip install geopy
!pip install folium==0.5.0
print('Folium installed')

Collecting folium==0.5.0
  Downloading folium-0.5.0.tar.gz (79 kB)
[K     |████████████████████████████████| 79 kB 11.1 MB/s eta 0:00:01
[?25hCollecting branca
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Building wheels for collected packages: folium
  Building wheel for folium (setup.py) ... [?25ldone
[?25h  Created wheel for folium: filename=folium-0.5.0-py3-none-any.whl size=76240 sha256=464ac47f958e07b4dca8b22d0eda8daa71374da3b0b5126806fe633f470b173e
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/b2/2f/2c/109e446b990d663ea5ce9b078b5e7c1a9c45cca91f377080f8
Successfully built folium
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.5.0
Folium installed


In [12]:
import requests # library to handle requests
import pandas as pd # library for data analsysis

import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

import folium # plotting library

print('Libraries imported.')

Libraries imported.


### Revisit This Cell

In [13]:
print('The dataframe has {} boroughs and {} neighborhood groups.'.format(
        len(df['Borough'].unique()),
        df.shape[0]
    )
)
#Extract the comma seperate boroughs as unique?

The dataframe has 10 boroughs and 103 neighborhood groups.


### Create a Map of Toronto, Superimpose the Neighborhoods

Map labels are shown as combined comma seperated lists within a zip code.

In [14]:
# establish map center of Toronto
latitude = 43.6532
longitude = -79.3832

# create map of using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighbourhood']):
    #label = '{}, {}'.format(neighborhood, borough)
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [float(lat), float(lng)],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto) 
    
map_toronto

### Observations

1. Zip codes are fairly spaced equidistant in suburban areas.  Higher concentrated in the urban center, but still spaced.  Demonstrates purposeful urban planning from a postal system perspective.  Suggests possible even distribution of population density - needs verification.

2. More collar (not city center) zip codes have fewer neighborhoods or exist in a 1:1 ratio.  Whereas, there are more neighborhoods in a given zip code towards the city center.

3. On inspection, there some neighborhoods that spread across mulitple zipcodes.
       FUTURE PROGRAMMATIC INQUIRY:  Can code generate a report of how many named neighborhoods span more than one zipcode?
       EXTENSION INQUIRY:  Are there any neighborhoods that span more than one borough?
       
#### Commentary
These overlap questions, create inconsistent analysis for overlaying neighborhood features when merging additional datasets.  For additional Toronto analysis, further bin development to deal with the special cases should be considered based on agreed upon group definition.  Coding effort to bin and realign outlier groupings could yeild better conclusions - assuming the groupings are designed to prevent data merging noise. While this code can be used to analysis repeatdly across urban centers, there is a required deeper understanding of the urban planning background for any other city analysis or deeper suggestive analysis that a business might act upon. 

## Overlay Borough Borders on Neighborhood Groupings
Questions for Observation:
 
 1.  Are the areas visually equal?
 2.  How many neighborhoods exist per borough?
 3.  Do any neighborhoods spread across more than one borough?

 FUTURE EXPLORATION:  How many people exist in each borough or neighborhood?  Is that number distributed event or concentrated?
 
 FUTURE EXPLORATION:  Can the geographic square footage be calculated?


## Part III - B.  Find FourSquare Data about Toronto

### The PreWork - Test Call to FourSquare

Hidden Cells for URL Generation to FourSquare

In [164]:
# The code was removed by Watson Studio for sharing.

In [166]:
# The code was removed by Watson Studio for sharing.

In [17]:
results = requests.get(url).json()

#### Important Method for Extracting Category Information from FourSquare

In [18]:
#Function to extract the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [19]:
#Clean json response and put into pandas dataframe
venues = results['response']['groups'][0]['items']
    
nearby_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Downtown Toronto,Neighborhood,43.653232,-79.385296
1,Nathan Phillips Square,Plaza,43.65227,-79.383516
2,LUSH,Cosmetics Shop,43.653557,-79.3804
3,Indigo,Bookstore,43.653515,-79.380696
4,CF Toronto Eaton Centre,Shopping Mall,43.654447,-79.380952


In [20]:
#Check results
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

79 venues were returned by Foursquare.


#### Important Method for Getting Nearby Venue Lists for a LIST of LAT/LONG
    - request generator
    - make request
    - process new data into df

In [130]:
#Repeating Process for every neighborhood in Toronto
def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name) - A visual check to make sure each neighbourhood is being processed, if request is slowed
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Finally Generate the Venue Information

In [131]:
#Generate Toronto Data   --- "Notice the spelling change of Neighbourhood from original data set"
#This call is expensive, even though you can't see it.  The FourSquare Call is embedded in here.
toronto_venues = getNearbyVenues(names=df['Neighbourhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

In [132]:
#Data Review
#print(toronto_venues.shape)
#print(toronto_venues.head())
#print(toronto_venues.groupby('Neighborhood').count())

print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

venue_list = sorted(toronto_venues['Venue Category'].unique())
print(venue_list[:10])

There are 329 uniques categories.
['ATM', 'Accessories Store', 'Afghan Restaurant', 'African Restaurant', 'Airport', 'Airport Lounge', 'American Restaurant', 'Amphitheater', 'Antique Shop', 'Aquarium']


### Encode the Venue Information 

In [133]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()
toronto_onehot.shape

(8581, 329)

In [134]:
#Group Rows by Neighborhood and Take the Mean of the Frequency of Occurence of Each Category
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

toronto_grouped.shape

(99, 329)

In [135]:
#Print Each Neighborhood along with 5 Most Common Venues
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                venue  freq
0  Chinese Restaurant  0.12
1         Coffee Shop  0.08
2          Restaurant  0.06
3            Pharmacy  0.04
4                Bank  0.03


----Alderwood, Long Branch----
                  venue  freq
0           Coffee Shop  0.09
1  Fast Food Restaurant  0.05
2                  Café  0.03
3        Breakfast Spot  0.03
4          Burger Joint  0.03


----Bathurst Manor, Wilson Heights, Downsview North----
           venue  freq
0    Coffee Shop  0.08
1           Park  0.07
2    Pizza Place  0.07
3           Bank  0.05
4  Deli / Bodega  0.03


----Bayview Village----
                venue  freq
0                Park  0.10
1  Chinese Restaurant  0.10
2         Coffee Shop  0.08
3                Bank  0.06
4            Pharmacy  0.04


----Bedford Park, Lawrence Manor East----
                venue  freq
0         Coffee Shop  0.11
1  Italian Restaurant  0.06
2    Sushi Restaurant  0.06
3              Bakery  0.05
4      Sandwich Place  0.04

In [136]:
#Put into a Pandas Dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [137]:
#New Dataframe with Top 10 Venues for Each Neighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Chinese Restaurant,Coffee Shop,Restaurant,Pharmacy,Bank,Sandwich Place,Indian Restaurant,Clothing Store,Gas Station,Cantonese Restaurant
1,"Alderwood, Long Branch",Coffee Shop,Fast Food Restaurant,Café,Burger Joint,Sandwich Place,Department Store,Electronics Store,Breakfast Spot,Restaurant,Pizza Place
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Park,Pizza Place,Bank,Sandwich Place,Pharmacy,Restaurant,Convenience Store,Gas Station,Intersection
3,Bayview Village,Park,Chinese Restaurant,Coffee Shop,Bank,Pharmacy,Café,Grocery Store,Gas Station,Restaurant,Clothing Store
4,"Bedford Park, Lawrence Manor East",Coffee Shop,Italian Restaurant,Sushi Restaurant,Bakery,Pizza Place,Sandwich Place,Restaurant,Bagel Shop,Café,Pharmacy


### Apply K-means Clustering to Toronto Venue Information to Find Archtype Clusters

In [138]:
from sklearn.cluster import KMeans 
from sklearn.datasets import make_blobs

In [139]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 


array([2, 2, 2, 2, 2, 1, 2, 3, 3, 3], dtype=int32)

In [140]:
print(type(kmeans.labels_[0]))

<class 'numpy.int32'>


In [141]:
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Chinese Restaurant,Coffee Shop,Restaurant,Pharmacy,Bank,Sandwich Place,Indian Restaurant,Clothing Store,Gas Station,Cantonese Restaurant
1,"Alderwood, Long Branch",Coffee Shop,Fast Food Restaurant,Café,Burger Joint,Sandwich Place,Department Store,Electronics Store,Breakfast Spot,Restaurant,Pizza Place
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Park,Pizza Place,Bank,Sandwich Place,Pharmacy,Restaurant,Convenience Store,Gas Station,Intersection
3,Bayview Village,Park,Chinese Restaurant,Coffee Shop,Bank,Pharmacy,Café,Grocery Store,Gas Station,Restaurant,Clothing Store
4,"Bedford Park, Lawrence Manor East",Coffee Shop,Italian Restaurant,Sushi Restaurant,Bakery,Pizza Place,Sandwich Place,Restaurant,Bagel Shop,Café,Pharmacy


In [142]:
#Create New Dataframe with Clusters As Well as the Top 10 Venue for each neighborhood

#drop Cluster Labels column in case other kmeans.labels_ have already been inserted
if "Cluster Labels" in neighborhoods_venues_sorted.columns:
    neighborhoods_venues_sorted.drop(['Cluster Labels'], axis=1, inplace=True)

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
neighborhoods_venues_sorted.head()
print(type(neighborhoods_venues_sorted['Cluster Labels'].iloc[0]))

toronto_merged = df  #df is the basic Toronto data

#merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')  #Note the spelling change

#check for rows with no information combined
print(toronto_merged.size)
toronto_merged = toronto_merged.dropna()
print(toronto_merged.size) 
    
#print(type(toronto_merged['Cluster Labels'].iloc[0]))
#toronto_merged.astype({"Cluster Labels":int})
#print(type(toronto_merged['Cluster Labels'].iloc[0]))

<class 'numpy.int32'>
1648
1648


# -- Clean Merged Dataframe with Cluster Data Achieved --

## Begin Map Visualization with Overlay

In [143]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [144]:
#Visualize Clusters
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

In [146]:
#Test Record Set for NaN errors
#for i in toronto_merged:
 #   if i['Cluster Labels']==NaN:
  #      print("Found a NaN value: ", )
        
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [float(lat), float(lon)],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## City Groupings
- Cluster 0:Red = Outlier (Mavern)
- Cluster 1:Purple = Urban, City Center 
- Cluster 2:Blue = City Surroundings
- Cluster 3:Light Green = Near City
- Cluster 4:Orange = Outlier (Upper Rogue) 

After extending the search range to include venues within 2000 meters over 500m, the clustering segmented better.   The density of Toronto is not that of New York.  Even still, that range could be bigger.  However, with the bigger range, the number of calls to FourSquare would be cost prohibitive.  In future analysis, the focus could be on a smaller area of Toronto or could be done using enterprise level accounts and servers.



# Conclusions

### Cluster 0, Mavern neighborhood within Scarborogh of Toronto has the zoo and multiple parks. Its spread out geography covers a lot of territory.   It is unique in its offerings.

### Cluster 4, the Upper Rogue part of Scarborough is similar, but takes advantage of being adjacent to the zoo and still offering many spaces.

In [160]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Scarborough,0,Zoo Exhibit,Restaurant,Fast Food Restaurant,Coffee Shop,Gas Station,Park,Pizza Place,Zoo,Women's Store,Theme Park Ride / Attraction


In [161]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]].head()

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
153,Scarborough,4,Grocery Store,Playground,Farm,Golf Course,Trail,Zoo,Escape Room,Donut Shop,Dumpling Restaurant,Eastern European Restaurant


### Cluster 1 is located city center.  It's venues support an active workforce and both upscale and convenience dining.  

#### There is one section of town with a similar venue offering included in Cluster 1.  It can be found along a major road that leads into the city, but falls on the outskirts of Toronto.   Willowdale area spans two zip codes.  

#### The Willowdale neighbourhood consists of single-family homes, condominium townhouses and high-rise condominium towers

In [157]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]].head()

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Downtown Toronto,1,Coffee Shop,Park,Café,Restaurant,Gastropub,Hotel,Farmers Market,Clothing Store,French Restaurant,Pub
6,Downtown Toronto,1,Coffee Shop,Park,Café,Theater,Yoga Studio,Pizza Place,Ramen Restaurant,Restaurant,Hotel,Shopping Mall
13,Downtown Toronto,1,Coffee Shop,Gastropub,Restaurant,Japanese Restaurant,Café,Italian Restaurant,Park,Diner,Plaza,Art Gallery
22,Downtown Toronto,1,Coffee Shop,Café,Japanese Restaurant,Plaza,Gastropub,Italian Restaurant,Farmers Market,Park,Restaurant,Historic Site
30,East Toronto,1,Coffee Shop,Pub,Breakfast Spot,Beach,Bakery,Japanese Restaurant,Ice Cream Shop,BBQ Joint,Park,Middle Eastern Restaurant


### Cluster 2 is the Outer City Ring. Mixed housing, mostly residental.  Venues support local living.

In [156]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]].head()

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,North York,2,Coffee Shop,Japanese Restaurant,Pharmacy,Sandwich Place,Chinese Restaurant,Discount Store,Fried Chicken Joint,Pizza Place,Supermarket,Gas Station
3,North York,2,Coffee Shop,Fast Food Restaurant,Gym,Sandwich Place,Clothing Store,Middle Eastern Restaurant,Park,Japanese Restaurant,Grocery Store,Beer Store
5,North York,2,Clothing Store,Coffee Shop,Vietnamese Restaurant,Furniture / Home Store,Fast Food Restaurant,Restaurant,Bank,Grocery Store,Italian Restaurant,Cosmetics Shop
8,Etobicoke,2,Coffee Shop,Pharmacy,Grocery Store,Park,Bank,Shopping Mall,Golf Course,Pizza Place,Liquor Store,Café
11,North York,2,Coffee Shop,Restaurant,Japanese Restaurant,Park,Pizza Place,Bank,Sandwich Place,Italian Restaurant,Supermarket,Middle Eastern Restaurant


### Cluster 3 is the Near City Surrounds. Most offerings are a variety of restaurants and food establishments.

In [158]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]].head()

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,East York,3,Park,Coffee Shop,Pizza Place,Café,Gastropub,Thai Restaurant,Skating Rink,Ice Cream Shop,Ethiopian Restaurant,Breakfast Spot
23,York,3,Coffee Shop,Italian Restaurant,Bank,Café,Bakery,Trail,Caribbean Restaurant,Mexican Restaurant,Indian Restaurant,Ice Cream Shop
41,Downtown Toronto,3,Café,Italian Restaurant,Coffee Shop,Bar,Beer Bar,Indian Restaurant,Vegetarian / Vegan Restaurant,Park,Restaurant,Korean Restaurant
50,West Toronto,3,Café,Coffee Shop,Park,Italian Restaurant,Bar,Cocktail Bar,Indian Restaurant,Pub,Liquor Store,Brewery
57,East York,3,Greek Restaurant,Café,Bakery,Gastropub,Coffee Shop,Italian Restaurant,Park,Ethiopian Restaurant,American Restaurant,Caribbean Restaurant


# Final Conclusions
The data analysis results in many more questions about the urban makeup of toronto.   The high number of restaurants indicates that further summary of the restaurant offerings might yeild interesting information about how the clusters futher differentiate both in variety and availabilty.  

Extracting the volume of food offerings might also start showing how the recreational offering might also differentiate.   

From a real estate similiarity, we aren't far enough down in our granulatity to make statements like: "If you like living here, you might also consider this neighborhood for it's similar make up.

Also from a data gathering perspective, we still hamstrung by zip code as our identifier for latitude and longitude by Neighborhood grouping. There is street level border information that can be manually gathered from wikipedia in other sources to define those neighborhood boundaries better and with a more accurate center for comparison.

In [150]:
#This extra code takes a latitude/longitude value and returns the places around the location with their categories labelled.
latitude = 43.6532
longitude = -79.3832

url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url

results = requests.get(url).json()
#print('There are {} around location.'.format(len(results['response']['groups'][0]['items']))

items = results['response']['groups'][0]['items'] 
items[0]   #Return values associated to first item in list.

dataframe = json_normalize(items) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories'] + [col for col in dataframe.columns if col.startswith('venue.location.')] + ['venue.id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# filter the category for each row
dataframe_filtered['venue.categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean columns
dataframe_filtered.columns = [col.split('.')[-1] for col in dataframe_filtered.columns]

dataframe_filtered.head(10)

venues_map = folium.Map(location=[latitude, longitude], zoom_start=15) # generate map centred around Ecco


# add central focus Circle Marker
folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    popup='Ecco',
    fill=True,
    color='red',
    fill_color='red',
    fill_opacity=0.6
    ).add_to(venues_map)


# add popular spots to the map as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        fill=True,
        color='blue',
        fill_color='blue',
        fill_opacity=0.6
        ).add_to(venues_map)

# display map
venues_map

