# 1. Introduction
## 1.1 Background
As the financial capital of Canada, Toronto has witnessed a rapidly growing population of immigrants. Finding a place to stay comes on the top of the list for most of them. And with regards to choosing a place to live, despite the price, convenience of the location becomes the first concern. Talking about the convenience of a place, different people may have various opinions depending on their age, marital status, background etc.

Young people may like to be surrounded by restaurants, cafes and bars. Family with little children will probably like a neighbourhood with accessible grocery stores and museums better. While the elders may find a place close to a hospital is convenient.

Thus, illustration of what kind of venues are in a certain neighbourhood, may give some guidance for people when they are choosing a place to live at.

## 1.2 Interest
Obviously, new comeers including new immigrants or people moving from other area of Cananda may find this report useful for them to find a new place to live. Even for local residents who may have lived in Toronto for a couple years, this report may give them a different perspective of view of Toronto.

# 2. Data acquisiton and cleaning
This report will continue previous work: https://github.com/WenjieZzz/Coursera_Capstone/blob/master/Segmenting%20and%20clustering%20neighhoods%20.ipynb

Firstly, I will scrape data from https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

# 2.1 Import the data


In [1]:
import pandas as pd
import numpy as np

url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
df = pd.read_html(url)[0]
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


## 2.2 Clean the data
Looking at the raw data, we can find a lot 'Not Assigned' value, I will drop the rows with 'Not assigned' Borough. If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. Combine two rows' Neighbourhood, seperated with a comma, if they have the same Borough.

In [2]:
df = df[df["Borough"] != 'Not assigned']

for index, row in df.iterrows():
    if row["Neighbourhood"] == 'Not assigned':
        row["Neighbourhood"] = row["Borough"]
        
grouped = df.groupby(['Postcode','Borough']).agg(', '.join)
df = grouped.reset_index()
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


# 3. Methodology
## 3.1 Objective
This report will use all the methodology we have learnt from previous lessson, including segmentation and clustering. And with the help with **FourSquare API**, we will define every neibourhood's domain functions. For example, answer the question that whether Markham is more of a Financial District or a Residencial Are.
## 3.2 Analysis
### 3.2.1 Clustering
Now with the clean data, I'm able to cluster the beignbourhood of Toronto.




In [4]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge

The following packages will be UPDATED:

    ca-certificates: 2019.8.28-0      anaconda --> 2019.9.11-hecc5488_0 conda-forge
    certifi:         2019.9.11-py36_0 anaconda --> 2019.9.11-py36_0     conda-forge

The following packages will be DOWNGRADED:

    openssl:         1.1.1-h7b6447c_0 anaconda --> 1.1.1c-h516909a_0    conda-forge


Downloading and Extracting Packages
certifi-2019.9.11    | 147 KB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Libraries imported.


In [5]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="Toronto")
location = geolocator.geocode(address)
latitude_toronto = location.latitude
longitude_toronto = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude_toronto, longitude_toronto))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [11]:
# get the coordinates for each neighbourhood
geo = pd.read_csv("https://cocl.us/Geospatial_data")
geodf = pd.merge(df,geo,how = 'left',left_on = 'Postcode', right_on = "Postal Code" )
geodf = geodf.drop('Postal Code', axis = 1)
geodf.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [7]:
#draw the map
map_toronto = folium.Map(location=[latitude_toronto, longitude_toronto], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(geodf['Latitude'], geodf['Longitude'], geodf['Borough'], geodf['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Now it's the time to utilize **FourSquare API** to count different kinds of venues in every neighborhoods in Toronto.
First we will import the data and define **FourSquare API** credentials and versions.

In [9]:
# The code was removed by Watson Studio for sharing.

In [15]:
#Let try the first Borough in our table, Scarborough, to see how it works.
neighborhood_latitude = geodf.loc[0, 'Latitude'] 
neighborhood_longitude = geodf.loc[0, 'Longitude'] 
neighborhood_name = geodf.loc[0, 'Borough']

LIMIT = 100 
radius = 1500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

#Send the GET request and examine the resutls
results = requests.get(url).json()

#clean the json and structure it into a pandas dataframe
venues = results['response']['groups'][0]['items']    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
print('{} venues were returned by Foursquare for Scarborough.'.format(nearby_venues.shape[0]))
nearby_venues.head()

29 venues were returned by Foursquare for Scarborough.


Unnamed: 0,name,categories,lat,lng
0,Images Salon & Spa,Spa,43.802283,-79.198565
1,Canadiana exhibit,Zoo Exhibit,43.817962,-79.193374
2,Caribbean Wave,Caribbean Restaurant,43.798558,-79.195777
3,LCBO,Liquor Store,43.796671,-79.204586
4,Wendy's,Fast Food Restaurant,43.802008,-79.19808


Now to explore other boroughs, we can define a function to automate the process for every borough.
First, get the coordinates. And then, get the venues' name, category and coordinates.

In [28]:
def VenuesWithin(names, latitudes, longitudes, radius=1500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Boroughs', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    return(nearby_venues)

In [29]:
#Run the function for all Boroughs.
TorontoVenues = VenuesWithin(names=geodf['Borough'],
                                   latitudes=geodf['Latitude'],
                                   longitudes=geodf['Longitude'])

TorontoVenues.head()

Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
East York
East York
East Toronto
East York
East York
East York
East Toronto
East Toronto
East Toronto
Central Toronto
Central Toronto
Central Toronto
Central Toronto
Central Toronto
Central Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
North York
Central Toronto
Central Toronto
Central Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
North York
North York
York
York
Downtown Toronto
Wes

Unnamed: 0,Boroughs,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Scarborough,43.806686,-79.194353,Images Salon & Spa,43.802283,-79.198565,Spa
1,Scarborough,43.806686,-79.194353,Canadiana exhibit,43.817962,-79.193374,Zoo Exhibit
2,Scarborough,43.806686,-79.194353,Caribbean Wave,43.798558,-79.195777,Caribbean Restaurant
3,Scarborough,43.806686,-79.194353,LCBO,43.796671,-79.204586,Liquor Store
4,Scarborough,43.806686,-79.194353,Wendy's,43.802008,-79.19808,Fast Food Restaurant


In [32]:
print('There are {} uniques categories in Toronto.'.format(len(TorontoVenues['Venue Category'].unique())))
TorontoVenues.groupby('Boroughs').count()

There are 346 uniques categories in Toronto.


Unnamed: 0_level_0,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Boroughs,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Central Toronto,835,835,835,835,835,835
Downtown Toronto,1770,1770,1770,1770,1770,1770
East Toronto,492,492,492,492,492,492
East York,358,358,358,358,358,358
Etobicoke,448,448,448,448,448,448
Mississauga,60,60,60,60,60,60
North York,1227,1227,1227,1227,1227,1227
Queen's Park,100,100,100,100,100,100
Scarborough,650,650,650,650,650,650
West Toronto,556,556,556,556,556,556


### 3.2.2 Analyze the venue categories in each Borough

In [34]:
# one hot encoding
Toronto_onehot = pd.get_dummies(TorontoVenues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Toronto_onehot['Boroughs'] = TorontoVenues['Boroughs'] 

# move neighborhood column to the first column
fixed_columns = [Toronto_onehot.columns[-1]] + list(Toronto_onehot.columns[:-1])
Toronto_onehot = Toronto_onehot[fixed_columns]

#examine the new dataframe size after one hot encoding
print('{} rows were returned after one hot encoding.'.format(Toronto_onehot.shape[0]))

#group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
Toronto_grouped = Toronto_onehot.groupby('Boroughs').mean().reset_index()

#examine the new dataframe size after one hot encoding
print('{} rows were returned after grouping.'.format(Toronto_grouped.shape[0]))

6815 rows were returned after one hot encoding.
11 rows were returned after grouping.


In [36]:
#print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for Borough in Toronto_grouped['Boroughs']:
    print("----"+Borough+"----")
    temp = Toronto_grouped[Toronto_grouped['Boroughs'] == Borough].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Central Toronto----
                venue  freq
0         Coffee Shop  0.08
1  Italian Restaurant  0.06
2    Sushi Restaurant  0.04
3                Café  0.04
4                Park  0.04


----Downtown Toronto----
                venue  freq
0         Coffee Shop  0.08
1                Café  0.07
2                Park  0.03
3  Italian Restaurant  0.03
4               Hotel  0.03


----East Toronto----
         venue  freq
0  Coffee Shop  0.08
1         Café  0.05
2         Park  0.04
3          Pub  0.03
4  Pizza Place  0.03


----East York----
               venue  freq
0        Coffee Shop  0.06
1        Pizza Place  0.04
2               Park  0.03
3  Indian Restaurant  0.03
4       Burger Joint  0.03


----Etobicoke----
           venue  freq
0    Coffee Shop  0.08
1    Pizza Place  0.05
2  Grocery Store  0.05
3           Park  0.05
4       Pharmacy  0.04


----Mississauga----
                       venue  freq
0                Coffee Shop  0.15
1                      Hotel  0.

# 4. Result and discussion

In [41]:
#put into a pandas dataframe

#write a function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#create the new dataframe and display the top 10 venues for each borough
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Boroughs']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
boroughs_venues_sorted = pd.DataFrame(columns=columns)
boroughs_venues_sorted['Boroughs'] = Toronto_grouped['Boroughs']

for ind in np.arange(Toronto_grouped.shape[0]):
    boroughs_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_grouped.iloc[ind, :], num_top_venues)

boroughs_venues_sorted

Unnamed: 0,Boroughs,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,Coffee Shop,Italian Restaurant,Café,Sushi Restaurant,Park,Pizza Place,Bakery,Japanese Restaurant,Gym,Restaurant
1,Downtown Toronto,Coffee Shop,Café,Hotel,Park,Italian Restaurant,Gastropub,Restaurant,Japanese Restaurant,Theater,Bar
2,East Toronto,Coffee Shop,Café,Park,Pizza Place,Pub,Bakery,Brewery,Bar,Indian Restaurant,Italian Restaurant
3,East York,Coffee Shop,Pizza Place,Sandwich Place,Park,Pharmacy,Restaurant,Burger Joint,Café,Grocery Store,Indian Restaurant
4,Etobicoke,Coffee Shop,Pizza Place,Park,Grocery Store,Pharmacy,Bank,Fast Food Restaurant,Sandwich Place,Café,Restaurant
5,Mississauga,Coffee Shop,Hotel,Middle Eastern Restaurant,Restaurant,Sandwich Place,Indian Restaurant,Mexican Restaurant,Electronics Store,Asian Restaurant,Caribbean Restaurant
6,North York,Coffee Shop,Park,Sandwich Place,Fast Food Restaurant,Pizza Place,Bank,Grocery Store,Pharmacy,Japanese Restaurant,Restaurant
7,Queen's Park,Coffee Shop,Japanese Restaurant,Gastropub,Pizza Place,Restaurant,Ramen Restaurant,Tea Room,Clothing Store,Café,Park
8,Scarborough,Coffee Shop,Fast Food Restaurant,Chinese Restaurant,Pizza Place,Pharmacy,Sandwich Place,Park,Grocery Store,Bank,Bakery
9,West Toronto,Café,Coffee Shop,Bar,Bakery,Italian Restaurant,Park,Restaurant,Pizza Place,Sushi Restaurant,Breakfast Spot


The result is obvious but somehow a little useless. A srong proof that Toronto people has coffee in there blood! There was a news saying that Canada people consume the most coffee in the world, averaging 5.1 cups a day. The fact that coffee shops are everywhere seems a great evidence.

The initial hope is to find the type of a borough, but the imbalance data, which contains so many coffee shops and different restraunts makes the result less informal. Maybe grouping all coffee shops and restraunts will give us more insights.

# 6. Conclusion
Besides that, if we excluding coffee shop and cafe, we may get the following conclusion:
1. Central Toronto people love Italian food, immigrants from Italy may feel more like home there.
2. Etobicoke and North York has a lot of banks there, immigrants with financial background may find it easier to get a job there.
3. Italian restaurant and pizza place are quite popular across all GTA, new immigrants may find it a good idea to open one of them.
    Pro: Totonro people love it!
    Con: Too many competitors
4. Mississauga has many hotels in its area, probably because it's near the airport.
5. West Toronto people love night lifes, you can tell it from the number of bars.

What other conclusion have you found?