# Capstone Project - The Best Neighborhoods for Students in Toronto
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: The Problem](#introduction)
* [Data: How can we use data to solve the problem?](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: The Problem <a name="introduction"></a>

This project aims to find the best places in **Toronto** for a university student to live based on their personal preferences and lifestyle. This can help out students in their house search process especially if they are new to Toronto. To be more specific, we will only perform the analysis for students attending the **University of Toronto**, as it is the largest university in the city of Toronto. Students from universities close to University of Toronto, like Ryerson University in downtown Toronto will also be able to use this as a reference.

We will specifically only examine places in the city of Toronto, not the Greater Toronto Area (GTA) which includes neighbor towns and cities. The areas outside of the city will be too long of a commute for students and not very ideal.

We will use our knowledge of unsupervised machine learning to create different clusters based on criteria such as **distance from the university**, ** the number of grocery stores**, and **access to public transit**.

## Data <a name="data"></a>

Based on our goals for this project some of the data we will require are:
- The coordinates of different areas in Toronto based on their postal code
- The coordinates of the university
- The number of nearby grocery/convenience stores in each area
- The number of coffee shops near each area
- The number of libraries near each area
- Whether there are public transit services nearby
- If the nearby transit is a subway station or bus station
- Number of gyms nearby

##### Our Data Sources:
- Wikipedia for a table of Toronto areas by postal code
- This [link](https://cocl.us/Geospatial_data) for getting the coordinates data for each postal code in Toronto
- Google Maps for the coordinates of the University of Toronto
- Foursquare for gathering venue data for each postal code area

### Data Collection
Lets now import all packages needed in this project and collect the data into pandas dataframes for ease of analysis and editing flexibility. We will then move on to preprocessing or cleaning up the data to make it usable for our machine learning algorithms later on.

##### Imorting Needed Libraries

In [None]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!pip install geopy #using pip to install the library
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

!pip install folium=0.5.0 --yes #using pip to install the library
import folium # map rendering library

# install lxml since it is needed by pandas for reading the wiki table
!pip3 install lxml
import lxml

print('Libraries imported.')


Usage:   
  pip3 install [options] <requirement specifier> [package-index-options] ...
  pip3 install [options] -r <requirements file> [package-index-options] ...
  pip3 install [options] [-e] <vcs project url> ...
  pip3 install [options] [-e] <local project path> ...
  pip3 install [options] <archive url/path> ...

no such option: --yes
Libraries imported.


##### Gathering Toronto Neighborhoods from Wikipedia

In [None]:
# scraping with pandas
df_borough = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M", header = 0)[0]

# droping boroughs that are not assigned
df_borough.drop(df_borough[df_borough.Borough == 'Not assigned'].index, inplace=True)

# grouping by postal code and list the group of neighborhoods for each
df_borough = df_borough.groupby(["Postal Code"], as_index=False).sum()

# checking our dataframe so far
df_borough.sample(3)

Unnamed: 0,Postal Code,Borough,Neighbourhood
97,M9M,North York,"Humberlea, Emery"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
96,M9L,North York,Humber Summit


##### Collecting Coordinates of Our Postal Codes

In [None]:
# coordinates of the University of Toronto - St George Campus
uni_lat = 43.6629
uni_lon = -79.3957

# load data from our link
latlong = pd.read_csv("https://cocl.us/Geospatial_data")

# sort by postal code to make sure the tables merge correctly
latlong = latlong.sort_values(by=['Postal Code'])

# merging the two dataframes
df = pd.concat([df_borough, latlong.iloc[:, 1:3]], axis = 1)

# filtering out the boroughs outside of the main city of Toronto
df = df[df['Borough'].str.contains('Toronto', regex=False) | df['Borough'].str.contains('North York', regex=False)].reset_index(drop=True)

# measuring the Euclidian distance (in kilometers) of each postal code to the university and concatenating 
df = pd.concat([df, pd.DataFrame(round(np.sqrt((df['Latitude'] - uni_lat)**2 + (df['Longitude'] - uni_lon)**2) * 100, 3), columns=['Distance to University'])], axis = 1)

# check
df.sample(3)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Distance to University
42,M5P,Central Toronto,"Forest Hill North & West, Forest Hill Road Park",43.696948,-79.411307,3.745
48,M5X,Downtown Toronto,"First Canadian Place, Underground city",43.648429,-79.38228,1.974
58,M6S,West Toronto,"Runnymede, Swansea",43.651571,-79.48445,8.947


Lets now visualize our dataframe so far with an interactice map.

In [None]:
# create map of Toronto using latitude and longitude values
toronto_map = folium.Map(width = 700, height = 700, location=[43.70672, -79.39744], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map) 

# add a marker for UofT
folium.CircleMarker(
        [uni_lat, uni_lon],
        radius=8,
        popup="University of Toronto",
        color='black',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map) 
    
toronto_map

##### Collecting Venue Data from Foursquare
We can now move on to collecting nearby venue details for each postal code. 

Lets first define our foursquare credentials.

***NOTE:*** The credentials are censored for privacy purposes.

In [None]:
# credentials
CLIENT_ID = 'DUY30PGDBD4ZWHQHP13JTHYZN3SS3H2T5BTIVGEJ2EM4XO3P' # your Foursquare ID
CLIENT_SECRET = 'Z2VS3U1XPE1MNV3DCI2FHOVGSZAW312UZ4IBRVKAIDKYXY0I' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

Lets now create a function that returns a data frame of venues close to each location, specifically in the 1km radius. First, I collected the category IDs of all relevant categories specified before from this [link](https://developer.foursquare.com/docs/build-with-foursquare/categories/). I did not collect any venue details but the category as they will not be used in this analysis.

In [None]:
# creating a list of categoryids of libraries, gyms, grocery stores, convenience stores, coffee shops, bus stops, subway stations, and cable cars
CATEGORY_IDS = ['4bf58dd8d48988d1a7941735', '4bf58dd8d48988d12f941735', '4bf58dd8d48988d1b2941735', 
                '4bf58dd8d48988d176941735', '5745c2e4498e11e7bccabdbd', '52f2ab2ebcbc57f1066b8b51',
                '4bf58dd8d48988d118951735', '50aa9e744b90af0d42d5de0e', '52f2ab2ebcbc57f1066b8b45',
                '52f2ab2ebcbc57f1066b8b1c', '4d954b0ea243a5684a65b473', '4bf58dd8d48988d16d941735',
                '4bf58dd8d48988d1e0931735', '4bf58dd8d48988d1fe931735', '52f2ab2ebcbc57f1066b8b4f',
                '4bf58dd8d48988d1fd931735']

# since this is a long list we will split it into 5 and call our function 5 seperate times to try to fall under the limit each time
# A functions to get the nearby venues of all postal codes
def getNearbyVenues(postal_codes, latitudes, longitudes, categories, radius=500):
    
    venues_list=[]
    for pc, lat, lng in zip(postal_codes, latitudes, longitudes):
        print(pc)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={},{},{}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            *categories)
            
        # make the GET request
        results = requests.get(url).json()["response"]["venues"]
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            pc, 
            lat, 
            lng,  
            v['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal Code', 
                  'PC Latitude', 
                  'PC Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

# use the function to collect data
venues1 = getNearbyVenues(postal_codes=df['Postal Code'],
  latitudes=df['Latitude'],
  longitudes=df['Longitude'], 
  categories=CATEGORY_IDS[0:3])
venues2 = getNearbyVenues(postal_codes=df['Postal Code'],
  latitudes=df['Latitude'],
  longitudes=df['Longitude'], 
  categories=CATEGORY_IDS[3:6])
venues3 = getNearbyVenues(postal_codes=df['Postal Code'],
  latitudes=df['Latitude'],
  longitudes=df['Longitude'],
  categories=CATEGORY_IDS[6:9])
venues4 = getNearbyVenues(postal_codes=df['Postal Code'],
  latitudes=df['Latitude'],
  longitudes=df['Longitude'], 
  categories=CATEGORY_IDS[13:16])

In [None]:
# redefining the function, this time it takes 4 category ids
def getNearbyVenues(postal_codes, latitudes, longitudes, categories, radius=500):
    
    venues_list=[]
    for pc, lat, lng in zip(postal_codes, latitudes, longitudes):
        print(pc)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={},{},{},{}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            *categories)
            
        # make the GET request
        results = requests.get(url).json()["response"]["venues"]
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            pc, 
            lat, 
            lng,  
            v['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal Code', 
                  'PC Latitude', 
                  'PC Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

# use the function to collect data
venues5 = getNearbyVenues(postal_codes=df['Postal Code'],
  latitudes=df['Latitude'],
  longitudes=df['Longitude'], 
  categories=CATEGORY_IDS[9:13])

Lets check how many different categories we now have.

In [None]:
toronto_venues = pd.concat([venues1, venues2, venues3, venues4, venues5], axis = 0)
toronto_venues['Venue Category'].unique()

array(['Library', 'College Gym', 'College Library',
       'College Academic Building', 'Art Gallery', 'College Rec Center',
       'Office', 'College Engineering Building', 'College Classroom',
       'Gym', 'Gym / Fitness Center', 'Yoga Studio', 'Tram Station',
       'Gym Pool', 'Pool', 'Hotel Pool', 'Drugstore',
       'Martial Arts School', 'Climbing Gym', 'Health Food Store',
       'Grocery Store', 'Supermarket', 'Convenience Store',
       'Fruit & Vegetable Store', 'Food & Drink Shop', 'Gourmet Shop',
       'Market', 'Organic Grocery', 'Breakfast Spot', 'Butcher',
       'Herbs & Spices Store', 'Farmers Market', 'Metro Station',
       'Bus Stop', 'Bus Station', 'Bus Line', 'Light Rail Station',
       'Platform', 'Train Station', 'Coffee Shop', 'Juice Bar', 'Café',
       'Gas Station', 'Tea Room', 'Pharmacy', 'Smoke Shop', 'Pool Hall',
       'Bakery', 'Vegetarian / Vegan Restaurant', 'French Restaurant',
       'Wine Bar', 'Discount Store', 'Coworking Space', 'Dessert Shop

##### Cleaning Up the Dataframe
We can see that unfortunately that data collection process was not as accurate as we would like it to be, therefore we now should organize this list and discard unwanted categories such as restaurants. Simplifying the data will help us later on in the analysis process.

In [None]:
# the list of unwanted categories
unwanted_categories = ['Art Gallery', 'Office', 'College Classroom', 'Hotel Pool', 'Herbs & Spices Store', 'Butcher',
                       'Gourmet Shop', 'Juice Bar', 'Gas Station', 'Smoke Shop', 'Pool Hall', 'French Restaurant', 'Vegetarian / Vegan Restaurant', 'Bakery',
                       'Wine Bar', 'Discount Store', 'Coworking Space', 'Dessert Shop',
                       'Restaurant', 'Italian Restaurant', 'Bookstore', 'Donut Shop',
                       'Smoothie Shop', 'Creperie', 'Chinese Restaurant',
                       'Sandwich Place', 'Flower Shop', 'College Quad', 'Student Center', 'Ice Cream Shop', 'Bar',
                       'Arts & Crafts Store', 'Event Space', 'Comic Shop', 'Bus Line']

# removing from table
for cat in unwanted_categories:
  toronto_venues.drop(toronto_venues[toronto_venues['Venue Category'] == cat].index, inplace=True)

# mapping values to simplify the data
toronto_venues['Venue Category'] = toronto_venues['Venue Category'].map({'Library': 'Library', 'Gym':'Gym','College Gym': 'Gym', 'College Library': 'Library', 'College Academic Building': 'Library', 'College Rec Center': 'Gym',
       'College Engineering Building': 'Library', 'Gym / Fitness Center': 'Gym',
       'Yoga Studio': 'Gym', 'Tram Station': 'Train/Subway Station', 'Gym Pool': 'Gym', 'Pool': 'Gym', 'Drugstore': 'Grocery/Convenience/Drug Store',
       'Martial Arts School': 'Gym', 'Climbing Gym': 'Gym', 'Health Food Store': 'Grocery/Convenience/Drug Store',
       'Grocery Store': 'Grocery/Convenience/Drug Store', 'Supermarket': 'Grocery/Convenience/Drug Store', 'Convenience Store': 'Grocery/Convenience/Drug Store',
       'Fruit & Vegetable Store': 'Grocery/Convenience/Drug Store', 'Food & Drink Shop': 'Grocery/Convenience/Drug Store', 'Market': 'Grocery/Convenience/Drug Store',
       'Organic Grocery': 'Grocery/Convenience/Drug Store', 'Breakfast Spot': 'Coffee Shop / Café', 'Farmers Market': 'Grocery/Convenience/Drug Store',
       'Metro Station': 'Train/Subway Station', 'Bus Stop': "Bus Station/Stop", 'Bus Station': "Bus Station/Stop",
       'Light Rail Station': 'Train/Subway Station', 'Platform': 'Train/Subway Station', 'Train Station': 'Train/Subway Station', 'Coffee Shop': 'Coffee Shop / Café',
       'Café': 'Coffee Shop / Café', 'Tea Room': 'Coffee Shop / Café', 'Pharmacy': 'Grocery/Convenience/Drug Store', 'College Cafeteria': 'Coffee Shop / Café'})

# check
toronto_venues['Venue Category'].unique()

array(['Library', 'Gym', 'Train/Subway Station',
       'Grocery/Convenience/Drug Store', 'Coffee Shop / Café',
       'Bus Station/Stop'], dtype=object)

##### One-Hot Encoding
Lets now one hot encode the venue categories to our dataframe. We should then concatenate it to the main dataframe and group by the postal code. We can sum the dataframe values to get how many of each category of venue is available at each location.

In [None]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# concatenate to the original table
toronto_grouped = pd.concat([toronto_venues, toronto_onehot], axis = 1).drop(['Venue Category'], axis=1)

# group by the postal code and sum to get the number of that venue type near each location
toronto_grouped = toronto_grouped.groupby(by=["Postal Code", "PC Latitude", "PC Longitude"], as_index= False).sum().reset_index(drop = True)

#concatenate the distance from UofT column
toronto_grouped = pd.merge(toronto_grouped, df[['Distance to University', 'Postal Code']], how='left', on=['Postal Code'])

# check our dataframe
toronto_grouped.head(3)

Unnamed: 0,Postal Code,PC Latitude,PC Longitude,Bus Station/Stop,Coffee Shop / Café,Grocery/Convenience/Drug Store,Gym,Library,Train/Subway Station,Distance to University
0,M2J,43.778517,-79.346556,10,9,5,2,1,3,12.563
1,M2L,43.75749,-79.374714,0,0,0,0,1,0,9.689
2,M2N,43.77012,-79.408493,1,11,9,2,0,1,10.798


## Methodology <a name="methodology"></a>

This project was focused on the data for students attending the **University of Toronto**. Ergo, only data relevant to this university was collected for analysis. However, the exact same process can be performed for any other university around the world.

We started by collecting Toronto neighborhoods data from Wikipedia and then moved on to collecting each neighborhoods coordinates alongside the coordinates of the university. The coordinates would help us visualize the data on an interactive Folium map and we would also be able to use them to collect information about each area, specifically the venues close to them, for each single coordinate. We specifically chose a **radius of 500 meters** around each postal code, as that is a reasonably convenient walking distance.

Our next step was cleaning our gathered data into something that the machine can cluster, and that we would also be able to analyze easily after the clustering was done. We used only the **categories** of each venue as the details were not crucially important, and would just make the cleaning process longer. For analyzing each postal code, we then grouped the data by postal code and used the **aggregate sum** as we wanted to know how many of each general category of venue we had near each area.

In the final step we will be focusing on analyzing our clean data in order to differentiate between the different neighborhoods. We will use unsupervised learning, specifically **k-means clustering**, to cluster our data into areas that are similar to each other. We will most likely divide the neighborhoods into 3 or 4 clusters. After we will look at the properties that grouped the areas together and use a map to visualize everything.

## Analysis <a name="analysis"></a>

We can now use k-means clustering to cluster the different areas we have into groups of similar areas. After, we can analyze those clusters and analyze the pros and cons of each area.
##### Clustering

In [None]:
# set number of clusters
kclusters = 3

toronto_clustering = toronto_grouped.drop(['Postal Code', 'PC Latitude', 'PC Longitude'], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=123).fit(toronto_clustering)

# CREATING A NEW DATAFRAME

toronto_merged = toronto_grouped

# add clustering labels
toronto_merged.insert(0, 'Cluster Labels', kmeans.labels_)

# check
toronto_merged.head(3)

Unnamed: 0,Cluster Labels,Postal Code,PC Latitude,PC Longitude,Bus Station/Stop,Coffee Shop / Café,Grocery/Convenience/Drug Store,Gym,Library,Train/Subway Station,Distance to University
0,2,M2J,43.778517,-79.346556,10,9,5,2,1,3,12.563
1,0,M2L,43.75749,-79.374714,0,0,0,0,1,0,9.689
2,2,M2N,43.77012,-79.408493,1,11,9,2,0,1,10.798


##### Visualizing the Clusters

In [None]:
# create map
map_clusters = folium.Map(width = 700, height = 700, location=[43.70672, -79.39744], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
rainbow = ['red', 'green', 'purple']

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['PC Latitude'], toronto_merged['PC Longitude'], toronto_merged['Postal Code'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

# add marker for UofT
folium.CircleMarker(
        [uni_lat, uni_lon],
        radius=8,
        popup="University of Toronto",
        color='black',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_clusters)
       
map_clusters

### Examining the Clusters
After many trials with different numbers of k in the k-means clustering, 3 clusters made the most sense. higher numbers of clusters made extra unnecessary groups and less clusters did not explain much. Lets now examine each cluster to see what patterns can be noticed among them. 
#### CLuster 1

In [None]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0].iloc[:, 4:].reset_index(drop = True)

Unnamed: 0,Bus Station/Stop,Coffee Shop / Café,Grocery/Convenience/Drug Store,Gym,Library,Train/Subway Station,Distance to University
0,0,0,0,0,1,0,9.689
1,0,0,1,0,0,0,8.996
2,0,1,1,0,0,0,12.856
3,3,0,1,0,0,0,11.192
4,0,2,0,2,0,0,9.372
5,0,5,1,2,0,0,8.348
6,1,1,5,0,0,0,10.26
7,1,1,0,0,0,1,13.938
8,1,0,0,0,0,0,10.164
9,0,1,1,0,1,0,13.479


This cluster consists of areas that are on average the furthest away from the university. As we can see many do not have a library or gym near them, some even not close to any convenience stores. These are the less crowded parts of the city in the central and northern parts of Toronto.

#### CLuster 2

In [None]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1].reset_index(drop = True)

Unnamed: 0,Cluster Labels,Postal Code,PC Latitude,PC Longitude,Bus Station/Stop,Coffee Shop / Café,Grocery/Convenience/Drug Store,Gym,Library,Train/Subway Station,Distance to University
0,1,M4Y,43.66586,-79.38316,3,21,20,20,3,5,1.288
1,1,M5B,43.657162,-79.378937,10,29,15,16,5,5,1.772
2,1,M5C,43.651494,-79.375418,9,28,14,25,4,5,2.327
3,1,M5E,43.644771,-79.373306,15,34,10,10,1,3,2.881
4,1,M5G,43.657952,-79.387383,12,37,10,12,6,6,0.968
5,1,M5H,43.650571,-79.384568,12,40,13,18,4,6,1.661
6,1,M5J,43.640816,-79.381752,5,40,12,13,1,2,2.612
7,1,M5K,43.647177,-79.381576,19,38,18,22,1,8,2.114
8,1,M5L,43.648198,-79.379817,21,39,16,22,0,5,2.164
9,1,M5S,43.662696,-79.400049,2,31,8,6,24,2,0.435


This is the area which represents the main downtown Toronto. All are very close to the university and have long lists of each category of venue close to them.

#### CLuster 3

In [None]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2].iloc[:, 4:].reset_index(drop = True)

Unnamed: 0,Bus Station/Stop,Coffee Shop / Café,Grocery/Convenience/Drug Store,Gym,Library,Train/Subway Station,Distance to University
0,10,9,5,2,1,3,12.563
1,1,11,9,2,0,1,10.798
2,7,11,8,4,1,1,4.659
3,2,11,4,5,0,1,5.488
4,1,9,2,8,1,0,4.2
5,3,12,8,3,0,0,2.848
6,8,15,7,9,0,1,3.611
7,1,10,5,6,0,1,1.399
8,1,9,3,0,0,0,2.767
9,5,8,7,3,2,0,4.696


This cluster consists of mainly neighborhoods that are an average distance away from the university. They are mostly near subway stations or at the very least close to a bus stop. They almost all have many coffee shops, stores and gyms nearby. Some are close to a library but about half are not.

## Results and Discussion <a name="results"></a>

The clustering seperates the areas almost perfectly. We analyze the city of Toronto and managed to find three different categories of areas. Although this is a very basic model with a low number of features, it is quite descriptive and also very easy to analyze. Let us now discuss each area in more detail and determine pros and cons of each.

#### Cluster 1: Calmer Lifestyle, Longer Commutes
These are the areas in the quiter parts of the city without all the bright lights and the tall buildings stacked one after another. Some pros of these areas are that after long days at the university in downtown you can come home to a peaceful area away from all the noisy streets. You can also spend your leisure time in the beautiful parks and neighborhoods around, which are found all over Toronto. These areas are also on average on the cheaper side and you don't have to pay large sums of money for parking each month if you plan on owning a car. 

The cons are that as expected there not always a variery of each veue category located near you. Chances are that you will not be at walking distance to any grocery stores, gyms, or cafes. Last but certainly not least, these are areas with the longest commute times to the university so they are not very ideal if you cannot stand spending around 1 to 2 hours each day in public transity to the university.
#### Cluster 2: Downtown Lifestyle, Neighboring the University
The second cluster is basically downtown Toronto in a nutshell. Toronto is amongst the busiest and most populated cities in North America. Thus, living there might not suit everybodies lifestyle preferences especially if you are used to smaller cities or towns. The streets are always noisy and filled with people rushing from building to building. Some may enjoy this lifestyle and get a boost of motivation from all the downtown energy and some may find it overwhelming. 

Some amazing things about these areas are that you are very close to the university so your daily commute is very convenient. Being close to the university also gives you ease of access to all the universities amenities and facilities. You can use their amazing gyms and libraries with the comfort of being close to home. You are also in close proximity to all downtown cafes, restaurants, stores, etc. Living in these areas can get very expensive. Owning a car is almost never an option for a student living there. You might also have to sacrifice living in a much smaller space compared to areas in cluster 1 or 3.
#### Cluster 3: The Average
The last cluster can be seen as the average of the first two in almost every single aspect. They are in the middle in terms of distance to the university and commute times. They are certainly not as crowded as downtown areas and are busier than areas in cluster 1. They normally have a good number of each category of amenities close to them and they are almost all close to the metro station.

They are also average in terms of rental prices. You might be able to own a car and live in a decent sized space depending on the exact location, building, and of course your own budget. These areas are the ones that most students prefer given that they are decently close to the university and still have all categories of venues near them.


## Conclusion <a name="conclusion"></a>

This project aimed to find the best areas for university students to live given their preferences, and lifestyles. Many students come to Toronto from other place for studies and are not sure where to live. There are also students that have lived in Toronto previously that are not sure which areas would fit their needs the most. By collecting some information about the different areas in the city of Toronto that most student tend to care about when deciding on a location to live we tried to make this process of finding a home easier. 

We created 3 different clusters of areas based on postal code and we analyzed all 3 to find the pros and cons of each. The first cluster was best for people that prefer quieter areas that do not have an issue with longer commutes. The second cluster was basically downtown Toronto near the university with all its glory and noisy streets. The third was an average the first two, not too far from the university and not too close to the busy downtown areas.

The ultimate decision is for the students to make on which areas they prefer and what budgets they have. Based on the data I believe most people would prefer the third cluster, being in close proximity to different categories of venues and also decently close to downtown Toronto and the university. This analysis can be performed for any other city and around any other university to find the most ideal places for living.