# Battle of the Neighborhoods IBM Capstone Report: Schools and Residential Analyses for the Toronto Metropolitan Area

## Introduction (Business Problem)

Toronto is the largest metropolitan area in Canada. The city of over six million people is the center of the country’s financial commercial efforts. It is growing and projects to attract many people and businesses over the coming decade.

The objective of this study is to identify and cluster the various neighborhoods of Toronto and into similarities for business, dining, entertainment, and housing.  The results of this analysis will be applied to individuals and corporations looking to relocate to Toronto.

Our target audience are twofold; individuals/families looking to relocate to a growing, dynamic and relatively safe metropolitan area, and businesses – established or start-up, that wish to leverage the commercial and financial advantages not just of Canada, but North America in general.


## Data

The datasets are compilations of several sites that focus on geospatial and demography.  The first database is https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M. This provides the individual boroughs and neighborhoods within Toronto as well as the accompanying postal codes.  This is the geographical foundation for our study.

Our next data source provides that latitudes and longitudes for Toronto and its constituent neighborhoods, http://cocl.us/Geospatial_data.  We will be able to build out our interactive and detailed maps.
The data retrieved from Foursquare contains information of venues within a specified distance of the longitude and latitude of the postalcodes. The information obtained per venue as follows:

1. Neighborhood
2. Neighborhood Latitude
3. Neighborhood Longitude
4. Venue
5. Name of the venue e.g. the name of a store or restaurant
6. Venue Latitude
7. Venue Longitude
8. Venue Category

Finally, we utilize our Foursquare API to populate venues, categories, businesses and rating to the various concerns within the neighborhoods.
These resources will allow us to create an objective, data driven product that will allow potential businesses and people to review the analysis and determine if relocation is the right choice and if so where in Toronto to relocate to.


![Toronto_map.jpg](attachment:Toronto_map.jpg)

## Methodology

### Clustering Approach

In order to compare the various venues, housing prices, and school ranking in Toronto we decided to explore neighborhoods, segment them, and group them into clusters to analyze differences and similarities between the neighborhoods. We need to cluster data which in an unsupervised machine learning algorithm: k-means clustering algorithm.

#### K-means Clustering Coding and Output Sample

##### Utilize Foursquare API to Explore Neighborhoods

![Neighborhood_explore.jpg](attachment:Neighborhood_explore.jpg)

![venues.jpg](attachment:venues.jpg)

##### One Hot Encoding

We normalize,group and determine the frequencies,and unique values for venues within the various neighborhoods.

##### K-means

![Kmeans.jpg](attachment:Kmeans.jpg)

* Due to the API Constraints we limit the neighborhood parameter to 100 and the radius parameter to 500.

## Results

### Map of Clusters

![Clusters.jpg](attachment:Clusters.jpg)

### Average Housing Price by Clusters

![housing.jpg](attachment:housing.jpg)

### School Ratings by Clusters

![school.jpg](attachment:school.jpg)

#### Foursquare API

This project uses the Foursquare API as its prime data gathering source. Its database storesof millions of places, especially their places API which provides the ability to perform location search, location sharing and details about a business.

## Discussion

###### Opportunity/Problem: 
The major purpose of this project, is to analyze the various neighborhoods in a new city for the people and/or businesses that are considering relocating. Housing prices and school district ranking are highly influential factors in deciding major life events. The hope is that people can use this analysis to help decide their choices
1.	Sorted list of house in terms of housing prices in a ascending or descending order
2.	Sorted list of schools in terms of location, fees, rating and reviews


## Conclusion

Using k-means cluster algorithm I separated Toronto into 5(five) distinctt clusters and for different lattitude and logitude from dataset, providing similar neighborhood profiles around them. Individuals can use the outputs of neighborhood analyses to help decide their next actions. 

The good part of this output is the data is modular so people at different point of their life can make decisions appropriate to them (people with children will way school rnaking while those without will put less weight).  This provides the means for people to use data without trying to prescribe end results.

This project has shown me a practical application to resolve a real situation that has impacting personal and financial impact using Data Science tools. The mapping with Folium is a very powerful technique to consolidate information and make the analysis and decision with confidence.


#### Libraries Used in this Analysis

Pandas: For creating and manipulating dataframes.
Folium: Python visualization library would be used to visualize the neighborhoods cluster distribution of using interactive leaflet map.
Scikit Learn: For importing k-means clustering.
JSON: Library to handle JSON files.
XML: To separate data from presentation and XML stores data in plain text format.
Geocoder: To retrieve Location Data.
Beautiful Soup and Requests: To scrap and library to handle http requests.
Matplotlib: Python Plotting Module.
