# Final Report - Capstone Project - The Battle of Neighborhoods - Calgary, AB

## 1. Introduction:

The purpose of this project is to highlight the best and most affordable neighborhoods in Calgary,AB, which is one of Canada's most reputable cities. The city at the heart of Canada's Energy sector has been a sought after destination for new families and immigrants looking for new opportunities and affordable cost of living without sacrificing the standard of living.

This project aims to analyze the different neighborhoods in Calgary and supply a report that will determine the median house price, the median elementary public school rating, and the frequency of the nearby amenities available within a 5 km radius.

## 2. Data Section:

Data Link: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_T

We will be using the link above to scrape the different neighborhoods with the postal codes start with the letter 'T'. The data will then be cleaned to only include neighborhoods pertaining to Calgary.

#### Foursquare API Data:

We will need data about different venues in different neighborhoods of that specific borough. In order to gain that information we will use "Foursquare" locational information. Foursquare is a location data provider with information about all manner of venues and events within an area of interest. Such information includes venue names, locations, menus and even photos. As such, the foursquare location platform will be used as the sole data source since all the stated required information can be obtained through the API.

After finding the list of neighborhoods, we then connect to the Foursquare API to gather information about venues inside each and every neighborhood. For each neighborhood, we have chosen the radius to be 5 kilometers.

The data retrieved from Foursquare contained information of venues within a specified distance of the longitude and latitude of the postcodes. The information obtained per venue as follows:

1. Neighborhood
2. Neighborhood Latitude
3. Neighborhood Longitude
4. Venue
5. Name of the venue e.g. the name of a store or restaurant
6. Venue Latitude
7. Venue Longitude
8. Venue Category


#### Map of Calgary

![alt text](https://raw.githubusercontent.com/FarshadZafari/Coursera_Capstone/master/MapofCalgary1.png)

## 3. Methodology Section

#### Clustering Approach:

We decided to explore neighborhoods, segment them, and group them into clusters to find nearby venues and facilities. To be able to do that, we need to cluster data which is a form of unsupervised machine learning: k-means clustering algorithm.

#### Using K-Means Clustering Approach

![alt text](https://raw.githubusercontent.com/FarshadZafari/Coursera_Capstone/master/KMeans.png)

#### Most Common Venues Near Neighborhood

![alt text](https://raw.githubusercontent.com/FarshadZafari/Coursera_Capstone/master/Venues.png)

#### Work Flow:

Using credentials of Foursquare API features of near-by places of the neighborhoods would be mined. Due to http request limitations the number of places per neighborhood parameter would reasonably be set to 100 and the radius parameter would be set to 5000

## 4. Results Section

#### Map of Clusters in Calgary

![alt text](https://raw.githubusercontent.com/FarshadZafari/Coursera_Capstone/master/Clusters.png)

#### Average Housing Price by Clusters in Calgary

![alt text](https://raw.githubusercontent.com/FarshadZafari/Coursera_Capstone/master/Housing.png)

#### Elementary Public School Ratings by Clusters in Calgary

![alt text](https://raw.githubusercontent.com/FarshadZafari/Coursera_Capstone/master/Schools.png)

#### The Location:

Calgary has been and continues to be a hot spot for new immigrants, and Canadians shifting provinces. Although the downturn of Canada's energy sector has negatively impacted the province of Alberta and especially in the greater Calgary area, the economic climate in Calgary remains hopeful as diversification is well under way and immensely lucrative for commercial real estate.

#### Foursquare API:

This project has used the Four-square API as its prime data gathering source as it contains a database of millions of locations which provides the ability to perform location search, location sharing and details about a business.

## 5. Discussion Section

#### Problem which to Solve:

The major purpose of this project, is to suggest a better neighborhood in a new city for the people who are shiffting there by analyzing the socio-economic status, connectivty to the airport, approximity to city center, public transportation, and every day living essentials determined for the average Canadian.

1. Sorted list of house in terms of housing prices in a ascending or descending order
2. Sorted list of schools in terms of location, fees, rating and reviews

## 6. Conclusion Section

In this project, using k-means cluster algorithm I separated the neighborhood into 3 different clusters and for 35 different lattitude and logitude from dataset, which have very-similar neighborhoods around them. Using the charts above results presented to a particular neighborhood based on average house prices and school rating have been made.

I feel rewarded with the efforts and believe this course with all the topics covered is well worthy of appreciation. This project has shown me a practical application to resolve a real situation that has impacting personal and financial impact using Data Science tools. The mapping with Folium is a very powerful technique to consolidate information and make the analysis and decision better with confidence.

#### Future Works:

This project can be continued to be made more precise in terms of locating the optimal neighborhood in Calgary. It is very flexible in terms of feeding a more involved dataset to work with and altering the parameters that will be required.

#### Libraries Used to Develop this Project:

* Pandas: For creating and manipulating dataframes.
* Folium: Python visualization library would be used to visualize the neighborhoods cluster distribution of using interactive leaflet map.
* Scikit Learn: For importing k-means clustering.
* Geocoder: To retrieve Location Data.
* Matplotlib: Python Plotting Module.