# **CAPSTONE PROJECT: BATTLE OF THE NEIGHBORHOODS**
---
## **Assessment of the Relocation Options**
___


### _Table of Contents_

1. [Purpose](#1)<br>
2. [Introduction](#2)<br>
3. [Data Acquisition](#3) <br>
4. [Methodology](#4) <br>
5. [Box Plots](#8) <br>
6. [Scatter Plots](#10) <br>
7. [Bubble Plots](#12) <br> 

***

### _1. PURPOSE_ <a id="1"></a>
This is the final project for IBM Data Science Professional Certificate. The main purpose of this project is to apply data science methodologies to analyze available data and provide recommendation for the best option for family relocation.
***

### _2. INTRODUCTION_ <a id="2"></a>
For my final project I decided to explore the potential options for relocation to Kitchener or Waterloo, Ontario. In the current economic situation in Western Canada some families can consider potential relocation to other provinces. This project mainly inspired by my personal thoughts and I used it as an opportunity to practice what I learned during this course.

There are several open resources where people can find information about potential target location, but most of the time it is unstructured and, in many cases, biased information. Some websites based only tourist attractions and reviews; others pulled information from real estate agencies, food chains, etc. When someone decided to conduct a research about particular town or area, in the best case it would take more than a dozen of different search requests. The results might be not as informative as it was anticipated. Eventually it is going to be a question – what is next? Where to look for more information? 

The purpose of this project is to collect all available open source data, apply data analytics methodologies and provide recommendation based on statistical data. The recommendations would present comparison of options based on rental prices, access to recreational areas, schools, etc. 

For this project the data was used from the following sources: 
* Canada Mortgage and Housing Corporation (CMHC) 
* Open Data from City of Kitchener website 
* Foursquare City Guide 
* Wikipedia.
***

### _3. DATA ACQUISITION_ <a id="3"></a>
The following data was acquired from public sources:

[**Canada Mortgage and Housing Corporation (CMHC)**](https://www03.cmhc-schl.gc.ca/hmip-pimh/en#TableMapChart/0850/3/Kitchener%20-%20Cambridge%20-%20Waterloo) 

From CMHC retrieved the data containing “Kitchener-Cambridge-Waterloo — Average Rent by Bedroom Type by Zone”. This data would be used to extract average apartment rental prices as of October 2019. 

[**Kitchener GeoHub**](https://open-kitchenergis.opendata.arcgis.com/) 

Kitchener GeoHub is part of Waterloo region open data initiative and contain variety of the information. The retrieved shape files and datasets used to generate JavaScript Object Notation (JSON) files for Kitchener and Waterloo with following information: ward numbers, ward boundaries, neighborhoods, addresses, education facilities. To create necessary JSON files the third party software was used - [QGIS](https://qgis.org/en/site/) A Free and Open Source Geographic Information System (c).

[**Foursquare City Guide**](https://foursquare.com/) 

The FourSquare API would be used to explore and retrieve the data containing up to date information about venues and point of interests in Kitchener and Waterloo. The result would be in the form of JSON file and include the following information:
* Venue ID
* Venue Name
* Coordinates : Latitude and Longitude
* Category Name

[**Wikipedia**](https://en.wikipedia.org/wiki/Kitchener_City_Council)

Wikipedia used to retrieve data containing city of Kitchener Ward list and communities included in each Ward. 

**Note:** For city of Waterloo there is no such information available. For this project the dataset containing similar information in CSV format  was manually created. The data gathered from several websites: 
* Real estate agents 
* Wikipedia 
* City of Waterloo 
* Google Maps
***

### _4. METHODOLOGY_ <a id="4"></a>

#### _4.1. DATA SELECTION_
For the project data used only from open sources. There were two targets selected for this project – Kitchener and Waterloo in Ontario, both cities are connected to each other. The communities/neighborhoods are relatively small with not too many venues in each separate community. The best option to group communities is by using Ward numbers. In average every Ward includes three communities. Total 17 wards (ten in Kitchener and seven in Waterloo).

This project requires dataset containing Ward numbers and coordinates of the geographical center of the ward area. This information would be used as a reference point to explore and retrieve data for venues from FourSquare. 

The “ready-to-go” dataset containing center coordinates was not available and had to be created manually. The best source for required data is Kitchener GeoHub. The website provides various statistical data, maps, dataset, etc., in form of CSV or shape files. The following shape files were created on Kitchener GeoHub: 
* Addresses – physical address of the buildings in Waterloo area
* Ward Boundaries – file containing polygons for every Ward in both cities
* Educational Facilities – locations of every educational institution in both cities.

These shape files were loaded to QGIS © software to create customized JSON files. For the Ward center file the coordinates of the building in geographical center of the ward area was used. After selecting 17 addresses the JSON file with ward cetner coordinates was generated. For the educational facilities file no cleaning or manual selection were required – shape files were loaded and JSON files generated.

![Ward Map](Project_Files/QGIS_map.png)

List of communities/neighborhoods was created manually from several sources, such as real estate agencies and Wikipedia. The resulted files were in CSV format and ready to upload to the notebook.

The average monthly rental price data was acquired from CMHC website in form of CSV file.

#### _4.1. DATA CLEANING AND RE-GROUPING_
Data was loaded to Jupiter Notebook for further work. Except manually created CSV files all datasets created from JSON files require additional cleaning, grouping and normalization.

The table shown below is the final dataset containing Ward unique ID, ward center coordinates and communities list.

![Ward List](Project_Files/Ward_List.JPG)
    

#### _4.2. SEGMENTING AND CLASTERING KITCHENER AND WATERLOO WARDS_

After dataset with ward’s information for both cities were prepared the next step was to acquire information related to venues and points of interests in the area. For this task FourSquare was used.

The explore function of FourSquare API was utilized to explore and retrieve information about most popular or common venues in communities, such as food courts, shops, restaurants, parks, public transportations, etc. 
The k-means clustering algorithm utilized to group wards in clusters. The final result containing communities color-coded to corresponding clusters was visualized using Folium library.

To retrieve venue information FourSquare API the function **getNearbyVenues** was defined. This function used predefined search radius and coordinates to locate venues and generate file containing the following venue information:
* Venue ID
* Venue Name
* Latitude and Longitude
* Category Name

This information was extracted from JSON and dataset created. There were total of 238 venues in 96 unique categories retrieved from FourSquare.

Venue count by Ward:

![Venues_per_ward](Project_Files/Venues_per_Ward.JPG)

To analyze venues located in each Ward the **One Hot Encoding** technique was used. The result was grouped and produced 16 rows (one row for each Ward, excluding Kitchener Ward 3) and 97 columns.

The next step was to analyze popularity of the venues and identification of top 10 venues for each Ward in both cities. 

![Top10_per_ward](Project_Files/Top10_per_ward.JPG)

Final step was to cluster results by utilizing **k-means** clustering. After some testing it was decided to proceed with five clusters. The result was merged with communities list for final presentation on the map.



|	Latitude	|	Longitude	|	Municipality	|	Ward	|	Community	|	Cluster Labels	|	1st Most Common Venue	|	2nd Most Common Venue	|	3rd Most Common Venue	|	4th Most Common Venue	|	5th Most Common Venue	|	6th Most Common Venue	|	7th Most Common Venue	|	8th Most Common Venue	|	9th Most Common Venue	|	10th Most Common Venue	|
|	:--:	|	:--:	|	:---	|	:--:	|	:---	|	:--:	|	:---	|	:---	|	:---	|	:---	|	:---	|	:---	|	:---	|	:---	|	:---	|	:---	|
|	43.46750655	|	-80.45676691	|	Kitchener	|	K01	|	Bridgeport, RiverRidge, Rosemount, Heritage Park	|	1	|	Fast Food Restaurant	|	Coffee Shop	|	Restaurant	|	Construction & Landscaping	|	Clothing Store	|	Brewery	|	Bowling Alley	|	Rental Car Location	|	Beer Store	|	Pub	|
|	43.44183516	|	-80.42731253	|	Kitchener	|	K02	|	Stanley Park, Centreville	|	1	|	Pizza Place	|	Park	|	Trail	|	Bus Station	|	Fast Food Restaurant	|	Convenience Store	|	Cosmetics Shop	|	Deli / Bodega	|	Department Store	|	Dim Sum Restaurant	|
|	43.38286678	|	-80.43175205	|	Kitchener	|	K04	|	Strasburg, Doon	|	1	|	Pizza Place	|	Pharmacy	|	Park	|	Spa	|	Dog Run	|	Convenience Store	|	Cosmetics Shop	|	Deli / Bodega	|	Department Store	|	Dim Sum Restaurant	|
|	43.38415865	|	-80.4921803	|	Kitchener	|	K05	|	Laurentian West, Huron Park, Williamsburg	|	3	|	Coffee Shop	|	Park	|	Bar	|	Furniture / Home Store	|	Yoga Studio	|	Fast Food Restaurant	|	Cosmetics Shop	|	Deli / Bodega	|	Department Store	|	Dim Sum Restaurant	|
|	43.41523759	|	-80.48778382	|	Kitchener	|	K06	|	Laurentian Hills, Country Hills, Alpine Village	|	1	|	Convenience Store	|	Park	|	Fast Food Restaurant	|	Clothing Store	|	Bus Station	|	Discount Store	|	Dog Run	|	Shopping Mall	|	Skating Rink	|	Pizza Place	|
|	43.4253858	|	-80.54489523	|	Kitchener	|	K07	|	Forest Heights, Waldau	|	1	|	Coffee Shop	|	Grocery Store	|	Supermarket	|	Pizza Place	|	Convenience Store	|	Restaurant	|	Sandwich Place	|	Breakfast Spot	|	Pet Store	|	Liquor Store	|
|	43.43707244	|	-80.51932757	|	Kitchener	|	K08	|	Forest Hills, Victoria Hills	|	1	|	Fast Food Restaurant	|	Restaurant	|	Pizza Place	|	Grocery Store	|	Food & Drink Shop	|	Skating Rink	|	Gym / Fitness Center	|	Gas Station	|	Discount Store	|	Convenience Store	|
|	43.44130888	|	-80.48855592	|	Kitchener	|	K09	|	Victoria Park, Southdale, Cherry Hill, Rockway	|	4	|	Brewery	|	Ice Cream Shop	|	Latin American Restaurant	|	Baseball Field	|	Beer Garden	|	French Restaurant	|	Deli / Bodega	|	Department Store	|	Dim Sum Restaurant	|	Diner	|
|	43.45767164	|	-80.48457583	|	Kitchener	|	K10	|	Fairfield, Northward, Central Frederick, Auditorium, King East, Eastwood	|	1	|	Train Station	|	Restaurant	|	Gym / Fitness Center	|	Sporting Goods Shop	|	Concert Hall	|	Coffee Shop	|	Clothing Store	|	Chinese Restaurant	|	Dim Sum Restaurant	|	Movie Theater	|
|	43.44702464	|	-80.56254102	|	Waterloo	|	W01	|	Beechwood West, Upper Beechwood, Westvale	|	1	|	Grocery Store	|	Pizza Place	|	Pet Store	|	Park	|	Bank	|	Deli / Bodega	|	Department Store	|	Yoga Studio	|	Dog Run	|	Cosmetics Shop	|
|	43.46752806	|	-80.59171463	|	Waterloo	|	W02	|	Clair Creek Meadows, Clair Hills, Erbsville, Laurelwood, Vista Hills	|	1	|	Video Game Store	|	Park	|	Asian Restaurant	|	Ice Cream Shop	|	Supermarket	|	Pizza Place	|	Chinese Restaurant	|	Sandwich Place	|	Pharmacy	|	Wings Joint	|
|	43.49248702	|	-80.56311178	|	Waterloo	|	W03	|	Lakeshore, Lakeshore North/Conservation Meadows, Parkdale	|	2	|	Park	|	College Classroom	|	Sandwich Place	|	Yoga Studio	|	Fast Food Restaurant	|	Cosmetics Shop	|	Deli / Bodega	|	Department Store	|	Dim Sum Restaurant	|	Diner	|
|	43.50533723	|	-80.51575545	|	Waterloo	|	W04	|	Colonial Acres, Rural East Country Squire, Glenridge, Eastbridge	|	0	|	Pizza Place	|	Pharmacy	|	Arts & Crafts Store	|	Indian Restaurant	|	Furniture / Home Store	|	Martial Arts Dojo	|	Fast Food Restaurant	|	Cosmetics Shop	|	Deli / Bodega	|	Department Store	|
|	43.48892327	|	-80.49853611	|	Waterloo	|	W05	|	Lexington and University Downs, Lincoln Heights, Lincoln Village	|	1	|	Pizza Place	|	Pharmacy	|	Pet Store	|	Park	|	Cosmetics Shop	|	Coffee Shop	|	Sandwich Place	|	Chinese Restaurant	|	Mediterranean Restaurant	|	Grocery Store	|
|	43.47046663	|	-80.55186311	|	Waterloo	|	W06	|	Beechwood, Uptown Waterloo North, Maple Hills	|	1	|	Coffee Shop	|	Sandwich Place	|	Juice Bar	|	CafÃ©	|	College Gym	|	Outdoor Sculpture	|	Bar	|	Sushi Restaurant	|	Convenience Store	|	Dog Run	|
|	43.46636455	|	-80.52301221	|	Waterloo	|	W07	|	Uptown Waterloo South, Westmount	|	1	|	CafÃ©	|	Restaurant	|	Bar	|	Coffee Shop	|	Pizza Place	|	Brewery	|	Indie Movie Theater	|	Grocery Store	|	Gastropub	|	Pub	|


![Cluster_map](Project_Files/Cluster_map.JPG)

#### _4.3. RENTAL PRICES_

The most recent rental prices for apartments in Kitchener and Waterloo were added to this project. It would be additional information for decision making process.

|	Municipality	|	Bachelor	|	1 Bedroom	|	2 Bedroom	|	3 Bedroom +	|
|	:--	|	:--:	|	:--:	|	:--:	|	:--:	|
|	Kitchener East	|	752 	|	974 	|	1168 	|	1321 	|
|	Kitchener Central	|	774 	|	959 	|	1176 	|	1599 	|
|	Kitchener West	|	797 	|	1044 	|	1264 	|	1336 	|
|	Waterloo	|	1032 	|	1197 	|	1354 	|	1337 	|


![Rental_prices](Project_Files/Rental_Prices.JPG)