In [1]:
import pandas as pd
import numpy as np
import lxml
import urllib.request
import geocoder
import folium

# Capstone Project - The Battle of Neighborhoods (Week 1)
## Finding the best neighborhood for my new Japanese restaurant
*This notebook contains my submission for Capstone Project week 1 of the Coursera course 'Applied Data Science Capstone'.*

## Introduction
As a restaurant owner I would like to open a Japanese restaurant in Toronto, Canada. As I would like to maximize my profits, I am looking for a neighborhood that would meet the following requirements:

* have a relatively high population;
* have a low crime rate;
* have a low number of Japanese restaurants.

In order to determine the best neighborhood to open a new Japanese restaurant, we could use data science for supporting this business decision: The business problem is to find the best neighborhood to open a Japanese restaurant in terms of maximizing profits. Important assumptions I make are 1) a high population leads to more profits, and 2) a high crime rate leads to less profits. more people would lead to more business and more crime leads to less business).

### Target audience
Even thought this project is tailored to (potentially new) restaurant owners, the target audience is anyone who is interested in opening a business anywhere in the world. The techniques used here could also be applied to various other businesses (coffee shops, clothing stores, supermarkets etc.) and locations. Given that data is available, the sky is the limit. Combining data sources and datasets would lead to a even more comprehensive analysis.

## Data
Since we would like to determine the best location for the restaurant using data science, one of the first steps is to determine what data we need, what data is available, and how we could use the available data to solve our business problem. Hereafter, I will describe the data that is available and that I would like to use to determine which neighborhood best fulfills the listed requirements above.

### Neighborhood data
In previous assignments of this Coursera course we have already scraped a Wikipedia page to obtain data on the neighborhoods of Toronto, Canada. The analysis can be found [here](https://github.com/dkreeft/Coursera_Capstone/blob/master/capstone_week3.ipynb). That dataset will be used here to have a list of neighborhoods and their geographical coordinates. 

### High population / low crime rate
Data about the population of Toronto, Canada is readily available. However, as we are interested in the population per neighborhood, this data is less easy to find. The Toronto Police shares [http://data.torontopolice.on.ca/datasets/neighbourhood-crime-rates-boundary-file-](data on crimes in neighborhoods) that also happens to contain data on the population of each neighborhood. Using this dataset, we can determine the numbers of crimes per neighborhood as well as account for the population (calculate crime rates) and determine the overall crime rate by extracting a new feature (i.e., taking the average of the rates for the different crimes). The dataset looks as follows.

In [2]:
df_crime = pd.read_csv('Neighbourhood_Crime_Rates_Boundary_File_.csv')
df_crime.head(10)

Unnamed: 0,OBJECTID,Neighbourhood_Crime_Rates_Neigh,Neighbourhood_Crime_Rates_Hood_,Hood_ID,Neighbourhood,Assault_2014,Assault_2015,Assault_2016,Assault_2017,Assault_2018,...,Homicide_2015,Homicide_2016,Homicide_2017,Homicide_2018,Homicide_AVG,Homicide_CHG,Homicide_Rate_2018,Population,Shape__Area,Shape__Length
0,1,Yonge-St.Clair,97,97,Yonge-St.Clair,58,38,51,46,61,...,0,0,0,0,,,0.0,3189,1161315.0,5873.270507
1,2,York University Heights,27,27,York University Heights,78,101,111,120,138,...,0,2,0,1,1.3,,2.7,36764,13246660.0,18504.777616
2,3,Lansing-Westgate,38,38,Lansing-Westgate,216,203,223,226,197,...,0,0,0,0,,,0.0,10242,5346186.0,11112.109419
3,4,Yorkdale-Glen Park,31,31,Yorkdale-Glen Park,121,141,136,124,127,...,1,1,1,2,1.2,100%,11.0,18233,6038326.0,10079.426837
4,5,Stonegate-Queensway,16,16,Stonegate-Queensway,109,140,124,112,128,...,0,0,0,0,1.0,,0.0,22207,7946202.0,11853.189803
5,6,Tam O'Shanter-Sullivan,118,118,Tam O'Shanter-Sullivan,63,58,50,57,56,...,1,0,0,1,1.0,,2.3,43695,5422345.0,10750.46829
6,7,The Beaches,63,63,The Beaches,349,392,380,435,457,...,0,0,0,0,,,0.0,28378,3595829.0,11275.181284
7,8,Thistletown-Beaumond Heights,3,3,Thistletown-Beaumond Heights,45,47,39,21,30,...,1,0,0,2,1.5,,12.5,16039,3339481.0,10828.444269
8,9,Thorncliffe Park,55,55,Thorncliffe Park,111,124,157,147,135,...,1,1,4,0,2.3,-100%,0.0,8352,3126554.0,7502.70932
9,10,Danforth East York,59,59,Danforth East York,214,203,214,203,227,...,0,0,0,0,,,0.0,10485,2188598.0,7623.857803


### Restaurant data
I would like to use data from Foursquare to determine whether a neighborhood already contains a Japanese restaurant or not, and if so how many. Using this location data, I can determine whether the competition would probably be fierce in a given neighborhood or not. As my goal is to maximize profits, I should find a neighborhood that has a relatively low number of Japanese restaurants, at least per capita (per person). Of course, the assumption here is that neighborhoods with a relatively low number of Japanese restaurants result in a business opportunity to open a new Japanese restaurant.