# Capstone Project - The Battle of the Neighborhoods
### Applied Data Science Capstone by IBM/Coursera



## Introduction: Business Problem

As increasing numbers of consumers want to dine out or take prepared food home, the number of food-service operations has skyrocketed today. But there's still room in the market for your food-service business. Choosing location is vitally important to start a restaurant.

This project is trying to find an optimal location for a restaurant. In particular, this project will be targeted to stakeholders interested in opening an **Chinese restaurant** in **Toronto, Canada**.

Since there are lots of restaurants in Toronto, we will try to detect **locations that are not already crowded with restaurants**. We are also particularly interested in **areas with no Chinese restaurants in vicinity**. We would also prefer locations **as close to city center as possible**, assuming that first two conditions are met.

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data

Based on definition of our problem, factors that will influence our decission are:
* number of existing restaurants in the neighborhood (any type of restaurant)
* number of Chinese restaurants in the neighborhood, if any
* distance of neighborhood from city center

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **geopy library**
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of Toronto center will be obtained using **geopy library**

### Neighborhood Candidates

Let's create latitude & longitude coordinates for centroids of our candidate neighborhoods. 

Let's first find the latitude & longitude of Toronto city center.

#### Use geopy library to get the latitude and longitude values of Toronto.

In [1]:
address = 'Toronto'

from geopy.geocoders import Nominatim 
geolocator = Nominatim(user_agent="ny_explorer", timeout=10)
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


#### Explore the neighborhood in the city of Toronto, Canada.
For the Toronto neighborhood data, a Wikipedia page exists that has all the information we need to explore the neighborhoods in Toronto. We will scrape the Wikipedia page: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M and wrangle the data, clean it, and then read it into a pandas dataframe so that it is in a structured format.

In [2]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

req = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(req.content,'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table))
neighborhood=pd.DataFrame(df[0])

neighborhood

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,
176,M6Z,Not assigned,
177,M7Z,Not assigned,
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


#### Clean the data

In [3]:
neighborhood = neighborhood[neighborhood.Borough != 'Not assigned']
neighborhood.groupby('Postal Code')['Neighborhood'].apply(', '.join)
neighborhood.loc[(neighborhood['Neighborhood'] == 'Not assigned'), 'Neighborhood'] = neighborhood['Borough'] 
neighborhood.reset_index(drop=True, inplace=True)
neighborhood

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business reply mail Processing Centre
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


In [4]:
neighborhood.shape

(103, 3)

#### Get the geo data

In [5]:
geospatial_data = pd.read_csv('http://cocl.us/Geospatial_data')
neighborhood_toronto = pd.merge(neighborhood, geospatial_data, on = 'Postal Code')
neighborhood_toronto

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,Business reply mail Processing Centre,43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


#### Create a map for neighborhood in Toronto City.

In [6]:
import folium
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, borough, neighborhood in zip(neighborhood_toronto['Latitude'], neighborhood_toronto['Longitude'], neighborhood_toronto['Borough'], neighborhood_toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
map_toronto

#### Calculate distance from center

In [7]:
import geopy.distance

distances_from_centers = []
for index, row in neighborhood_toronto.iterrows():
    distance = geopy.distance.vincenty(
        (row['Longitude'], row['Latitude']), 
        (longitude, latitude)
    ).km
    distances_from_centers.append(distance)

neighborhood_toronto['Distances_from_Centers'] = distances_from_centers
neighborhood_toronto

  import sys


Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Distances_from_Centers
0,M3A,North York,Parkwoods,43.753259,-79.329656,6.400372
1,M4A,North York,Victoria Village,43.725882,-79.315572,7.778012
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636,2.601491
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,9.122914
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0.646674
...,...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,13.734695
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160,0.268979
100,M7Y,East Toronto,Business reply mail Processing Centre,43.662744,-79.321558,6.967320
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509,12.797783


### Foursquare
Now that we have our location candidates, let's use Foursquare API to get info on restaurants in each neighborhood.

We're interested in venues in 'food' category, but only those that are proper restaurants - coffe shops, pizza places, bakeries etc. are not direct competitors so we don't care about those. So we will include in out list only venues that have 'restaurant' in category name, and we'll make sure to detect and include all the subcategories of specific 'Chinese restaurant' category, as we need info on Chinese restaurants in the neighborhood.

In [8]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180604'

#### Explore Neighborhoods in Toronto¶

In [9]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

toronto_venues = getNearbyVenues(names=neighborhood_toronto['Neighborhood'],latitudes=neighborhood_toronto['Latitude'],longitudes=neighborhood_toronto['Longitude'])

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmount Park
Bayview Village
Downsview
The Danforth West, Ri

In [10]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


In [11]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
toronto_onehot

Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2114,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2115,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2116,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2117,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [12]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').sum().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,Agincourt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Alderwood, Long Branch",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Bathurst Manor, Wilson Heights, Downsview North",0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
3,Bayview Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Bedford Park, Lawrence Manor East",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88,"Wexford, Maryvale",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
89,Willowdale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
90,Woburn,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
91,Woodbine Heights,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0


#### Select columns contains name 'Restaurant'

In [13]:
toronto_restaurants = toronto_grouped
toronto_restaurants.set_index('Neighborhood', inplace=True)
columns = [columnname for columnname in toronto_restaurants.columns if 'Restaurant' in columnname]
toronto_restaurants = toronto_restaurants[columns]
toronto_restaurants

Unnamed: 0_level_0,Afghan Restaurant,American Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Colombian Restaurant,Comfort Food Restaurant,...,Portuguese Restaurant,Ramen Restaurant,Restaurant,Seafood Restaurant,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Agincourt,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
"Alderwood, Long Branch",0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
"Bathurst Manor, Wilson Heights, Downsview North",0,0,0,0,0,0,0,0,0,0,...,0,0,1,0,1,0,0,0,0,0
Bayview Village,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
"Bedford Park, Lawrence Manor East",0,1,0,0,0,0,0,0,0,1,...,0,0,1,0,1,0,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Wexford, Maryvale",0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Willowdale,0,0,0,0,0,0,0,0,0,0,...,0,3,2,0,1,0,0,0,0,1
Woburn,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Woodbine Heights,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [14]:
toronto_restaurants['Total'] = toronto_restaurants.sum(axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [15]:
toronto_restaurants

Unnamed: 0_level_0,Afghan Restaurant,American Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Colombian Restaurant,Comfort Food Restaurant,...,Ramen Restaurant,Restaurant,Seafood Restaurant,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Total
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Agincourt,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
"Alderwood, Long Branch",0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
"Bathurst Manor, Wilson Heights, Downsview North",0,0,0,0,0,0,0,0,0,0,...,0,1,0,1,0,0,0,0,0,3
Bayview Village,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,2
"Bedford Park, Lawrence Manor East",0,1,0,0,0,0,0,0,0,1,...,0,1,0,1,0,1,0,0,0,10
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Wexford, Maryvale",0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2
Willowdale,0,0,0,0,0,0,0,0,0,0,...,3,2,0,1,0,0,0,0,1,11
Woburn,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
Woodbine Heights,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Caculate the number of total restaurants and Chinese restaurants within the neighborhood.

In [16]:
toronto_restaurants_agg = toronto_restaurants[['Chinese Restaurant', 'Total']]
toronto_restaurants_agg

Unnamed: 0_level_0,Chinese Restaurant,Total
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1
Agincourt,0,1
"Alderwood, Long Branch",0,0
"Bathurst Manor, Wilson Heights, Downsview North",0,3
Bayview Village,1,2
"Bedford Park, Lawrence Manor East",0,10
...,...,...
"Wexford, Maryvale",0,2
Willowdale,0,11
Woburn,0,1
Woodbine Heights,0,0


In [17]:
neighborhood_toronto

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Distances_from_Centers
0,M3A,North York,Parkwoods,43.753259,-79.329656,6.400372
1,M4A,North York,Victoria Village,43.725882,-79.315572,7.778012
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636,2.601491
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,9.122914
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0.646674
...,...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,13.734695
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160,0.268979
100,M7Y,East Toronto,Business reply mail Processing Centre,43.662744,-79.321558,6.967320
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509,12.797783


In [18]:
neighborhood_restaurants = pd.merge(neighborhood_toronto, toronto_restaurants_agg, on = 'Neighborhood', how = 'left')

In [19]:
neighborhood_restaurants.head(20)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Distances_from_Centers,Chinese Restaurant,Total
0,M3A,North York,Parkwoods,43.753259,-79.329656,6.400372,0.0,0.0
1,M4A,North York,Victoria Village,43.725882,-79.315572,7.778012,0.0,2.0
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,2.601491,0.0,5.0
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,9.122914,0.0,1.0
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0.646674,0.0,6.0
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242,16.562078,,
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,21.405203,0.0,1.0
7,M3B,North York,Don Mills,43.745906,-79.352188,4.023882,1.0,10.0
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937,8.334155,0.0,1.0
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0.563126,1.0,24.0


In [20]:
print('Total number of restaurants:', int(neighborhood_restaurants['Total'].sum()))
print('Total number of Chinese restaurants:', int(neighborhood_restaurants['Chinese Restaurant'].sum()))
print('Percentage of Chinese restaurants: {:.2f}%'.format(int(neighborhood_restaurants['Chinese Restaurant'].sum()) / int(neighborhood_restaurants['Total'].sum()) * 100))

Total number of restaurants: 514
Total number of Chinese restaurants: 15
Percentage of Chinese restaurants: 2.92%


#### Let's now see all the restaurants in toronto city on map.

In [21]:
map_restaurants = folium.Map(location=[latitude, longitude], zoom_start=11)
map_neighborhood = neighborhood_restaurants
for lat, lng, borough, neighborhood, total in zip(map_neighborhood['Latitude'], map_neighborhood['Longitude'], map_neighborhood['Borough'], map_neighborhood['Neighborhood'], map_neighborhood['Total']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=total,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_restaurants)  
map_restaurants

#### Let's now see all the Chinese restaurants in toronto city on map.

In [22]:
map_chinese_restaurants = folium.Map(location=[latitude, longitude], zoom_start=11)
map_neighborhood = neighborhood_restaurants
for lat, lng, borough, neighborhood, total in zip(map_neighborhood['Latitude'], map_neighborhood['Longitude'], map_neighborhood['Borough'], map_neighborhood['Neighborhood'], map_neighborhood['Chinese Restaurant']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5*total,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_chinese_restaurants)  
map_chinese_restaurants

## Data Analysis

Let's perform some basic explanatory data analysis and derive some additional info from our raw data. 
First let's sort the neighborhood by **distances from toronto center**:

In [23]:
neighborhood_candidates = neighborhood_restaurants.sort_values(by=['Distances_from_Centers'], ascending=True)
neighborhood_candidates.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Distances_from_Centers,Chinese Restaurant,Total
30,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,0.092617,0.0,24.0
97,M5X,Downtown Toronto,"First Canadian Place, Underground city",43.648429,-79.38228,0.211978,0.0,30.0
99,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,0.268979,0.0,25.0
42,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576,0.293543,1.0,29.0
36,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,0.356797,1.0,13.0
24,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0.395811,0.0,18.0
48,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel",43.648198,-79.379817,0.472455,0.0,30.0
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0.563126,1.0,24.0
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0.646674,0.0,6.0
83,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316,0.747643,0.0,0.0


Let's remove the neighborhood that contains **Chinese restaurants**.

In [24]:
neighborhood_candidates=neighborhood_candidates.loc[neighborhood_restaurants['Chinese Restaurant']==0]
neighborhood_candidates.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Distances_from_Centers,Chinese Restaurant,Total
30,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,0.092617,0.0,24.0
97,M5X,Downtown Toronto,"First Canadian Place, Underground city",43.648429,-79.38228,0.211978,0.0,30.0
99,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,0.268979,0.0,25.0
24,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0.395811,0.0,18.0
48,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel",43.648198,-79.379817,0.472455,0.0,30.0
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0.646674,0.0,6.0
83,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316,0.747643,0.0,0.0
91,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,0.89421,0.0,0.0
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0.951829,0.0,22.0
92,M5W,Downtown Toronto,Stn A PO Boxes,43.646435,-79.374846,1.025117,0.0,21.0


We can see that *Moore Park, Summerhill East* is nearest to the center of toronto with least rastarants. *Rosedale* can also be taken into consideration since it is also not far away from center. *Queen's Park, Ontario Provincial Government* is also a good choice since it's near to the center and with relatively less reataurants in the neighborhood.

## Conclusion

#### We can see that *Moore Park, Summerhill East*, *Rosedale*, *Queen's Park, Ontario Provincial Government* are the optimal locations for opening a Chinese restaraunt.