# Capstone Project - The Battle of the Neighborhoods

### Business Problem

In this project, we attempt to locate an optimal primise for a restaurant. Specifically, this report is targeted to stakeholders who are interested in openning an Indian restaurant in Toronto.

As there are numerous restaurants including Chinese, Japanese, Korean, French, etc. in Toronto, we are trying to detect locations that are not already crowded with similar business. We particularly interested in areas with no Indian restaurants in vicinity. We should also prefer locations as close to city center as possible to be eye-catching, assuming that first two conditions are met.

We will use our tools and techniques to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

### Data 

Based on definition of our problem, factors that will influence our decission are:
- number of existing restaurants in the borough (any type of restaurant)
- number of and distance to Indian restaurants in the borough, if any

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods. Following data sources will be needed to extract/generate the required information:

- centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using the given list of city in Toronto- 
- number of restaurants and their type and location in every neighborhood will be obtained using Foursquare API

### Borough Candidate

Based on the prior data given by the list of neighborhood in Toronto, import the relevant data including latitude and longitude, the neighborhood name, Borough etc. In this event research, we will focus on one single Borough that is the North York.

In [1]:
# Import needed libraries
import pandas as pd 
import numpy as np 
print('libraries imported!')

# Load table into a dataframe
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df = pd.read_html(url,skiprows=1)[0]

# Apply table headers as described by the activity instructions
df.columns = ['PostalCode', 'Borough', 'Neighborhood']


# Removes the 'Not Assigned' and missing values
df = df[df.Borough != 'Not assigned']

# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
for index, row in df.iterrows():
    if row['Neighborhood'] == 'Not assigned':
        row['Neighborhood'] = row['Borough']
        
# Group the dataset by PostalCode and Borough.
df = df.groupby(['PostalCode', 'Borough'])['Neighborhood'].apply(list).apply(lambda x:', '.join(x)).to_frame().reset_index()

# Show the cleansed dataframe
df.head()
      
# Install needed libraries
!pip install shapely
!pip install geopandas
print('Libraries installed!')



# Import needed libraries
import requests
import io
from shapely.geometry import Point
import geopandas as gpd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline
print('Libraries imported and environment set up!')


# Load latitude and the longitude coordinates of each neighborhood.
url2="http://cocl.us/Geospatial_data"
s=requests.get(url2).content
df_coordinates=pd.read_csv(io.StringIO(s.decode('utf-8')))

# Rename the first column to allow merging dataframes on Postcode
df_coordinates.columns = ['PostalCode', 'Latitude', 'Longitude']

# Show a sample of the df_coordinates dataframe
df_coordinates.head()


      
# END PART II

# Join the Neighborhoods and Coordinates datasets
df = pd.merge(df,df_coordinates,on='PostalCode')

# Select data and clean up dataframe variable
df = df[['PostalCode', 'Borough', 'Neighborhood', 'Latitude', 'Longitude']]

# Show the df_table dataframe data
df

libraries imported!
Collecting shapely
[?25l  Downloading https://files.pythonhosted.org/packages/a2/6c/966fa320a88fc685c956af08135855fa84a1589631256abebf73721c26ed/Shapely-1.6.4.post2-cp35-cp35m-manylinux1_x86_64.whl (1.5MB)
[K    100% |████████████████████████████████| 1.5MB 16.6MB/s ta 0:00:01
[31mtensorflow 1.3.0 requires tensorflow-tensorboard<0.2.0,>=0.1.0, which is not installed.[0m
[?25hInstalling collected packages: shapely
Successfully installed shapely-1.6.4.post2
Collecting geopandas
[?25l  Downloading https://files.pythonhosted.org/packages/74/42/f4b147fc7920998a42046d0c2e65e61000bc5d104f1f8aec719612cb2fc8/geopandas-0.5.0-py2.py3-none-any.whl (893kB)
[K    100% |████████████████████████████████| 901kB 19.3MB/s ta 0:00:01
[?25hCollecting pyproj (from geopandas)
[?25l  Downloading https://files.pythonhosted.org/packages/80/90/40120af4e276943215ae448def53954f5fc9e1f8e14cce9d6c71babbd2fa/pyproj-2.2.0-cp35-cp35m-manylinux1_x86_64.whl (11.2MB)
[K    100% |█████████████

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


In [2]:
!pip install folium

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/4f/86/1ab30184cb60bc2b95deffe2bd86b8ddbab65a4fac9f7313c278c6e8d049/folium-0.9.1-py2.py3-none-any.whl (91kB)
[K    100% |████████████████████████████████| 92kB 18.6MB/s ta 0:00:01
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/63/36/1c93318e9653f4e414a2e0c3b98fc898b4970e939afeedeee6075dd3b703/branca-0.3.1-py3-none-any.whl
[31mtensorflow 1.3.0 requires tensorflow-tensorboard<0.2.0,>=0.1.0, which is not installed.[0m
Installing collected packages: branca, folium
Successfully installed branca-0.3.1 folium-0.9.1


In [3]:
df['Neighborhood']

0                                         Rouge, Malvern
1                 Highland Creek, Rouge Hill, Port Union
2                      Guildwood, Morningside, West Hill
3                                                 Woburn
4                                              Cedarbrae
5                                    Scarborough Village
6            East Birchmount Park, Ionview, Kennedy Park
7                        Clairlea, Golden Mile, Oakridge
8        Cliffcrest, Cliffside, Scarborough Village West
9                            Birch Cliff, Cliffside West
10     Dorset Park, Scarborough Town Centre, Wexford ...
11                                     Maryvale, Wexford
12                                             Agincourt
13               Clarks Corners, Sullivan, Tam O'Shanter
14     Agincourt North, L'Amoreaux East, Milliken, St...
15                                       L'Amoreaux West
16                                           Upper Rouge
17                             

In [4]:

# given the location data of Tornoto center
# create the map of Toronto
import folium

toronto_latitude = 43.6532; toronto_longitude = -79.3832
map_toronto = folium.Map(location = [toronto_latitude, toronto_longitude], zoom_start = 10.7)

# add markers to the map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_toronto)  
    

map_toronto

### Foursquare Data
The foursquare data will provide the series of venues especially the restaurant venues.

In [5]:

# import the foursquare data resources
CLIENT_ID = 'KXND1X0TY1G0FUT2ZC1YQVFQOIAI04ZRF24SN0LJVOLR1M25' 
CLIENT_SECRET = 'HVPDG0RIJVRUBGKUSVNFCV4QGEWCGJEQHQNN4ZVUGEYWAH35'
VERSION = '20190610'

# display the North York data explicitly
Scarborough_data =df[df['Borough'] == 'Scarborough'].reset_index(drop=True)
Scarborough_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
