# Final Project - Coursera Capstone

Andrew R.

## Intro

I would like to explore where to open a restaurant as a prospective owner in Toronto. It is important to open a restaurant in a high population area with fewer competitors. Using Foursquare API and other data, population trends and competitor numbers will be analyzed to figure out an ideal location for opening up a new restaurant.

## Data

Basic Toronto Borough data scraped from Wikipedia and location coordinates scraped in the previous lab will be used initially. Then government population data will be appended to it as well as Foursquare API data for restuarant venues.

## Methodology

It will start with a list of Toronto Boroughs, Neighborhoods, latitude, longitude in a table. With this data, we already have a map of Toronto. The population data will be appended to the dataframe and consequently added to the map. The number of restaurants already located in each borough/neighborhood will also be appended to the map and table. This will allow analysis of the most populated and competitive boroughs.

## Results

The table was made with the additional data as well as a map with the information. Population seemed to be densest in the Downtown Toronto area. Restaurant numbers were directly proportionate to population.

## Discussion

Seeing as restaurant numbers were directly proportionate to population, a new business owner may want to simply pick a location that is desnely population to maximize revenue. Picking a lower population area would theoretically have similar/proprotionate competition, but would have fewer customers.

## Conclusion

New business owners may consider opening up in a densely populated area to maximize their customer base. However, population and number of competitors is likely not the only factor at play. Further analysis is needed to make a more significant conslusion. For example, revenue numbers of competitor restuarants could be a valuable input among other factors.

## Appendix

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
import requests
!conda install -c conda-forge folium=0.5.0 --yes
import folium
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    folium-0.5.0               |             py_0          45 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    certifi-2019.6.16          |           py36_1         149 KB  conda-forge
    ca-certificates-2019.6.16  |       hecc5488_0         145 KB  conda-forge
    altair-3.2.0               |           py36_0         770 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.3 MB

The following NEW packages will be 

In [2]:
!conda install -c conda-forge beautifulsoup4
from bs4 import BeautifulSoup

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - beautifulsoup4


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    beautifulsoup4-4.8.0       |           py36_0         144 KB  conda-forge

The following packages will be UPDATED:

    beautifulsoup4: 4.7.1-py36_1 --> 4.8.0-py36_0 conda-forge


Downloading and Extracting Packages
beautifulsoup4-4.8.0 | 144 KB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done


In [3]:
link = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df = pd.read_html(link)[0]
df.to_csv('beautifulsoup_new.csv', header = 0, index = False)
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [4]:
df = df[df.Borough != 'Not assigned']
df = df.groupby(['Postcode', 'Borough'], sort = False).agg(','.join).reset_index()
df.loc[df['Neighbourhood'] == 'Not assigned', 'Neighbourhood'] = df['Borough']
df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront,Regent Park"
3,M6A,North York,"Lawrence Heights,Lawrence Manor"
4,M7A,Queen's Park,Queen's Park
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Rouge,Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens,Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson,Garden District"


In [5]:
!wget -q -O 'Toronto_long_lat_data.csv'  http://cocl.us/Geospatial_data
df_lat_long = pd.read_csv('Toronto_long_lat_data.csv')

In [6]:
df_lat_long.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [7]:
df_lat_long.columns = ['Postcode', 'Latitude', 'Longitude']

df_full = pd.merge(df, df_lat_long[['Postcode', 'Latitude', 'Longitude']], on = 'Postcode')
df_full.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494


In [8]:
city = 'Toronto, ON'
geolocator = Nominatim(user_agent = "Toronto")
location = geolocator.geocode(city)
lat = location.latitude
long = location.longitude
print('Toronto Coordinates (Lat,Long): {} , {}'.format(lat, long))

Toronto Coordinates (Lat,Long): 43.653963 , -79.387207


In [9]:
toronto_map = folium.Map(location = [lat, long], zoom_start = 11)

for lat, long, Borough, Neighbourhood in zip(df_full['Latitude'], df_full['Longitude'], df_full['Borough'], df_full['Neighbourhood']):
    label = "{}, {}".format(Neighbourhood, Borough)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, long],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.5,
        parse_html = False).add_to(toronto_map)
    
toronto_map

In [10]:
kluster = 5

df_cluster = df_full.drop('Neighbourhood', 1)
df_cluster = df_cluster.drop('Borough', 1)
df_cluster = df_cluster.drop('Postcode', 1)

kmeans = KMeans(n_clusters = kluster, random_state = 0).fit(df_cluster)

kmeans.labels_[1:10]

array([4, 2, 3, 2, 1, 0, 4, 4, 2], dtype=int32)

In [11]:
cluster_labels = pd.DataFrame(kmeans.labels_)[0]
cluster_labels.head()

0    4
1    4
2    2
3    3
4    2
Name: 0, dtype: int32

In [12]:
df_full['Cluster'] = cluster_labels
df_full.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster
0,M3A,North York,Parkwoods,43.753259,-79.329656,4
1,M4A,North York,Victoria Village,43.725882,-79.315572,4
2,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636,2
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.718518,-79.464763,3
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494,2


In [13]:
map_clusters = folium.Map(location = [lat, long], zoom_start = 10)

x = np.arange(kluster)
ys = [i + x + (i*x) ** 2 for i in range(kluster)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

marker_color = []
for lat, lng, poi, cluster in zip(df_full["Latitude"], df_full['Longitude'], df_full['Neighbourhood'], df_full['Cluster']):
    label = folium.Popup(str(poi) + 'Cluster' + str(cluster), parse_html = True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = rainbow[cluster - 1],
        fill = True,
        fill_color = rainbow[cluster - 1],
        fill_opacity = 0.7).add_to(map_clusters)
    
map_clusters