## Introduction/Business Problem
Vegan or vegetarian diet is the latest trend in restaurant industry. As the inclination of people towards plant based diet is increasing, the demand for vegan or vegetarian restaurants is rising. There are few vegan or vegetarian restaurants available but there is still a big void to fill. This project will help bridge this gap. The audience of this project will be the people who want to open a restaurant in Toronto. This project will help them by providing them the area names where there is high potential for a vegan or vegetarian restaurant to succeed.

## Data
In this project we will use the Foursquare location data to search for vegan or vegetarian restaurants in Toronto. We will then get different boroughs of Toronto using the data from Wikipedia. After getting the boroughss data we will use this data to get the number of vegan or vegetarian restaurants in different boroughs of Toronto. Thereafter, we will compare the boroughs to find the ones with the least number of vegan or vegetarian restaurants. Finally, we will suggest the user the best places to open vegan or vegetarian restaurants in Toronto. For example, if "Harbourfront, Regent Park" has 60 vegan or vegetarian restaurants but "Central Bay Street" has 10 vegan or vegetarian restaurants, then "Central Bay Street" could be a possible suggestion.

Link to Wikipedia file: 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'.

## Methodology

In [1]:
!conda install beautifulsoup4
!conda install lxml
!conda install requests

Fetching package metadata ...........
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
beautifulsoup4            4.6.3                    py35_0  
Fetching package metadata ...........
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
lxml                      4.2.5            py35hefd8a0e_0  
Fetching package metadata ...........
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
requests                  2.19.1                   py35_0  


In [2]:
from bs4 import BeautifulSoup
import requests
import csv
import pandas as pd
import numpy as np

In [3]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

In [4]:
soup = BeautifulSoup(source,'lxml')

In [5]:
table = soup.find_all('table')[0]
#table

In [6]:
#print(soup.prettify())

In [7]:
table_raw = soup.find('table', 'tr',class_='wikitable sortable')

In [8]:
#print(table_raw.prettify())

In [9]:
b = table_raw.tbody.text
#b

In [10]:
c = b.split('\n\n\n')
#c

In [11]:
d = pd.DataFrame(c)
#d

In [12]:
new = d[0].str.split('\n',n=0,expand = True)

new.drop(columns =[3,4], inplace = True) 
new.drop([0],inplace=True)
new.columns=['PostalCode','Borough','Neighborhood']
new = new.reset_index(drop=True)
new.shape
new.tail()

Unnamed: 0,PostalCode,Borough,Neighborhood
284,M8Z,Etobicoke,Mimico NW
285,M8Z,Etobicoke,The Queensway West
286,M8Z,Etobicoke,Royal York South West
287,M8Z,Etobicoke,South of Bloor
288,M9Z,Not assigned,Not assigned


In [13]:
new.shape

(289, 3)

In [14]:
for i in range (0,(len(new['Borough']))):
    
    if(new.iloc[i,1] == 'Not assigned'):
        new.iloc[i,1]=None
    else:
        pass
print(new.head())

  PostalCode           Borough      Neighborhood
0        M1A              None      Not assigned
1        M2A              None      Not assigned
2        M3A        North York         Parkwoods
3        M4A        North York  Victoria Village
4        M5A  Downtown Toronto      Harbourfront


In [15]:
new = new.dropna(subset=["Borough"],axis=0)
new = new.reset_index(drop=True)
new.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


In [16]:
for i in range (0,(len(new['Neighborhood']))):
    if(new.iloc[i,2] == 'Not assigned'):
        new.iloc[i,2]=new.iloc[i,1]
    else:
        pass
new.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Queen's Park
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


In [17]:
group1 = new["Neighborhood"].groupby(new["PostalCode"]).agg([('Neighborhood', ', '.join)])
group1=group1.reset_index()
print(group1.size)

206


In [18]:
df_inner = pd.merge(new, group1, on = "PostalCode", how ='inner')
df_inner.head()

Unnamed: 0,PostalCode,Borough,Neighborhood_x,Neighborhood_y
0,M3A,North York,Parkwoods,Parkwoods
1,M4A,North York,Victoria Village,Victoria Village
2,M5A,Downtown Toronto,Harbourfront,"Harbourfront, Regent Park"
3,M5A,Downtown Toronto,Regent Park,"Harbourfront, Regent Park"
4,M6A,North York,Lawrence Heights,"Lawrence Heights, Lawrence Manor"


In [19]:
df_inner.drop(columns=["Neighborhood_x"], inplace=True)

In [20]:
df_inner.shape

(212, 3)

In [21]:
df_inner = df_inner.drop_duplicates()

In [22]:
df_inner.columns=['PostalCode','Borough','Neighborhood']
df_inner = df_inner.reset_index(drop=True)
df_inner.shape

(103, 3)

In [23]:
df_inner.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park


In [24]:
#Number of rows in the final dataframe
df_inner.shape

(103, 3)

In [25]:
geo_cord = pd.read_csv('https://cocl.us/Geospatial_data')

In [26]:
geo_cord.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [27]:
geo_cord.columns=["PostalCode","Latitude","Longitude"]

In [36]:
geo_df = pd.merge(df_inner,geo_cord, on="PostalCode", how="inner")
geo_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494


In [37]:
geo_df.drop(columns=["PostalCode"], inplace=True)
geo_df.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,North York,Parkwoods,43.753259,-79.329656
1,North York,Victoria Village,43.725882,-79.315572
2,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
3,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,Queen's Park,Queen's Park,43.662301,-79.389494


In [38]:
geo_df.drop(columns=["Neighborhood"], inplace=True)
geo_df.head()

Unnamed: 0,Borough,Latitude,Longitude
0,North York,43.753259,-79.329656
1,North York,43.725882,-79.315572
2,Downtown Toronto,43.65426,-79.360636
3,North York,43.718518,-79.464763
4,Queen's Park,43.662301,-79.389494


In [39]:
df_fresh = geo_df.drop_duplicates("Borough")

In [40]:
df_fresh = df_fresh.reset_index(drop=True)
df_fresh

Unnamed: 0,Borough,Latitude,Longitude
0,North York,43.753259,-79.329656
1,Downtown Toronto,43.65426,-79.360636
2,Queen's Park,43.662301,-79.389494
3,Etobicoke,43.667856,-79.532242
4,Scarborough,43.806686,-79.194353
5,East York,43.706397,-79.309937
6,York,43.693781,-79.428191
7,East Toronto,43.676357,-79.293031
8,West Toronto,43.669005,-79.442259
9,Central Toronto,43.72802,-79.38879


In [41]:
df_latest = df_fresh

In [42]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    geographiclib: 1.49-py_0   conda-forge
    geopy:         1.17.0-py_0 conda-forge

geographiclib- 100% |################################| Time: 0:00:00  23.02 MB/s
geopy-1.17.0-p 100% |################################| Time: 0:00:00   1.29 MB/s
Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    altair:  2.2.2-py35_1 conda-forge
    branca:  0.3.1-py_0   conda-forge
    folium:  0.5.0-py_0   conda-forge
    vincent: 0.4.4-py_1   conda-forge

altair-2.2.2-p 100% |################################| Time: 0:00:00  55.06 MB/s
branca-0.3.1-p 100% |################################| Time: 0:00:00  34.87 MB/s
vincent-0.4.4- 100% |###################

In [96]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '' # Foursquare API version

#print('Your credentails:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)

In [45]:
veg_df=pd.DataFrame(columns=['name','categories','lat','lng','address','Borough'])
veg_df

Unnamed: 0,name,categories,lat,lng,address,Borough


In [46]:
for i in range(df_latest.shape[0]):
    print(df_latest['Borough'][i],df_latest['Latitude'][i],df_latest['Longitude'][i])

    address = df_latest['Borough'][i]
    #geolocator = Nominatim()
    #location = geolocator.geocode(address)
    latitude = df_latest['Latitude'][i]
    longitude = df_latest['Longitude'][i]
    #print('The geograpical coordinate of {} are {}, {}.'.format(address,latitude, longitude))
    search_query = 'Vegetarian / Vegan Restaurant'
    LIMIT = 500 # limit of number of venues returned by Foursquare API
#radius = 10000 # define radius
#url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&query={}&intent={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query,'browse')
    #url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&near={}&v={}&query={}&intent={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, address, VERSION, search_query,'browse',LIMIT)
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&query={}&intent={}&limit={}'.format(
        CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query,'browse',LIMIT)
    results = requests.get(url).json()
    # function that extracts the category of the venue
    def get_category_type(row):
        try:
            categories_list = row['categories']
        except:
            categories_list = row['venue.categories']
        
        if len(categories_list) == 0:
            return None
        else:
            return categories_list[0]['name']
    #print(results)
    venues = results['response']['groups'][0]['items'] 
    nearby_venues = json_normalize(venues) # flatten JSON
    nearby_venues.head()
    # filter columns
    filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng','venue.location.address']
    nearby_venues =nearby_venues.loc[:, filtered_columns]
    # filter the category for each row
    nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

    # clean columns
    nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
    nearby_venues['Borough']=address
    veg_df=veg_df.append(nearby_venues)
    #print(nearby_venues)
  

North York 43.7532586 -79.3296565
Downtown Toronto 43.6542599 -79.3606359
Queen's Park 43.6623015 -79.3894938
Etobicoke 43.6678556 -79.5322424
Scarborough 43.8066863 -79.1943534
East York 43.7063972 -79.309937
York 43.6937813 -79.4281914
East Toronto 43.6763574 -79.2930312
West Toronto 43.6690051 -79.4422593
Central Toronto 43.7280205 -79.3887901
Mississauga 43.6369656 -79.615819


## Results Section

The following table shows different boroughs in Toronto and the Vegetarian / Vegan Restaurants present in them.	

In [47]:
veg_df

Unnamed: 0,name,categories,lat,lng,address,Borough
0,Freshii,Vegetarian / Vegan Restaurant,43.754221,-79.351665,861 York Mills Rd.,North York
1,Pita Pit,Vegetarian / Vegan Restaurant,43.769842,-79.281952,1975 Kennedy Rd,North York
2,Ital Vital Rastarant,Vegetarian / Vegan Restaurant,43.717216,-79.293808,741 Pharmacy Ave,North York
3,Fresh,Vegetarian / Vegan Restaurant,43.707324,-79.395649,90 Eglinton Avenue East,North York
4,Lotus Pond Vegetarian Restaurant 蓮花素食,Vegetarian / Vegan Restaurant,43.819421,-79.294682,3838 Midland Ave.,North York
5,Gourmet Malaysia 膳園,Vegetarian / Vegan Restaurant,43.788118,-79.266678,4466 Sheppard Ave. E Unit 101,North York
6,Magic Oven,Vegetarian / Vegan Restaurant,43.679637,-79.341752,1450 Danforth Ave.,North York
7,Udupi Palace,Vegetarian / Vegan Restaurant,43.67248,-79.321275,1460 Gerrard St E,North York
8,Tori's Bakeshop,Vegetarian / Vegan Restaurant,43.672114,-79.290331,2188 Queen Street E,North York
9,A1 Sweets,Vegetarian / Vegan Restaurant,43.820629,-79.261592,3300 McNicoll Ave (Unit 18A),North York


## Discussion Section

Since there are many restaurants present per borough we group the restaurants in a borough. We calculate the number of restaurants in every borough.

In [79]:
veggroup= veg_df["Borough"].groupby(veg_df["Borough"])
veggroup= veggroup.count()
veggroup

Borough
Central Toronto      8
Downtown Toronto    21
East Toronto        15
East York           12
Etobicoke            7
Mississauga          5
North York          91
Queen's Park        34
Scarborough          8
West Toronto        30
York                41
Name: Borough, dtype: int64

Based on the number of restaurants present in different boroughs, Mississauga should be the first choice to open a vegan or vegetarian restaurant.

## Conclusion section

In [94]:
venues_map = folium.Map(location=[43.653963, -79.387207], zoom_start=12) # generate map centred around E
for lat, lng, label in zip(veg_df['lat'], veg_df['lng'], veg_df['name']):
        
        folium.features.CircleMarker(
            [lat, lng],
            radius=5,
            poup=label,
            fill=True,
            color='blue',
            fill_color='blue',
            fill_opacity=0.6,
            parse_html=False
        ).add_to(venues_map)

In [95]:
venues_map

Thus, we see that some boroughs have a few number of vegan or vegetarian restaurants whereas others have a large number of vegan or vegetarian restaurants. Based on the number of restaurants present in different boroughs, Mississauga should be the first choice to open a vegan or vegetarian restaurant. Followed by Etobicoke, Central Toronto, Scarborough, East York, East Toronto, Downtown Toronto, West Toronto, Queen's Park, York, North York.