## Introduction/Business Problem


Toronto is one of the most cosmopolitan cities in the world. It is the home of a wide range of people from different cultures and as such has become an international business hub in the world. As one of the largest cities in Canada with a population of over 6 million, it represents an ideal location to set up a business. Time is of the essence in such a busy environment and an efficient food delivery service will serve the people well.

The project utilizes Foursquare location data, segmentation and clustering techniques in order to identify the most appropriate neighborhood to station the office/base of the food delivery service. The project allows the business to pilot its service in a neighborhood which maximizes access to a variety of foods over the shortest possible distance from the base. This increases probability of opportunities for the business and allows them to be more efficient in delivery times.


## Audience

Toronto represents a city of opportunity. Every year thousands of people migrate to Tornoto, Canada to seek new opportunities. Small scale businesses which require a minimum capital is an attractive option to many budding entrepreneurs among these people. A food delivery business is one such opportunity and the project seeks to equip these adventurers with valuable information on locations to target prior to moving across. This will give these entrepreneurs an upper hand and enable them hit the ground running.   

## Data Description

The sources of data for this project are:


1.	List of postal Codes of Canada via Wikipedia which provided the various neighborhoods and bouroughs. Specifically, the link:
https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

![3.JPG](attachment:3.JPG)

2.	A csv file which provided the Geographical coordinates of the neighborhoods in Canada. Specifically:
https://cocl.us/Geospatial_data
From the above the coordinates of the neighborhoods in Toronto will be extracted and used in the Foursquare API.

![1.JPG](attachment:1.JPG)

3.	Venue Data Using Foursquare API
Data from these files will be merged, cleaned and visualized and the Foursquare API used to gather nearby venues in the respective neighborhoods. Each neighborhood will be examined and analyzed with a machine learning technique called clustering to determine frequency of food venues, variety of food venues and proximity to residential and business neighborhoods. The end product after comparison will be the location which optimizes exposure to a variety of foods as well as proximity to residential and business areas.
![4.JPG](attachment:4.JPG)

## METHODOLOGY

Data Cleansing

Data collected from Wikipedia and the geographical coordinates were cleaned and merged as per the assumptions:
Postal codes with multiple neighborhoods are merged into one row,
Entries without boroughs were eliminated,
In rows with missing neighborhoods but have boroughs, neighborhoods will be the same as boroughs,
After cleansing was completed the final product was as below:
![df.JPG](attachment:df.JPG)
After coordinates merged into one data frame the final product was as below.
![Merged%20Df.JPG](attachment:Merged%20Df.JPG)
   


Data Exploration

The neighborhoods in final dataframe was visualized with Folium. The map produced is represented below.
![map%20toronto%20alone.JPG](attachment:map%20toronto%20alone.JPG)
The exploration of the data collected was then ready. We began by using the Foursqaure API to collect data on venues in the neighborhoods under investigation. This is shown below.
![4sqaure%20Initial.JPG](attachment:4sqaure%20Initial.JPG)
The number of venues per neighborhood was also retrieved as below.
![5%20count.JPG](attachment:5%20count.JPG)

Machine Learning

For our problem we require information to be analysed by representing strings with numbers. This would enable us to use figures to ease finding restaurants in the various neighborhoods. The one hot encoding technique allowed this to be possible. 
The results were the individual neighborhoods and their respective venues listed as per frequency.
![6%20freq.JPG](attachment:6%20freq.JPG)
This was then transformed into a dataframe displaying the most common venues from 1 to 10 in that order as shown below.
![7%20freq%20df.JPG](attachment:7%20freq%20df.JPG)

Clustering

In order to significantly reduce the possibility of targeting a neighborhood which is an outlier. We decided to make clusters of neighborhoods which had similar averages of venues in general. This would ensure that the neighborhood chosen would be one from a pool of others with similar varieties. Also unique neighborhoods are easily identified and analysed.
The dataframe produced after clustering completed (5 clusters) is as below.
![8%20clus%20df.JPG](attachment:8%20clus%20df.JPG)
Visualization of colour-coded clusters on a folium map was completed and the results shown below.
![9%20clus%20map.JPG](attachment:9%20clus%20map.JPG)

## RESULTS

From work implemented above the main results on which the analysis are based are listed below.
1.	The number of total venues in the neighborhoods in question. 
![5%20count.JPG](attachment:5%20count.JPG)
2.	The frequency of venue types in each neighborhood.
![7%20freq%20df.JPG](attachment:7%20freq%20df.JPG)
3.	The neighborhoods in the clusters produced

Cluster 1
![10%20clus%201.JPG](attachment:10%20clus%201.JPG)
Cluster 2
![10%20clus%202.JPG](attachment:10%20clus%202.JPG)
Cluster 3
![10%20clus%20%203.JPG](attachment:10%20clus%20%203.JPG)
Cluster 4
![10%20clus%204.JPG](attachment:10%20clus%204.JPG)
Cluster 5
![10%20clus%205.JPG](attachment:10%20clus%205.JPG)

## DISCUSSION

Our objective was to identify a neighborhood in which to set up a food delivery business in the cosmopolitan city of Toronto. The exploration of the data has provided results from which we can infer a logical line of thought. 

The first result to be discussed is the number of venues per neighborhood generated. The ideal placement of the food delivery base would be in a neighborhood with a variety of foods available but a moderate number of venues in total. This eliminates the neighbourhood with over 90 venues generated. The reason for this is, the more concentrated the neighborhood is with venues the more likely it is for people to eat out casually. Therefore the food delivery might not be much of a success. But the downside of a neighborhood with less venues is that there might not be enough variety of foods available. A balance therefore had to be struck hence the neighborhoods with moderate venues.

The second result to be discussed is the frequency of variety of foods available. A neighborhood might have a variety of foods but not nearly enough restaurants for people to choose from for each type. Therefore the neighborhood chosen must have as many different restaurants as possible in its first to tenth most common venues.

The third result is to ensure an outlier neighborhood which may not have a ready nearby market was avoided. Based on the cluster analysis, the neighborhood to be selected had to be from Cluster 5 which had the potential of customers commuting regularly and a similar set of venues.


The neighborhood chosen had to have a combination of the above criteria. The central bay street neighborhood is the best match for the criteria above. It is displayed below.
![image.png](attachment:image.png)
![last.JPG](attachment:last.JPG)
As an added bonus its location in Downtown Toronto and proximity to the University of Toronto ensures the availability of customers for the business.

## CONCLUSION

The project afforded an opportunity to solve a business problem. Utilizing python we have been able to scrape data off websites to good effect. The foursquare API was also utilized to obtain live data to complement the analysis. A combination of this led to the analysis incorporating machine learning and visualizing data on maps.
This project can be fine tuned to be effective in several other business opportunity identification forays. 
The code is available on Github.
