# Distribution solution for milk delivery to Restaurants/Cafes in Scarborough, Toronto

### Capstone Project - Battle of the Neighbourhoods


## Problem Description

There is a milk contractor that wants to start distributing milk in all neighbourhoods of  Scarborough, Toronto. This contractor wants timely delivery of milk to all major clusters of restaurants, cafes, bakeries and breakfast places every morning. 

The contractor wants to build an efficient network of delivery with maximum 10 delivery trucks and yet cover all areas within time. The contractor wants to segment every probable customer (restaurant/cafe/bakery/breakfast place) into a group and operate each group as a separate entity for better and efficient customer service.

##  Data we need
 - We will need geo-locational information about that specific borough and the neighbourhoods in that borough. We specifically and technically mean the latitude and longitude numbers of that borough. This we will be able to get from the Geopy- geocoders library and the wikipedia page : https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M


 - To cluster every restaurant/cafe/bakery/breakfast place will need data about different venues in different neighbourhoods of Scarborough. In order to gain that information we will use "Foursquare" locational information. By locational information for each venue we mean the venue id, venue name, its precise latitude and longitude co-ordinates and category of that venue.

Sample of the formant and data we need as below:

In [3]:
import pandas as pd

In [6]:
df = pd.read_csv('Scarborough_Venues')

In [8]:
df.drop(['Unnamed: 0'], axis = 1 , inplace = True)

In [9]:
df.head()

Unnamed: 0,Postcode,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue_id,Venue,Venue Category,Venue_lat,Venue_lng
0,M1B,"Malvern,Rouge",43.806686,-79.194353,4b914562f964a520d4ae33e3,Caribbean Wave,Caribbean Restaurant,43.798558,-79.195777
1,M1B,"Malvern,Rouge",43.806686,-79.194353,4b6718c2f964a5203f3a2be3,Harvey's,Fast Food Restaurant,43.800106,-79.198258
2,M1B,"Malvern,Rouge",43.806686,-79.194353,579a91b3498e9bd833afa78a,Wendy's,Fast Food Restaurant,43.802008,-79.19808
3,M1B,"Malvern,Rouge",43.806686,-79.194353,4b16e23bf964a520edbe23e3,Tim Hortons,Coffee Shop,43.802,-79.198169
4,M1B,"Malvern,Rouge",43.806686,-79.194353,4bb6b9446edc76b0d771311c,Wendy's,Fast Food Restaurant,43.807448,-79.199056


## Methodology

On the data in the above format clustering is performed with **K-means Clustering**. The aim was to figure out the best value of k (no of clusters) which signifies number of delivery trucks.
**Silhouette Coefficient** was used as the metric to evaluate the best k (no of clusters)

The Silhouette Coefficient is calculated using the mean intra-cluster distance and the mean nearest-cluster distance for each sample.
The best value is 1 and the worst value is -1. Values near 0 indicate overlapping clusters. Negative values generally indicate that a sample has been assigned to the wrong cluster, as a different cluster is more similar.

Visualization of the clusters for the best value of k was done using folium library

## Results

#### Foursquare Dataset

The foursquare dataset retured in total 385 venues accross 115 categories (eg: Spa, Caribbean Restaurant, Fast-Food Restaurant, Coffee Shop, Paper/Office Supplies Store etc)

From the above we selected categories that included : coffee shops, restaurants, bakeries and grocery stores. The final dataset contained 163 venues across 35 foursquare categories

#### Best Value of K
Based on k means clustering, below were the Silhouette Coefficients for different Ks
- For n_clusters=2, The Silhouette Coefficient is 0.4010976248963884
- For n_clusters=3, The Silhouette Coefficient is 0.5033882735481788
- For n_clusters=4, The Silhouette Coefficient is 0.5094938106397492
- For n_clusters=5, The Silhouette Coefficient is 0.5164653401561993
- For n_clusters=6, The Silhouette Coefficient is 0.5232348359922512
- For n_clusters=7, The Silhouette Coefficient is 0.5510735675113302
- For n_clusters=8, The Silhouette Coefficient is 0.5676479743210102 
- For n_clusters=9, The Silhouette Coefficient is 0.5875320034995417
- For n_clusters=10, The Silhouette Coefficient is 0.6074921736586862

## Discussion and Conclusion

This Analysis concludes that the efficient number of clusters lie between 8-10. 
The suggested no of clusters i.e. the number of delivery trucks is 10. Each venue is assigned to one cluster and the deliveries can be sorted based on each cluster.
Thank You!