# Capstone Project - The Battle of the Neighborhoods (Week 2)
## Applied Data Science Capstone by IBM/Coursera
### Using Data to Establish Location of a new Bar/Nightclub in Milton Ontario

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

In this project I will try to find an optimal location for a bar and nightclub. Specifically, this report will be targeted to stakeholders interested in opening a **bar** in **Milton**, Ontario, Canada.

Since there are lots of restaurants, bars and pubs in Milton I will try to detect **locations that are not already crowded with these ameneties**. I am particularly interested in **areas with no bars, lounges, or pubs**. I would also prefer locations **as close to city center as possible, close to bus routs, and most importantly: close to the location of Milton's two new University Campuses in association with Wilfrid Laurier University and Conestoga College, known as the Milton Education Village**, assuming that first two conditions are met.

I will use data science methodologies to generate a few neighborhoods of interest based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence my decission are:
* number of existing bars, pubs, nighclubs and lounges in the neighborhood 
* number of bus stops or stations
* distance of neighborhood from new University Campuses

I decided to use radial locations of 2 kilometers around the geographical center of each neighborhood.  

Following data sources will be needed to extract/generate the required information:
* centers of each neighborhood were acquired from Google Earth and stored in a CSV file.  THis CSV file was loaded into a pandas dataframe, where **Foursquare API** venues method was used to extract nearby venues for analysis of each neighborhood
* number of venues and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of new University Campus are 43.484091, -79.883314, and the distance from each neighborhood to this location will be analyzed. 

### Neighborhood Database Creation

This section will import the coordinates of each neighbourhood for storage in a pandas dataframe. 

In [1]:
# The section opens and reads the csv file into the notebook, and stores the university development site as a point of interest

In [2]:
import requests

newCoords = [43.4842417, -79.883314]
print('Site of new Milton Education Village is at {},{}'.format(newCoords[0], newCoords[1]))

Site of new Milton Education Village is at 43.4842417,-79.883314


In [10]:
import pandas as pd
import numpy as np

hoods = pd.read_csv("miltonneighborhoods.csv")
print(hoods.head())

   Neighborhood   Latitude  Longitude
0       Agerton  43.555278 -79.808113
1           Ash  43.501779 -79.836021
2         Beaty  43.515338 -79.835494
3  Blue Springs  43.620228 -80.095909
4         Boyne  43.486744 -79.834765


Now, we can plot these neighbouhoods on a map of Milton to visualize the data

In [7]:
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

Let's create a **hexagonal grid of cells**: we offset every other row, and adjust vertical row spacing so that **every cell center is equally distant from all it's neighbors**.

In [8]:
address = 'Milton, Ontario, Canada'

geolocator = Nominatim(user_agent="Milton_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Milton are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Milton are 43.513671, -79.882817.


In [11]:
# create map of Milton using latitude and longitude values
map_Milton = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(hoods['Latitude'], hoods['Longitude'], hoods['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Milton)
    
#Add a distinct Marker for the Milton Education Village
label = 'Milton Education Village'
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
    [newCoords[0], newCoords[1]],
    radius = 10,
    popup = label,
    color = 'red',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.7,
    parse_html = False).add_to(map_Milton)
    
map_Milton

Now we need to get the distance of each neighborhood from the site of the new development

In [17]:
!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

Collecting pyproj
  Downloading pyproj-3.1.0-cp38-cp38-win_amd64.whl (14.5 MB)
Installing collected packages: pyproj
Successfully installed pyproj-3.1.0


In [32]:
MiltonEV_x, MiltonEV_y = lonlat_to_xy(newCoords[1], newCoords[0]) # City center in Cartesian coordinates
x_list = []
y_list = []
i = 0
while i < len(hoods['Neighborhood']):
    x_list.append(MiltonEV_x)
    y_list.append(MiltonEV_y)
    i += 1

print(hoods.shape, len(x_list))
hoods['x-coord'], hoods['y-coord'] = lonlat_to_xy(hoods['Longitude'], hoods['Latitude'])

(22, 5) 22


  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


In [35]:
hoods['Distance to Milton Education Village']= ((((x_list - hoods['x-coord'] )**2) + ((y_list-hoods['y-coord'])**2) )**0.5)
hoods.sort_values(by=['Distance to Milton Education Village'], inplace = True)
hoods.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,x-coord,y-coord,Distance to Milton Education Village
19,Scott,43.501648,-79.886141,-5327883.0,10568470.0,2811.650124
21,Wilmott,43.495457,-79.8607,-5329232.0,10565650.0,3195.957976
17,Peru,43.510885,-79.905953,-5326128.0,10570590.0,5024.993714
4,Boyne,43.486744,-79.834765,-5330988.0,10562810.0,5685.718321
1,Ash,43.501779,-79.836021,-5328577.0,10562660.0,6198.572993


Great! Now we know the location of each neighbourhood, and have sorted our database to understand which neighbourhoods are best located for a new bar/nightclub. 

### Foursquare
Now that I have my location candidates, I can use Foursquare API to get info on venues in each neighborhood.

I'm interested in venues in the 'food', 'travel and transport' and 'nightlife spot' categories, but within the food category I am only interested in fast food, and my travel and transport query will focus on buses.

In [36]:
#@ hidden_cell
CLIENT_ID = 'LFF4YRZCAKT4UWBR5DRZQRS3OR4MEJQFX1DVG0ZGKS1SQULI' # your Foursquare ID
CLIENT_SECRET = 'OXVIJ110MU3QNYDR2IGDZ2G1IH1IDB44IYEAP5UTWAEZTUP2' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value
import json
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

In [52]:
# Category IDs corresponding to venus of interest were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

categories_of_interest = ['4bf58dd8d48988d16c941735', '52e81612bcbc57f1066b7a00', '4bf58dd8d48988d1c9941735',
                         '4bf58dd8d48988d16e941735', '4d4ae6fc7a7b7dea34424761', '4bf58dd8d48988d1ca941735',
                         '4bf58dd8d48988d1c7941735', '4d4b7105d754a06376d81259', '4bf58dd8d48988d1fe931735',
                         '4bf58dd8d48988d12b951735', '52f2ab2ebcbc57f1066b8b4f']


def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        for category in categories_of_interest :       
            # create the API request URL
            url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
                CLIENT_ID, 
                CLIENT_SECRET, 
                VERSION, 
                lat, 
                lng,
                category,
                radius, 
                LIMIT,
                )
            
            # make the GET request
            results = requests.get(url).json()["response"]['groups'][0]['items']
        
            # return only relevant information for each nearby venue
            venues_list.append([(
                name, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                'Neighborhood Latitude', 
                'Neighborhood Longitude', 
                'Venue', 
                'Venue Latitude', 
                'Venue Longitude', 
                'Venue Category']
    
    return(nearby_venues)

In [53]:
Milton_venues = getNearbyVenues(names=hoods['Neighborhood'],
                                   latitudes=hoods['Latitude'],
                                   longitudes=hoods['Longitude']
                                  )

Scott
Wilmott
Peru
Boyne
Ash
Timberlea
Beaty
Kelso
Clarke
Dempsey
Hawthorne Village
Omagh
Campbellville
Guelph Junction
Agerton
Drumquin
Sayers Mills
Moffat
Haltonville
Brookville
Darbyville
Blue Springs


In [71]:
print(Milton_venues.shape)
print((Milton_venues).head())

(226, 7)
  Neighborhood  Neighborhood Latitude  Neighborhood Longitude  \
0        Scott              43.501648              -79.886141   
1        Scott              43.501648              -79.886141   
2        Scott              43.501648              -79.886141   
3        Scott              43.501648              -79.886141   
4        Scott              43.501648              -79.886141   

                                 Venue  Venue Latitude  Venue Longitude  \
0      The Works Gourmet Burger Bistro       43.512589       -79.883587   
1                         Troy's Diner       43.515083       -79.881386   
2             Halifax Donair and Pizza       43.515091       -79.881277   
3                      Jay's Ice Cream       43.511843       -79.884502   
4  Jay's Ice Cream & Sunshine's Gelato       43.511750       -79.884449   

        Venue Category  
0         Burger Joint  
1  American Restaurant  
2          Pizza Place  
3       Ice Cream Shop  
4       Ice Cream Shop  

Looking good. So now I have all the locations of interest in area within two kilometers from each neighborhood center, 

This concludes the data gathering phase - I can now use this data for analysis to produce the report on optimal location for a new bar/nightclub!

## Methodology <a name="methodology"></a>

In this project I will direct my efforts on detecting areas of Milton that have low nightlife establishment density, and high fast food and transportation density. I will evaluate potential areas of interest derived from this criteria based on distance from the Milton Education Village development.

In first step I collected the required **data: location and type (category) of every venue of interest within 2km from each neighborhood center**.

Second step in the analysis will be calculation and exploration of '**venue-type density**' across each neighbourhood.

In third and final step I will focus on most promising areas and within those create **clusters of locations that meet some basic requirements** established in discussion with stakeholders. I will present map of all such locations but also create clusters (using **k-means clustering**) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

Of note during this process, Scott, Wilmott, Peru, Boyne and Ash are the closet neighborhoods to the development, and this will be a key evaulation factor

## Analysis <a name="analysis"></a>

Let's perform some basic explanatory data analysis and derive some additional info from the raw data. First I count the **number of venues of interest in each neighborhood**:

In [55]:
Milton_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agerton,1,1,1,1,1,1
Ash,10,10,10,10,10,10
Beaty,13,13,13,13,13,13
Boyne,3,3,3,3,3,3
Campbellville,3,3,3,3,3,3
Clarke,24,24,24,24,24,24
Darbyville,1,1,1,1,1,1
Dempsey,28,28,28,28,28,28
Guelph Junction,1,1,1,1,1,1
Hawthorne Village,8,8,8,8,8,8


Now, I can one hot encode these venue types for further analysis

In [56]:
# one hot encoding
Milton_onehot = pd.get_dummies(Milton_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Milton_onehot['Neighborhood'] = Milton_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Milton_onehot.columns[-1]] + list(Milton_onehot.columns[:-1])
Milton_onehot = Milton_onehot[fixed_columns]

Milton_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Bakery,Bar,Brewery,Burger Joint,Bus Line,Bus Station,Bus Stop,Chinese Restaurant,...,Ice Cream Shop,Italian Restaurant,Lounge,Nightlife Spot,Pizza Place,Pub,Restaurant,Speakeasy,Sports Bar,Steakhouse
0,Scott,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Scott,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Scott,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
3,Scott,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
4,Scott,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0


Now I can group the neighbourhoods and take the mean frequency of each venue

In [57]:
Milton_grouped = Milton_onehot.groupby('Neighborhood').mean().reset_index()
Milton_grouped

Unnamed: 0,Neighborhood,American Restaurant,Bakery,Bar,Brewery,Burger Joint,Bus Line,Bus Station,Bus Stop,Chinese Restaurant,...,Ice Cream Shop,Italian Restaurant,Lounge,Nightlife Spot,Pizza Place,Pub,Restaurant,Speakeasy,Sports Bar,Steakhouse
0,Agerton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Ash,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.3,0.0,0.3,0.0,0.0,0.0
2,Beaty,0.0,0.0,0.0,0.0,0.076923,0.076923,0.076923,0.0,0.0,...,0.0,0.0,0.0,0.0,0.307692,0.0,0.230769,0.0,0.0,0.0
3,Boyne,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.333333,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0
4,Campbellville,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.333333
5,Clarke,0.0,0.0,0.0,0.0,0.083333,0.083333,0.125,0.0,0.0,...,0.083333,0.0,0.0,0.0,0.125,0.0,0.166667,0.0,0.125,0.0
6,Darbyville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
7,Dempsey,0.0,0.0,0.035714,0.0,0.035714,0.071429,0.071429,0.0,0.0,...,0.107143,0.0,0.035714,0.0,0.178571,0.0,0.107143,0.0,0.071429,0.0
8,Guelph Junction,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Hawthorne Village,0.0,0.0,0.0,0.0,0.125,0.25,0.25,0.0,0.0,...,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0


#### Let's print each neighborhood along with the top 5 most common venues

Let's crete a map showing **heatmap / density of restaurants** and try to extract some meaningfull info from that. Also, let's show **borders of Berlin boroughs** on our map and a few circles indicating distance of 1km, 2km and 3km from Alexanderplatz.

In [58]:
num_top_venues = 5

for hood in Milton_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = Milton_grouped[Milton_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agerton----
                  venue  freq
0  Fast Food Restaurant   1.0
1   American Restaurant   0.0
2             Gastropub   0.0
3            Sports Bar   0.0
4             Speakeasy   0.0


----Ash----
                  venue  freq
0            Restaurant   0.3
1           Pizza Place   0.3
2  Fast Food Restaurant   0.2
3          Burger Joint   0.1
4             Gastropub   0.1


----Beaty----
                  venue  freq
0           Pizza Place  0.31
1            Restaurant  0.23
2  Fast Food Restaurant  0.15
3          Burger Joint  0.08
4              Bus Line  0.08


----Boyne----
                  venue  freq
0           Pizza Place  0.33
1  Fast Food Restaurant  0.33
2        Ice Cream Shop  0.33
3   American Restaurant  0.00
4             Gastropub  0.00


----Campbellville----
         venue  freq
0   Steakhouse  0.33
1    Speakeasy  0.33
2  Bus Station  0.33
3    Gastropub  0.00
4   Sports Bar  0.00


----Clarke----
                 venue  freq
0           Restaurant

Already from the above data, Wilmott and Ash stands out as locations to keep in mind as a key location.  They have a lot of fast food restaurants, but no nightlife establishments while being on our list of key neighbourhoods.  Even better, Peru fufills these criteira while also having access to Bus Lines.

In [59]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [61]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Milton_grouped['Neighborhood']

for ind in np.arange(Milton_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Milton_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agerton,Fast Food Restaurant,American Restaurant,Gastropub,Sports Bar,Speakeasy,Restaurant,Pub,Pizza Place,Nightlife Spot,Lounge
1,Ash,Restaurant,Pizza Place,Fast Food Restaurant,Burger Joint,Gastropub,American Restaurant,Ice Cream Shop,Sports Bar,Speakeasy,Pub
2,Beaty,Pizza Place,Restaurant,Fast Food Restaurant,Burger Joint,Bus Line,Bus Station,Gastropub,American Restaurant,Italian Restaurant,Sports Bar
3,Boyne,Pizza Place,Fast Food Restaurant,Ice Cream Shop,American Restaurant,Gastropub,Sports Bar,Speakeasy,Restaurant,Pub,Nightlife Spot
4,Campbellville,Steakhouse,Speakeasy,Bus Station,Gastropub,Sports Bar,Restaurant,Pub,Pizza Place,Nightlife Spot,Lounge


#### Finally, lets use K-means clustering to cluster the neighborhoods

In [62]:
# set number of clusters
kclusters = 5

Milton_grouped_clustering = Milton_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Milton_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 0, 0, 0, 0, 0, 2, 0, 1, 0])

In [63]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Milton_merged = hoods

# merge grouped data with manhattan_data to add latitude/longitude for each neighborhood
Milton_merged = Milton_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

Milton_merged['Cluster Labels'] = Milton_merged['Cluster Labels'].fillna(0).astype(int)

Milton_merged.head() 

Unnamed: 0,Neighborhood,Latitude,Longitude,x-coord,y-coord,Distance to Milton Education Village,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Scott,43.501648,-79.886141,-5327883.0,10568470.0,2811.650124,0,Pizza Place,Fast Food Restaurant,Ice Cream Shop,Pub,American Restaurant,Sports Bar,Italian Restaurant,Bakery,Fried Chicken Joint,Chinese Restaurant
21,Wilmott,43.495457,-79.8607,-5329232.0,10565650.0,3195.957976,0,Pizza Place,Fast Food Restaurant,Restaurant,Ice Cream Shop,Fried Chicken Joint,Bar,Sports Bar,Burger Joint,Pub,Chinese Restaurant
17,Peru,43.510885,-79.905953,-5326128.0,10570590.0,5024.993714,0,Fast Food Restaurant,Burger Joint,Restaurant,Ice Cream Shop,Bus Line,Pub,Fried Chicken Joint,Brewery,Bus Station,Nightlife Spot
4,Boyne,43.486744,-79.834765,-5330988.0,10562810.0,5685.718321,0,Pizza Place,Fast Food Restaurant,Ice Cream Shop,American Restaurant,Gastropub,Sports Bar,Speakeasy,Restaurant,Pub,Nightlife Spot
1,Ash,43.501779,-79.836021,-5328577.0,10562660.0,6198.572993,0,Restaurant,Pizza Place,Fast Food Restaurant,Burger Joint,Gastropub,American Restaurant,Ice Cream Shop,Sports Bar,Speakeasy,Pub


Now I can map these clusters

In [64]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Milton_merged['Latitude'], Milton_merged['Longitude'], Milton_merged['Neighborhood'], Milton_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Lets examine each cluster in more detail to understand what makes them similar

In [67]:
Milton_merged.set_index('Neighborhood', inplace=True)
print('Cluster 1')
Milton_merged.loc[Milton_merged['Cluster Labels'] == 0, Milton_merged.columns[[1] + list(range(5, Milton_merged.shape[1]))]]

Cluster 1


Unnamed: 0_level_0,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Scott,-79.886141,0,Pizza Place,Fast Food Restaurant,Ice Cream Shop,Pub,American Restaurant,Sports Bar,Italian Restaurant,Bakery,Fried Chicken Joint,Chinese Restaurant
Wilmott,-79.8607,0,Pizza Place,Fast Food Restaurant,Restaurant,Ice Cream Shop,Fried Chicken Joint,Bar,Sports Bar,Burger Joint,Pub,Chinese Restaurant
Peru,-79.905953,0,Fast Food Restaurant,Burger Joint,Restaurant,Ice Cream Shop,Bus Line,Pub,Fried Chicken Joint,Brewery,Bus Station,Nightlife Spot
Boyne,-79.834765,0,Pizza Place,Fast Food Restaurant,Ice Cream Shop,American Restaurant,Gastropub,Sports Bar,Speakeasy,Restaurant,Pub,Nightlife Spot
Ash,-79.836021,0,Restaurant,Pizza Place,Fast Food Restaurant,Burger Joint,Gastropub,American Restaurant,Ice Cream Shop,Sports Bar,Speakeasy,Pub
Timberlea,-79.86416,0,Pizza Place,Ice Cream Shop,Fast Food Restaurant,Restaurant,Pub,Sports Bar,Burger Joint,Bus Station,Bus Line,American Restaurant
Beaty,-79.835494,0,Pizza Place,Restaurant,Fast Food Restaurant,Burger Joint,Bus Line,Bus Station,Gastropub,American Restaurant,Italian Restaurant,Sports Bar
Clarke,-79.850107,0,Restaurant,Sports Bar,Bus Station,Pizza Place,Fried Chicken Joint,Burger Joint,Bus Line,Fast Food Restaurant,Ice Cream Shop,Coffee Shop
Dempsey,-79.866434,0,Pizza Place,Fast Food Restaurant,Fried Chicken Joint,Restaurant,Ice Cream Shop,Sports Bar,Bus Line,Bus Station,Lounge,Bar
Hawthorne Village,-79.837518,0,Bus Line,Bus Station,Sports Bar,Burger Joint,Pizza Place,Fast Food Restaurant,American Restaurant,Italian Restaurant,Speakeasy,Restaurant


In [68]:
print('Cluster 2')
Milton_merged.loc[Milton_merged['Cluster Labels'] == 1, Milton_merged.columns[[1] + list(range(5, Milton_merged.shape[1]))]]

Cluster 2


Unnamed: 0_level_0,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Guelph Junction,-79.999197,1,Bar,American Restaurant,Gastropub,Sports Bar,Speakeasy,Restaurant,Pub,Pizza Place,Nightlife Spot,Lounge


In [69]:
print('Cluster 3')
Milton_merged.loc[Milton_merged['Cluster Labels'] == 2, Milton_merged.columns[[1] + list(range(5, Milton_merged.shape[1]))]]

Cluster 3


Unnamed: 0_level_0,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Darbyville,-80.064378,2,Pub,American Restaurant,Gastropub,Sports Bar,Speakeasy,Restaurant,Pizza Place,Nightlife Spot,Lounge,Italian Restaurant


In [70]:
print('Cluster 4')
Milton_merged.loc[Milton_merged['Cluster Labels'] == 3, Milton_merged.columns[[1] + list(range(5, Milton_merged.shape[1]))]]

Cluster 4


Unnamed: 0_level_0,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Agerton,-79.808113,3,Fast Food Restaurant,American Restaurant,Gastropub,Sports Bar,Speakeasy,Restaurant,Pub,Pizza Place,Nightlife Spot,Lounge


This concludes our analysis. As it turns out, a lot of these neighborhoods are particularly similar. Perhaps this is to be expected when our focus is on a medium sized suburban area.  However, this can be used to our benefit.  We can use the 10 most commen venue types for the 5 closest neighborhoods the Milton Education Village to best choose our new bar/nightclub location.  This also shows that cluster 1 contains the greatest number of locations of interest, focused in the Milton downtown core, and any location of interest should fall in cluster 1 (labelled cluster 0 on our map).  

## Results and Discussion <a name="results"></a>

Lets look at each of the 5 neighborhoods on our shortlist in detail to determine the best spot for the new bar.

1) Scott

Scott is the closest neighborhood to the Milton Education Village, and has a high density of pizza places, fast food shops, ice cream shops which make it a good candidate for my new bar.  However, there is not easy access to transportation, and there are other pubs and sports bars in the area which could act as competition.

2) Wilmott
 
Wilmott's 5 most frequent venues are all food establishments, however, the sixth and seventh spot on the venue density list tells us that there are other bars in Wilmott, as well as pubs in ninth, and no access to transportation.  This makes Wilmott a less appealing option.
 
3) Peru

Peru is an incredibly promising location.  Peru's top four venues are fast food establishments, fifth and ninth show Peru has a high density of bus lines and stations, there are no pubs or bars, and while there is some nightlife spot density, it is at tenth place on the list.

4) Boyne

Boyne has a number of nightlife spots, sports bars and speakeasys and no transportation access.  With this in mind, and it being being further from the development site from Peru, it is less appealing. 

5) Ash

Ash has a large number of fast food joints, but with no access to transportation, and it being further from the development site as well as spots 8-10 being direct competitors to the new bar I am hoping to establish. 

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify Milton neighborhoods close to the Milton Education village with a low number of nightlife establishments (particularly bars and nightclubs) and a high number of fast food restaurants and transportation access in order to aid stakeholders in narrowing down the search for optimal location for a new bar or nightclub. By calculating venue-of-interest density distribution from Foursquare data I have first identified neighborhoods that justify further analysis, and then generated extensive collection of locations which satisfy some basic requirements regarding existing nearby establishments. Clustering of those locations was then performed in order to create major zones of interest (containing greatest number of potential locations).

Final decission on optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc. however, it is strongly reccomended that Peru be considered as an optimal neighborhood for the reasons listed above.