# IBM Capstone Week 4
## Battle of the Neighborhoods
### Calvin Todorovich 6/4/20

##### Presentation is important

### Introduction

Toronto, Canada was my city of choice for this project, since I checked every other postal code in Canada (A-Z), and more than 20 U.S. cities and none of them had any sort of table data similar to week 3's assignment. It honestly blows my mind that the page we were taught to scrape is utterly unique compared to every other postal code starting letter in Canada, and it has to be one of the most suspicious things I've ever encountered in my life. 

If someone were to open a coffee shop, proximity to competing businesses, number of potential customers (population), and parking could be good predictors for how well a restaurant will do.

In [1]:
#Setting up Libraries
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

import urllib

print("Libraries imported.")

Libraries imported.


In [2]:
#Toronto Data from last week

wiki_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
response = requests.get(wiki_url)
soup = BeautifulSoup(response.text, "lxml")

canada_table = soup.find("table",{"class": "wikitable sortable"})

#I'm not sure if I parsed it wrong, but the canada_table saved as a list, which still had all the html tags in it
#So I found some user defined functions to translate html to csv

table = canada_table

def get_table_headers(table):
    headers = []
    for th in table.find("tr").find_all("th"):
        headers.append(th.text.strip())

#table
df = pd.read_csv("can_table.csv")

#drop that extra unnamed row
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]

#If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
df.Neighborhood.fillna(df.Borough, inplace = True)

#If a neighborhood is still unassigned, drop it
df = df.replace('Not assigned', np.nan).dropna()

df2 = pd.read_csv(r'C:\Users\Todo\Documents\Geospatial_Coordinates.csv')

TorLoc = pd.merge(left = df, right = df2)
TorLoc.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [45]:
data = requests.get('https://en.wikipedia.org/wiki/Demographics_of_Toronto_neighbourhoods').text
soup = BeautifulSoup(data, 'lxml')


In [46]:
NeighborhoodList = []
BoroughList = []
PopulationList = []
DensityList = []
AvgIncomeList = []
CommutingPercList = []

In [47]:
# find the table
#soup.find('table').find_all('tr')

# find all the rows of the table
#soup.find('table').find_all('tr')

tab = soup.find("table",{"class":"wikitable sortable"})
tab_body = tab.find('tbody')
rows = tab_body.findAll('tr')


# for each row of the table, find all the table data
for row in rows:
    cells = row.findAll('td')
    if len(cells) == 13:
        NeighborhoodList.append(cells[0].find(text = True))
        BoroughList.append(cells[1].find(text = True))
        PopulationList.append(cells[3].find(text = True))
        DensityList.append(cells[5].find(text = True))
        AvgIncomeList.append(cells[7].find(text = True))
        CommutingPercList.append(cells[8].find(text = True))


In [48]:
demographics_df = pd.DataFrame({"Neighborhood": NeighborhoodList,
    "Borough": BoroughList,
    "Population": PopulationList,
    "Density": DensityList,
    "AvgIncome": AvgIncomeList,
    "Commuting%": CommutingPercList})
demographics_df.head()

Unnamed: 0,Neighborhood,Borough,Population,Density,AvgIncome,Commuting%
0,Toronto,\n,5113149,866,40704,10.6
1,Agincourt,S\n,"44,577\n",3580\n,"25,750\n",11.1\n
2,Alderwood,E\n,"11,656\n",2360\n,"35,239\n",8.8\n
3,Alexandra Park,OCoT\n,"4,355\n","13,609\n","19,687\n",13.8\n
4,Allenby,OCoT\n,"2,513\n",4333\n,"245,592\n",5.2\n


In [49]:
#Remove the messy stuff from the table
demographics_df = demographics_df.replace(',','', regex=True) #remove the commas
demographics_df = demographics_df.replace('\n','', regex=True) #remove endline characters

demographics_df.head() #Will use this data frame later for profit predicting

#The neighborhood names are so vastly different, I must use borough instead

Unnamed: 0,Neighborhood,Borough,Population,Density,AvgIncome,Commuting%
0,Toronto,,5113149,866,40704,10.6
1,Agincourt,S,44577,3580,25750,11.1
2,Alderwood,E,11656,2360,35239,8.8
3,Alexandra Park,OCoT,4355,13609,19687,13.8
4,Allenby,OCoT,2513,4333,245592,5.2


In [8]:
#Set up Lat and Long
address = 'Toronto'
geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(TorLoc['Latitude'], TorLoc['Longitude'], TorLoc['Borough'], TorLoc['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_toronto)  
    
map_toronto

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [9]:
#Setting up foursquare data

# define Foursquare Credentials and Version
CLIENT_ID = 'ZKXNTW0JK4NFBVDHSQMDD1KSQGZMSMG5WLZQSZQUPX0O04TT'
CLIENT_SECRET = '1U5YA0JRWGIRWM4P1AGQVEQWTDFKSWIULJT1VO2YIAFJILER'
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ZKXNTW0JK4NFBVDHSQMDD1KSQGZMSMG5WLZQSZQUPX0O04TT
CLIENT_SECRET:1U5YA0JRWGIRWM4P1AGQVEQWTDFKSWIULJT1VO2YIAFJILER


In [12]:
#Get Venues for Toronto
#may need to run it two or three times, foursquare gives me a lot of trouble
radius = 500
LIMIT = 100

venues = []

for lat, long, post, borough, neighborhood in zip(TorLoc['Latitude'], TorLoc['Longitude'], TorLoc['Postal Code'], TorLoc['Borough'], TorLoc['Neighborhood']):
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for venue in results:
        venues.append((
            post, 
            borough,
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [14]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['PostalCode', 'Borough', 'Neighborhood', 'BoroughLatitude', 'BoroughLongitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

#Maybe we have to use the 'similar' endpoint?

(2153, 9)


Unnamed: 0,PostalCode,Borough,Neighborhood,BoroughLatitude,BoroughLongitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,M3A,North York,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,M3A,North York,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,M4A,North York,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,M4A,North York,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,M4A,North York,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


In [15]:
#One hot encoding blueprint
#toronto_cluster = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")


# move postal, borough and neighborhood column to the first column
#fixed_columns = list(toronto_onehot.columns[-3:]) + list(toronto_onehot.columns[:-3])
#toronto_onehot = toronto_onehot[fixed_columns]


#So the point is to have the user input the category of the business they want to open, and there will be a k-means clustered map
# Which highlights all businesses of that type

#Need to use standard scalar on lat and long

# one hot encoding
toronto_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add postal, borough and neighborhood column back to dataframe
toronto_onehot['PostalCode'] = venues_df['PostalCode'] 
toronto_onehot['Borough'] = venues_df['Borough'] 
toronto_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move postal, borough and neighborhood column to the first column
fixed_columns = list(toronto_onehot.columns[-3:]) + list(toronto_onehot.columns[:-3])
toronto_onehot = toronto_onehot[fixed_columns]

print(toronto_onehot.shape)
toronto_onehot.head()

(2153, 274)


Unnamed: 0,PostalCode,Borough,Neighborhoods,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cable Car,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Auditorium,College Gym,College Rec Center,College Stadium,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Convention Center,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Harbor / Marina,Hardware Store,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hookah Bar,Hospital,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indonesian Restaurant,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Luggage Store,Mac & Cheese Joint,Market,Martial Arts Dojo,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Piano Bar,Pizza Place,Plane,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Poutine Place,Print Shop,Pub,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Restaurant,River,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Repair,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Social Club,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,M3A,North York,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,M3A,North York,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,M4A,North York,Victoria Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,M4A,North York,Victoria Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,M4A,North York,Victoria Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [18]:
toronto_grouped = toronto_onehot.groupby(["PostalCode", "Borough", "Neighborhoods"]).mean().reset_index()

print(toronto_grouped.shape)
toronto_grouped.head()

(99, 274)


Unnamed: 0,PostalCode,Borough,Neighborhoods,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cable Car,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Auditorium,College Gym,College Rec Center,College Stadium,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Convention Center,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Harbor / Marina,Hardware Store,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hookah Bar,Hospital,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indonesian Restaurant,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Luggage Store,Mac & Cheese Joint,Market,Martial Arts Dojo,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Piano Bar,Pizza Place,Plane,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Poutine Place,Print Shop,Pub,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Restaurant,River,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Repair,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Social Club,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,M1B,Scarborough,"Malvern, Rouge",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M1G,Scarborough,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.666667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M1H,Scarborough,Cedarbrae,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [19]:
toronto_grouped = toronto_onehot.groupby(["PostalCode", "Borough", "Neighborhoods"]).mean().reset_index()
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
areaColumns = ['PostalCode', 'Borough', 'Neighborhoods']
freqColumns = []
for ind in np.arange(num_top_venues):
    try:
        freqColumns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        freqColumns.append('{}th Most Common Venue'.format(ind+1))
columns = areaColumns+freqColumns

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['PostalCode'] = toronto_grouped['PostalCode']
neighborhoods_venues_sorted['Borough'] = toronto_grouped['Borough']
neighborhoods_venues_sorted['Neighborhoods'] = toronto_grouped['Neighborhoods']

for ind in np.arange(toronto_grouped.shape[0]):
    row_categories = toronto_grouped.iloc[ind, :].iloc[3:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    neighborhoods_venues_sorted.iloc[ind, 3:] = row_categories_sorted.index.values[0:num_top_venues]

# neighborhoods_venues_sorted.sort_values(freqColumns, inplace=True)
print(neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted.head()

#Will use this later to confirm my analysis

(99, 13)


Unnamed: 0,PostalCode,Borough,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Malvern, Rouge",Fast Food Restaurant,Print Shop,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",Bar,Construction & Landscaping,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",Intersection,Medical Center,Breakfast Spot,Electronics Store,Restaurant,Mexican Restaurant,Rental Car Location,Bank,Discount Store,Dim Sum Restaurant
3,M1G,Scarborough,Woburn,Coffee Shop,Korean Restaurant,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore
4,M1H,Scarborough,Cedarbrae,Bank,Fried Chicken Joint,Hakka Restaurant,Gas Station,Athletics & Sports,Thai Restaurant,Caribbean Restaurant,Bakery,Dim Sum Restaurant,Diner


In [20]:
#The user input is the Venue Category
user_in = "Coffee Shop"

#One Hot Encoding, with user input
toronto_cluster = pd.get_dummies(venues_df[['VenueCategory']] == user_in, prefix="", prefix_sep="")

# add Lat and Long column back to dataframe
toronto_cluster['VenueLatitude'] = venues_df['VenueLatitude']
toronto_cluster['VenueLongitude'] = venues_df['VenueLongitude']
toronto_cluster.head()

Unnamed: 0,VenueCategory,VenueLatitude,VenueLongitude
0,False,43.751976,-79.33214
1,False,43.751974,-79.333114
2,False,43.723481,-79.315635
3,True,43.725517,-79.313103
4,False,43.725819,-79.312785


In [21]:
num_clusters = 5

k_means = KMeans(init="k-means++", n_clusters=num_clusters, n_init=12)
k_means.fit(toronto_cluster)
labels = k_means.labels_

print(labels[1:5])
labels

[4 4 1 4]


array([4, 4, 4, ..., 3, 3, 3])

In [22]:
toronto_cluster["Labels"] = labels
toronto_cluster['VenueName'] = venues_df['VenueName']
toronto_cluster.head()

Unnamed: 0,VenueCategory,VenueLatitude,VenueLongitude,Labels,VenueName
0,False,43.751976,-79.33214,4,Brookbanks Park
1,False,43.751974,-79.333114,4,Variety Store
2,False,43.723481,-79.315635,4,Victoria Village Arena
3,True,43.725517,-79.313103,1,Tim Hortons
4,False,43.725819,-79.312785,4,Portugril


In [23]:
toronto_cluster['marker_color'] = pd.cut(toronto_cluster['Labels'], bins=5, 
                              labels=['yellow', 'green', 'blue', 'red', 'purple'])
toronto_cluster.head()

Unnamed: 0,VenueCategory,VenueLatitude,VenueLongitude,Labels,VenueName,marker_color
0,False,43.751976,-79.33214,4,Brookbanks Park,purple
1,False,43.751974,-79.333114,4,Variety Store,purple
2,False,43.723481,-79.315635,4,Victoria Village Arena,purple
3,True,43.725517,-79.313103,1,Tim Hortons,green
4,False,43.725819,-79.312785,4,Portugril,purple


In [24]:
locations = toronto_cluster[['VenueLatitude', 'VenueLongitude']]
locationlist = locations.values.tolist()
len(locationlist)

2153

In [25]:
map_cluster = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, m, category, name in zip(toronto_cluster['VenueLatitude'], toronto_cluster['VenueLongitude'], toronto_cluster['marker_color'], toronto_cluster['VenueCategory'], toronto_cluster['VenueName']):
    label = '{}, {}, {}, {}'.format(category, name, lat, lng)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=m,
        fill=True,
        fill_color=m,
        fill_opacity=0.7,
        parse_html=False).add_to(map_cluster)  
map_cluster

## Now that it's clustered, we can use a distance calculation to find areas with small competition for coffee only.

### Combine this metric with the demographics data set to predict potential profit.

### At the end, we should have a list of neighborhoods with good potential profit. We can then check the top venues data set to ensure there aren't any top 10 coffee shops in the area.

Use a distance matrix to find the most isolated points. This will show us the venues with the least amount of competition.

In [26]:
from sklearn.neighbors import NearestNeighbors
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler

In [27]:
coffee = toronto_cluster.loc[toronto_cluster['VenueCategory'] == True]
coffee.head()

Unnamed: 0,VenueCategory,VenueLatitude,VenueLongitude,Labels,VenueName,marker_color
3,True,43.725517,-79.313103,1,Tim Hortons,green
8,True,43.653559,-79.361809,1,Tandem Coffee,green
19,True,43.649963,-79.361442,1,Arvo,green
20,True,43.6519,-79.365609,1,Rooster Coffee,green
22,True,43.653081,-79.357078,1,Dark Horse Espresso Bar,green


In [28]:
dists = pd.get_dummies(coffee[['VenueLatitude']], prefix="", prefix_sep="")
dists['VenueLongitude'] = coffee['VenueLongitude']
#dists.head()

#dists = pd.get_dummies(toronto_onehot[['VenueLatitude']], prefix="", prefix_sep="")
#dists['VenueLongitude'] = toronto_onehot['VenueLongitude']

X = dists.values

In [29]:
nbrs = NearestNeighbors(n_neighbors = 2, algorithm = 'ball_tree').fit(X)
distances, indices = nbrs.kneighbors(X)

scaler = StandardScaler()
scaler.fit(distances)
print(scaler.transform(distances)[1:5])

#The first column is all zeroes, since it represents the distance between a point and itself
#The second column represents the distance between a point and the nearest point

Y = abs(scaler.transform(distances)[:,1]).tolist() #absolute distance
coffee['Distance'] = Y
coffee.head()

[[ 0.         -0.01108333]
 [ 0.         -0.01108333]
 [ 0.         -0.31635967]
 [ 0.         -0.26661508]]


Unnamed: 0,VenueCategory,VenueLatitude,VenueLongitude,Labels,VenueName,marker_color,Distance
3,True,43.725517,-79.313103,1,Tim Hortons,green,2.389286
8,True,43.653559,-79.361809,1,Tandem Coffee,green,0.011083
19,True,43.649963,-79.361442,1,Arvo,green,0.011083
20,True,43.6519,-79.365609,1,Rooster Coffee,green,0.31636
22,True,43.653081,-79.357078,1,Dark Horse Espresso Bar,green,0.266615


In [30]:
print("Coffee Shops with least competition in Toronto:")
coffee.sort_values(by=['Distance'], ascending=False)[0:14]

Coffee Shops with least competition in Toronto:


Unnamed: 0,VenueCategory,VenueLatitude,VenueLongitude,Labels,VenueName,marker_color,Distance
1433,True,43.696338,-79.533398,1,Starbucks,green,5.903431
1891,True,43.602396,-79.545048,1,Tim Hortons,green,4.96298
930,True,43.726895,-79.266157,1,Tim Hortons,green,4.607307
344,True,43.641312,-79.576924,1,Starbucks,green,3.76455
1779,True,43.799102,-79.318715,1,Tim Hortons,green,3.120531
1368,True,43.757604,-79.518882,1,Tim Hortons,green,2.887792
779,True,43.764289,-79.48879,1,Tim Hortons,green,2.887792
1326,True,43.690072,-79.474599,1,Timothy's World Coffee,green,2.809388
56,True,43.719427,-79.467995,1,Tim Hortons,green,2.809388
1142,True,43.667662,-79.312006,1,Country Style,green,2.512808


In [None]:
#Mark the top 10 locations to build near / buy out with a new color
#The best locations will be marked with BLACK

#toronto_onehot['CoffeeDistance'] = coffee['Distance']
#toronto_onehot['CoffeeDistance']= toronto_onehot['CoffeeDistance'].fillna(value=0)
#toronto_onehot.head()


In [None]:
#categories = np.array(
#     ['yellow', 'green', 'blue', 'red', 'purple', 'black']
#)
#toronto_onehot.marker_color = pd.Categorical(toronto_onehot.marker_color, categories=categories, ordered=True)


#toronto_onehot.loc[(toronto_onehot.CoffeeDistance > 1.0),'marker_color']='black'
#toronto_onehot.sort_values(by=['CoffeeDistance'], ascending=False).head()
#The best locations are now marked with black

In [None]:
#map_cluster2 = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
#for lat, lng, m, category, name, d in zip(toronto_onehot['VenueLatitude'], toronto_onehot['VenueLongitude'], toronto_onehot['marker_color'], toronto_onehot['VenueCategory'], toronto_onehot['VenueName'], toronto_onehot['Distance']):
#    label = '{}, {}, {}, {}'.format(name, lat, lng, d)
#    label = folium.Popup(label, parse_html=True)
#    folium.CircleMarker(
#        [lat, lng],
#        radius=5,
#        popup=label,
#        color=m,
#        fill=True,
#        fill_color=m,
#        fill_opacity=0.7,
#        parse_html=False).add_to(map_cluster2)  
#map_cluster2


In [31]:
#I need to find points on the map that are furthest from coffee shops.. but how.....

#IDEA:
#Take the coffee distances from above, and do the same calculation but...
#Write a loop that puts a point from toronto_onehot into the first position of coffee distances
#Then for each iteration, distances[0,1] will give you the distance from a point the nearest coffee shop
#Then in each iteration set toronto_onehot['CoffeeDistance'][i] = distances[0,1]
#The result should be toronoto_onehot with a new column for each venue, which represents how far the nearest coffee shop is

#t_row = pd.DataFrame(toronto_cluster.loc[0,:])

dists2 = pd.get_dummies(toronto_cluster[['VenueLatitude']], prefix="", prefix_sep="")
dists2['VenueLongitude'] = toronto_cluster['VenueLongitude']
dists2.head()
#X2 = dists2.values
X2 = dists2.to_numpy()
X = dists.to_numpy()

#Now we have lat and long for all the venues
#Go one at a time to find nearest coffee distance

t_loc = X2[0,:]
print("Find this value: ", t_loc)

X = np.concatenate(([t_loc], X))
X[0:5]
#It worked

Find this value:  [ 43.75197605 -79.33214045]


array([[ 43.75197605, -79.33214045],
       [ 43.72551663, -79.31310251],
       [ 43.65355871, -79.36180946],
       [ 43.6499628 , -79.36144178],
       [ 43.65189966, -79.36560912]])

In [32]:

nbrs = NearestNeighbors(n_neighbors = 2, algorithm = 'ball_tree').fit(X)
distances, indices = nbrs.kneighbors(X)

scaler = StandardScaler()
scaler.fit(distances)
#print(scaler.transform(distances)[1:5])

#The first column is all zeroes, since it represents the distance between a point and itself
#The second column represents the distance between a point and the nearest point

Y = abs(scaler.transform(distances)[:,1]).tolist() #absolute distance
#coffee['Distance'] = Y
#coffee.head()

#Y[0] # = the distance from that point the the nearest coffee shop

#Remove the first row from X array
X = np.delete(X, 0)
X[0:5]
#Then the loop will start over

array([-79.33214045,  43.72551663, -79.31310251,  43.65355871,
       -79.36180946])

In [33]:
for i in range(0,len(toronto_cluster)):
    X2 = dists2.to_numpy() #resets X and X2 
    X = dists.to_numpy()
    t_loc = X2[i,:]
    X = np.concatenate(([t_loc], X))
    
    nbrs = NearestNeighbors(n_neighbors = 2, algorithm = 'ball_tree').fit(X)
    distances, indices = nbrs.kneighbors(X)

    scaler = StandardScaler()
    scaler.fit(distances)
    #print(scaler.transform(distances)[1:5])

    #The first column is all zeroes, since it represents the distance between a point and itself
    #The second column represents the distance between a point and the nearest point

    Y = abs(scaler.transform(distances)[:,1]).tolist() #absolute distance
    #coffee['Distance'] = Y
    #coffee.head()
    #print(Y[i])
    toronto_cluster.at[i, 'Coffee Distance'] = Y[0] # = the distance from that point the the nearest coffee shop


toronto_cluster.head()

Unnamed: 0,VenueCategory,VenueLatitude,VenueLongitude,Labels,VenueName,marker_color,Coffee Distance
0,False,43.751976,-79.33214,4,Brookbanks Park,purple,2.283982
1,False,43.751974,-79.333114,4,Variety Store,purple,2.256852
2,False,43.723481,-79.315635,4,Victoria Village Arena,purple,0.036837
3,True,43.725517,-79.313103,1,Tim Hortons,green,0.385463
4,False,43.725819,-79.312785,4,Portugril,purple,0.338651


In [34]:
#Then we want to calculate a score for both coffee distance, and a combination of demographics

#Get the population demographics from wikipedia

#Standardize population and coffee distance

#toronto_onehot['Profit Score'] = x * toronto_onehot['C_Distance] + y * toronto_onehot['Population']
#Find a linear combo of that statement that gives good results

#Order by Profit Score and those will be the best places to go
#Need to combine this with cluster df, joining on the Neighborhood
#First off, remove the total thing

#demographics_df = demographics_df.drop(demographics_df.index[0])


#demographics_df = demographics_df.drop(columns = ['Borough'])
#demographics_df.head()

toronto_cluster['Neighborhood'] = venues_df['Neighborhood']
toronto_cluster['Borough'] = venues_df['Borough']
toronto_cluster.head()

Unnamed: 0,VenueCategory,VenueLatitude,VenueLongitude,Labels,VenueName,marker_color,Coffee Distance,Neighborhood,Borough
0,False,43.751976,-79.33214,4,Brookbanks Park,purple,2.283982,Parkwoods,North York
1,False,43.751974,-79.333114,4,Variety Store,purple,2.256852,Parkwoods,North York
2,False,43.723481,-79.315635,4,Victoria Village Arena,purple,0.036837,Victoria Village,North York
3,True,43.725517,-79.313103,1,Tim Hortons,green,0.385463,Victoria Village,North York
4,False,43.725819,-79.312785,4,Portugril,purple,0.338651,Victoria Village,North York


In [35]:
toronto_cluster = toronto_cluster.replace('North York','NY', regex=True)
toronto_cluster = toronto_cluster.replace('Scarborough','S', regex=True)
toronto_cluster = toronto_cluster.replace('Downtown Toronto','OCoT', regex=True)
toronto_cluster = toronto_cluster.replace('Etobicoke','E', regex=True)
toronto_cluster = toronto_cluster.replace('East York','EY', regex=True)

In [36]:
toronto_cluster = toronto_cluster.replace('York', 'Y', regex = True)
toronto_cluster.head()

Unnamed: 0,VenueCategory,VenueLatitude,VenueLongitude,Labels,VenueName,marker_color,Coffee Distance,Neighborhood,Borough
0,False,43.751976,-79.33214,4,Brookbanks Park,purple,2.283982,Parkwoods,NY
1,False,43.751974,-79.333114,4,Variety Store,purple,2.256852,Parkwoods,NY
2,False,43.723481,-79.315635,4,Victoria Village Arena,purple,0.036837,Victoria Village,NY
3,True,43.725517,-79.313103,1,Tim Hortons,green,0.385463,Victoria Village,NY
4,False,43.725819,-79.312785,4,Portugril,purple,0.338651,Victoria Village,NY


In [40]:
#I have to drop neighborhoods, since the names don't match
#demographics_df = demographics_df.drop(['Neighborhood'], axis = 1)

In [50]:
demographics_df.head()

Unnamed: 0,Neighborhood,Borough,Population,Density,AvgIncome,Commuting%
0,Toronto,,5113149,866,40704,10.6
1,Agincourt,S,44577,3580,25750,11.1
2,Alderwood,E,11656,2360,35239,8.8
3,Alexandra Park,OCoT,4355,13609,19687,13.8
4,Allenby,OCoT,2513,4333,245592,5.2


In [53]:
merged = pd.merge(toronto_cluster,demographics_df)
merged.head()
#Need to get the merge to keep all data in toronto cluster, so left join

Unnamed: 0,VenueCategory,VenueLatitude,VenueLongitude,Labels,VenueName,marker_color,Coffee Distance,Neighborhood,Borough,Population,Density,AvgIncome,Commuting%
0,False,43.751976,-79.33214,4,Brookbanks Park,purple,2.283982,Parkwoods,NY,26533,5349,34811,14.0
1,False,43.751974,-79.333114,4,Variety Store,purple,2.256852,Parkwoods,NY,26533,5349,34811,14.0
2,False,43.723481,-79.315635,4,Victoria Village Arena,purple,0.036837,Victoria Village,NY,17047,3612,29657,15.6
3,True,43.725517,-79.313103,1,Tim Hortons,green,0.385463,Victoria Village,NY,17047,3612,29657,15.6
4,False,43.725819,-79.312785,4,Portugril,purple,0.338651,Victoria Village,NY,17047,3612,29657,15.6


In [55]:
#merged
#It actually worked

In [44]:
#Drop everything after 5th most common
neighborhoods_venues_sorted = neighborhoods_venues_sorted.loc[:, :'5th Most Common Venue']
neighborhoods_venues_sorted.head()
#Should I cluster this too?

Unnamed: 0,PostalCode,Borough,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M1B,Scarborough,"Malvern, Rouge",Fast Food Restaurant,Print Shop,Donut Shop,Dim Sum Restaurant,Diner
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",Bar,Construction & Landscaping,Yoga Studio,Donut Shop,Diner
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",Intersection,Medical Center,Breakfast Spot,Electronics Store,Restaurant
3,M1G,Scarborough,Woburn,Coffee Shop,Korean Restaurant,Yoga Studio,Donut Shop,Diner
4,M1H,Scarborough,Cedarbrae,Bank,Fried Chicken Joint,Hakka Restaurant,Gas Station,Athletics & Sports


In [56]:
Profit_Calc = pd.get_dummies(merged[['Coffee Distance']], prefix="", prefix_sep="")
Profit_Calc['Population'] = merged['Population']
Profit_Calc['Density'] = merged['Density']
Profit_Calc['Avg Income'] = merged['AvgIncome']
Profit_Calc['Commuting%'] = merged['Commuting%']

Profit_Calc.head()

x = Profit_Calc.values
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)

Profit_Calc = pd.DataFrame(x_scaled)

Profit_Calc.head()


Unnamed: 0,0,1,2,3,4
0,0.550459,0.484783,0.060035,0.065084,0.333333
1,0.543911,0.484783,0.060035,0.065084,0.333333
2,0.008117,0.262368,0.032085,0.038184,0.412935
3,0.092257,0.262368,0.032085,0.038184,0.412935
4,0.080959,0.262368,0.032085,0.038184,0.412935


In [58]:
#Linear Combo of the Variables
#Y = scaler.transform(distances)[:,1].tolist() #may need to do seperate list for each variable, I'm thinking yeah

merged['Profit Score'] = (Profit_Calc.loc[:,0] + Profit_Calc.loc[:,1] + Profit_Calc.loc[:,2] + Profit_Calc.loc[:,3] + Profit_Calc.loc[:,4])/5  #Take the average of these 5 for now
merged.head()

Unnamed: 0,VenueCategory,VenueLatitude,VenueLongitude,Labels,VenueName,marker_color,Coffee Distance,Neighborhood,Borough,Population,Density,AvgIncome,Commuting%,Profit Score
0,False,43.751976,-79.33214,4,Brookbanks Park,purple,2.283982,Parkwoods,NY,26533,5349,34811,14.0,0.298739
1,False,43.751974,-79.333114,4,Variety Store,purple,2.256852,Parkwoods,NY,26533,5349,34811,14.0,0.297429
2,False,43.723481,-79.315635,4,Victoria Village Arena,purple,0.036837,Victoria Village,NY,17047,3612,29657,15.6,0.150738
3,True,43.725517,-79.313103,1,Tim Hortons,green,0.385463,Victoria Village,NY,17047,3612,29657,15.6,0.167566
4,False,43.725819,-79.312785,4,Portugril,purple,0.338651,Victoria Village,NY,17047,3612,29657,15.6,0.165306


In [73]:
merge_group = merged.groupby('Neighborhood').mean() #The top 15 neighborhoods based on the average profit score metric
merge_group.sort_values('Profit Score', ascending = False).head(15)

Unnamed: 0_level_0,VenueCategory,VenueLatitude,VenueLongitude,Labels,Coffee Distance,Profit Score
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
St. James Town,0.059524,43.649895,-79.375883,0.059524,0.285635,0.454941
Agincourt,0.0,43.792049,-79.259427,4.0,3.899862,0.417344
Church and Wellesley,0.077922,43.666231,-79.382507,0.077922,0.215707,0.311894
Downsview,0.058824,43.741456,-79.500979,2.705882,1.626432,0.299926
Parkwoods,0.0,43.751975,-79.332627,4.0,2.270417,0.298084
Rosedale,0.0,43.679754,-79.377335,0.0,0.778552,0.289609
Bayview Village,0.0,43.787903,-79.38086,2.0,3.26947,0.288246
Woburn,0.666667,43.770559,-79.219579,2.0,0.36785,0.287815
Weston,0.0,43.704486,-79.515789,3.0,1.847797,0.241671
Humber Summit,0.0,43.757519,-79.563653,3.0,4.146612,0.23634


## Results

#### Venue Category shows us how common coffee shops are in the Neighborhood, while Profit score uses the average of the profit score for the neighborhood to show the best locations.

#### Taking the averages of these columns really helps for coffee distance actually.

### Discussion 
#### (of interesting observations and recommendations)

