<h1>Battle of the Neighborhoods:Toronto vs. Twin Cities</h1>
<h4>
    By: Alexander Stetzer
</h4>

In [1]:
#Imports for dataframes and extra processes
import numpy as np
import pandas as pd

#Set columns and rows so that they all print out instead of cutoff view
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

#Used to get Foursquare Data
import requests 

#Webscraping 
from bs4 import BeautifulSoup

#KMeans Clustering import
from sklearn.cluster import KMeans 

#Colors for cluster map
import matplotlib.cm as cm
import matplotlib.colors as colors

#Folium used to create maps with markers
#!pip install folium
import folium

#Used to find lat, long of cites
#!pip install geopy
from geopy.geocoders import Nominatim

#JSON handling
import json
from pandas.io.json import json_normalize 

#Import all math equations for the cosine similarity equations
from math import*

<h2>Table of Contents:</h2>
<ul>
<li>
    <a href=#introduction >Introduction </a>
</li>

<li>
    <a href=#data > Data </a>
</li>

<li>
    <a href=#methodology > Methodology </a>
</li>

<li>
    <a href=#results > Results </a>
    <ul>
    <li>
        <a href=#clustering > Clustering </a>
    </li>
    <li>
        <a href=#cosine > Cosine Similarity </a>
    </li>
    <li>
        <a href=#find_neigh > Find_neigh Function</a>
    </li>
    </ul>
</li>

<li>
    <a href=#discussion > Discussion </a>
</li>

<li>
    <a href=#conclusion > Conclusion </a>
</li>
</ul>


## Introduction <a name = 'introduction'></a>

<p>The goal of this project is to find similarities between the neighborhoods of Toronto and the Twin Cities. The interested parties for this project are individuals or familes that may be moving from one of the cities to the other. As a first look into a potential new home, the similarity of their current neighborhood to a new one is important. With all three cities being very diverse growing communities, movement between them is common. Using KMeans clustering and cosine similarity similar neighborhoods can be found and potential new neighborhoods recommended.</p>

## Data <a name = 'data'></a>

<p>
    The main source of the data for the project is the wikipedia pages for the cities.
</p>

<p>Wikipedia Pages:</p>

<ul>
    <li> <a href= https://en.wikipedia.org/wiki/Neighborhoods_of_Minneapolis> Minneapolis</a></li>
    <li> <a href= https://en.wikipedia.org/wiki/Neighborhoods_in_Saint_Paul,_Minnesota> St. Paul</a></li>
    <li> <a href= http://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M> Toronto</a></li>
</ul>

<p>
    With the neighborhood data found, the next step would be to find the Latitude and Longitude of the data. Using a geospatial coordinates csv, the toronto data can be found. Due to the zip code system not being as precise, Google Geocoder API is needed to find the proper data for the Twin Cities. 
</p>

<p>
    After collecting the location data, Foursquare comes into play. Using the Foursquare API, venue data can be found for each neighborhood. The data from the API creates a DataFrame with each row being a venue. To continue the data wrangling, the venue categories are split using one hot encoding. This creates a DataFrame of venues as rows and columns as all of the possible venue categories. Completing the data wrangling, the venues are grouped by neighborhood and the mean of each column was taken for each neighborhood. The final resulting DataFrame has rows of neighborhoods and columns of each venue category mean for the neighborhood. 
</p>

<h3>Webscraping the Data</h3>
<p>
    Every data science project requires some data and the first part of the data collection is to get the urls and to create the BeautifulSoup objects. The next step is to collect all of the data for each of the cities, communities, and neighborhoods. For Minneapolis, using simple for loops to collect all of the data from the website is used.
</p>

In [2]:
#url of the neighborhoods
url = 'https://en.wikipedia.org/wiki/Neighborhoods_of_Minneapolis'
html = requests.get(url)
d = html.text

url2 = 'https://en.wikipedia.org/wiki/Neighborhoods_in_Saint_Paul,_Minnesota'
html2 = requests.get(url2)
d2 = html2.text

url3 = 'http://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
html3 = requests.get(url3)
d3 = html3.text

#Creation of the BeautifulSoup object
soup = BeautifulSoup(d, 'html5lib')
soup2 = BeautifulSoup(d2, 'html5lib')
soup3 = BeautifulSoup(d3, 'html5lib')
#print(soup.prettify)


In [3]:
#This is a for loop to go through the wikipedia page for the neighborhoods of Minneapolis 
minn_html = soup.find_all('table')
count = 0
minn_neigh_dict = {}
for item in minn_html:
    minn_neigh = []
    for row in item.find_all('td'):
        minn_neigh.append(row.find('a').string)
    
    minn_neigh_dict[str(count)] = minn_neigh
    count += 1

In [4]:
#This loop goes through and collects all of the community names 
community_list = []
for comm in soup.find_all('h3'):
    try:
        community_list.append(comm.find('a').string)
    except:
        pass

In [5]:
#For loop used to combine the community and neighborhoods into a dict in order to make dataFrames
#Also the for loop puts minneapolis into city column
minn_list = []
temp = 0
city = 'Minneapolis'
for community in community_list:
    for item in range(0, len(minn_neigh_dict[str(temp)])):
        new = minn_neigh_dict[str(temp)][item]
        minn_list.append([city, community, new])
    temp += 1
 

In [6]:
#Conversion of the Minneapolis list from the website into the proper dataFrame
minn_df = pd.DataFrame(minn_list)
minn_df.rename(columns ={0: 'City', 1:'Community', 2: 'Neighborhood'}, inplace = True)
minn_df

Unnamed: 0,City,Community,Neighborhood
0,Minneapolis,Calhoun-Isles,Bryn Mawr
1,Minneapolis,Calhoun-Isles,Cedar-Isles-Dean
2,Minneapolis,Calhoun-Isles,East Calhoun
3,Minneapolis,Calhoun-Isles,East Isles
4,Minneapolis,Calhoun-Isles,Kenwood
5,Minneapolis,Calhoun-Isles,Lowry Hill
6,Minneapolis,Calhoun-Isles,Lowry Hill East
7,Minneapolis,Calhoun-Isles,South Uptown
8,Minneapolis,Calhoun-Isles,West Maka Ska
9,Minneapolis,Camden,Cleveland


<p>For the St. Paul data, the website has quite messy data. In order to ensure that the data is correct, a brute force approach will be used. Since the data is not too large the brute force method is the best option. </p>

In [7]:
#Full dictionary of the communities and neighborhoods of St. Paul
st_dict = {'Southeast Side': ['Eastview', 'Conway', 'Battle Creek', 'Highwood Woods'],\
           'Greater East Side': ['Frost Lake', 'Hillcrest', 'Prosperity Heights', 'Hayden Heights', 'Beaver Lake', 'Hazel Park', 'Phalen Village'],\
           'West Side': ['Baker-Annapolis', 'Riverview', 'Concord-Robert'],\
           'Dayton\'s Bluff': ['Dayton\'s Bluff'],\
           'Payne-Phalen': ['Railroad Island', 'Phalen Park', 'Rivoli Bluff', 'Vento', 'Wheelock Park', 'Willams Hill'],\
           'North End': ['North of Maryland', 'South of Maryland', 'South Como'],\
           'Thomas-Dale': ['East Midway', 'West Frogtown', 'North Frogtown', 'Capitol', 'Mt. Airy'],\
           'Summit-University': ['Cathedral Hill'],\
           'West Seventh': ['West Seventh'],\
           'Como Park': ['Energy Park'],\
           'Hamline-Midway': ['Midway'],\
           'Saint Anthony Park':['Langford Park Area', 'South St. Anthony Park'],\
           'Union Park': ['Lexington-Hamline', 'Snelling Hamline', 'Merriam Park'],\
           'Macalester-Groveland': ['TangleTown'],\
           'Highland Park': ['Highland Park'],\
           'Summit Hill': ['Crocus Hill', 'Grand Hill'],\
           'Downtown': ['Core', 'Lowertown']}

In [8]:
#For loop to put St. Paul in the city column of the DataFrame
stpaul_list = []
temp=0 
city = 'St. Paul'
for comm in st_dict.keys():
    for item in st_dict[comm]:
        stpaul_list.append([city, comm, item])
    temp += 1
        
stpaul_df = pd.DataFrame(stpaul_list)
stpaul_df.rename(columns ={0:'City', 1:'Community', 2: 'Neighborhood'}, inplace = True)
stpaul_df.head()

Unnamed: 0,City,Community,Neighborhood
0,St. Paul,Southeast Side,Eastview
1,St. Paul,Southeast Side,Conway
2,St. Paul,Southeast Side,Battle Creek
3,St. Paul,Southeast Side,Highwood Woods
4,St. Paul,Greater East Side,Frost Lake


<p>
The twin cities are combined in this dataframe to simplify the final concatenation.
</p>

In [9]:
#The complete DataFrame of the cities, communities, and neighborhoods
twin_df = pd.concat([minn_df, stpaul_df]).reset_index(drop=True)
twin_df.head()

Unnamed: 0,City,Community,Neighborhood
0,Minneapolis,Calhoun-Isles,Bryn Mawr
1,Minneapolis,Calhoun-Isles,Cedar-Isles-Dean
2,Minneapolis,Calhoun-Isles,East Calhoun
3,Minneapolis,Calhoun-Isles,East Isles
4,Minneapolis,Calhoun-Isles,Kenwood


<p>
For Toronto, the postal table from the wikipedia page is used and the communitiy and neighborhood data comes from there. 
</p>

In [10]:
#Empty list used to gather all of the Postal Data
toronto_list = []
postal_table = soup3.find('table')

#Loop to go through the table data and extract the Postal Codes, Boroughs, and Neighborhoods of Toronto
for item in postal_table.findAll('td'):
    code = {}
    if item.span.text == 'Not assigned': #used to remove all of the postal codes with no Borough assignment
        pass
    else:
        code['Postal Code'] = item.p.text[:3]
        code['City'] = 'Toronto'
        code['Community'] = item.span.text.split('(')[0]
        code['Neighborhood'] = item.span.text.split('(')[1].replace(' /', ',').replace(')','').strip(' ')
        toronto_list.append(code)
    
#Creation of the DataFrame from the list created using the previous loop
toronto_df = pd.DataFrame(toronto_list)
toronto_df['Community']=toronto_df['Community'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})
print(toronto_df.shape)
toronto_df.head()

(103, 4)


Unnamed: 0,Postal Code,City,Community,Neighborhood
0,M3A,Toronto,North York,Parkwoods
1,M4A,Toronto,North York,Victoria Village
2,M5A,Toronto,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,Toronto,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Toronto,Queen's Park,Ontario Provincial Government


<h3>Neighborhood Latitude and Longitude Coordinates</h3>

<p>
Once the neighborhood data is collected the location data needs to be extracted. For Toronto, a csv file with geospatial coordinates for each postal code is used. To combine the two datasets, a left merge is used to combine all the postal codes in the toronto data and exclude the latitudes and longitudes of non toronto data. 
</p>

In [11]:
#Geospatial file used to get lat, long of each postal code 
file = 'Geospatial_Coordinates.csv'

#Reading the geospatial file
geo_df = pd.read_csv(file)
geo_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [12]:
#Merging of the two dataframes but only with postal codes in the toronto data
torontoGeo_df = toronto_df.merge(geo_df, on = 'Postal Code', how = 'left')
torontoGeo_df.head()

Unnamed: 0,Postal Code,City,Community,Neighborhood,Latitude,Longitude
0,M3A,Toronto,North York,Parkwoods,43.753259,-79.329656
1,M4A,Toronto,North York,Victoria Village,43.725882,-79.315572
2,M5A,Toronto,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,Toronto,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Toronto,Queen's Park,Ontario Provincial Government,43.662301,-79.389494


<p>
The Twin cities lat-long data is found using the Google Geocoder API. The next section shows the for loop to go through each of Twin Cities data and find the correct lat-long coordinates with printouts to show progress through the loop. 
</p>

In [13]:
#This section is to collect the Lat-Long Coordinates for each of the neighborhoods in the twin cities
name = twin_df['City']
community = twin_df['Community']
neighborhood = twin_df['Neighborhood']

#Key for Google Geocoder API. Deleted since it is not needed for report
key = ''
twin_list = []

#For loop to go through each neighborhood and check for Lat-Long Coords
for city, comm, neigh in zip(name, community, neighborhood):
    #Print to check progress of the loop
    print(neigh)
    goog_neigh = ''
    neigh_split = neigh.split(' ')
    
    #If check cause the API requires + in between spaces and some neighborhoods require the +
    if len(neigh_split) > 1:
        for nei in neigh_split[:-1]:
            goog_neigh += nei + '+'
    
    #Url builder
    goog_neigh += neigh_split[-1]
    url = 'https://maps.googleapis.com/maps/api/geocode/json?address={}+{}&key={}'.format(goog_neigh, city, key)
    
    #JSON handling
    r = requests.get(url)
    results = r.json()
    
    #Lat-Long Coords from JSON file
    lat = results['results'][0]['geometry']['location']['lat']
    long = results['results'][0]['geometry']['location']['lng']
    
    #Final List
    twin_list.append([city, comm, neigh, lat, long])
    
#print(twin_list)

Bryn Mawr
Cedar-Isles-Dean
East Calhoun
East Isles
Kenwood
Lowry Hill
Lowry Hill East
South Uptown
West Maka Ska
Cleveland
Folwell
Lind-Bohanon
McKinley
Shingle Creek
Victory
Webber-Camden
Downtown East
Downtown West
Elliot Park
Loring Park
North Loop
Stevens Square/Loring Heights
Cooper
Hiawatha
Howe
Longfellow
Seward
Harrison
Hawthorne
Jordan
Near North
Sumner-Glenwood
Willard Hay
Diamond Lake
Ericsson
Field
Hale
Keewaydin
Minnehaha
Morris Park
Northrop
Page
Regina
Wenonah
Audubon Park
Beltrami
Bottineau
Columbia Park
Holland
Logan Park
Marshall Terrace
Northeast Park
Sheridan
St. Anthony East
St. Anthony West
Waite Park
Windom Park
East Phillips
Midtown Phillips
Phillips West
Ventura Village
Bancroft
Bryant
Central
Corcoran
Lyndale
Powderhorn Park
Standish
Whittier
Armatage
East Harriet
Fulton
Kenny
King Field
Linden Hills
Lynnhurst
Tangletown
Windom
Cedar-Riverside
Como
Marcy-Holmes
Nicollet Island/East Bank
Prospect Park
University
Eastview
Conway
Battle Creek
Highwood Woods
Frost

In [14]:
#Creation of the DataFrame and renaming of the columns
twinGeo_df = pd.DataFrame(twin_list)
twinGeo_df.columns = ['City', 'Community', 'Neighborhood', 'Latitude', 'Longitude']
print(twinGeo_df.shape)
twinGeo_df.head()

(128, 5)


Unnamed: 0,City,Community,Neighborhood,Latitude,Longitude
0,Minneapolis,Calhoun-Isles,Bryn Mawr,44.973721,-93.308377
1,Minneapolis,Calhoun-Isles,Cedar-Isles-Dean,44.954166,-93.321534
2,Minneapolis,Calhoun-Isles,East Calhoun,44.952149,-93.297887
3,Minneapolis,Calhoun-Isles,East Isles,44.955947,-93.300271
4,Minneapolis,Calhoun-Isles,Kenwood,44.959105,-93.312002


<h3>Maps of Cities and the Neighborhoods Involved</h3>
<p>
After the data is collected folium maps are made to make sure that the data looks correct and to show the general areas of interest. Using Nominatim, the center of the maps over the cities is found.
</p>

In [15]:
address = 'Minneapolis, MN'

#Using geolocator to find the coordinates for the folium map
geolocator = Nominatim(user_agent='minneapolis')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

#Print out to show that the geolocator worked
print('Lat = {}, Long = {}'.format(latitude, longitude))

Lat = 44.9772995, Long = -93.2654692


In [16]:
#Address used to find the lat and long for toronto
address = 'Toronto, CA'

#Using geolocator to find the coordinates for the folium map
geolocator = Nominatim(user_agent='toronto')
location = geolocator.geocode(address)
latitude2 = location.latitude
longitude2 = location.longitude

#Print out to show that the geolocator worked
print('Lat = {}, Long = {}'.format(latitude2, longitude2))

Lat = 43.6534817, Long = -79.3839347


In [17]:
#Creation of the folium Map for the Twin Cities map
minn_map = folium.Map(location=[latitude,longitude], zoom_start = 11)

#Creation of the circle markers 
for lat, long, community, neigh, city in zip(twinGeo_df.Latitude, twinGeo_df.Longitude, twinGeo_df.Community, twinGeo_df.Neighborhood, twinGeo_df.City):
    label = '{}, {}, {}'.format(neigh, community, city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius = 5,
        popup = label,
        color = 'green',
        fill = True,
        fill_color = '3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(minn_map)

minn_map

In [18]:
#Creation of the folium Map for the Toronto map
toronto_map = folium.Map(location=[latitude2,longitude2], zoom_start = 10)

#Creation of the circle markers 
for lat, long, comm, neigh in zip(torontoGeo_df.Latitude, torontoGeo_df.Longitude, torontoGeo_df.Community, torontoGeo_df.Neighborhood):
    label = '{}, {}'.format(neigh, comm)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius = 5,
        popup = label,
        color = 'red',
        fill = True,
        fill_color = '3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(toronto_map)

toronto_map

<h3>Combining the Neighborhoods</h3>
<p>
The neighborhoods are now combined to allow for the future KMeans clustering.
</p>

In [19]:
fullcities_df = pd.concat([twinGeo_df, torontoGeo_df]).reset_index()
fullcities_df.drop(columns={'Postal Code'}, axis=1, inplace=True)
fullcities_df

Unnamed: 0,index,City,Community,Neighborhood,Latitude,Longitude
0,0,Minneapolis,Calhoun-Isles,Bryn Mawr,44.973721,-93.308377
1,1,Minneapolis,Calhoun-Isles,Cedar-Isles-Dean,44.954166,-93.321534
2,2,Minneapolis,Calhoun-Isles,East Calhoun,44.952149,-93.297887
3,3,Minneapolis,Calhoun-Isles,East Isles,44.955947,-93.300271
4,4,Minneapolis,Calhoun-Isles,Kenwood,44.959105,-93.312002
5,5,Minneapolis,Calhoun-Isles,Lowry Hill,44.96705,-93.295442
6,6,Minneapolis,Calhoun-Isles,Lowry Hill East,44.955577,-93.291875
7,7,Minneapolis,Calhoun-Isles,South Uptown,44.941056,-93.291062
8,8,Minneapolis,Calhoun-Isles,West Maka Ska,44.944701,-93.3263
9,9,Minneapolis,Camden,Cleveland,45.019713,-93.313961


<h3>Foursquare Time!</h3>
<p>
In order to compare the neighborhoods, venue data from Foursquare is used. The key part of the venue data used for comparison is the category of the venue. The following sections show url componets, a function to extract and return a nice dataframe of each venue, and finally the last is used to call the getVenues function.
</p>

In [20]:
#Foursquare data. Deleted because not important to report
CLIENT_ID = ''
CLIENT_SECRET = ''
VERSION = ''
LIMIT = 100

In [21]:
#Function that creates a dataframe of all the venues from the foursquare api
def getVenues(city, community, neighborhood, latitude, longitude, rad = 500):
    venues_list=[]
    for city, comm, neigh, lat, long in zip(city, community, neighborhood, latitude, longitude):
        #print-out to check progress on the function
        print('{}, {}, {}'.format(neigh, comm, city))
        
        #url creation for each neighborhood
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius=500&limit={}'\
        .format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, long, LIMIT)
        
        #results from Foursquare
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        #row creation for each venue and neighborhood
        venues_list.append([(
            city,
            comm,
            neigh,
            lat,
            long,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name']) for v in results])
    
    #Creation of the final data frame    
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 'Community', 'Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', \
                                'Venue Latitude', 'Venue Longitude', 'Venue Category']
       
    return nearby_venues

In [22]:
venues = getVenues(fullcities_df.City, fullcities_df.Community, fullcities_df.Neighborhood, fullcities_df.Latitude, fullcities_df.Longitude)  

Bryn Mawr, Calhoun-Isles, Minneapolis
Cedar-Isles-Dean, Calhoun-Isles, Minneapolis
East Calhoun, Calhoun-Isles, Minneapolis
East Isles, Calhoun-Isles, Minneapolis
Kenwood, Calhoun-Isles, Minneapolis
Lowry Hill, Calhoun-Isles, Minneapolis
Lowry Hill East, Calhoun-Isles, Minneapolis
South Uptown, Calhoun-Isles, Minneapolis
West Maka Ska, Calhoun-Isles, Minneapolis
Cleveland, Camden, Minneapolis
Folwell, Camden, Minneapolis
Lind-Bohanon, Camden, Minneapolis
McKinley, Camden, Minneapolis
Shingle Creek, Camden, Minneapolis
Victory, Camden, Minneapolis
Webber-Camden, Camden, Minneapolis
Downtown East, Central, Minneapolis
Downtown West, Central, Minneapolis
Elliot Park, Central, Minneapolis
Loring Park, Central, Minneapolis
North Loop, Central, Minneapolis
Stevens Square/Loring Heights, Central, Minneapolis
Cooper, Longfellow, Minneapolis
Hiawatha, Longfellow, Minneapolis
Howe, Longfellow, Minneapolis
Longfellow, Longfellow, Minneapolis
Seward, Longfellow, Minneapolis
Harrison, Near North, M

Enclave of L4W, Mississauga, Toronto
Kingsview Village, St. Phillips, Martin Grove Gardens, Richview Gardens, Etobicoke, Toronto
Agincourt, Scarborough, Toronto
Davisville, Central Toronto, Toronto
University of Toronto, Harbord, Downtown Toronto, Toronto
Runnymede, Swansea, West Toronto, Toronto
Clarks Corners, Tam O'Shanter, Sullivan, Scarborough, Toronto
Moore Park, Summerhill East, Central Toronto, Toronto
Kensington Market, Chinatown, Grange Park, Downtown Toronto, Toronto
Milliken, Agincourt North, Steeles East, L'Amoreaux East, Scarborough, Toronto
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park, Central Toronto, Toronto
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport, Downtown Toronto, Toronto
New Toronto, Mimico South, Humber Bay Shores, Etobicoke, Toronto
South Steeles, Silverstone, Humbergate, Jamestown, Mount Olive, Beaumond Heights, Thistletown, Albion Gardens, Etobicoke, Toronto
Steeles West, L'A

<h3>One Hot Encoding and Set-up for KMeans Clustering</h3>
<p>
Using the data in its current state would be difficult, so in order to allow for clustering, one hot encoding is used. For the following code block, the dummy variable are made but each venue is still in its own row. In the block after, the venues are grouped by the neighborhood they are apart of and the mean of each column is taken.  
</p>

In [23]:
#one hot encoding
full_onehot = pd.get_dummies(venues[['Venue Category']], prefix="", prefix_sep="")

#Neighborhood added back in
full_onehot[['City', 'Neighborhood']] = venues.loc[:,('City','Neighborhood')]

#way to fix the neighborhood column as it was not being placed into the correct spot
non_neigh = full_onehot.iloc[:, (full_onehot.columns != 'Neighborhood')]
non_neigh.drop(columns={'City'}, axis=1, inplace=True)
the_cities = full_onehot.iloc[:, (full_onehot.columns == 'City')]

#Correction of the one hot encoding dataframe 
fixed_columns = the_cities.columns.tolist() + ['Neighborhood'] + non_neigh.columns.tolist()
full_onehot = full_onehot[fixed_columns]

#Shape of the dataframe
full_onehot.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


Unnamed: 0,City,Neighborhood,ATM,Accessories Store,Adult Boutique,Advertising Agency,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Astrologer,Athletics & Sports,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Belgian Restaurant,Big Box Store,Bike Rental / Bike Share,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cable Car,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Churrascaria,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Auditorium,College Cafeteria,College Gym,College Rec Center,College Residence Hall,College Stadium,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Cultural Center,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Distribution Center,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Fishing Spot,Flea Market,Flower Shop,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Truck,Football Stadium,Fountain,Frame Store,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hakka Restaurant,Harbor / Marina,Hardware Store,Health & Beauty Service,Health Food Store,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hookah Bar,Hospital,Hostel,Hot Dog Joint,Hot Spring,Hotel,Hotel Bar,Housing Development,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Insurance Office,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean BBQ Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Lawyer,Library,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Luggage Store,Mac & Cheese Joint,Malay Restaurant,Market,Martial Arts School,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Movie Theater,Moving Target,Museum,Music Store,Music Venue,Nail Salon,New American Restaurant,Nightclub,Nightlife Spot,Non-Profit,Noodle House,Office,Opera House,Organic Grocery,Other Great Outdoors,Other Repair Shop,Outdoor Sculpture,Outdoors & Recreation,Paintball Field,Paper / Office Supplies Store,Park,Performing Arts Venue,Persian Restaurant,Pet Café,Pet Store,Pharmacy,Photography Studio,Piano Bar,Pier,Pizza Place,Plane,Platform,Playground,Plaza,Pool,Portuguese Restaurant,Poutine Place,Print Shop,Pub,Public Art,Ramen Restaurant,Real Estate Office,Record Shop,Recording Studio,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,River,Road,Rock Club,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,Southern / Soul Food Restaurant,Spa,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Squash Court,Sri Lankan Restaurant,Stadium,Stationery Store,Steakhouse,Storage Facility,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tourist Information Center,Toy / Game Store,Trail,Train,Train Station,Travel & Transport,Tree,Truck Stop,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vehicle Inspection Station,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Warehouse Store,Waterfall,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Minneapolis,Bryn Mawr,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Minneapolis,Bryn Mawr,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Minneapolis,Bryn Mawr,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Minneapolis,Bryn Mawr,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Minneapolis,Bryn Mawr,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [24]:
#Grouping of the venues into one neighborhood with mean for each venue category
full_grouped = full_onehot.groupby(['City','Neighborhood']).mean().reset_index()
print(full_grouped.shape)
full_grouped

(226, 343)


Unnamed: 0,City,Neighborhood,ATM,Accessories Store,Adult Boutique,Advertising Agency,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Astrologer,Athletics & Sports,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Belgian Restaurant,Big Box Store,Bike Rental / Bike Share,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cable Car,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Churrascaria,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Auditorium,College Cafeteria,College Gym,College Rec Center,College Residence Hall,College Stadium,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Cultural Center,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Distribution Center,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Fishing Spot,Flea Market,Flower Shop,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Truck,Football Stadium,Fountain,Frame Store,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hakka Restaurant,Harbor / Marina,Hardware Store,Health & Beauty Service,Health Food Store,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hookah Bar,Hospital,Hostel,Hot Dog Joint,Hot Spring,Hotel,Hotel Bar,Housing Development,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Insurance Office,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean BBQ Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Lawyer,Library,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Luggage Store,Mac & Cheese Joint,Malay Restaurant,Market,Martial Arts School,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Movie Theater,Moving Target,Museum,Music Store,Music Venue,Nail Salon,New American Restaurant,Nightclub,Nightlife Spot,Non-Profit,Noodle House,Office,Opera House,Organic Grocery,Other Great Outdoors,Other Repair Shop,Outdoor Sculpture,Outdoors & Recreation,Paintball Field,Paper / Office Supplies Store,Park,Performing Arts Venue,Persian Restaurant,Pet Café,Pet Store,Pharmacy,Photography Studio,Piano Bar,Pier,Pizza Place,Plane,Platform,Playground,Plaza,Pool,Portuguese Restaurant,Poutine Place,Print Shop,Pub,Public Art,Ramen Restaurant,Real Estate Office,Record Shop,Recording Studio,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,River,Road,Rock Club,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,Southern / Soul Food Restaurant,Spa,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Squash Court,Sri Lankan Restaurant,Stadium,Stationery Store,Steakhouse,Storage Facility,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tourist Information Center,Toy / Game Store,Trail,Train,Train Station,Travel & Transport,Tree,Truck Stop,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vehicle Inspection Station,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Warehouse Store,Waterfall,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Minneapolis,Armatage,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Minneapolis,Audubon Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Minneapolis,Bancroft,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Minneapolis,Beltrami,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Minneapolis,Bottineau,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Minneapolis,Bryant,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Minneapolis,Bryn Mawr,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Minneapolis,Cedar-Isles-Dean,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Minneapolis,Cedar-Riverside,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.153846,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.038462,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Minneapolis,Central,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064935,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.012987,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.025974,0.0,0.0,0.0,0.0,0.0,0.025974,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.012987,0.0,0.012987,0.0,0.0,0.0,0.038961,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.025974,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.064935,0.0,0.0,0.0,0.012987,0.012987,0.0,0.0,0.0,0.0,0.0,0.051948,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025974,0.0,0.025974,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025974,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025974,0.0,0.0,0.0,0.0,0.0,0.012987,0.064935,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038961,0.0,0.0,0.012987,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064935,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0


<p>
The following couple code blocks are used to create a way to easily see the most common venue categories for each neighborhood. This is not for the analysis, but more for the final portion to show the interested party the types of venues in the neighborhood. 
</p>

In [25]:
#Function to find the most common venue categories in each neighborhood 
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[2:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [26]:
#Number of top venues for the list and indicators of 1st, 2nd, and others
num_top_venues = 10
indic = ['st', 'nd', 'rd']

#Setup of the columns where neighborhood is the first and the most common values follow
columns = ['City', 'Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indic[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
#DataFrame of the sorted venues of the Twin Cities        
full_venues_sorted = pd.DataFrame(columns=columns)
full_venues_sorted.City = full_grouped.City
full_venues_sorted.Neighborhood = full_grouped.Neighborhood

#Filling of the final Twin Cities venues
for ind in np.arange(full_grouped.shape[0]):
    full_venues_sorted.iloc[ind, 2:] = return_most_common_venues(full_grouped.iloc[ind,:], num_top_venues) 
    
full_venues_sorted.head()

Unnamed: 0,City,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Minneapolis,Armatage,Pizza Place,Convenience Store,Trail,Skate Park,Park,Business Service,Fast Food Restaurant,Filipino Restaurant,Doner Restaurant,Donut Shop
1,Minneapolis,Audubon Park,Convenience Store,Pizza Place,American Restaurant,Bakery,Thrift / Vintage Store,Jewelry Store,Arts & Crafts Store,Clothing Store,Coffee Shop,Chinese Restaurant
2,Minneapolis,Bancroft,Dive Bar,Candy Store,Discount Store,Chinese Restaurant,Garden,English Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store
3,Minneapolis,Beltrami,Playground,Gay Bar,Cosmetics Shop,Clothing Store,Café,Bus Station,Escape Room,Brewery,Liquor Store,Platform
4,Minneapolis,Bottineau,Garden Center,Harbor / Marina,Art Gallery,Steakhouse,Grocery Store,Bus Station,Building,Coffee Shop,Scenic Lookout,Theme Restaurant


## Methodology <a name = 'methodology'></a>

<p>
    Once the data is collected and wrangled the real fun begins. The main goal for the interested parties is to find a similar neighborhood to live in. With the Foursquare venue category data, a general idea of the neighborhood can be made. Neighborhoods that have similar venues, parks, elementary schools, and others, tend to have similar feelings. 
To find these similar neighborhoods, KMeans clustering was used. KMeans clustering was the chosen clustering machine learning algorithm because of its ability to eek out similarities within the data. Five clusters were chosen for the final project as five clusters seemed to produce a great cluster set. </p>
<p>
Even with a large amount of clusters, finding differences between the neighborhoods within the cluster is difficult. With KMeans, there tends to be a few clusters of 10-15 and then a major 100+ cluster. To help the interested parties, another further similarity solution is used. The solution that was used is cosine similarity. 
    </p>


<p>
Using the neighborhood data from full_onehot, dot products between the two neighborhoods make up the numerator of the cosine similarity. Then the length of the two neighborhood vectors is multiplied together, which makes up the denominator of the cosine similarity. In the end, the final cosine similarity will provide a score between one and zero. As the cosine similarity reaches closer to one, the more similar the neighborhoods are. With the KMeans clustering and the cosine similarity calculations, the final recommendations can be made. The final part is to allow an interested person to input their neighborhood and find recommendations based on the solutions. The neigh_finder function scans through each of the cosine similarities and finds all of the similarities with the chosen neighborhood. Then, neigh_finder function returns the top five similar neighborhoods with similarity scores and the top ten most common venues of each neighborhood.     
</p>

## Results <a name = 'results'></a>

<h3>Clustering</h3><a name = 'clustering'></a>
<p>
Now it is time to cluster the neighborhoods. Five clusters were used as that seemed to be a good number where most clusters have 5-10 neighborhoods with one or two different mega clusters.
</p>

In [47]:
# Number of clusters for the Kmeans clustering
kclusters = 5

#Dropping the neighborhood column as it has nothing to do with the machine learning 
full_grouped_cluster = full_grouped.drop(columns={'City','Neighborhood'}, axis = 1)

#Creation of the KMeans object
kmeans = KMeans(n_clusters = kclusters).fit(full_grouped_cluster)

#Printout of all the labels for the cluster
kmeans.labels_

array([3, 4, 4, 4, 4, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
       4, 4, 4, 3, 4, 4, 4, 4, 4, 4, 4, 3, 4, 3, 4, 4, 3, 3, 4, 4, 4, 0,
       4, 4, 4, 4, 3, 4, 4, 4, 4, 0, 4, 4, 4, 4, 3, 4, 4, 4, 1, 4, 4, 3,
       4, 4, 3, 4, 4, 1, 4, 4, 4, 4, 3, 3, 2, 4, 4, 3, 4, 3, 4, 4, 4, 4,
       4, 4, 4, 4, 4, 4, 4, 4, 4, 0, 4, 4, 4, 3, 4, 4, 4, 4, 4, 4, 4, 4,
       2, 4, 4, 3, 4, 3, 4, 4, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
       4, 4, 4, 4, 0, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 0, 4,
       3, 4, 4, 4, 4, 4, 4, 4, 3, 4, 3, 3, 4, 4, 4, 4, 1, 2, 4, 4, 4, 4,
       3, 4, 3, 4, 4, 2, 3, 4, 1, 4, 3, 4, 4, 2, 4, 4, 4, 3, 4, 4, 0, 4,
       4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 4, 3, 4, 4, 4, 4, 4, 4, 4,
       0, 4, 4, 4, 3, 0])

<p>
The next block is used to put the neighborhoods and the cluster numbers together.
</p>

In [49]:
#Can be uncommented if needed to be run more than once as it breaks due to an adding another 'Cluster Labels' column
#full_venues_sorted = full_venues_sorted.drop('Cluster Labels', 1)

#Instertion of the cluster labels into the sorted venues dataFrame
full_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

#Merging of the Data together so that it can be shown together
fullGeo = fullcities_df.reset_index(drop=True)
full_merged = fullGeo
full_merged = full_merged.merge(full_venues_sorted.set_index('Neighborhood'), on='Neighborhood', how='inner')
full_merged.drop(columns={'City_y'}, axis = 1, inplace=True)
full_merged.rename(columns={'City_x':'City'}, inplace=True)

#Printout showing the shape and first 5 rows of data
print(full_merged.shape)
print(full_merged['Cluster Labels'].value_counts())
full_merged.head()

(226, 17)
4    177
3     31
0      8
2      6
1      4
Name: Cluster Labels, dtype: int64


Unnamed: 0,index,City,Community,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,Minneapolis,Calhoun-Isles,Bryn Mawr,44.973721,-93.308377,4,Men's Store,Coffee Shop,Furniture / Home Store,Grocery Store,Antique Shop,Intersection,Park,Pizza Place,Farmers Market,Falafel Restaurant
1,1,Minneapolis,Calhoun-Isles,Cedar-Isles-Dean,44.954166,-93.321534,4,Intersection,Tourist Information Center,Speakeasy,Beach,Yoga Studio,Drugstore,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant
2,2,Minneapolis,Calhoun-Isles,East Calhoun,44.952149,-93.297887,4,Coffee Shop,Pizza Place,Bar,American Restaurant,Salon / Barbershop,Sandwich Place,Clothing Store,Gym,Mobile Phone Shop,Electronics Store
3,3,Minneapolis,Calhoun-Isles,East Isles,44.955947,-93.300271,4,Pizza Place,Coffee Shop,Mexican Restaurant,Gas Station,Pharmacy,Indian Restaurant,Boutique,Shipping Store,Bike Rental / Bike Share,Sandwich Place
4,4,Minneapolis,Calhoun-Isles,Kenwood,44.959105,-93.312002,4,Bookstore,Bakery,Trail,Arts & Crafts Store,Skating Rink,Tailor Shop,Café,American Restaurant,Yoga Studio,Escape Room


In [50]:
twin_merged = full_merged[0:127].reset_index(drop=True)
toronto_merged = full_merged[127:-1].reset_index(drop=True)

<h3>Showing off the cluster colors of each neighborhood</h3>
<p>
To show which cluster each neighborhood made it into, folium maps of Toronto and the Twin Cities are made.
</p>

In [51]:
#Creation of the map with the colors as markers
minn_map_clusters = folium.Map(location=[latitude, longitude], zoom_start = 11)

#Creation of the color arrays used to color the cluster markers
x = np.arange(kclusters)
ys = [(i + x + (i*x)**2) for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0,1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

#Creation of the markers for the map coloring by cluster
markers_colors = []
for lat, long, poi, cluster in zip(twin_merged.Latitude, twin_merged.Longitude, twin_merged.Neighborhood, twin_merged['Cluster Labels'] ):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html = True)
    folium.CircleMarker(
        [lat, long],
        radius = 5,
        popup = label,
        color=rainbow[cluster-1],
        fill = True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(minn_map_clusters)
    
    
minn_map_clusters

In [70]:
#Creation of the map with the colors as markers
tor_map_clusters = folium.Map(location=[latitude2, longitude2], zoom_start = 10)

#Creation of the color arrays used to color the cluster markers
x = np.arange(kclusters)
ys = [(i + x + (i*x)**2) for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0,1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

#Creation of the markers for the map coloring by cluster
markers_colors = []
for lat, long, poi, cluster in zip(toronto_merged.Latitude, toronto_merged.Longitude, toronto_merged.Neighborhood, toronto_merged['Cluster Labels'] ):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html = True)
    folium.CircleMarker(
        [lat, long],
        radius = 5,
        popup = label,
        color=rainbow[cluster-1],
        fill = True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(tor_map_clusters)
    
    
tor_map_clusters

<h3>Finding the Similarity of the Neighborhoods in the Clusters</h3><a name = 'cosine'></a>
<p>
    In order to compare the similarities between the neighborhoods in the cluster cosine similarity is used. With cosine similarity, venue categories that do not match up will decrease the similarity and those that do match will increase the similarity. The goal is to loop through each neighborhood and compare it to every other neighborhood in the cluster.
</p>

In [53]:
#Function to find the cosine similarity of the neighborhoods
def cosine_sim(X, Y):
    
    #Dot product of the X and Y Vectors
    num = sum(x*y for x,y in zip(X,Y))
    
    #Product of the length of the X and Y vectors
    denom = sqrt(sum([x*x for x in X]))*sqrt(sum([y*y for y in Y]))
    
    #Final cos(theta) similarity
    similarity = (num/denom)
    
    return similarity

In [54]:
#Function to go through each of the neighborhoods and finds the cosine similarity of each
#Looks quite large and complicated but isn't
def find_neigh_sim(neighborhoods_df):
    sim_list=[]
    neigh_list=[]
    
    #Series for removal of the neigh list while keeping the names
    neigh = neighborhoods_df[['City','Neighborhood']]
    neighborhoods_df.drop(['City'], axis=1, inplace=True)
    neigh_df = neighborhoods_df.drop(columns={'Neighborhood'}, axis=1)
    #For loop where each neighborhood is gone through and the cosine similarity is found
    #First neighborhood
    for index1 in neigh_df.index:
        neigh1 = list(neigh_df.loc[index1])
        name1 = neigh.loc[index1]
        
        #Second Neighborhood
        for index2 in neigh_df.index:
            
            #Check if the neighborhoods are the same since the cosine sim is obvious in that case
            if index1 != index2:
                neigh2 = list(neigh_df.loc[index2])                
                name2 = neigh.loc[index2]
                #A way to sort the neighborhood names and check if the twin cities and toronto are matched up 
                A = sorted([name1[1], name2[1]])[0]
                city_A = neigh[neigh['Neighborhood'] == A]['City'].item()
                B = sorted([name1[1], name2[1]])[1]
                city_B = neigh[neigh['Neighborhood'] == B]['City'].item()
                if city_A != city_B:
                    similarity = [city_A, A, city_B, B, cosine_sim(neigh1, neigh2)]
                    sim_list.append(similarity)
                #Can be used to print the similarity values of the neighborhoods, but can be really annoying 
                #print('The cosine similarity between {} and {} neighborhoods is {}'.format(name1, name2, similarity[4]))            
            
            #Used if index is the same. Really is useless but mostly for peace of mind
            else:
                pass
    
    sim_df = pd.DataFrame(sim_list, columns=['City 1', 'Neighborhood 1', 'City 2', 'Neighborhood 2', 'Cosine Sim'])
    sim_df.drop_duplicates(inplace=True)
    sim_df = sim_df.sort_values(by=['Cosine Sim'], ascending=False).reset_index(drop=True)
    return sim_df

<p>The following code boxes go through each cluster from the KMeans solution and run the find_neigh_sim function. Each code box prints out the amount of neighborhoods in each cluster, and, if their is more than 1 neighborhood in the cluster, the top five neighborhoods due to similarity.</p>

<h4>Cluster 0:</h4>

In [55]:
cluster0 = pd.DataFrame([])
clust0 = full_merged[full_merged['Cluster Labels'] == 0]
print('The amount of neighborhoods in cluster 0 is {}.'.format(len(clust0)))
#Check if the cluster is greater than 1
if len(clust0) >= 2: 
    
    #For loop to put the cluster together for  find_neigh_sim
    for city, neigh in zip(clust0['City'], clust0['Neighborhood']):
        row = (full_grouped.loc[full_grouped['Neighborhood'] == neigh])
        cluster0 = pd.concat([cluster0, row])
        cluster0_df = pd.DataFrame(cluster0, columns = full_grouped.columns)

    #Finding the similarities of the cluster
    clust_sim0 = find_neigh_sim(cluster0_df)
    print(clust_sim0.head())

#If the cluster has less than 2 neighborhoods this prints out
else:
    clust_sim0 = pd.DataFrame([])
    print('{}, {} is the only neighborhood in this cluster.'.format(clust0['Neighborhood'].item(), clust0['City'].item()))

The amount of neighborhoods in cluster 0 is 8.
        City 1       Neighborhood 1       City 2           Neighborhood 2  \
0  Minneapolis           Lowry Hill      Toronto  Willowdale, Newtonbrook   
1  Minneapolis           Lowry Hill      Toronto          York Mills West   
2  Minneapolis           Lowry Hill      Toronto                 Rosedale   
3      Toronto  Caledonia-Fairbanks  Minneapolis               Lowry Hill   
4  Minneapolis           Near North      Toronto  Willowdale, Newtonbrook   

   Cosine Sim  
0    1.000000  
1    0.894427  
2    0.816497  
3    0.816497  
4    0.816497  


<h4>Cluster 1:</h4>

In [56]:
cluster1 = pd.DataFrame([])
clust1 = full_merged[full_merged['Cluster Labels'] == 1]
print('The amount of neighborhoods in cluster 1 is {}.'.format(len(clust1)))
#Check if the cluster is greater than 1
if len(clust1) >= 2: 
    
    #For loop to put the cluster together for  find_neigh_sim
    for city, neigh in zip(clust1['City'], clust1['Neighborhood']):
        row = (full_grouped.loc[full_grouped['Neighborhood'] == neigh])
        cluster1 = pd.concat([cluster1, row])
        cluster1_df = pd.DataFrame(cluster1, columns = full_grouped.columns)

    #Finding the similarities of the cluster
    clust_sim1 = find_neigh_sim(cluster1_df)
    print(clust_sim1.head())

#If the cluster has less than 2 neighborhoods this prints out    
else:
    clust_sim1 = pd.DataFrame([])
    print('{}, {} is the only neighborhood in this cluster.'.format(clust1['Neighborhood'].item(), clust1['City'].item()))

The amount of neighborhoods in cluster 1 is 4.
    City 1               Neighborhood 1       City 2   Neighborhood 2  \
0  Toronto                Humber Summit  Minneapolis           Regina   
1  Toronto  Moore Park, Summerhill East  Minneapolis           Regina   
2  Toronto                Humber Summit  Minneapolis  Sumner-Glenwood   
3  Toronto  Moore Park, Summerhill East  Minneapolis  Sumner-Glenwood   

   Cosine Sim  
0     0.57735  
1     0.57735  
2     0.50000  
3     0.50000  


<h4>Cluster 2:</h4>

In [57]:
cluster2 = pd.DataFrame([])
clust2 = full_merged[full_merged['Cluster Labels'] == 2]
print('The amount of neighborhoods in cluster 2 is {}.'.format(len(clust2)))
#Check if the cluster is greater than 1
if len(clust2) >= 2: 
    
    #For loop to put the cluster together for  find_neigh_sim
    for city, neigh in zip(clust2['City'], clust2['Neighborhood']):
        row = (full_grouped.loc[full_grouped['Neighborhood'] == neigh])
        cluster2 = pd.concat([cluster2, row])
        cluster2_df = pd.DataFrame(cluster2, columns = full_grouped.columns)

    #Finding the similarities of the cluster
    clust_sim2 = find_neigh_sim(cluster2_df)
    print(clust_sim2.head())

#If the cluster has less than 2 neighborhoods this prints out    
else:
    clust_sim2 = pd.DataFrame([])
    print('{}, {} is the only neighborhood in this cluster.'.format(clust2['Neighborhood'].item(), clust2['City'].item()))

The amount of neighborhoods in cluster 2 is 6.
     City 1     Neighborhood 1       City 2  \
0   Toronto   Humberlea, Emery     St. Paul   
1  St. Paul  North of Maryland      Toronto   
2   Toronto   Humberlea, Emery  Minneapolis   
3  St. Paul  North of Maryland  Minneapolis   
4   Toronto   DownsviewCentral     St. Paul   

                                      Neighborhood 2  Cosine Sim  
0                                  North of Maryland    0.816497  
1  Old Mill South, King's Mill Park, Sunnylea, Hu...    0.577350  
2                                            Wenonah    0.577350  
3                                            Wenonah    0.471405  
4                                  North of Maryland    0.471405  


<h4>Cluster 3:</h4>

In [58]:
cluster3 = pd.DataFrame([])
clust3 = full_merged[full_merged['Cluster Labels'] == 3]
print('The amount of neighborhoods in cluster 3 is {}.'.format(len(clust3)))
#Check if the cluster is greater than 1
if len(clust3) >= 2: 
    for city, neigh in zip(clust3['City'], clust3['Neighborhood']):
        row = (full_grouped.loc[full_grouped['Neighborhood'] == neigh])
        cluster3 = pd.concat([cluster3, row])
        cluster3_df = pd.DataFrame(cluster3, columns = full_grouped.columns)
    
    #Finding the similarities of the cluster
    clust_sim3 = find_neigh_sim(cluster3_df)
    print(clust_sim3.head())

#If the cluster has less than 2 neighborhoods this prints out    
else:
    clust_sim3 = pd.DataFrame([])
    print('({}), {} is the only neighborhood in this cluster.'.format(clust3['Neighborhood'].item(), clust3['City'].item()))

The amount of neighborhoods in cluster 3 is 31.
        City 1 Neighborhood 1       City 2  \
0  Minneapolis        Folwell     St. Paul   
1  Minneapolis   Linden Hills      Toronto   
2      Toronto  DownsviewWest  Minneapolis   
3  Minneapolis   Lind-Bohanon     St. Paul   
4      Toronto      Glencairn  Minneapolis   

                                      Neighborhood 2  Cosine Sim  
0                                      Highland Park    0.816497  
1  Milliken, Agincourt North, Steeles East, L'Amo...    0.654654  
2                                        Willard Hay    0.566947  
3                                          Riverview    0.516398  
4                                       Linden Hills    0.507093  


<h4>Cluster 4:</h4>

In [59]:
cluster4 = pd.DataFrame([])
clust4 = full_merged[full_merged['Cluster Labels'] == 4]
print('The amount of neighborhoods in cluster 4 is {}.'.format(len(clust4)))
#Check if the cluster is greater than 1
if len(clust4) >= 2: 
    
    #For loop to put the cluster together for  find_neigh_sim
    for city, neigh in zip(clust4['City'], clust4['Neighborhood']):
        row = (full_grouped.loc[full_grouped['Neighborhood'] == neigh])
        cluster4 = pd.concat([cluster4, row])
        cluster4_df = pd.DataFrame(cluster4, columns = full_grouped.columns)
    
    #Finding the similarities of the cluster
    clust_sim4 = find_neigh_sim(cluster4_df)
    print(clust_sim4.head())

#If the cluster has less than 2 neighborhoods this prints out    
else:
    clust_sim4 = pd.DataFrame([])
    print('{}, {} is the only neighborhood in this cluster.'.format(clust4['Neighborhood'].item(), clust4['City'].item()))

The amount of neighborhoods in cluster 4 is 177.
        City 1                                     Neighborhood 1  \
0  Minneapolis                                           Standish   
1      Toronto  Harbourfront East, Union Station, Toronto Islands   
2      Toronto                           Richmond, Adelaide, King   
3  Minneapolis                                       East Harriet   
4      Toronto                      Ontario Provincial Government   

        City 2 Neighborhood 2  Cosine Sim  
0      Toronto         Woburn    0.894427  
1  Minneapolis       Standish    0.809524  
2  Minneapolis       Standish    0.776750  
3      Toronto       Roselawn    0.750000  
4  Minneapolis       Standish    0.744208  


In [60]:
#Combine all of the clusters into one great cluster of similarity
clust_sim = pd.concat([clust_sim0, clust_sim1, clust_sim2, clust_sim3, clust_sim4]).reset_index(drop=True)
clust_sim.head()

Unnamed: 0,City 1,Neighborhood 1,City 2,Neighborhood 2,Cosine Sim
0,Minneapolis,Lowry Hill,Toronto,"Willowdale, Newtonbrook",1.0
1,Minneapolis,Lowry Hill,Toronto,York Mills West,0.894427
2,Minneapolis,Lowry Hill,Toronto,Rosedale,0.816497
3,Toronto,Caledonia-Fairbanks,Minneapolis,Lowry Hill,0.816497
4,Minneapolis,Near North,Toronto,"Willowdale, Newtonbrook",0.816497


<h3>Checking Specific Neighborhoods</h3><a name = 'find_neigh'></a>
<p>
    Since the goal of the project involves specific neighborhoods, a function to find similarity scores for a specific neighborhood is necessary. The following function finds the named neighborhood in the similarity dataFrame and outputs all of the similarities the named neighborhood is in. Also, the top ten venues for each of the neighborhoods involved.
</p>

In [61]:
#Function to find the similar neighborhoods of the target neighborhood 
def neigh_finder(neigh, clust_sim):
    #Initialization of key aspects
    neigh_list = []
    top_ten = pd.DataFrame([], columns = full_venues_sorted.columns)
    count = 0
    full_venues_sorted[full_venues_sorted.Neighborhood == neigh]
    temp = pd.DataFrame(full_venues_sorted[full_venues_sorted.Neighborhood == neigh].values, columns = full_venues_sorted.columns)
    top_ten = pd.concat([top_ten, temp])
    #For loop to go through each of rows in the cluster
    for index, row in clust_sim.iterrows():
        
        #Check if target neighborhood is in the row
        if neigh in row[['Neighborhood 1', 'Neighborhood 2']].values:
            #Used for error checking in neigh and if the target neigh is in a cluster with more than 1 neighborhood
            count += 1
            #If check to sort the target neighborhood into the first slot
            if neigh in row['Neighborhood 1']:
                neigh_list.append(row.values)
                temp_df = pd.DataFrame(full_venues_sorted[full_venues_sorted.Neighborhood == row['Neighborhood 2']], columns = full_venues_sorted.columns)
                top_ten = pd.concat([top_ten, temp_df])
            elif neigh in row['Neighborhood 2']:
                neigh_list.append([row['City 2'], row['Neighborhood 2'], row['City 1'] \
                                   , row['Neighborhood 1'], row['Cosine Sim']])
                temp_df = pd.DataFrame(full_venues_sorted[full_venues_sorted.Neighborhood == row['Neighborhood 1']], columns = full_venues_sorted.columns)
                top_ten = pd.concat([top_ten, temp_df])
            
    #Printout if count is zero, meaning if check never went  
    if count == 0: 
        print('Sorry, your neighborhood was not included or was in a cluster alone.')
    neigh_df = pd.DataFrame(neigh_list, columns = clust_sim.columns)
    top_ten.reset_index(drop=True, inplace=True)
    return neigh_df, top_ten

<p>
To show off the effectiveness of the neigh_finder function the following boxes use the Near North neighborhood of Minneapolis. In this example someone from Near North is looking for similar neighborhoods to move to. To show that the function works with all neighborhoods and clusters, examples from St. Paul and Toronto are included with neighborhoods in different clusters.
</p>

In [62]:
new_neigh, top_ten = neigh_finder('Near North', clust_sim)
new_neigh.head()

Unnamed: 0,City 1,Neighborhood 1,City 2,Neighborhood 2,Cosine Sim
0,Minneapolis,Near North,Toronto,"Willowdale, Newtonbrook",0.816497
1,Minneapolis,Near North,Toronto,York Mills West,0.730297
2,Minneapolis,Near North,Toronto,Caledonia-Fairbanks,0.666667
3,Minneapolis,Near North,Toronto,Rosedale,0.666667
4,Minneapolis,Near North,Toronto,DownsviewEast,0.57735


In [63]:
top_ten.head(6)

Unnamed: 0,Cluster Labels,City,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,Minneapolis,Near North,Park,Miscellaneous Shop,Wine Bar,Yoga Studio,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant
1,0,Toronto,"Willowdale, Newtonbrook",Park,Yoga Studio,Ethiopian Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant,Escape Room
2,0,Toronto,York Mills West,Park,Convenience Store,Yoga Studio,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant,Escape Room
3,0,Toronto,Caledonia-Fairbanks,Park,Women's Store,Pool,Yoga Studio,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store,Elementary School
4,0,Toronto,Rosedale,Park,Trail,Playground,Fish & Chips Shop,English Restaurant,Doner Restaurant,Fish Market,Donut Shop,Drugstore,Eastern European Restaurant
5,0,Toronto,DownsviewEast,Airport,Park,Yoga Studio,Ethiopian Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant


In [64]:
new_neigh, top_ten = neigh_finder('Lexington-Hamline', clust_sim)
new_neigh.head()

Unnamed: 0,City 1,Neighborhood 1,City 2,Neighborhood 2,Cosine Sim
0,St. Paul,Lexington-Hamline,Toronto,"Alderwood, Long Branch",0.365148
1,St. Paul,Lexington-Hamline,Minneapolis,Northeast Park,0.351763
2,St. Paul,Lexington-Hamline,Minneapolis,Elliot Park,0.341565
3,St. Paul,Lexington-Hamline,Minneapolis,South Uptown,0.331133
4,St. Paul,Lexington-Hamline,Toronto,"Regent Park, Harbourfront",0.314918


In [65]:
top_ten.head(6)

Unnamed: 0,Cluster Labels,City,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,4,St. Paul,Lexington-Hamline,Baseball Field,Theater,Pizza Place,Athletics & Sports,College Gym,Video Store,Football Stadium,Coffee Shop,Park,Yoga Studio
1,4,Toronto,"Alderwood, Long Branch",Pizza Place,Pub,Sandwich Place,Athletics & Sports,Coffee Shop,Playground,Pharmacy,Construction & Landscaping,Escape Room,Doner Restaurant
2,4,Minneapolis,Northeast Park,Yoga Studio,Theater,Food Truck,Coffee Shop,Brewery,Health & Beauty Service,Gym,Event Space,Music Store,Diner
3,4,Minneapolis,Elliot Park,Coffee Shop,Park,Pharmacy,BBQ Joint,Football Stadium,Grocery Store,Outdoor Sculpture,Bank,Outdoors & Recreation,Brewery
4,4,Minneapolis,South Uptown,Coffee Shop,Park,Music Store,Vietnamese Restaurant,Gift Shop,Café,Vegetarian / Vegan Restaurant,Intersection,Donut Shop,Convenience Store
5,4,Toronto,"Regent Park, Harbourfront",Coffee Shop,Park,Bakery,Pub,Café,Restaurant,Sushi Restaurant,Discount Store,Chocolate Shop,Distribution Center


In [68]:
new_neigh, top_ten = neigh_finder('Woburn', clust_sim)
new_neigh.head()

Unnamed: 0,City 1,Neighborhood 1,City 2,Neighborhood 2,Cosine Sim
0,Toronto,Woburn,Minneapolis,Standish,0.894427
1,Toronto,Woburn,Minneapolis,Elliot Park,0.604743
2,Toronto,Woburn,Minneapolis,University,0.582223
3,Toronto,Woburn,St. Paul,Midway,0.516398
4,Toronto,Woburn,Minneapolis,Stevens Square/Loring Heights,0.474342


In [69]:
top_ten.head(6)

Unnamed: 0,Cluster Labels,City,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,4,Toronto,Woburn,Coffee Shop,Korean BBQ Restaurant,Yoga Studio,Ethiopian Restaurant,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store,Elementary School
1,4,Minneapolis,Standish,Coffee Shop,Yoga Studio,Event Space,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant,Escape Room
2,4,Minneapolis,Elliot Park,Coffee Shop,Park,Pharmacy,BBQ Joint,Football Stadium,Grocery Store,Outdoor Sculpture,Bank,Outdoors & Recreation,Brewery
3,4,Minneapolis,University,Coffee Shop,Bowling Alley,College Rec Center,Bagel Shop,Pharmacy,Burger Joint,Rock Club,Restaurant,Chinese Restaurant,Pub
4,4,St. Paul,Midway,Korean Restaurant,Coffee Shop,Playground,Music Venue,Music Store,Turkish Restaurant,English Restaurant,Donut Shop,Drugstore,Eastern European Restaurant
5,4,Minneapolis,Stevens Square/Loring Heights,Coffee Shop,Pharmacy,Park,Asian Restaurant,Fast Food Restaurant,Brewery,Bridal Shop,Liquor Store,Sandwich Place,Music Venue


## Discussion <a name = 'discussion'></a>

<p>The results show that certain neighborhoods between the cities have very similar venues. A cool observation shown through the maps is that central cities tend to share the same cluster. Sharing the same cluster makes sense because city centers tend to have a large amount of venues and can overlap due to their small size. The opposite is true as well for the outskirts of the cities. The outskirts tend to have less venues and of those venues, they tend to be very similar, i.e., parks, ball fields, and schools. </p>
<p>
The cosine similarity portion also showed a cool observation between the neighborhoods of Lowry Hill, Minneapolis and Willowdale, Newtonbrook, Toronto. As shown in the Cluster 0 top five similarities these two neighborhoods have exactly the same venues. The following code also shows this connection. All of the similar venues make a decent amount of sense to be there, but one that stuck out was the Ethiopian Restaurants. With some knowledge of Minneapolis, there is a great amount of Eastern African influence due to Somoli refugees, but it is still interesting that they both feature these types of restaurants.
</p>

In [80]:
new_neigh, top_ten = neigh_finder('Lowry Hill', clust_sim)
top_ten.head()

Unnamed: 0,Cluster Labels,City,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,Minneapolis,Lowry Hill,Park,Yoga Studio,Ethiopian Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant,Escape Room
1,0,Toronto,"Willowdale, Newtonbrook",Park,Yoga Studio,Ethiopian Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant,Escape Room
2,0,Toronto,York Mills West,Park,Convenience Store,Yoga Studio,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store,Elementary School,English Restaurant,Escape Room
3,0,Toronto,Rosedale,Park,Trail,Playground,Fish & Chips Shop,English Restaurant,Doner Restaurant,Fish Market,Donut Shop,Drugstore,Eastern European Restaurant
4,0,Toronto,Caledonia-Fairbanks,Park,Women's Store,Pool,Yoga Studio,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Electronics Store,Elementary School


## Conclusion <a name = 'conclusion'></a>

<p>	In this project, Foursquare venue category data was analyzed and similar neighborhoods were found. The goal for the project was to find similar neighborhoods in different cities. This would allow for potential homebuyers or renters to find similar neighborhoods in their new city and help to ease the transition. With the final neigh_finder function, the similar neighborhoods and their common venue categories can be shown to the interested parties. The solution is definitely a good first step in the search for a potential new home. Improvements that could be made are comparisons between housing costs, crime rates, venue ratings, or many others. All of these comparisons are very important to any home buyer, and may potentially have a greater impact depending on the severity. In the end, the final deliverable for this project completed its goal and provides a good overview of Toronto and the Twin Cities for potential home buyers and renters.  
</p>