# Finding the Best Neighborhood for an NFL Stadium in Portland, OR
## Applied Data Science Capstone
## David Johnson

### Introduction and Business Problem
The National Football League is a 32 team professional sports league that plays American football in the United States.  The NFL is very popular, with an average live attendance of 68,400 spectators per game.  This average revenue for an NFL team is around 450 million dollars, with approximately two thirds coming from television deals.  The remaining third, approximately 150 million dollars per year, comes from local business at the stadium, such as tickets, parking, and concessions.

Portland, Oregon, with a population of around 2.3 million, is one of the largest cities in the United States without an NFL team.   There is a potential business opportunity to build a football stadium in Portland, with the hopes of eventually gaining an NFL team.  But where should the stadium be built?

This report uses Data Science methods to answer the question of where the stadium should be built.  By evaluating which neighborhood in Portland is most similar to the neighborhoods around existing NFL stadiums in other cities we will recommend a construction location to the stakeholders.

### Data
Data sources for this project will include:

* List of neighborhoods in Portland, Oregon https://en.wikipedia.org/wiki/Neighborhoods_of_Portland,_Oregon
* List of current NFL stadiums https://en.wikipedia.org/wiki/List_of_current_National_Football_League_stadiums
* Foursquare data for venues https://foursquare.com/developers/apps

Since every city is different, we will use take the Foursquare data for venues near NFL stadiums and use a K-means algorithm to cluster these into groups.  We will then evaluate the neighborhoods in Portland to determine which neighborhood is the most similar to one of the previously determined groups.


In [2]:
#Import necessary libraries and functions for the analysis

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

import pandas as pd
import numpy as np
from bs4 import BeautifulSoup

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

In [134]:
#Scrape source data from wikipedia
portland_neighborhoods_website=requests.get('https://en.wikipedia.org/wiki/Neighborhoods_of_Portland,_Oregon').text
nfl_stadiums_website=requests.get('https://en.wikipedia.org/wiki/List_of_current_National_Football_League_stadiums').text

In [135]:
#Turn neighborhoods page into soup
neighborhoods_soup = BeautifulSoup(portland_neighborhoods_website,'html.parser')
#print(neighborhoods_soup.prettify())

In [136]:
#select all tables
test=neighborhoods_soup.find_all('table')

In [137]:
#scrape text from each line item in the table
df=[]
for row in test:
    df.append([t.text.strip() for t in row.find_all('li')])

In [138]:
#Select the first item from the list
neighborhoods=df[0]
#neighborhoods

In [139]:
#Remove ...(including ...) from some lines
neighborhoods_final = [item.split(' (incl', 1)[0] for item in neighborhoods]
#neighborhoods_final

In [140]:
#Replace names colum with corrected list
df_neighborhoods=pd.DataFrame(neighborhoods_final, columns=['Name'])
#df_neighborhoods

In [141]:
#Test the nominatim geolocator for finding lat and long of addresses
address = 'Lloyd District, OR'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 45.5313815, -122.6600824.


In [142]:
#Create list of neighborhood latitudes
lat=[]
for name in df_neighborhoods['Name']:
    lat.append(geolocator.geocode(name +', OR').latitude)

In [144]:
#create list of neighborhood longitudes
long=[]
for name in df_neighborhoods['Name']:
    long.append(geolocator.geocode(name +', OR').longitude)

In [145]:
#append the latitudes to the dataframe
df_neighborhoods['Latitude']=lat
#df_neighborhoods

In [146]:
#append the longitudes to the data frame
df_neighborhoods['Longitude']=long
#df_neighborhoods

In [147]:
df_neighborhoods

Unnamed: 0,Name,Latitude,Longitude
0,Arlington Heights,45.519496,-122.710667
1,Forest Park,45.561376,-122.758458
2,Goose Hollow,45.517749,-122.692819
3,Hillside,45.527439,-122.713120
4,Linnton,45.600330,-122.786779
5,Northwest District,45.533013,-122.698845
6,Northwest Heights,45.540806,-122.774354
7,Northwest Industrial,23.598351,58.273050
8,Old Town Chinatown,45.524934,-122.673516
9,Pearl District,45.529044,-122.681598


In [148]:
#Enter foursquare client ID and secret
CLIENT_ID='YMCAN240OIVX4VGMLAYYQ4BW4U3VQJUBVMEIECO2QGY4T4B4'
CLIENT_SECRET='AC20U32QMLJCV5IDYXYWQLIBD4YECIQGOAP1DAINSMQASR1L'
VERSION='20191012'

In [149]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [150]:
#function to get nearby venues
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            100)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [151]:
#Get venues near the center of each neighborhood
neighborhood_venues = getNearbyVenues(names=df_neighborhoods['Name'],
                                   latitudes=df_neighborhoods['Latitude'],
                                   longitudes=df_neighborhoods['Longitude']
                                  )
#neighborhood_venues

In [152]:
neighborhood_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Arlington Heights,45.519496,-122.710667,Portland Japanese Garden,45.519457,-122.706937,Garden
1,Arlington Heights,45.519496,-122.710667,International Rose Test Garden,45.519075,-122.705616,Botanical Garden
2,Arlington Heights,45.519496,-122.710667,Washington Park,45.517835,-122.705784,Park
3,Arlington Heights,45.519496,-122.710667,Umami Cafe,45.519022,-122.708381,Café
4,Arlington Heights,45.519496,-122.710667,Shakespeare Garden,45.518558,-122.704685,Garden
5,Arlington Heights,45.519496,-122.710667,Washington Park Amphitheater,45.519620,-122.705220,Amphitheater
6,Arlington Heights,45.519496,-122.710667,Portland Japanese Garden Gift Store,45.519099,-122.708644,Gift Shop
7,Arlington Heights,45.519496,-122.710667,Ellie M Hill Bonsai Terrace,45.518887,-122.708725,Botanical Garden
8,Arlington Heights,45.519496,-122.710667,Eggy Pocket,45.518956,-122.708292,Food Truck
9,Arlington Heights,45.519496,-122.710667,The Sand and Stone Garden,45.518439,-122.708433,Garden


In [153]:
# one hot encoding for each venue category
neighborhood_onehot = pd.get_dummies(neighborhood_venues[['Venue Category']], prefix="", prefix_sep="")
neighborhood_onehot=neighborhood_onehot.drop('Neighborhood',axis=1)

#add neighborhood column back to dataframe
neighborhood_onehot['Neighborhood'] = neighborhood_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [neighborhood_onehot.columns[-1]] + list(neighborhood_onehot.columns[:-1])
neighborhood_onehot = neighborhood_onehot[fixed_columns]

neighborhood_onehot.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,American Restaurant,Amphitheater,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,...,Video Store,Vietnamese Restaurant,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,Arlington Heights,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Arlington Heights,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Arlington Heights,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Arlington Heights,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Arlington Heights,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [154]:
#group neighborhoods by the total number of venues in each category - will continue after collecting similar
#infomation in the vicinity of each stadium
neighborhood_grouped=neighborhood_onehot.groupby('Neighborhood').sum().reset_index()
neighborhood_grouped

Unnamed: 0,Neighborhood,ATM,Accessories Store,American Restaurant,Amphitheater,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,...,Video Store,Vietnamese Restaurant,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,Alameda,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Arbor Lodge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Ardenwald-Johnson Creek,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Argay,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Arlington Heights,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Beaumont-Wilshire,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Boise,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
7,Brentwood-Darlington,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,Bridgeton,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Bridlemile,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0


In [155]:
#create soup object to find NFL stadiums
stadiums_soup = BeautifulSoup(nfl_stadiums_website,'html.parser')
#print(stadiums_soup.prettify())

In [156]:
#select tables
stadiums_table=stadiums_soup.find_all('table')
#stadiums_table

In [188]:
#Srape each row of the table
df=[]
for row in stadiums_table:
    df.append([t.text.strip() for t in row.find_all('th')])
#df

In [158]:
#The list of stadiums is the second element in the table list (first is headers)
new_stadiums=df[1]
#new_stadiums

In [159]:
#Select the stadiums
final_stadiums=pd.DataFrame(new_stadiums[9:40],columns=['Name'])
#final_stadiums

In [160]:
#Drop two stadiums that do not return a lat/long in nomanatim
cleaned_stadiums=final_stadiums.drop([4,5],axis=0)
#cleaned_stadiums

In [189]:
#get stadium latitudes
lat=[]
for name in cleaned_stadiums['Name']:
    #print(name)
    lat.append(geolocator.geocode(name).latitude)

In [190]:
#get stadium longitudes
long=[]
for name in cleaned_stadiums['Name']:
    long.append(geolocator.geocode(name).longitude)
    #print(name)

In [163]:
#combine dataframes
cleaned_stadiums['Latitude']=lat
cleaned_stadiums['Longitude']=long
cleaned_stadiums

Unnamed: 0,Name,Latitude,Longitude
0,Arrowhead Stadium,39.04894,-94.483003
1,AT&T Stadium,32.747842,-97.092844
2,Bank of America Stadium,35.225736,-80.853882
3,CenturyLink Field,47.595346,-122.331644
6,FedExField,38.907687,-76.864487
7,FirstEnergy Stadium,41.506056,-81.699712
8,Ford Field,42.339957,-83.045617
9,Gillette Stadium,42.091253,-71.264465
10,Hard Rock Stadium,25.95792,-80.238838
11,Heinz Field,40.446716,-80.015755


In [164]:
#Get venues near each stadium
stadium_venues = getNearbyVenues(names=cleaned_stadiums['Name'],
                                   latitudes=cleaned_stadiums['Latitude'],
                                   longitudes=cleaned_stadiums['Longitude']
                                  )
#stadium_venues

In [165]:
# one hot encoding for venue categories
stadium_onehot = pd.get_dummies(stadium_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
stadium_onehot['Neighborhood'] = stadium_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [stadium_onehot.columns[-1]] + list(stadium_onehot.columns[:-1])
stadium_onehot = stadium_onehot[fixed_columns]

stadium_onehot.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Airport Terminal,American Restaurant,Amphitheater,Antique Shop,Aquarium,Arcade,Art Gallery,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Winery,Wings Joint,Yoga Studio
0,Arrowhead Stadium,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Arrowhead Stadium,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Arrowhead Stadium,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Arrowhead Stadium,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Arrowhead Stadium,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [166]:
#group stadiums by sum of venue categories
stadium_grouped = stadium_onehot.groupby('Neighborhood').sum().reset_index()
stadium_grouped

Unnamed: 0,Neighborhood,ATM,Accessories Store,Airport Terminal,American Restaurant,Amphitheater,Antique Shop,Aquarium,Arcade,Art Gallery,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Winery,Wings Joint,Yoga Studio
0,AT&T Stadium,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Arrowhead Stadium,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
2,Bank of America Stadium,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
3,CenturyLink Field,0,0,0,2,0,1,0,0,1,...,0,0,0,0,0,1,0,0,0,0
4,FedExField,0,0,0,2,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
5,FirstEnergy Stadium,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Ford Field,0,0,0,4,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
7,Gillette Stadium,0,2,0,1,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
8,Hard Rock Stadium,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Heinz Field,0,0,0,2,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0


In [167]:
# Use Kmeans as a convienient way to find the "centroid" of the NFL stadium group.  This should represent
#the typical number and categories of venues near a stadium
kclusters = 1

stadium_grouped_clustering = stadium_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(stadium_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.cluster_centers_

array([[0.03448276, 0.06896552, 0.03448276, 1.03448276, 0.06896552,
        0.03448276, 0.06896552, 0.03448276, 0.17241379, 0.03448276,
        0.06896552, 0.96551724, 0.03448276, 0.03448276, 0.34482759,
        0.06896552, 0.06896552, 0.82758621, 0.24137931, 0.65517241,
        0.03448276, 0.27586207, 0.03448276, 0.13793103, 0.03448276,
        0.03448276, 0.03448276, 0.17241379, 0.03448276, 0.03448276,
        0.03448276, 0.06896552, 0.17241379, 0.06896552, 0.4137931 ,
        0.06896552, 0.06896552, 0.03448276, 0.06896552, 0.03448276,
        0.03448276, 0.03448276, 0.03448276, 0.31034483, 0.10344828,
        0.62068966, 0.03448276, 0.03448276, 0.06896552, 0.03448276,
        0.03448276, 0.03448276, 0.03448276, 0.03448276, 0.03448276,
        0.03448276, 0.10344828, 0.03448276, 0.03448276, 0.06896552,
        0.03448276, 0.17241379, 0.03448276, 0.03448276, 0.03448276,
        0.03448276, 0.03448276, 0.4137931 , 0.03448276, 0.06896552,
        0.20689655, 0.44827586, 2.75862069, 0.06

In [168]:
#convert output to data frame with same column names
out=pd.DataFrame(kmeans.cluster_centers_,columns=stadium_grouped_clustering.columns)

In [169]:
out

Unnamed: 0,ATM,Accessories Store,Airport Terminal,American Restaurant,Amphitheater,Antique Shop,Aquarium,Arcade,Art Gallery,Arts & Entertainment,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Winery,Wings Joint,Yoga Studio
0,0.034483,0.068966,0.034483,1.034483,0.068966,0.034483,0.068966,0.034483,0.172414,0.034483,...,0.068966,0.137931,0.034483,0.034483,0.034483,0.068966,0.068966,0.034483,0.068966,0.034483


In [170]:


#add neighborhood column back to dataframe
out['Neighborhood'] = ['Stadium'] 

# move neighborhood column to the first column
fixed_columns = [out.columns[-1]] + list(out.columns[:-1])
out = out[fixed_columns]

out

Unnamed: 0,Neighborhood,ATM,Accessories Store,Airport Terminal,American Restaurant,Amphitheater,Antique Shop,Aquarium,Arcade,Art Gallery,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Winery,Wings Joint,Yoga Studio
0,Stadium,0.034483,0.068966,0.034483,1.034483,0.068966,0.034483,0.068966,0.034483,0.172414,...,0.068966,0.137931,0.034483,0.034483,0.034483,0.068966,0.068966,0.034483,0.068966,0.034483


In [171]:
#identify and drop columns that are not in both data frames
diffs=out.columns.difference(neighborhood_grouped.columns)
diffs

Index(['Airport Terminal', 'Aquarium', 'Auditorium', 'Baseball Stadium',
       'Basketball Stadium', 'Beer Garden', 'Belgian Restaurant',
       'Boat or Ferry', 'Bowling Alley', 'Cha Chaan Teng',
       'College Football Field', 'College Rec Center', 'Convention Center',
       'Courthouse', 'Cycle Studio', 'Donburi Restaurant',
       'Dumpling Restaurant', 'Entertainment Service', 'Football Stadium',
       'General Entertainment', 'Health & Beauty Service', 'Hockey Arena',
       'Hospital', 'Hostel', 'Hotpot Restaurant', 'Indie Theater', 'Jazz Club',
       'Kitchen Supply Store', 'Metro Station', 'Mini Golf',
       'Miscellaneous Shop', 'Music School', 'Non-Profit', 'Office',
       'Opera House', 'Other Great Outdoors', 'Outdoors & Recreation',
       'Parking', 'Pedestrian Plaza', 'Piano Bar', 'Planetarium', 'Public Art',
       'Racetrack', 'River', 'Science Museum', 'Smoothie Shop', 'Snack Place',
       'Souvenir Shop', 'Tailor Shop', 'Tennis Stadium',
       'Theme Park R

In [172]:
common_stadium=out.drop(diffs,axis=1)
common_stadium

Unnamed: 0,Neighborhood,ATM,Accessories Store,American Restaurant,Amphitheater,Antique Shop,Arcade,Art Gallery,Arts & Entertainment,Asian Restaurant,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Yoga Studio
0,Stadium,0.034483,0.068966,1.034483,0.068966,0.034483,0.034483,0.172414,0.034483,0.068966,...,0.034483,0.068966,0.137931,0.034483,0.034483,0.034483,0.068966,0.068966,0.068966,0.034483


In [173]:
#identify and drop columns that are not present in both data frames
diffs2=neighborhood_grouped.columns.difference(out.columns)
diffs2

Index(['Art Museum', 'Arts & Crafts Store', 'Auto Dealership', 'Auto Garage',
       'Bagel Shop', 'Bed & Breakfast', 'Bike Rental / Bike Share',
       'Bike Shop', 'Bistro', 'Board Shop',
       ...
       'Tourist Information Center', 'Tram Station', 'Transportation Service',
       'Tunnel', 'Vacation Rental', 'Video Store', 'Weight Loss Center',
       'Whisky Bar', 'Women's Store', 'Zoo Exhibit'],
      dtype='object', length=127)

In [174]:
neighborhood_final=neighborhood_grouped.drop(diffs2,axis=1)
neighborhood_final

Unnamed: 0,Neighborhood,ATM,Accessories Store,American Restaurant,Amphitheater,Antique Shop,Arcade,Art Gallery,Arts & Entertainment,Asian Restaurant,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Yoga Studio
0,Alameda,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Arbor Lodge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Ardenwald-Johnson Creek,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Argay,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Arlington Heights,0,0,0,1,0,0,0,0,0,...,0,2,1,0,0,0,0,0,0,0
5,Beaumont-Wilshire,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Boise,0,0,1,0,0,0,0,0,1,...,1,0,0,3,0,0,0,0,0,1
7,Brentwood-Darlington,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,Bridgeton,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Bridlemile,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0


In [175]:
from scipy.spatial import distance

In [176]:
#drop name column to convert to numpy array
stadium_numpy=common_stadium.drop('Neighborhood',axis=1).to_numpy()

In [177]:
#drop name column to conver to numpy array
neighborhood_numpy=neighborhood_final.drop('Neighborhood',axis=1).to_numpy()

In [178]:
#test uclidian distance
distance.euclidean(stadium_numpy,neighborhood_numpy[0])

3.174849140198667

In [180]:
#iterate through the neighborhood array to find the distance from the nominal stadium neighborhood
dist=[]
for i in range(np.size(neighborhood_numpy,0)):
    dist.append(distance.euclidean(stadium_numpy,neighborhood_numpy[i]))

In [181]:
dist

[3.174849140198667,
 3.992710361212976,
 3.4805387257780804,
 4.324390907190587,
 6.089307601281136,
 3.8115538727780276,
 11.983192748275835,
 3.228698599935756,
 3.573418844109099,
 3.2072673126423923,
 4.387719322851413,
 10.200254129402833,
 8.700595193329743,
 3.5830556446523945,
 2.967120657144683,
 5.855472962515014,
 7.052293206951433,
 3.108998867734104,
 3.6118117758819497,
 2.961304144365115,
 2.961304144365115,
 4.959527038169119,
 3.0978877388198818,
 2.967120657144683,
 3.7934169149470534,
 2.978719609378512,
 4.481034151066048,
 2.961304144365115,
 8.026487067779223,
 2.961304144365115,
 3.7843158395362617,
 3.103448275862069,
 4.647247404897931,
 4.609997923657196,
 9.092077538141211,
 3.415534212911933,
 9.287199839355438,
 6.368863767086706,
 3.3851111367055293,
 4.903588920821108,
 5.733478046639411,
 5.7364843998669395,
 5.096687040338866,
 4.549764468430495,
 3.3026130972181353,
 12.430926873129339,
 3.4054233866482093,
 3.007521289101629,
 3.7060544352199303,
 3.1

In [182]:
#find the closest neighborhood to the stadium neighborhood
min(dist)

2.961304144365115

In [183]:
#find the closest neighborhood to the stadium neighborhood
dist.index(min(dist))

19

In [184]:
#East Columbia is the most similar neighborhood
neighborhood_final.iloc[19]

Neighborhood                       East Columbia
ATM                                            0
Accessories Store                              0
American Restaurant                            0
Amphitheater                                   0
Antique Shop                                   0
Arcade                                         0
Art Gallery                                    0
Arts & Entertainment                           0
Asian Restaurant                               0
Athletics & Sports                             0
Automotive Shop                                0
BBQ Joint                                      0
Bakery                                         0
Bank                                           0
Bar                                            0
Baseball Field                                 0
Basketball Court                               0
Beer Bar                                       0
Beer Store                                     0
Big Box Store       

In [185]:
df_neighborhoods.loc[df_neighborhoods['Name']=='East Columbia']

Unnamed: 0,Name,Latitude,Longitude
32,East Columbia,45.593837,-122.663537


In [186]:
#Draw East Columbia on a map
map_portland = folium.Map(location=[df_neighborhoods.loc[32,'Latitude'],df_neighborhoods.loc[32,'Longitude']], zoom_start=15)
folium.CircleMarker(
        [df_neighborhoods.loc[32,'Latitude'],df_neighborhoods.loc[32,'Longitude']],
        radius=5,
        popup='East Columbia',
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_portland)

<folium.features.CircleMarker at 0x7f5ff43b3128>

In [187]:
map_portland
#This is the recommended location to build an NFL stadium