### Coursera IBM Data Science Capstone
# The Battle of Neighborhoods

This Jupyter notebook will be used to complete my Coursera Capstone Project.

### Problem & Discussion
Many people are forced to relocate for their careers and/or choose to move to a new city for better opportunities. However, most still have a home neighborhood they are fond of, or a favorite neighborhood from their hometown. This notebook will attempt to characterize the venues of a chosen neighborhood from one city and compare it to all neighborhoods of a different city, returning the neighborhood with the highest similarity.

Beyond the personal use of this notebook, finding a neighborhood in a new city that is most similar to a favorite neighborhood back home, this workflow has business application as well.

Say a company is looking to expand into a new city. By characterizing the area around an already high-performing location, and comparing that characterization against all neighborhoods in the new city, the business might be able to predict which new locations would be most likely to perform well. Similarly, by characterizing very low-performing locations and utilizing the same workflow, it might be possible to predict which neighborhoods should be avoided. 

### Description of Data & Execution
For this project, I will select a neighborhood from Houston, TX and a list of all neighborhoods from Seattle, WA. These will be merged into a single DataFrame and analyzed using venue data from Foursquare. I will then run a clustering algorithm on the DataFrame and return the cluster that the Houston neighborhood belongs to. Finally, I will produce a map that depicts all neighborhoods in Seattle within the same cluster as the input neighborhood from Houston.

Neighborhood names for Houston and Seattle will be obtained from https://en.wikipedia.org/wiki/List_of_Houston_neighborhoods and https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Seattle, respectively. Location data for each neighborhood will be gathered by iteratively feeding each neighborhood name into the geocoder tool. Once neighborhood locations are defined, each lat-long point can be queried in Foursquare for venue information, one-hot encoded, and clustered.




In [1]:
# Install dependencies

!pip install BeautifulSoup4
!pip install lxml

import pandas as pd
import numpy as np
from urllib.request import urlopen
from bs4 import BeautifulSoup


# Install additional dependencies

!conda install -c conda-forge geopy --yes
!pip install geocoder

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import json
import requests
from geopy.geocoders import Nominatim
import geocoder

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium


print("Libraries imported successfully")

Collecting BeautifulSoup4
[?25l  Downloading https://files.pythonhosted.org/packages/66/25/ff030e2437265616a1e9b25ccc864e0371a0bc3adb7c5a404fd661c6f4f6/beautifulsoup4-4.9.1-py3-none-any.whl (115kB)
[K     |████████████████████████████████| 122kB 5.9MB/s eta 0:00:01
[?25hCollecting soupsieve>1.2 (from BeautifulSoup4)
  Downloading https://files.pythonhosted.org/packages/6f/8f/457f4a5390eeae1cc3aeab89deb7724c965be841ffca6cfca9197482e470/soupsieve-2.0.1-py3-none-any.whl
Installing collected packages: soupsieve, BeautifulSoup4
Successfully installed BeautifulSoup4-4.9.1 soupsieve-2.0.1
Collecting lxml
[?25l  Downloading https://files.pythonhosted.org/packages/55/6f/c87dffdd88a54dd26a3a9fef1d14b6384a9933c455c54ce3ca7d64a84c88/lxml-4.5.1-cp36-cp36m-manylinux1_x86_64.whl (5.5MB)
[K     |████████████████████████████████| 5.5MB 8.6MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.5.1
Collecting package metadata (current_repodata.json): done
Solving en

In [2]:
# Define function to fetch html content from url using BeautifulSoup

def HTMLContent(url):
    html = urlopen(url)
    soup = BeautifulSoup(html, 'lxml')
    return soup

In [4]:
# First, we will gather all neighborhood names for Seattle

# Parse url to html
table = HTMLContent('https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Seattle').find('table')

# Create empty DataFrame
column_labels = ['Neighborhood', 'District', 'Annexed', 'Loc_Map', 'Street_Map', 'Image', 'Notes']
neighborhoods_SEA = pd.DataFrame(columns = column_labels)

# Parse html to DataFrame
rows = table.find_all('tr')
for row in rows:
    data = []
    cells = row.find_all('td')
    for cell in cells:
        data.append(cell.text.strip())
    if len(data)==7:
        neighborhoods_SEA.loc[len(neighborhoods_SEA)] = data

# Clean DataFrame by dropping unneeded columns and formatting text 
neighborhoods_SEA = neighborhoods_SEA.drop(['Annexed', 'Loc_Map', 'Street_Map', 'Image', 'Notes'], axis = 1)
neighborhoods_SEA['District'] = neighborhoods_SEA['District'].str.replace(r"\[.*\]","")
neighborhoods_SEA['Neighborhood'] = neighborhoods_SEA['Neighborhood'].str.replace(r"\[.*\]","")

neighborhoods_SEA.head()

Unnamed: 0,Neighborhood,District
0,North Seattle,Seattle
1,Broadview,North Seattle
2,Bitter Lake,North Seattle
3,North Beach / Blue Ridge,North Seattle
4,Crown Hill,North Seattle


In [6]:
# Next we iterate through the list of neighborhoods to obtain lat-lng coordinates of each neighborhood

# Define empty dataframe to fill with lat-lng data
locations_SEA = pd.DataFrame(columns = ['Neighborhood'])

# Make each neighborhood name able to be queried by Nominatim
addresses = neighborhoods_SEA['Neighborhood'] + ', Seattle, WA'

# Iterate through neighborhoods, finding lat-lng from location data via Nominatim and appending it to the empty dataframe
for i, address in enumerate(addresses):
    neigh = neighborhoods_SEA.loc[i, 'Neighborhood']
    
    geolocator = Nominatim(user_agent="GeoLoc_Agent")
    location = geolocator.geocode(address)
    
    if location is None:
        print("Location information for", neigh, "not unavailable")
    
    else:
        Neighborhood_lat = location.latitude
        Neighborhood_lng = location.longitude

        locations_SEA = locations_SEA.append({'Neighborhood': neigh,
                                              'Latitude': Neighborhood_lat,
                                              'Longitude': Neighborhood_lng},
                                             ignore_index = True)

locations_SEA.head(10)


Location information for North Beach / Blue Ridge not unavailable
Location information for North College Park
(Licton Springs) not unavailable
Location information for Portage Bay / Roanoke not unavailable
Location information for Pike-Pine Corridor / Pike/Pine not unavailable
Location information for International District ("ID") not unavailable
Location information for Central Area / Central District ("CD") not unavailable
Location information for Cherry Hill & Squire Park not unavailable
Location information for South End not unavailable
Location information for Dunlap / Othello not unavailable
Location information for Rainier Beach / Atlantic City Beach not unavailable
Location information for Mid Beacon Hill (Maplewood) not unavailable
Location information for South Beacon Hill / Van Asselt not unavailable
Location information for Industrial District not unavailable
Location information for North Admiral / Admiral District not unavailable
Location information for Junction / West S

Unnamed: 0,Neighborhood,Latitude,Longitude
0,North Seattle,47.660773,-122.291497
1,Broadview,47.72232,-122.360407
2,Bitter Lake,47.726236,-122.348764
3,Crown Hill,47.694715,-122.371459
4,Greenwood,47.690981,-122.354877
5,Northgate,47.713153,-122.321231
6,Haller Lake,47.719748,-122.333751
7,Pinehurst,47.603832,-122.330062
8,Maple Leaf,47.693987,-122.322905
9,Lake City,47.719162,-122.295494


In [7]:
# Check the length of the dataframes with and without lat-lng data. The difference (16) indicates the number of neighborhoods that Nominatim could not find
print(len(neighborhoods_SEA))
print(len(locations_SEA))

127
111


In [11]:
# Obtain lat long of the Houston Heights neighborhood in Houston, TX. To query against a different neighborhood, change the string of Input_neighborhood

Input_neighborhood = 'Houston Heights, Houston, TX'

geolocator = Nominatim(user_agent="GeoLoc_Agent")
location = geolocator.geocode(Input_neighborhood)
Input_neighborhood_lat = location.latitude
Input_neighborhood_lng = location.longitude

# We create a dataframe to hold the information for our Input Neighborhood and place it at index 0 so it's visible using .head()
Input_df=pd.DataFrame({'Neighborhood': Input_neighborhood, 'Latitude': Input_neighborhood_lat, 'Longitude': Input_neighborhood_lng}, index = [0])

print('The geograpical coordinates of', Input_neighborhood, 'are {}, {}.'.format(Input_neighborhood_lat, Input_neighborhood_lng))

# and then merge it with the Seattle locations dataframe 
Neighborhoods_merged = pd.concat([Input_df, locations_SEA]).reset_index(drop = True)

Neighborhoods_merged.head()

The geograpical coordinates of Houston Heights, Houston, TX are 29.797687, -95.3984463.


Unnamed: 0,Neighborhood,Latitude,Longitude
0,"Houston Heights, Houston, TX",29.797687,-95.398446
1,North Seattle,47.660773,-122.291497
2,Broadview,47.72232,-122.360407
3,Bitter Lake,47.726236,-122.348764
4,Crown Hill,47.694715,-122.371459


In [15]:
# Define function to get venues for each neighborhood, limited to 1000m radius around each neighborhood center and limited to 100 venues

def getVenues(names, latitudes, longitudes, radius = 1000, LIMIT = 100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]["groups"][0]["items"]
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [16]:
# We will now use the defined function getVenues to find the venues associated with each neighborhood in our merged dataframe

# Define Foursquare credentials
CLIENT_ID = '2CLIUGMZWC4MRQLV0KFN3XHYVP1MOGXBA1GPNPKPAYT31CZ1' # your Foursquare ID
CLIENT_SECRET = 'BMDOR4GEERRYYTA32RGGKVBQ3GMOHZ2NX02T1P33RQSFDVR0' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

# Execute function getVenues for each neighborhood 
df_venues = getVenues(names = Neighborhoods_merged['Neighborhood'],
                      latitudes = Neighborhoods_merged['Latitude'],
                      longitudes = Neighborhoods_merged['Longitude']
                     )

Houston Heights, Houston, TX
North Seattle
Broadview
Bitter Lake
Crown Hill
Greenwood
Northgate
Haller Lake
Pinehurst
Maple Leaf
Lake City
Cedar Park
Matthews Beach
Meadowbrook
Olympic Hills
Victory Heights
Wedgwood
View Ridge
Sand Point
Roosevelt
Ravenna
Bryant
Windermere
Hawthorne Hills
Laurelhurst
University District (U District)
University Village
Wallingford
Northlake
Green Lake
Fremont
Phinney Ridge
Ballard
West Woodland
Whittier Heights
Adams
Sunset Hill
Loyal Heights
Central Seattle
Magnolia
Lawton Park
Briarcliff
Southeast Magnolia
Interbay
Queen Anne
North Queen Anne
East Queen Anne
Lower Queen Anne
West Queen Anne
Capitol Hill
Broadway
Montlake
Stevens
Interlaken
Madison Valley
Renton Hill
Madison Park
Broadmoor
Lake Union
South Lake Union, Seattle
Cascade, Seattle
Westlake
Eastlake
Downtown
Denny Triangle
Belltown
Pike-Market
Central Business District
First Hill
Pioneer Square
Yesler Terrace
Central Waterfront
West Edge
Mann
Minor
Atlantic
Judkins Park
Madrona
Madrona Valle

In [17]:
# Get dummies for venue categories
venue_dummies = pd.get_dummies(df_venues[['Venue Category']], prefix="", prefix_sep="")

# Add neighborhood column to venue_dummies
venue_dummies.drop(['Neighborhood'], axis = 1, inplace = True)
venue_dummies.insert(0, 'Neighborhood', df_venues['Neighborhood'])

# Group venues by neighborhood
venues_grouped = venue_dummies.groupby('Neighborhood').mean().reset_index()

# Determine the index location of the Input neighborhood
print(venues_grouped[venues_grouped["Neighborhood"] == Input_neighborhood].index.values)

# and move the Input neighborhood back to index 0 so it's easy to check
pd.concat([venues_grouped.iloc[[44],:], venues_grouped.drop([44], axis=0)], axis=0)

[44]


Unnamed: 0,Neighborhood,ATM,Accessories Store,Adult Boutique,African Restaurant,Airport,Airport Terminal,Alternative Healer,American Restaurant,Animal Shelter,...,Warehouse Store,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
44,"Houston Heights, Houston, TX",0.0,0.00,0.0,0.000000,0.0,0.0,0.0,0.012500,0.0,...,0.00,0.00,0.0,0.000000,0.000000,0.0,0.0125,0.000000,0.000000,0.000000
0,Adams,0.0,0.00,0.0,0.017241,0.0,0.0,0.0,0.017241,0.0,...,0.00,0.00,0.0,0.000000,0.017241,0.0,0.0000,0.017241,0.000000,0.000000
1,Alki Point,0.0,0.00,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,...,0.00,0.00,0.0,0.000000,0.000000,0.0,0.0000,0.000000,0.000000,0.000000
2,Arbor Heights,0.0,0.00,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,...,0.00,0.00,0.0,0.000000,0.000000,0.0,0.0000,0.000000,0.000000,0.000000
3,Atlantic,0.0,0.01,0.0,0.000000,0.0,0.0,0.0,0.020000,0.0,...,0.01,0.00,0.0,0.000000,0.000000,0.0,0.0000,0.000000,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
107,West Woodland,0.0,0.00,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,...,0.00,0.00,0.0,0.011236,0.000000,0.0,0.0000,0.000000,0.011236,0.179775
108,Westlake,0.0,0.01,0.0,0.000000,0.0,0.0,0.0,0.030000,0.0,...,0.00,0.01,0.0,0.010000,0.010000,0.0,0.0100,0.020000,0.000000,0.000000
109,Whittier Heights,0.0,0.00,0.0,0.000000,0.0,0.0,0.0,0.011236,0.0,...,0.00,0.00,0.0,0.000000,0.011236,0.0,0.0000,0.011236,0.000000,0.000000
110,Windermere,0.0,0.00,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,...,0.00,0.00,0.0,0.000000,0.000000,0.0,0.0000,0.000000,0.000000,0.000000


In [18]:
# Define function to get top ranked venues
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [19]:
# Define the number of venues to use when characterizing a neighborhood
num_venues = 20

indicators = ['st', 'nd', 'rd']

# Loop to create columns for specifiednumber of top venues
columns = ['Neighborhood']
for ind in np.arange(num_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# Create a new dataframe to house the most common venues
venues_grouped_sorted = pd.DataFrame(columns=columns)
venues_grouped_sorted['Neighborhood'] = venues_grouped['Neighborhood']

for ind in np.arange(venues_grouped.shape[0]):
    venues_grouped_sorted.iloc[ind, 1:] = return_most_common_venues(venues_grouped.iloc[ind, :], num_venues)

# Checking the most common venues for our Input neighborhood to make sure they are reasonable (in this case, they are)
venues_grouped_sorted.iloc[44]

Neighborhood              Houston Heights, Houston, TX
1st Most Common Venue                      Coffee Shop
2nd Most Common Venue                     Burger Joint
3rd Most Common Venue                             Park
4th Most Common Venue                         Pharmacy
5th Most Common Venue                      Flower Shop
6th Most Common Venue           Furniture / Home Store
7th Most Common Venue          New American Restaurant
8th Most Common Venue           Thrift / Vintage Store
9th Most Common Venue               Mexican Restaurant
10th Most Common Venue                           Trail
11th Most Common Venue                           Diner
12th Most Common Venue                       Pet Store
13th Most Common Venue                  Sandwich Place
14th Most Common Venue                       Gift Shop
15th Most Common Venue                  Cosmetics Shop
16th Most Common Venue                             Spa
17th Most Common Venue                     Pizza Place
18th Most 

In [20]:
# Now we will cluster all neighborhoods in the dataset using k-means

# Set number of clusters. In this case, we will use a large number of clusters to limit the number of matches to our input neighborhood
kclusters = 25

# First we create a new dataframe and drop the neighborhood labels, which would impede the clustering
venues_clustering = venues_grouped.drop('Neighborhood', 1)

# Run clustering algorithm on all locations
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(venues_clustering)

# Add cluster number to each row
venues_grouped_sorted.insert(0, 'Cluster', kmeans.labels_)

mapping_locations = Neighborhoods_merged

# Merge the neighborhood-grouped venues dataframe with the locations dataframe to link data
mapping_locations = mapping_locations.join(venues_grouped_sorted.set_index('Neighborhood'), on='Neighborhood')

In [21]:
# Check to ensure that lat-lng and venue data are all present
mapping_locations.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,...,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,"Houston Heights, Houston, TX",29.797687,-95.398446,24,Coffee Shop,Burger Joint,Park,Pharmacy,Flower Shop,Furniture / Home Store,...,Diner,Pet Store,Sandwich Place,Gift Shop,Cosmetics Shop,Spa,Pizza Place,Italian Restaurant,Movie Theater,Indian Chinese Restaurant
1,North Seattle,47.660773,-122.291497,24,Coffee Shop,Furniture / Home Store,Clothing Store,Women's Store,Arts & Crafts Store,Pizza Place,...,New American Restaurant,Mobile Phone Shop,Electronics Store,Cosmetics Shop,Brewery,Burger Joint,Ice Cream Shop,Café,Shopping Plaza,College Baseball Diamond
2,Broadview,47.72232,-122.360407,12,Furniture / Home Store,Pizza Place,Trail,Food Truck,Antique Shop,Sushi Restaurant,...,Convenience Store,Fabric Shop,Fast Food Restaurant,Drugstore,Dry Cleaner,Dumpling Restaurant,Eastern European Restaurant,Field,Electronics Store,Farmers Market
3,Bitter Lake,47.726236,-122.348764,4,Hotel,Sushi Restaurant,Fast Food Restaurant,Asian Restaurant,Vietnamese Restaurant,Thai Restaurant,...,Marijuana Dispensary,Food Truck,Sandwich Place,Soccer Field,Casino,Noodle House,Rental Car Location,Chinese Restaurant,Café,Grocery Store
4,Crown Hill,47.694715,-122.371459,1,Food Truck,Pizza Place,Coffee Shop,Mexican Restaurant,Greek Restaurant,Gas Station,...,Flower Shop,Mobile Phone Shop,Beer Bar,Beer Store,Sports Bar,Rental Car Location,Moroccan Restaurant,Pub,Taco Place,Fast Food Restaurant


In [27]:
# Great! We now have each neighborhood in Seattle, as well as our Input neighborhood, with location information and cluster based on venue occurence

Houston_cluster = mapping_locations[mapping_locations["Cluster"] == 24]
Houston_cluster

# The final part of the analysis will be to map all locations within Cluster 1, the cluster that our Input neighborhood appears in

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,...,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,"Houston Heights, Houston, TX",29.797687,-95.398446,24,Coffee Shop,Burger Joint,Park,Pharmacy,Flower Shop,Furniture / Home Store,...,Diner,Pet Store,Sandwich Place,Gift Shop,Cosmetics Shop,Spa,Pizza Place,Italian Restaurant,Movie Theater,Indian Chinese Restaurant
1,North Seattle,47.660773,-122.291497,24,Coffee Shop,Furniture / Home Store,Clothing Store,Women's Store,Arts & Crafts Store,Pizza Place,...,New American Restaurant,Mobile Phone Shop,Electronics Store,Cosmetics Shop,Brewery,Burger Joint,Ice Cream Shop,Café,Shopping Plaza,College Baseball Diamond
25,University District (U District),47.661191,-122.292083,24,Coffee Shop,Furniture / Home Store,Clothing Store,Arts & Crafts Store,Pizza Place,Italian Restaurant,...,Thai Restaurant,Mobile Phone Shop,Electronics Store,Lingerie Store,Cosmetics Shop,Burger Joint,Ice Cream Shop,College Science Building,Ramen Restaurant,Salad Place
26,University Village,47.66274,-122.298925,24,Pizza Place,Arts & Crafts Store,Burger Joint,Thai Restaurant,Coffee Shop,Italian Restaurant,...,Brewery,Electronics Store,Cosmetics Shop,Clothing Store,Mexican Restaurant,Sandwich Place,Lingerie Store,Ice Cream Shop,Ramen Restaurant,College Science Building


In [23]:
# Obtain lat-lng of Seattle for Folium Map

geolocator = Nominatim(user_agent="GeoLoc_Agent")
location = geolocator.geocode("Seattle, WA")
Seattle_lat = location.latitude
Seattle_lng = location.longitude
print('The geograpical coordinate of Seattle are {}, {}.'.format(Seattle_lat, Seattle_lng))

The geograpical coordinate of Seattle are 47.6038321, -122.3300624.


In [26]:
# create map
map_clusters = folium.Map(location=[Seattle_lat, Seattle_lng], zoom_start = 12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Houston_cluster['Latitude'], Houston_cluster['Longitude'], Houston_cluster['Neighborhood'], Houston_cluster['Cluster']):
    label = folium.Popup(str(poi), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [None]:
# There we have it! We have identified the neighborhoods in Seattle most similar to our input neighborhood, Houston Heights.

# If we were looking to move from Houston Heights to the city of Seattle, we now know exactly where to begin our search for new housing.

# Similarly, if we were seeking to expand our business from Houston to Seattle, and knew that locations within Houston Heights performed well, we would know which neighborhoods would also likely support our business.