# Data Science Capstone Project - The Battle of the Neighborhoods (Week 2 Assignment)
## IBM Coursera Data Science Professional Certificate 

### Introduction: Business Problem

Location data can be very useful for people moving to a new city to find an area that is very similar to their old neighborhood in their old city. They can know before they move that their new area has resources and venues that will interest them and help them adjust to their new environment more quickly because it is actually familiar and comfortable. This is of personal interest to this data scientist who is looking to relocate to a new city soon.

For someone who currently lives in one city, how can we find the most similar neighborhood in a another city based on the kinds of venues that are most common in that neighborhood? 

In this project, we will explore this question - specifically for someone who currently lives in __Charleston, South Carolina__, is moving to __Pittsburgh, Pennsylvania__, and is interested in neighborhood similarity based on the number of __craft breweries__ in that neighborhood. Additionally, we will look at median home values in those neighborhoods to better inform our decision about where to relocate with a home-buying budget of USD $300,000.

### Data 

Based on our business problem and specific criteria to be looked at, we need data for the following factors:
* Neighborhoods in Charleston, SC
* Neighborhoods in Pittsburgh, PA
* The number of breweries per neighborhood in Charleston, SC
* The number of breweries per neighborhood in Pittsburgh, PA
* Median home values per neighborhood in Charleston, SC
* Median home values per neighborhood in Pittsburgh, PA


The following data sources will be needed to extract/generate the relevant data:
* Neighborhoods in Charleston & Pittsburgh with relevant median home value data will be obtained from [Zillow's publicly available database](https://www.zillow.com/research/data/)

   * Under Housing Data, we will export ZHVI Single-Family Homes Time Series ($) based on Neighborhood; it is exported & automatically downloaded in .csv format   

* Venues categorized as breweries in each relevant neighborhood will be obtained using __Foursquare API__

#### Neighborhoods and Median Home Value Data

As mentioned above, data extracted from Zillow's research data is downloaded as a .csv file. We will import this file and convert to a Pandas DataFrame, then clean the data to only include our relevant neighborhoods. We will complete this process twice, once for Charleston neighborhoods, once for Pittsburgh neighborhoods.

In [1]:
#import relevant libraries
import pandas as pd
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

In [2]:
# The code was removed by Watson Studio for sharing.

In [3]:
#uploaded .csv file downloaded from Zillow into IBM Cloud Object Storage & inserted into notebook using my credentials in hidden cell above
#convert file to dataframe and inspect number of neighborhoods included in whole database
zillow_zhvi = pd.read_csv(body)
print('there are',zillow_zhvi.shape[0],'neighborhoods in this dataframe')
zillow_zhvi.head()

there are 16120 neighborhoods in this dataframe


Unnamed: 0,RegionID,SizeRank,RegionName,RegionType,StateName,State,City,Metro,CountyName,1996-01-31,...,2020-02-29,2020-03-31,2020-04-30,2020-05-31,2020-06-30,2020-07-31,2020-08-31,2020-09-30,2020-10-31,2020-11-30
0,274772,0,Northeast Dallas,Neighborhood,TX,TX,Dallas,Dallas-Fort Worth-Arlington,Dallas County,155511.0,...,366864.0,367041.0,368564.0,370745.0,373354.0,376478.0,379331.0,382966.0,388495.0,394667.0
1,112345,1,Maryvale,Neighborhood,AZ,AZ,Phoenix,Phoenix-Mesa-Scottsdale,Maricopa County,,...,191287.0,193544.0,196301.0,199044.0,201852.0,204835.0,208309.0,212375.0,216496.0,221432.0
2,192689,2,Paradise,Neighborhood,NV,NV,Las Vegas,Las Vegas-Henderson-Paradise,Clark County,151416.0,...,301051.0,303430.0,305488.0,306858.0,307410.0,308912.0,311808.0,315707.0,318784.0,321558.0
3,270958,3,Upper West Side,Neighborhood,NY,NY,New York,New York-Newark-Jersey City,New York County,,...,3571704.0,3551526.0,3534224.0,3521037.0,3536661.0,3535563.0,3558189.0,3573975.0,3602224.0,3617474.0
4,118208,4,South Los Angeles,Neighborhood,CA,CA,Los Angeles,Los Angeles-Long Beach-Anaheim,Los Angeles County,,...,533204.0,538849.0,543582.0,546210.0,549003.0,554329.0,561817.0,569292.0,575813.0,581009.0


We can see that the raw data downloaded from Zillow includes information on 16,120 neighborhoods. Now, let's focus only on the neighborhoods located in Charleston, South Carolina. The Zillow data also includes time series data starting from 1996 up through November 2020. Since we are looking at relocating in the near future, we do not need to pay attention to the time series data, so we will drop those columns from our focused dataframe. 

In [4]:
chs_zhvi=zillow_zhvi[zillow_zhvi['City']=='Charleston'].reset_index(drop=True)
chs_zhvi=chs_zhvi[chs_zhvi['State']=='SC'].reset_index(drop=True)
chs_zhvi=chs_zhvi.loc[:,['RegionName','State','City','Metro','2020-11-30']]
chs_zhvi.rename(columns={'2020-11-30':'Median Home Value 2020-11'},inplace=True)
chs_zhvi['neighborhood']=chs_zhvi[['RegionName', 'State']].agg(', '.join, axis=1)
print('there are',chs_zhvi.shape[0],"neighborhoods in Zillow's Home Index database for Charleston, SC")
chs_zhvi.head()

there are 26 neighborhoods in Zillow's Home Index database for Charleston, SC


Unnamed: 0,RegionName,State,City,Metro,Median Home Value 2020-11,neighborhood
0,Harleston Village,SC,Charleston,Charleston-North Charleston,892637.0,"Harleston Village, SC"
1,Daniel Island,SC,Charleston,Charleston-North Charleston,876967.0,"Daniel Island, SC"
2,Cannonborough-Elliottbororugh,SC,Charleston,Charleston-North Charleston,568652.0,"Cannonborough-Elliottbororugh, SC"
3,Shadowmoss,SC,Charleston,Charleston-North Charleston,314499.0,"Shadowmoss, SC"
4,Wagener Terrace,SC,Charleston,Charleston-North Charleston,564916.0,"Wagener Terrace, SC"


We extracted information for 26 neighborhoods in Charleston.

Now we focus on the neighborhoods located only in Pittsburgh, Pennsylvania. We will also drop the time series data from this focused dataframe.

In [5]:
pit_zhvi=zillow_zhvi[zillow_zhvi['City']=='Pittsburgh'].reset_index(drop=True)
pit_zhvi=pit_zhvi[pit_zhvi['State']=='PA'].reset_index(drop=True)
pit_zhvi=pit_zhvi.loc[:,['RegionName','State','City','Metro','2020-11-30']]
pit_zhvi.rename(columns={'2020-11-30':'Median Home Value 2020-11'},inplace=True)
pit_zhvi['neighborhood']=pit_zhvi[['RegionName', 'State']].agg(', '.join, axis=1)
print('there are',pit_zhvi.shape[0],"neighborhoods in Zillow's Home Index database for Pittsburgh, PA")
pit_zhvi.head()

there are 77 neighborhoods in Zillow's Home Index database for Pittsburgh, PA


Unnamed: 0,RegionName,State,City,Metro,Median Home Value 2020-11,neighborhood
0,Mount Lebanon,PA,Pittsburgh,Pittsburgh,354231.0,"Mount Lebanon, PA"
1,Squirrel Hill South,PA,Pittsburgh,Pittsburgh,442777.0,"Squirrel Hill South, PA"
2,Shadyside,PA,Pittsburgh,Pittsburgh,612067.0,"Shadyside, PA"
3,Brookline,PA,Pittsburgh,Pittsburgh,163387.0,"Brookline, PA"
4,Squirrel Hill North,PA,Pittsburgh,Pittsburgh,669792.0,"Squirrel Hill North, PA"


We extracted information for 77 neighborhoods in Pittsburgh.

Now let's map the neighborhoods in each city. First, we need to find the latitude & longitude of each neighborhood's center. We'll use Nominatim to do this.

In [6]:
! pip install geopy
from geopy.geocoders import Nominatim
import itertools

chs_lat=[]
chs_long=[]
for index, row in chs_zhvi.iterrows():
    try:
        address=row['neighborhood']
        geolocator=Nominatim(user_agent='chs_explorer')
        location=geolocator.geocode(address)
        lat = location.latitude
        long = location.longitude 
        chs_lat.append(lat)
        chs_long.append(long)
    except:
        chs_lat.append('N/A')
        chs_long.append('N/A')
chs_zhvi['lat']=chs_lat
chs_zhvi['long']=chs_long

chs_zhvi.head()



Unnamed: 0,RegionName,State,City,Metro,Median Home Value 2020-11,neighborhood,lat,long
0,Harleston Village,SC,Charleston,Charleston-North Charleston,892637.0,"Harleston Village, SC",32.7781,-79.9435
1,Daniel Island,SC,Charleston,Charleston-North Charleston,876967.0,"Daniel Island, SC",32.8591,-79.912
2,Cannonborough-Elliottbororugh,SC,Charleston,Charleston-North Charleston,568652.0,"Cannonborough-Elliottbororugh, SC",,
3,Shadowmoss,SC,Charleston,Charleston-North Charleston,314499.0,"Shadowmoss, SC",32.844,-80.0651
4,Wagener Terrace,SC,Charleston,Charleston-North Charleston,564916.0,"Wagener Terrace, SC",33.6524,-81.3612


Looks like some neighborhoods didn't return latitude & longitude data. Let's look at those neighborhoods and inspect why they didn't return any information.

In [7]:
chs_zhvi.loc[chs_zhvi['lat'] == 'N/A']

Unnamed: 0,RegionName,State,City,Metro,Median Home Value 2020-11,neighborhood,lat,long
2,Cannonborough-Elliottbororugh,SC,Charleston,Charleston-North Charleston,568652.0,"Cannonborough-Elliottbororugh, SC",,
15,South Windenere,SC,Charleston,Charleston-North Charleston,627225.0,"South Windenere, SC",,


These neighborhoods were misspelled! Let's fix that & try to find their coordinates again.

In [None]:
chs_zhvi['neighborhood']=chs_zhvi['neighborhood'].str.replace('Cannonborough-Elliottbororugh, SC','Cannonborough-Elliotborough, SC')
chs_zhvi['RegionName']=chs_zhvi['RegionName'].str.replace('Cannonborough-Elliottbororugh','Cannonborough-Elliotborough')
chs_zhvi['neighborhood']=chs_zhvi['neighborhood'].str.replace('South Windenere, SC','South Windermere, SC')
chs_zhvi['RegionName']=chs_zhvi['RegionName'].str.replace('South Windenere','South Windermere')

chs_lat=[]
chs_long=[]
for index, row in chs_zhvi.iterrows():
    try:
        address=row['neighborhood']
        geolocator=Nominatim(user_agent='chs_explorer')
        location=geolocator.geocode(address)
        lat = location.latitude
        long = location.longitude 
        chs_lat.append(lat)
        chs_long.append(long)
    except:
        chs_lat.append('N/A')
        chs_long.append('N/A')
chs_zhvi['lat']=chs_lat
chs_zhvi['long']=chs_long

chs_zhvi.loc[chs_zhvi['lat'] == 'N/A']

The neighborhood of Cannonborough-Elliotborough still doesn't return any coordinates. This may be because the neighborhood is so small or so new that Nominatim does not have any geospatial data associated with it. Let's add the coordinates manually.

In [None]:
import numpy as np
chs_zhvi['lat']=np.where((chs_zhvi.lat=='N/A'),32.7906,chs_zhvi.lat)
chs_zhvi['long']=np.where((chs_zhvi.long=='N/A'),-79.9459,chs_zhvi.long)

chs_zhvi

Now let's get the coordinates for all the neighborhood centers in Pittsburgh.

In [None]:
pit_lat=[]
pit_long=[]
for index, row in pit_zhvi.iterrows():
    try:
        address=row['neighborhood']
        geolocator=Nominatim(user_agent='pit_explorer')
        location=geolocator.geocode(address)
        lat = location.latitude
        long = location.longitude 
        pit_lat.append(lat)
        pit_long.append(long)
    except:
        pit_lat.append('N/A')
        pit_long.append('N/A')
pit_zhvi['lat']=pit_lat
pit_zhvi['long']=pit_long

pit_zhvi.loc[pit_zhvi['lat'] == 'N/A']

This neighborhood name was also spelled incorrectly! Let's fix that.

In [None]:
pit_zhvi['neighborhood']=pit_zhvi['neighborhood'].str.replace('Southside Slopes, PA','South Side Slopes, PA')

pit_lat=[]
pit_long=[]
for index, row in pit_zhvi.iterrows():
    try:
        address=row['neighborhood']
        geolocator=Nominatim(user_agent='pit_explorer')
        location=geolocator.geocode(address)
        lat = location.latitude
        long = location.longitude 
        pit_lat.append(lat)
        pit_long.append(long)
    except:
        pit_lat.append('N/A')
        pit_long.append('N/A')
pit_zhvi['lat']=pit_lat
pit_zhvi['long']=pit_long

pit_zhvi.loc[pit_zhvi['lat'] == 'N/A']

Great! Every neighborhood has coordinates associated with it now. Let's double check our Pittsburgh dataframe.

In [None]:
pit_zhvi.head()

Now let's map the neighborhoods!

In [None]:
import json
import requests
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
%matplotlib inline
!pip install folium
import folium
print('libraries imported')

In [None]:
#finding geographical coordinates of Charleston
address_chs = 'Charleston, South Carolina'
geolocator_chs=Nominatim(user_agent='chs_explorer')
location_chs=geolocator_chs.geocode(address_chs)
latitude_chs=location_chs.latitude
longitude_chs=location_chs.longitude
print('the geographical coordinates of Charleston, SC are {},{}'.format(latitude_chs,longitude_chs))


#visualizing Charleston & the neighborhoods in it
map_chs=folium.Map(location=[latitude_chs,longitude_chs],zoom_start=8)
#add markers to map
for lat,long,label in zip(chs_zhvi['lat'],chs_zhvi['long'],chs_zhvi['neighborhood']):
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker([lat,long],radius=5,popup=label,color='blue',fill=True,fill_color='#003366',fill_opacity=0.6,parse_html=False).add_to(map_chs)
map_chs

This map doesn't look quite right... some cities that are supposed to be in Charleston are in Myrtle Beach or Columbia! Let's look at the geometry of Charleston's coordinates.

In [None]:
# @hidden cell
api = 'https://maps.googleapis.com/maps/api/geocode/json?components=locality:charleston&key=AIzaSyBC0D60XjFXmUmCV2QZ7WVoXu7R1OaKufc'

In [None]:
#api with credentials defined in hidden cell above
results=requests.get(api).json()
results

Now let's look at each neighborhood to see which neighborhoods have the incorrect coordinates. If their latitude and longitude coordinates are not within bounds of what we know Charleston's northeast and southwest corners are, they are incorrectly labeled.

In [None]:
for index, row in chs_zhvi.iterrows():
    lat=row['lat']
    long=row['long']
    if (32.6685048 <=lat <= 32.973586) and (-80.14379 <= long <=-79.7971659):
        print('neighborhood in bounds')
    else:
        print (row['neighborhood'])

Unfortunately, there are 13 neighborhoods that are too specific and are not returning the correct coordinates. Let's add them manually as well.

In [None]:
chs_zhvi['lat']=chs_zhvi['lat'].astype(float)
chs_zhvi['long']=chs_zhvi['long'].astype(float)

#Wagener Terrace
chs_zhvi['lat']=np.where((chs_zhvi.RegionName=='Wagener Terrace'),32.8162,chs_zhvi.lat)
chs_zhvi['long']=np.where((chs_zhvi.RegionName=='Wagener Terrace'),-79.9514,chs_zhvi.long)

#Westside
chs_zhvi['lat']=np.where((chs_zhvi.RegionName=='Westside'),32.7913,chs_zhvi.lat)
chs_zhvi['long']=np.where((chs_zhvi.RegionName=='Westside'),-79.9542,chs_zhvi.long)

#North Central
chs_zhvi['lat']=np.where((chs_zhvi.RegionName=='North Central'),32.8021,chs_zhvi.lat)
chs_zhvi['long']=np.where((chs_zhvi.RegionName=='North Central'),-79.9507,chs_zhvi.long)

#Eastside
chs_zhvi['lat']=np.where((chs_zhvi.RegionName=='Eastside'),32.7931,chs_zhvi.lat)
chs_zhvi['long']=np.where((chs_zhvi.RegionName=='Eastside'),-79.9365,chs_zhvi.long)

#Radcliffeborough
chs_zhvi['neighborhood']=chs_zhvi['neighborhood'].str.replace('Radcliffborough, SC','Radcliffeborough, SC')
chs_zhvi['RegionName']=chs_zhvi['RegionName'].str.replace('Radcliffborough','Radcliffeborough')
chs_zhvi['lat']=np.where((chs_zhvi.RegionName=='Radcliffeborough'),32.7874,chs_zhvi.lat)
chs_zhvi['long']=np.where((chs_zhvi.RegionName=='Radcliffeborough'),-79.9403,chs_zhvi.long)

#Charlestowne
chs_zhvi['lat']=np.where((chs_zhvi.RegionName=='Charlestowne'),32.7765,chs_zhvi.lat)
chs_zhvi['long']=np.where((chs_zhvi.RegionName=='Charlestowne'),-79.9311,chs_zhvi.long)

#Ashley Forest
chs_zhvi['lat']=np.where((chs_zhvi.RegionName=='Ashley Forest'),32.7846,chs_zhvi.lat)
chs_zhvi['long']=np.where((chs_zhvi.RegionName=='Ashley Forest'),-79.9876,chs_zhvi.long)

#Silver Hill-Magnolia
chs_zhvi['lat']=np.where((chs_zhvi.RegionName=='Silver Hill-Magnolia'),32.8209,chs_zhvi.lat)
chs_zhvi['long']=np.where((chs_zhvi.RegionName=='Silver Hill-Magnolia'),-79.9516,chs_zhvi.long)

#East Central
chs_zhvi['lat']=np.where((chs_zhvi.RegionName=='East Central'),32.8070,chs_zhvi.lat)
chs_zhvi['long']=np.where((chs_zhvi.RegionName=='East Central'),-79.9473,chs_zhvi.long)

#Hampton Park Terrace
chs_zhvi['lat']=np.where((chs_zhvi.RegionName=='Hampton Park Terrace'),32.7963,chs_zhvi.lat)
chs_zhvi['long']=np.where((chs_zhvi.RegionName=='Hampton Park Terrace'),-79.9547,chs_zhvi.long)

#Cresent
chs_zhvi['lat']=np.where((chs_zhvi.RegionName=='Cresent'),32.7722,chs_zhvi.lat)
chs_zhvi['long']=np.where((chs_zhvi.RegionName=='Cresent'),-79.9693,chs_zhvi.long)

#Ansonborough
chs_zhvi['lat']=np.where((chs_zhvi.RegionName=='Ansonborough'),32.7918,chs_zhvi.lat)
chs_zhvi['long']=np.where((chs_zhvi.RegionName=='Ansonborough'),-79.9263,chs_zhvi.long)

#French Quarter
chs_zhvi['lat']=np.where((chs_zhvi.RegionName=='French Quarter'),32.7787,chs_zhvi.lat)
chs_zhvi['long']=np.where((chs_zhvi.RegionName=='French Quarter'),-79.9284,chs_zhvi.long)

chs_zhvi

Let's try visualizing the neighborhoods in a map again.

In [None]:
#visualizing Charleston & the neighborhoods in it
map_chs=folium.Map(location=[latitude_chs,longitude_chs],zoom_start=13)
#add markers to map
for lat,long,label in zip(chs_zhvi['lat'],chs_zhvi['long'],chs_zhvi['neighborhood']):
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker([lat,long],radius=5,popup=label,color='blue',fill=True,fill_color='#003366',fill_opacity=0.6,parse_html=False).add_to(map_chs)
map_chs

That looks a lot better! Now let's look at the map of Pittsburgh. Will we have the same issue?

In [None]:
address_pit = 'Pittsburgh, Pennsylvania'
geolocator_pit=Nominatim(user_agent='pit_explorer')
location_pit=geolocator_pit.geocode(address_pit)
latitude_pit=location_pit.latitude
longitude_pit=location_pit.longitude
print('the geographical coordinates of Pittsburgh, PA are {},{}'.format(latitude_pit,longitude_pit))


#visualizing Pittsburgh & the neighborhoods in it
map_pit=folium.Map(location=[latitude_pit,longitude_pit],zoom_start=8)
#add markers to map
for lat,long,label in zip(pit_zhvi['lat'],pit_zhvi['long'],pit_zhvi['neighborhood']):
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker([lat,long],radius=5,popup=label,color='yellow',fill=True,fill_color='#ffb612',fill_opacity=0.6,parse_html=False).add_to(map_pit)
map_pit

Unfortunately, it looks like we have the same issue with the Pittsburgh neighborhoods. Again, let's define the bounds of Pittsburgh and investigate which neighborhoods have been assigned coordinates outside those bounds. Then, let's manually add in the correct coordinates for each neighborhood's center.

In [None]:
# @hidden cell
api_pit = 'https://maps.googleapis.com/maps/api/geocode/json?components=locality:pittsburgh&key=AIzaSyBC0D60XjFXmUmCV2QZ7WVoXu7R1OaKufc'

In [None]:
#api with credentials defined in hidden cell above
results=requests.get(api_pit).json()
results

In [None]:
for index, row in pit_zhvi.iterrows():
    lat=row['lat']
    long=row['long']
    if (40.3613689 <=lat <= 40.501368) and (-80.0952779 <= long <=--79.8657231):
        print('neighborhood in bounds')
    else:
        print (row['neighborhood'])

In [None]:
#Highland Park
pit_zhvi['lat']=np.where((pit_zhvi.RegionName=='Highland Park'),40.4799,pit_zhvi.lat)
pit_zhvi['long']=np.where((pit_zhvi.RegionName=='Highland Park'),-79.9165,pit_zhvi.long)

#South Side Flats
pit_zhvi['neighborhood']=pit_zhvi['neighborhood'].str.replace('Southside Flats, PA','South Side Flats, PA')
pit_zhvi['RegionName']=pit_zhvi['RegionName'].str.replace('Southside Flats','South Side Flats')
pit_zhvi['lat']=np.where((pit_zhvi.RegionName=='South Side Flats'),40.4284,pit_zhvi.lat)
pit_zhvi['long']=np.where((pit_zhvi.RegionName=='South Side Flats'),-79.9736,pit_zhvi.long)

#Point Breeze
pit_zhvi['lat']=np.where((pit_zhvi.RegionName=='Point Breeze'),40.449,pit_zhvi.lat)
pit_zhvi['long']=np.where((pit_zhvi.RegionName=='Point Breeze'),-79.910,pit_zhvi.long)

#Westwood
pit_zhvi['lat']=np.where((pit_zhvi.RegionName=='Westwood'),40.434,pit_zhvi.lat)
pit_zhvi['long']=np.where((pit_zhvi.RegionName=='Westwood'),-80.049,pit_zhvi.long)

#Overbrook
pit_zhvi['lat']=np.where((pit_zhvi.RegionName=='Overbrook'),40.3863,pit_zhvi.lat)
pit_zhvi['long']=np.where((pit_zhvi.RegionName=='Overbrook'),-80.0004,pit_zhvi.long)

#Manchester
pit_zhvi['lat']=np.where((pit_zhvi.RegionName=='Manchester'),40.4552,pit_zhvi.lat)
pit_zhvi['long']=np.where((pit_zhvi.RegionName=='Manchester'),-80.0241,pit_zhvi.long)

#Windgap
pit_zhvi['neighborhood']=pit_zhvi['neighborhood'].str.replace('Wind Gap, PA','Windgap, PA')
pit_zhvi['RegionName']=pit_zhvi['RegionName'].str.replace('Wind Gap','Windgap')
pit_zhvi['lat']=np.where((pit_zhvi.RegionName=='Windgap'),40.4546,pit_zhvi.lat)
pit_zhvi['long']=np.where((pit_zhvi.RegionName=='Windgap'),-80.0744,pit_zhvi.long)

#Fineview
pit_zhvi['neighborhood']=pit_zhvi['neighborhood'].str.replace('Fine View, PA','Fineview, PA')
pit_zhvi['RegionName']=pit_zhvi['RegionName'].str.replace('Fine View','Fineview')
pit_zhvi['lat']=np.where((pit_zhvi.RegionName=='Fineview'),40.464,pit_zhvi.lat)
pit_zhvi['long']=np.where((pit_zhvi.RegionName=='Fineview'),-80.003,pit_zhvi.long)

#Summer Hill
pit_zhvi['lat']=np.where((pit_zhvi.RegionName=='Summer Hill'),40.493,pit_zhvi.lat)
pit_zhvi['long']=np.where((pit_zhvi.RegionName=='Summer Hill'),-80.008,pit_zhvi.long)

#Oakwood
pit_zhvi['lat']=np.where((pit_zhvi.RegionName=='Oakwood'),40.4263,pit_zhvi.lat)
pit_zhvi['long']=np.where((pit_zhvi.RegionName=='Oakwood'),-80.0689,pit_zhvi.long)

#Spring Garden
pit_zhvi['lat']=np.where((pit_zhvi.RegionName=='Spring Garden'),40.471,pit_zhvi.lat)
pit_zhvi['long']=np.where((pit_zhvi.RegionName=='Spring Garden'),-79.988,pit_zhvi.long)

#St. Clair
pit_zhvi['lat']=np.where((pit_zhvi.RegionName=='St. Clair'),40.4091,pit_zhvi.lat)
pit_zhvi['long']=np.where((pit_zhvi.RegionName=='St. Clair'),-79.9724,pit_zhvi.long)

#Ridgemont
pit_zhvi['lat']=np.where((pit_zhvi.RegionName=='Ridgemont'),40.4282,pit_zhvi.lat)
pit_zhvi['long']=np.where((pit_zhvi.RegionName=='Ridgemont'),-80.0325,pit_zhvi.long)

#Allegheny West
pit_zhvi['lat']=np.where((pit_zhvi.RegionName=='Allegheny West'),40.4282,pit_zhvi.lat)
pit_zhvi['long']=np.where((pit_zhvi.RegionName=='Allegheny West'),-80.0158,pit_zhvi.long)

#double check that the values were input correctly
pit_zhvi.tail(10)

Now let's look at the map of Pittsburgh again.

In [None]:
#visualizing Pittsburgh & the neighborhoods in it
map_pit=folium.Map(location=[latitude_pit,longitude_pit],zoom_start=13)
#add markers to map
for lat,long,label in zip(pit_zhvi['lat'],pit_zhvi['long'],pit_zhvi['neighborhood']):
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker([lat,long],radius=5,popup=label,color='yellow',fill=True,fill_color='#ffb612',fill_opacity=0.7,parse_html=False).add_to(map_pit)
map_pit

Now let's look at venues near each neighborhood's city center. Foursquare Client ID, Client Secret, & Access Token all defined in hidden cell below. API version 20200605 and limit is 100.

In [None]:
# The code was removed by Watson Studio for sharing.

In [None]:
#defining a function to get the details of nearby venues for each neighborhood within each city
def getNearbyVenues(names, latitudes, longitudes, radius=1609.34):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID,CLIENT_SECRET,VERSION,lat,lng,radius,limit)
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['neighborhood','neighborhood latitude','neighborhood longitude','venue','venue latitude','venue longitude','venue category']
    return(nearby_venues)

#converting Charleston list into dataframe
chs_venues=getNearbyVenues(names=chs_zhvi['neighborhood'],latitudes=chs_zhvi['lat'],longitudes=chs_zhvi['long'],radius=1609.34)
chs_venues=pd.DataFrame(chs_venues)
print(chs_venues.shape)
chs_venues.head()

From the shape of the dataframe found above, we see that there are a total of 1711 nearby venues within 1609.34 meters or 1 mile of each neighborhood's given latitude & longitude. Now we group venues by neighborhood to see how many venues are in each neighborhood. We also group in descending order to see which neighborhoods have the most venues. Then, we separate venue categories with one-hot encoding and add the neighborhood column back into the dataframe. 

In [None]:
chs_venues.groupby('neighborhood').count().sort_values(by='venue',ascending=False)

#separate venue categories with one hot encoding & add neighborhood column back to dataframe
chs_onehot=pd.get_dummies(chs_venues[['venue category']],prefix='',prefix_sep='')
chs_onehot['neighborhood']=chs_venues['neighborhood']
#move neighborhood column to first column/index
fixed_columns = [chs_onehot.columns[-1]] + list(chs_onehot.columns[:-1])
chs_onehot = chs_onehot[fixed_columns]
#make sure we didn't lose any venue data by printing shape; should still have 1711 rows
print(chs_onehot.shape)
chs_onehot.head()

Now we group by neighborhood & calculate mean of venue occurance per category to see what types of venues are most common in each neighborhood.

In [None]:
chs_grouped=chs_onehot.groupby('neighborhood').mean().reset_index()
chs_grouped

Lastly, we isolate the category for brewery since that is our venue of interest and how we will determine neighborhood similarity.

In [None]:
chs_breweries=chs_grouped.loc[:,['neighborhood','Brewery']].sort_values(by='Brewery',ascending=False)
chs_breweries

We see that the neighborhood of Silver Hill-Magnolia has the greatest frequency of breweries at 0.24.

Now let's repeat the whole process with the neighborhoods in Pittsburgh.

In [None]:
#convert list into dataframe
pit_venues=getNearbyVenues(names=pit_zhvi['neighborhood'],latitudes=pit_zhvi['lat'],longitudes=pit_zhvi['long'],radius=1609.34)
pit_venues=pd.DataFrame(pit_venues)
print(pit_venues.shape)
pit_venues.head()

In [None]:
#from the shape of the dataframe found above, we see that there are a total of 4451 nearby venues within 750 meters of each neighborhood's given latitude & longitude
#group venues by neighborhood to see how many venues are in each neighborhood in descending order to see which neighborhoods have the most venues
pit_venues.groupby('neighborhood').count().sort_values(by='venue',ascending=False)

#separate venue categories with one hot encoding & add neighborhood column back to dataframe
pit_onehot=pd.get_dummies(pit_venues[['venue category']],prefix='',prefix_sep='')
pit_onehot['neighborhood']=pit_venues['neighborhood']
#move neighborhood column to first column/index
fixed_columns_pit = [pit_onehot.columns[-1]] + list(pit_onehot.columns[:-1])
pit_onehot = pit_onehot[fixed_columns_pit]
#make sure we didn't lose any venue data by printing shape; should still have 4451 rows
print(pit_onehot.shape)
pit_onehot.head()

In [None]:
#now group by neighborhood & calculate mean of venue occurance per category to see what types of venues are most common in each neighborhood
pit_grouped=pit_onehot.groupby('neighborhood').mean().reset_index()
pit_grouped

In [None]:
#now isolate the brewery category so we can see which neighborhood has the greatest frequency of breweries
pit_breweries=pit_grouped.loc[:,['neighborhood','Brewery']].sort_values(by='Brewery',ascending=False)
pit_breweries.head(10)

We see that the neighborhoods of Spring Garden and Upper Lawrenceville both have the greatest frequency of breweries at 0.125.

Lastly, combine our home value dataframe with our brewery frequency per neighborhood dataframe for each city.

In [None]:
chs_merged=chs_zhvi.merge(chs_breweries, on='neighborhood',how='inner')
chs_merged.sort_values(by='Brewery',ascending=False,inplace=True)
chs_merged.head(10)

In [None]:
pit_merged=pit_zhvi.merge(pit_breweries, on='neighborhood',how='inner')
pit_merged.sort_values(by='Brewery',ascending=False,inplace=True)
pit_merged.head(10)

### Results and Discussion

Our analysis shows that the Charleston neighborhood with the highest frequency of brewery as a venue within a one-mile radius of city center is Silver Hill-Magnolia, with a frequency of 0.24. This means that almost a quarter of the venues in this neighborhood are categorized as a brewery. At the end of 2020, this neighborhood had a median home value of $205,635.00.

Our analysis also shows that there are two Pittsburgh neighborhoods with the highest frequency of brewery as a venue within a one-mile radius of city center with a frequency of 0.125 - Spring Garden and Upper Lawrenceville. At the end of 2020, these neighborhoods had median home values of $129,360.00 and $251,403.00, respectively. 

However, the intention of this project is to find the neighborhoods which are most similar between Charleston and Pittsburgh based on the frequency of breweries as a venue. Based on this criteria, Silver Hill-Magnolia is not very similar to Spring Garden or Upper Lawrenceville. We can take another look at Charleston neighborhoods and find that East Central has a frequency of breweries as a venue of 0.112903. This is still not a perfect match but is more similar to the frequency of 0.125 found in Spring Garden & Upper Lawrenceville. Although we do not need the median home value in East Central to inform our home-buying decision in Pittsburgh, we can see that the median home value at the end of 2020 in East Central was $343,867.00.

Of course, it would be most helpful to see what current (January 2021) median home values are in the identified neighborhoods, but that data is not yet available from Zillow. There was also a bit of manual data cleaning that was involved in our process because home neighborhoods are more specific and narrowly bounded than neighborhoods for venues, based on Foursquare's API data. In order to compare as many neighborhoods as possible and visualize breweries all across each city, the decision was made to manually enter the city center coordinates for each neighborhood that was not defined by the Foursquare API. This process may be easier for bigger/more populous/more tourist-centric cities because the data may be more robust. Additional analysis on neighborhoods can be done with other factors, such as school districts, proximity to grocery stores, and/or proximity to a known workplace (if relocation is due to job change), to help narrow down the ideal neighborhood(s) even more.

### Conclusion

The purpose of this project was to find the most similar neighborhoods between Charleston, SC and Pittsburgh, PA based on the number of breweries in each neighborhood. A secondary purpose of this project was to evaluate median home value in each of those neighborhoods to inform and narrow down our search for a new home while relocating from Charleston to Pittsburgh. Our analysis consisted of taking neighborhoods for which we have home value data and examining the venues within those neighborhoods to identify the frequency of breweries within each neighborhood. Once we found which neighborhoods had the most breweries, we could easily compare the neighborhoods from Charleston and Pittsburgh to find which ones were most similar/had the closest frequency of breweries per neighbrhood. Based on our analysis, we can say that the Spring Garden and Upper Lawrenceville neighborhoods in Pittsburgh are most similar to the Silver Hill-Magnolia neighborhood in Charleston. We can also easily compare median home values in these neighborhoods. We can see that median home values in Spring Garden and Upper Lawrenceville fall well below our home-buying budget of $300,000. 

Thus, we can conclude that moving from Charleston to either Spring Garden or Upper Lawrenceville will help a craft beer enthusiast feel at home in their new surroundings more quickly because of the greater frequency of a venue of interest. We can also conclude that moving to either of these neighborhoods will be affordable for this relocator since neither neighborhood's median home value exceeds the predetermined budget of $300,000. 