# The Battle of Neighbourhoods
    
    This notebook exists to complete the coursera capstone project assignment.

#### 1. Introduction / Business Problem

    The problem this report aims to solve consists of the following question, "Which neighbourhoods in and around Pittsburgh could be considered as likely candidates to open a new grocery store location." In order to solve this problem, we will focus on factors such as the variety of venues and their locations within the neighbourhoods of Pittsburgh, PA. We will then group the neighbourhoods based on these factors and make observations. The target audience of this report includes the stakeholders of a national grocery chain.

#### 2. Data

    For this project, we will be using Allegheny County Zip Codes data in conjunction with FourSquare location data. The Allegheny Zip Code dataset demarcates the zip code boundaries that lie within Allegheny County and provides zip codes, neighborhoods, and geospatial coordinates. The FourSquare Places API will be used to gather venue locations found within a 500m radius of a given neighborhood. Data will be organized in to dataframes and projected on to an interactive map for visualization.

    The Allegheny Zip Codes dataset is accessed via Pennsylvania Spatial Data Access (PASDA), an official public access open geospatial data portal.

In [89]:
import pandas as pd


url = 'https://www.pasda.psu.edu/spreadsheet/AlleghenyCounty__ZipCodeBoundaries2020.csv'
pittsburgh_nbh = pd.read_csv(url)
pittsburgh_nbh.head()


Unnamed: 0,FID,ZIP,NAME,ZIPTYPE,STATE,STATEFIPS,COUNTYFIPS,COUNTYNAME,S3DZIP,LAT,...,EMPTYCOL,TOTRESCNT,MFDU,SFDU,BOXCNT,BIZCNT,RELVER,COLOR,Shape_Leng,Shape_Area
0,0,15224,PITTSBURGH,NON-UNIQUE,PA,42,42003,ALLEGHENY,152,40.464263,...,,5113,845,4063,205,495,1.9.3,0,34291.10853,27729040.0
1,1,15202,PITTSBURGH,NON-UNIQUE,PA,42,42003,ALLEGHENY,152,40.467764,...,,14090,2933,10866,291,961,1.9.3,8,96211.11859,146664800.0
2,2,15012,BELLE VERNON,NON-UNIQUE,PA,42,42129,WESTMORELAND,150,40.15614,...,,7110,180,6786,144,651,1.9.3,10,8748.136272,3777756.0
3,3,15142,PRESTO,NON-UNIQUE,PA,42,42003,ALLEGHENY,151,40.380401,...,,1037,0,919,118,30,1.9.3,5,56105.50969,55544800.0
4,4,15216,PITTSBURGH,NON-UNIQUE,PA,42,42003,ALLEGHENY,152,40.401802,...,,11008,1817,9054,137,535,1.9.3,6,80277.15303,95351930.0


    The .csv file provided contains a multitude of fields defining the zip code boundaries and neighborhoods found within Pittsburgh and Allegheny County. We will be focusing our analysis on the Zip Code, Neighborhood, Latitude, and Longitude fields.

In [90]:
#instantiate a new dataframe for our target variables
column_names = ['ZIP', 'Neighborhood', 'Latitude', 'Longitude']
neighborhoods = pd.DataFrame(columns = column_names)

In [91]:
neighborhoods

Unnamed: 0,ZIP,Neighborhood,Latitude,Longitude


In [94]:
#fill the new dataframe with the relevant data
neighborhoods['ZIP'] = pittsburgh_nbh['ZIP']
neighborhoods['Neighborhood'] = pittsburgh_nbh['NAME']
neighborhoods['Latitude'] = pittsburgh_nbh['LAT']
neighborhoods['Longitude'] = pittsburgh_nbh['LON']

In [96]:
neighborhoods

Unnamed: 0,ZIP,Neighborhood,Latitude,Longitude
0,15224,PITTSBURGH,40.464263,-79.945118
1,15202,PITTSBURGH,40.467764,-80.053123
2,15012,BELLE VERNON,40.156140,-79.812132
3,15142,PRESTO,40.380401,-80.120993
4,15216,PITTSBURGH,40.401802,-80.034334
...,...,...,...,...
119,15241,PITTSBURGH,40.331863,-80.082840
120,15219,PITTSBURGH,40.442916,-79.988152
121,15236,PITTSBURGH,40.382820,-79.945023
122,15642,IRWIN,40.302046,-79.703196


In [115]:
#remove any rows in which location data is not found
nan_value = float('NaN')

neighborhoods.replace(0, nan_value, inplace = True)

neighborhoods.dropna(subset = ['Latitude'], inplace = True)

In [116]:
#The resulting dataframe contains our target locations
print('The dataframe has {} zip codes and {} unique neighborhoods.'.format(
        neighborhoods.shape[0],
        len(neighborhoods['Neighborhood'].unique())))



The dataframe has 120 zip codes and 73 unique neighborhoods.
