# Capstone Project - The Battle of Neighborhoods (Week 1)
## By Chase K.
## Data Section

**This notebook will be parsed out into individual sections during the first week.**

*The content of this project is hypothetical, and while it addresses a real issue, the premise is fictional and is a work of academic demonstration and not necessarily a real-world solution.*
***

**Brief**

*Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.*
***

# Data Introduction

**To reintroduce, the problem is to find appropriate pharmacies within an appropriate distance to public transit that will serve as vaccination sites.** 

The Foursquare API has location data for both pharmacies as well as train stations, though we will attempt to pull data from the state agency, <a href="https://www.njtransit.com/developer-tools" target="_blank">NJ Transit API</a>, as well as sources like Wikipedia to verify or enhance data. One expanded use would be to match pharmacy operating hours with that of nearby public transit schedules, the granularity of bus schedules likley requiring use of the NJ Transit API.

**The stakeholders are the county/state governments requesting the *hypothetical* proposal, and ultimately the general public, who will be able to use the information to find a vaccination site near them that are accessible by public transit.**

# Data Requirements

1. **NJ Transit train stations**
    * Data source: <a href="https://www.njtransit.com/developer-tools" target="_blank">NJ Transit API</a> - <a href="https://raw.githubusercontent.com/cwkruppo/Coursera_Capstone/master/stops.txt" target="_blank">stops.txt</a> - General Transit Feed Specification (GTFS)
    * Description: A .txt list that can be converted in to a pandas dataframe and used to identify train stations by name, location coordinates, and a unique train stop code that will be used to identify stations within the target county.   
    
2. **Wikipedia List of Railway Stations in Morris County, NJ**
    * Data source: <a href="https://en.wikipedia.org/wiki/Category:Railway_stations_in_Morris_County,_New_Jersey" target="_blank">Wikipedia</a>
    * Description: A list of railway station stops in Morris County, NJ. This source can be formatted to be compared to the complete list of station data.    
    
3. **US Census County Data - FIPS codes**
    * Data source: <a href="https://www2.census.gov/geo/docs/reference/codes/files/st34_nj_cousub.txt" target="_blank">US Census website</a> - State = 34 County = 027
    * Description: A list of railway station stops in Morris County, NJ. This source can be formatted to be compared to the complete list of station data.    
    
4. **Foursquare API**
    * Data source: <a href="https://developer.foursquare.com/" target="_blank">Developer Tools portal for Foursquare API</a>
    * Description: A series of venue search calls to the API to yield the specific pharmacies within the set radius of each of the stations (Walgreens, CVS, Rite-AID; 3/4mi (1200m) from each station).

**Approach**

The idea would be to cluster the pharmacies around the train stations using the specifying a radius of 1200 meters, roughly 3/4 of a mile, as described in the proposal.

Since there are fewer train stations than pharmacies, using the list of stations as a reference to find suitable venues should yield the desired result. i.e. Finding pharmacies near train stations instead of finding train stations near pharmacies.

Using the Census data to map the county, the Foursquare API to pull venue data, and NJ Transit data to map the clusters of pharmacies around the stations.


# Train Station Data

**The following exemplifies the import and cleaning of the train station data as imported as a .txt file, as downloaded from the NJ Transit Developer Tools portal.**

In [1]:
import pandas as pd
dfstops = pd.read_table("https://raw.githubusercontent.com/cwkruppo/Coursera_Capstone/master/stops.txt", sep=',', header=0)
dfstops.head()

Unnamed: 0,stop_id,stop_code,stop_name,stop_desc,stop_lat,stop_lon,zone_id
0,1,95001,30TH ST. PHL.,,39.956565,-75.182327,5961
1,2,95002,ABSECON,,39.424333,-74.502094,333
2,3,95003,ALLENDALE,,41.030902,-74.130957,2893
3,4,95004,ALLENHURST,,40.237659,-74.006769,5453
4,5,95005,ANDERSON STREET,,40.894458,-74.043781,1357


In [2]:
dfstops.columns

Index(['stop_id', 'stop_code', 'stop_name', 'stop_desc', 'stop_lat',
       'stop_lon', 'zone_id'],
      dtype='object')

In [3]:
dfstops.shape

(228, 7)

In [4]:
## Drop unwanted columns, reducing to a unique stope code, the stop name, and the stop's lat & long
dfstops.drop(['stop_id', 'stop_desc', 'zone_id'], axis=1, inplace=True)
dfstops.columns

Index(['stop_code', 'stop_name', 'stop_lat', 'stop_lon'], dtype='object')

In [5]:
dfstops.shape

(228, 4)

In [6]:
dfstops.head()

Unnamed: 0,stop_code,stop_name,stop_lat,stop_lon
0,95001,30TH ST. PHL.,39.956565,-75.182327
1,95002,ABSECON,39.424333,-74.502094
2,95003,ALLENDALE,41.030902,-74.130957
3,95004,ALLENHURST,40.237659,-74.006769
4,95005,ANDERSON STREET,40.894458,-74.043781


In [10]:
## Drop unwanted rows, cutting the light rail data, since our focus is on main railway stations only (and because light rail service doesn't operate in Morris County)
#identify partial string to look for
dfstops = dfstops[~dfstops['stop_name'].str.contains('LIGHT RAIL')]
dfstops.shape

(168, 4)

In [8]:
dfstops

Unnamed: 0,stop_code,stop_name,stop_lat,stop_lon
0,95001,30TH ST. PHL.,39.956565,-75.182327
1,95002,ABSECON,39.424333,-74.502094
2,95003,ALLENDALE,41.030902,-74.130957
3,95004,ALLENHURST,40.237659,-74.006769
4,95005,ANDERSON STREET,40.894458,-74.043781
...,...,...,...,...
213,30824,BERGENLINE AVE,40.782225,-74.022271
221,95171,MOUNT ARLINGTON,40.896590,-74.632731
222,95172,WAYNE/ROUTE 23 TRANSIT CENTER [RR],40.900254,-74.256971
226,95183,PENNSAUKEN TRANSIT CENTER,39.977769,-75.061796


# Foursquare Pharmacy Data

**The following exemplifies the import and cleaning of the pharmacy data as requested through the Foursquare API.
Specifically, Walgreens, CVS, and Rite-Aid pharmacies within 3/4 of a mile of an NJ Transit rail station.

# Mapping data


**At this point we have the train station names, their lat & long, as well as the original unique stop_code by which to identify them.**

From here, we can use this data as reference points against Foursquare API data, allowing comparison between the two datasets.

**The next steps are to narrow the station and pharmacy by county and then cluster the pharmacies around the distance from those stations. Finally, plotting the results on a map.**

-CK Feb 2021

![Vaccine vials](https://c.files.bbci.co.uk/53A9/production/_115371412_gettyimages-1265248637.jpg)