## Capstone Project - The Battle of Neighborhoods (Week 2)
## By Chase K.
## Full Report

**This notebook contains the individual intro/business problem and data requirements sections from Week 1.**

*The content of this project is hypothetical, and while it addresses a real issue, the premise is fictional and is a work of academic demonstration and not necessarily a real-world solution.*

*Data source required disclosure for <a href="https://www.njtransit.com/developer-tools" target="_blank">NJ Transit API</a>* - **"Data provided by NJ TRANSIT, which is the sole owner of the Data.**

***

**Brief**

*A full report consisting of all of the following components:*

* **Introduction** where you discuss the business problem and who would be interested in this project.
* **Data** where you describe the data that will be used to solve the problem and the source of the data.
* **Methodology** section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.
* **Results** section where you discuss the results.
* **Discussion** section where you discuss any observations you noted and any recommendations you can make based on the results.
* **Conclusion** section where you conclude the report.
***

# Introduction

In February 2021, the global coronavirus pandemic is still causing major distruptions to daily life and continues to sicken and kill many people.
To combat the virus, scientists around the world have been developing vaccines that will prevent new cases of infection.

At this time, there are 2 vaccines approved for use in the United States. They are being distributed by the federal government to the states, and administered by county governments, hospital networks, and more recently, retail pharmacy chains.

# The Business Problem

Logisitics for vaccine distribution have been stymied by miscommunication, slow rollout of each phase of eligible person to be vaccinated, and bringing the vaccine sites online in ways that are accessible to the people who need it most. For those that rely on public transit, just getting to a vaccination site can prove difficult or impossible. 

**A *hypothetical* request for proposals has been issued by the <a href="https://morriscountynj.gov/" target="_blank">County of Morris</a> in the State of New Jersey asking for bids to identify the most efficient choice of retail pharmacy locations to adminster vaccine that are accessible by public transit.**

For this exercise we will consider public transit options through the state agency, <a href="https://www.njtransit.com/" target="_blank">NJ Transit</a>. This agency operates trains, light rail, buses, and a ADA paratransit program known as <a href="https://www.njtransit.com/accessibility/access-link-ada-paratransit" target="_blank">Access Link</a>.

Access Link only picks up or drops off within 3/4 of a mile of a bus or light rail station, so for the purposes of this project, we will **LIMIT** our results to 3/4 miles away from any bus, train, or light rail station. 

Train stations have fixed street addresses while bus stations may be roadside with no real street address. Train stations are also listed as physical places on mapping apps like Google Maps and Foursquare whereas bus stations do not have such profiles, often just a street sign or an overhead covering.

*Because the array of bus stop data is much larger with greater variability in service, we will focus on rail station data only in this project proposal. If the proposal is accepted, we will expand to include bus stations in the report through geolocating or approximating the bus station data to a street address.*


Our firm has been retained to explore the issue, to collect retail pharmacy location data from Foursquare's API, and to use train station data from Foursquare while also considering data from <a href="https://www.njtransit.com/developer-tools" target="_blank">NJ Transit's own developer tools API</a>. As needed, public information on train stations or pharmacies will be considered, with preference for live rather than static data.

For practical purposes, we will only consider the stand-alone locations for the Walgreens, CVS, and Rite-Aid pharmacies, and disregard the in-store locations such as Walmart and Shoprite grocery stores.

**In summary, the problem is to find appropriate pharmacies within an appropriate distance to public transit that will serve as vaccination sites.** 

The Foursquare API has location data for both pharmacies as well as train stations, though we will attempt to pull data from the NJ Transit API as well as sources like Wikipedia to verify or enhance data. One expanded use would be to match pharmacy operating hours with that of nearby public transit schedules, the granularity of bus schedules likley requiring use of the NJ Transit API.

**The stakeholders are the county/state governments requesting the proposal, and ultimately the general public, who will be able to use the information to find a vaccination site near them that are accessible by public transit.**

# Data Introduction

**To reintroduce, the problem is to find appropriate pharmacies within an appropriate distance to public transit that will serve as vaccination sites.** 

The Foursquare API has location data for both pharmacies as well as train stations, though we will attempt to pull data from the state agency, <a href="https://www.njtransit.com/developer-tools" target="_blank">NJ Transit API</a>, as well as sources like Wikipedia to verify or enhance data. One expanded use would be to match pharmacy operating hours with that of nearby public transit schedules, the granularity of bus schedules likley requiring use of the NJ Transit API.

**The stakeholders are the county/state governments requesting the *hypothetical* proposal, and ultimately the general public, who will be able to use the information to find a vaccination site near them that are accessible by public transit.**

# Data Requirements

1. **NJ Transit train stations**
    * Data source: <a href="https://www.njtransit.com/developer-tools" target="_blank">NJ Transit API</a> - <a href="https://raw.githubusercontent.com/cwkruppo/Coursera_Capstone/master/stops.txt" target="_blank">stops.txt</a> - General Transit Feed Specification (GTFS)
    * Description: A .txt list that can be converted in to a pandas dataframe and used to identify train stations by name, location coordinates, and a unique train stop code that will be used to identify stations within the target county.   
    
2. **Wikipedia List of Railway Stations in Morris County, NJ**
    * Data source: <a href="https://en.wikipedia.org/wiki/Category:Railway_stations_in_Morris_County,_New_Jersey" target="_blank">Wikipedia</a>
    * Description: A list of railway station stops in Morris County, NJ. This source can be formatted to be compared to the complete list of station data.    
    
3. **US Census County Data - FIPS codes**
    * Data source: <a href="https://www2.census.gov/geo/docs/reference/codes/files/st34_nj_cousub.txt" target="_blank">US Census website</a> - State = 34 County = 027
    * Description: A list of railway station stops in Morris County, NJ. This source can be formatted to be compared to the complete list of station data.    
    
4. **Foursquare API**
    * Data source: <a href="https://developer.foursquare.com/" target="_blank">Developer Tools portal for Foursquare API</a>
    * Description: A series of venue search calls to the API to yield the specific pharmacies within the set radius of each of the stations (Walgreens, CVS, Rite-AID; 3/4mi (1200m) from each station).

**Approach**

The idea would be to cluster the pharmacies around the train stations using the specifying a radius of 1200 meters, roughly 3/4 of a mile, as described in the proposal.

Since there are fewer train stations than pharmacies, using the list of stations as a reference to find suitable venues should yield the desired result. i.e. Finding pharmacies near train stations instead of finding train stations near pharmacies.

Using the Census data to map the county, the Foursquare API to pull venue data, and NJ Transit data to map the clusters of pharmacies around the stations.


# Train Station Data

**The following exemplifies the import and cleaning of the train station data as imported as a .txt file, as downloaded from the NJ Transit Developer Tools portal.**

In [1]:
import pandas as pd
dfstops = pd.read_table("https://raw.githubusercontent.com/cwkruppo/Coursera_Capstone/master/stops.txt", sep=',', header=0)
dfstops.head()

Unnamed: 0,stop_id,stop_code,stop_name,stop_desc,stop_lat,stop_lon,zone_id
0,1,95001,30TH ST. PHL.,,39.956565,-75.182327,5961
1,2,95002,ABSECON,,39.424333,-74.502094,333
2,3,95003,ALLENDALE,,41.030902,-74.130957,2893
3,4,95004,ALLENHURST,,40.237659,-74.006769,5453
4,5,95005,ANDERSON STREET,,40.894458,-74.043781,1357


In [2]:
dfstops.columns

Index(['stop_id', 'stop_code', 'stop_name', 'stop_desc', 'stop_lat',
       'stop_lon', 'zone_id'],
      dtype='object')

In [3]:
dfstops.shape

(228, 7)

In [4]:
## Drop unwanted columns, reducing to a unique stope code, the stop name, and the stop's lat & long
dfstops.drop(['stop_id', 'stop_desc', 'zone_id'], axis=1, inplace=True)
dfstops.columns

Index(['stop_code', 'stop_name', 'stop_lat', 'stop_lon'], dtype='object')

In [5]:
dfstops.shape

(228, 4)

In [6]:
dfstops.head()

Unnamed: 0,stop_code,stop_name,stop_lat,stop_lon
0,95001,30TH ST. PHL.,39.956565,-75.182327
1,95002,ABSECON,39.424333,-74.502094
2,95003,ALLENDALE,41.030902,-74.130957
3,95004,ALLENHURST,40.237659,-74.006769
4,95005,ANDERSON STREET,40.894458,-74.043781


In [10]:
## Drop unwanted rows, cutting the light rail data, since our focus is on main railway stations only (and because light rail service doesn't operate in Morris County)
#identify partial string to look for
dfstops = dfstops[~dfstops['stop_name'].str.contains('LIGHT RAIL')]
dfstops.shape

(168, 4)

In [8]:
dfstops

Unnamed: 0,stop_code,stop_name,stop_lat,stop_lon
0,95001,30TH ST. PHL.,39.956565,-75.182327
1,95002,ABSECON,39.424333,-74.502094
2,95003,ALLENDALE,41.030902,-74.130957
3,95004,ALLENHURST,40.237659,-74.006769
4,95005,ANDERSON STREET,40.894458,-74.043781
...,...,...,...,...
213,30824,BERGENLINE AVE,40.782225,-74.022271
221,95171,MOUNT ARLINGTON,40.896590,-74.632731
222,95172,WAYNE/ROUTE 23 TRANSIT CENTER [RR],40.900254,-74.256971
226,95183,PENNSAUKEN TRANSIT CENTER,39.977769,-75.061796


# Foursquare Pharmacy Data

**The following exemplifies the import and cleaning of the pharmacy data as requested through the Foursquare API.
Specifically, Walgreens, CVS, and Rite-Aid pharmacies within 3/4 of a mile of an NJ Transit rail station.

# Mapping data


**At this point we have the train station names, their lat & long, as well as the original unique stop_code by which to identify them.**

From here, we can use this data as reference points against Foursquare API data, allowing comparison between the two datasets.

**The next steps are to narrow the station and pharmacy by county and then cluster the pharmacies around the distance from those stations. Finally, plotting the results on a map.**

# Methodology

This section represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.

# Results

This section discusses the results.

# Discussion

The section where you discuss any observations you noted and any recommendations you can make based on the results.

# Conclusion

Section where you conclude the report.

In summary, our firm was able to produce {# clusters} mapped clusters of {# pharmacies} retail pharmacy locations within an ADA-compliant distance from {#} public train stations in Morris County, New Jersey, United States. This represents {%} of stations in Morris County.

The data and resulting visual representations show that {there is/isn't sufficient access to coronavirus vaccine sites}.

# Future directions

Based on our conclusions, we are recommending the following changes to improve access to vaccination sites by public transit and expand vaccination efforts to account for those who want a vaccine, though who cannot get to a site from where they live.

* **Extending the limit of Access Link ride coverage from the existing 3/4 mile limit  - OR -**
* **Waiving the 3/4 mile Access Link ride limit for the purpose of vaccination**
* **Coordinating with ride share apps to waive fares to and from vaccination sites**
* **Offering in-home vaccination for homebound senior citizens by appointment**
* **Expanding vaccination sites to include other publically-accessible spaces such as libraries, schools, municipal offices**
* **Expanding vaccination sites to include population centers that do not have direct public transit access by train, light rail, or bus**

Finally, if the proposal is approved, the same approach can be expanded to include bus stops, and further, to the entire state and the pharmacies within close proximity to all train, bus, or light rail station stops.

![Vaccine vials](https://cloudfront-us-east-1.images.arcpublishing.com/advancelocal/4ZJWSPGZ55GBROZG7KSQA7BWWM.jpg)

-CK Feb 2021