# Objective

Help a nonprofit organization YoLocal Snack find three potential locations to open up shop. Our goal is to find the the stations with the highest entries and exits during meal hours. To cater towards our target market of New Yorkers with long commutes, we will establish filters that indicate long commutes.

Long Commute Indicators:

1. Boroughs Outside of the City
2. Stations with only one or two subway lines
3. Number of unlimited and student metros cards used 

After filtering and identifying potential stations, we can do a hand check of these stations by opening up Google Maps to visualize the number of local food stores near the station. Google's activity tracker can also reveal if traffic within these stores are higher during meal hours. In the future, YoLocal Snack will work with these vendors to efficiently cater to the local commuters. 


# Gathering Data

I will use mta data from January 2021 to April 2021 as the basis of my analysis. This is a good time frame to look at New York's commuter cycle. Students go back to school in January and workers resume work after major holidays. Additionally, the turnstile data has reset so it's possible to detect where anomalies begin and decide what to do with them. The total ridership from January 1, 2021 to April 23, 2021 is [171,715,108](https://new.mta.info/document/20441). 

To reinforce consistency, I gathered data for January to April from previous years to be used for comparison with traffic in 2021. If stations remain consistently busy during meal hours for the last three years, then they are great choices for YoLocal Snack to open a store. 

Datasets stored: 

- Mta Location Data
- Mta Turnstile Data January to April 2019 - 2021
- Mta Fare Data January to April 2019 - 2021

In [105]:
from sqlalchemy import create_engine
import urllib.request
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import datetime 
from datetime import timedelta
%config InlineBackend.figure_format = 'svg'
%matplotlib inline 

turnstile_url = "http://web.mta.info/developers/data/nyct/turnstile/turnstile_{}.txt"
daily_total_riders_url = "https://new.mta.info/document/20441"
fare_url = "http://web.mta.info/developers/data/nyct/fares/fares_{}.csv"
location_url = "https://atisdata.s3.amazonaws.com/Station/Stations.csv"

In [3]:
pd.set_option('display.max.colwidth', None)

```def get_serial_date(start_date, end_date, month):
    week_nums = []
    date = datetime.date(*start_date)
    end_date = datetime.date(*end_date)
    delta = timedelta(weeks = 1)
    while date <= end_date:
        date_month = date.month
        if date_month in month:
            week_nums.append(date.strftime("%y%m%d"))
        date += delta
    return week_nums```

In [114]:
daily_total_df = pd.read_csv(daily_total_riders_url)
daily_total_df.rename(columns = {'Date': 'DATE', 'Subways: Total Estimated Ridership':'SUBWAY_TOTAL'}, inplace = True)
daily_total_check_df = daily_total_df.loc[:, ['DATE','SUBWAY_TOTAL']]
daily_total_check_df['YEAR'] = daily_total_check_df['DATE'].str.extract(r'\b(\d+)$')

Unnamed: 0,DATE,SUBWAY_TOTAL,Subways: % Change From Pre-Pandemic Equivalent Day,Buses: Total Estimated Ridership,Buses: % Change From Pre-Pandemic Equivalent Day,LIRR: Total Estimated Ridership,LIRR: % Change From 2019 Monthly Weekday/Saturday/Sunday Average,Metro-North: Total Estimated Ridership,Metro-North: % Change From 2019 Monthly Weekday/Saturday/Sunday Average,Access-A-Ride: Total Scheduled Trips,Access-A-Ride: % Change From Pre-Pandemic Equivalent Day,Bridges and Tunnels: Total Traffic,Bridges and Tunnels: % Change From Pre-Pandemic Equivalent Day,Unnamed: 13
0,4/27/2021,2042686,-64.5%,1136003,-50.4%,89000.0,-71%,69300.0,-76%,22603,-23.7%,839788,-9.6%,
1,4/26/2021,1957949,-64.8%,1136590,-49.4%,89300.0,-71%,68600.0,-76%,21083,-22.6%,826605,-8.7%,
2,4/25/2021,984480,-59.3%,538059,-45.8%,34100.0,-63%,37100.0,-64%,9777,-44.0%,695863,-16.9%,
3,4/24/2021,1519427,-53.5%,806098,-39.7%,51600.0,-55%,78900.0,-47%,13351,-18.6%,854964,-7.3%,
4,4/23/2021,2124496,-57.9%,1178025,-32.6%,92800.0,-70%,76100.0,-73%,22035,-13.2%,920588,2.0%,


In [4]:
engine = create_engine("sqlite:///Data/mta.db")
turnstile_df_21 = pd.read_sql("SELECT * FROM turnstile_data WHERE DATE LIKE '%2021';", engine)

In [5]:
turnstile_df_21.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3377520 entries, 0 to 3377519
Data columns (total 11 columns):
 #   Column                                                                Dtype 
---  ------                                                                ----- 
 0   C/A                                                                   object
 1   UNIT                                                                  object
 2   SCP                                                                   object
 3   STATION                                                               object
 4   LINENAME                                                              object
 5   DIVISION                                                              object
 6   DATE                                                                  object
 7   TIME                                                                  object
 8   DESC                                                          

In [6]:
turnstile_df_21.columns = turnstile_df_21.columns.str.replace(' ','')

In [7]:
mta_dfs = [turnstile_df_21]
#mta_dfs = [turnstile_df_19, turnstile_df_20, turnstile_df_21]

for mta_df in mta_dfs:
    
    mta_df['DATETIME'] = pd.to_datetime(mta_df.DATE + " " + mta_df.TIME, 
                                        format="%m/%d/%Y %H:%M:%S")
    
    mta_df['TURNSTILES'] = mta_df['C/A'] + " - " +\
                           mta_df['UNIT'] + " - " +\
                           mta_df['SCP'] + " - " +\
                           mta_df['STATION'] 

In [8]:
turnstile_df_21 = turnstile_df_21[['TURNSTILES', 'C/A', 'UNIT', 'SCP', 'STATION', 'LINENAME', 'DATETIME', 'DATE', 'TIME',
                   'ENTRIES', 'EXITS']]

In [14]:
turnstile_df_21['ENTRIES'] = turnstile_df_21['ENTRIES'].astype('int')
turnstile_df_21['EXITS'] = turnstile_df_21['EXITS'].astype('int')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  turnstile_df_21['ENTRIES'] = turnstile_df_21['ENTRIES'].astype('int')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  turnstile_df_21['EXITS'] = turnstile_df_21['EXITS'].astype('int')


In [11]:
turnstile_df_21.describe()

  turnstile_df_21.describe()


Unnamed: 0,TURNSTILES,C/A,UNIT,SCP,STATION,LINENAME,DATETIME,DATE,TIME,ENTRIES,EXITS
count,3377520,3377520,3377520,3377520,3377520,3377520.0,3377520,3377520,3377520,3377520.0,3377520.0
unique,5056,749,468,226,378,114.0,216474,113,62524,2127757.0,2686004.0
top,N339A - R114 - 00-00-00 - PARSONS BLVD,PTH22,R549,00-00-00,34 ST-PENN STA,1.0,2021-04-05 08:00:00,01/01/2021,04:00:00,0.0,0.0
freq,810,29007,46309,313237,69058,413514.0,2562,30696,239910,47280.0,14505.0
first,,,,,,,2021-01-01 00:00:00,,,,
last,,,,,,,2021-04-23 23:59:55,,,,


In [13]:
(turnstile_df_21.groupby(['TURNSTILES','DATETIME'])
['ENTRIES', 'EXITS'].count()
.reset_index()
.sort_values(["ENTRIES", "EXITS"], ascending=False)).head()

  (turnstile_df_21.groupby(['TURNSTILES','DATETIME'])


Unnamed: 0,TURNSTILES,DATETIME,ENTRIES,EXITS
304390,B028 - R136 - 01-00-01 - SHEEPSHEAD BAY,2021-01-08 04:00:00,2,2
912572,N071 - R013 - 00-00-00 - 34 ST-PENN STA,2021-04-08 08:00:00,2,2
913251,N071 - R013 - 00-00-01 - 34 ST-PENN STA,2021-04-08 08:00:00,2,2
913930,N071 - R013 - 00-00-02 - 34 ST-PENN STA,2021-04-08 08:00:00,2,2
914609,N071 - R013 - 00-00-03 - 34 ST-PENN STA,2021-04-08 08:00:00,2,2


In [17]:
turnstile_df_21['ENTRIES'].describe()

count    3.377520e+06
mean     4.215707e+07
std      2.186629e+08
min      0.000000e+00
25%      2.253830e+05
50%      1.505995e+06
75%      6.173308e+06
max      2.147432e+09
Name: ENTRIES, dtype: float64

In [18]:
turnstile_df_21['EXITS'].describe()

count    3.377520e+06
mean     3.392197e+07
std      1.943887e+08
min      0.000000e+00
25%      9.431400e+04
50%      9.045045e+05
75%      4.055988e+06
max      2.123068e+09
Name: EXITS, dtype: float64

In [96]:
perc_25_entries = turnstile_df_21['ENTRIES'].quantile(.25)
perc_75_entries = turnstile_df_21['ENTRIES'].quantile(.75)

In [97]:
perc_25_exits = turnstile_df_21['EXITS'].quantile(.25)
perc_75_exits = turnstile_df_21['EXITS'].quantile(.75)

# DATA CLEANING Part 1

A quick exploration of the dataset reveals many cleaning tasks. There are a number of duplicate rows, the exits and entries columns contain outliers that are far from the mean, the time column reveals 62524 instead of the expected 14. The entries and exits columns show cumulative values instead of the number of entries at that point in time. 

The next steps will include:
1. Remove the duplicate values 
2. Locate the outliers and save their indexes. Use the unique identifiers to replace the outlier values with numbers from a previous year if traffic patterns are similar to current.
3. Check the unique time values 
4. Calculate the number of entries and exits  

In [None]:
turnstile_df_21.sort_values(['TURNSTILES','DATETIME'], 
                   ascending = True, inplace = True)
turnstile_df_21.drop_duplicates(subset = ['TURNSTILES', 'DATETIME'], keep = 'first',
                      inplace = True)

In [None]:
turnstile_df_21['TIME'].value_counts().head(40)

This code formats the 6000+ time stamps to a consistent four hour time stamp from 12am to 12am. 
```
temp_time = turnstile_df_21['TIME'].reset_index().copy()
temp_time['first3'] = temp_time['TIME'].str[:4]
temp_time['first3'] = temp_time['first3'].str.replace(":", ".").astype('float')
temp_time['first3'] = temp_time['first3'].apply(lambda x: np.round(x,0))
time_bin = [0.0, 4.0, 8.0, 12.0, 16.0, 20.0, 24.0]
temp_time['first3'] = temp_time['first3'].apply(lambda x: time_bin[np.digitize(x,time_bin, right = True)])
temp_time['first3']['TIME'] = temp_time['first3']['TIME'].astype('str')
time_dict = {'0.0': '00:00:00' , '4.0': '04:00:00' , '8.0': '08:00:00' , '12.0': '12:00:00', 
             '16.0': '16:00:00', '20.0': '20:00:00', '24.0': '00:00:00' }    
time_list = list(temp_time['first3'])
turnstile_df_21['TIME'] = time_list

```

In [342]:
turnstile_df_21.drop('DATETIME',axis = 1, inplace = True)
turnstile_df_21['DATETIME'] = pd.to_datetime(turnstile_df_21.DATE + " " + turnstile_df_21.TIME, 
                                        format="%m/%d/%Y %H:%M:%S")

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  turnstile_df_21['DATETIME'] = pd.to_datetime(turnstile_df_21.DATE + " " + turnstile_df_21.TIME,


In [None]:
turnstile_df_21[["PREV_DATE", "PREV_ENTRIES", "PREV_EXITS"]] = (turnstile_df_21
                                                       .groupby(["TURNSTILES"])["DATE", "ENTRIES", "EXITS"]
                                                       .apply(lambda grp: grp.shift(1)))
turnstile_df_21.dropna(subset=["PREV_DATE"], axis=0, inplace=True)

In [52]:
exit_mask_0 = (turnstile_df_21['EXITS'] == 0) & (turnstile_df_21['DATE'] > '03/01/2021')
turnstile_df_21[exit_mask_0].shape

(63291, 11)

In [55]:
entry_mask_0 = (turnstile_df_21['ENTRIES'] == 0) & (turnstile_df_21['DATE'] > '03/01/2021')
turnstile_df_21[entry_mask_0].shape

(22046, 11)

In [91]:
turnstile_df_21[entry_mask_0].groupby('STATION')['TURNSTILES'].value_counts()

  turnstile_df_21[entry_mask_0].groupby('STATION')['TURNSTILES'].value_counts()


STATION          TURNSTILES                               
111 ST           N138 - R355 - 01-04-01 - 111 ST              319
14 ST            N513 - R163 - 04-05-01 - 14 ST               316
168 ST           N013 - R035 - 02-05-01 - 168 ST              314
                 N012 - R035 - 01-05-00 - 168 ST               94
175 ST           N011 - R126 - 01-05-01 - 175 ST              317
                                                             ... 
THIRTY ST        PTH13 - R541 - 00-00-00 - THIRTY ST            1
THIRTY THIRD ST  PTH17 - R541 - 01-00-00 - THIRTY THIRD ST    303
UTICA AV         N120A - R153 - 01-05-01 - UTICA AV           328
W 4 ST-WASH SQ   N083 - R138 - 01-05-00 - W 4 ST-WASH SQ        9
W 8 ST-AQUARIUM  G015 - R312 - 01-05-01 - W 8 ST-AQUARIUM      44
Name: TURNSTILES, Length: 82, dtype: int64

In [None]:
#Think about what to do with turnstiles with zero. We can exclude the stations if they are coming from stations with low traffic and if there are a lot of zero entries within March 1st, 2021.
#turnstile_df_21_not_performing = pd.concat(turnstile_df_21[exit_mask_0],entry_mask_0)

In [177]:
total_traffic_21 = daily_total_check_df.groupby('YEAR')['SUBWAY_TOTAL'].sum()['2021']
print(len(str(total_traffic_21)))

9


In [172]:
total_traffic_21

178219650

In [339]:
#Identify irregular entries with over 9 length. Exclude if they're coming from low traffic stations or replace them with 2020's or 2019's values. 
turnstile_df_21['irr_entry']=turnstile_df_21['ENTRIES'].apply(lambda x: len(str(x))>=8) 
turnstile_df_21['irr_exit']=turnstile_df_21['EXITS'].apply(lambda x: len(str(x))>=8) 
irr_entry_df = turnstile_df_21[turnstile_df_21['irr_entry'] == True]
irr_exit_df = turnstile_df_21[turnstile_df_21['irr_exit'] == True]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  turnstile_df_21['irr_entry']=turnstile_df_21['ENTRIES'].apply(lambda x: len(str(x))>=8)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  turnstile_df_21['irr_exit']=turnstile_df_21['EXITS'].apply(lambda x: len(str(x))>=8)


In [340]:
irr_entry_df.shape

(510763, 16)

In [341]:
irr_exit_df.shape

(364353, 16)

In [344]:
clean_turnstile_df_21 = turnstile_df_21[~(turnstile_df_21['irr_entry'] == True) & 
                                        ~(turnstile_df_21['irr_exit'] == True)]

In [345]:
clean_turnstile_df_21.shape

(2767875, 16)

# DATA CLEANING Part 2

We reformatted the dates to fall into date ranges 0-4,4-8,8-12,4-8,8-12 to make it easier for analysis. The outliers that fall outside of the total number of traffic in January to April 2021 have been masked and removed. 

Before we calculate the entries and exits for a particular point in time, we need to perform a gutcheck. The ideal stiatuion is to have all the previous entries be less than the current entries. We want to check if there are situations where PREV_ENTRIES > ENTRIES or PREV_EXITS > EXITS and then decide how to calculate the entries and exits. We can sort the entries and exits descending order with a groupby. 

In [351]:
mask = (turnstile_df_21["ENTRIES"] < turnstile_df_21["PREV_ENTRIES"])
turnstile_df_21[mask].groupby(["TURNSTILES"]).size().sort_values(ascending = False).head(20)

TURNSTILES
N063A - R011 - 00-00-08 - 42 ST-PORT AUTH    678
C008 - R099 - 00-00-00 - DEKALB AV           676
N063A - R011 - 00-00-04 - 42 ST-PORT AUTH    676
R401 - R445 - 00-00-00 - 3 AV 138 ST         675
H003 - R163 - 01-00-02 - 6 AV                674
N601 - R319 - 00-00-01 - LEXINGTON AV/63     673
R127 - R105 - 00-00-00 - 14 ST               673
R161B - R452 - 00-00-03 - 72 ST              671
H023 - R236 - 00-00-01 - DEKALB AV           671
N006A - R280 - 00-00-00 - 190 ST             671
N203 - R195 - 00-00-01 - 161/YANKEE STAD     670
N606 - R025 - 00-00-01 - JAMAICA CENTER      668
N506 - R022 - 00-05-03 - 34 ST-HERALD SQ     667
N207 - R104 - 00-00-00 - 167 ST              666
N063A - R011 - 00-00-05 - 42 ST-PORT AUTH    666
H023 - R236 - 00-06-00 - DEKALB AV           666
A011 - R080 - 01-03-00 - 57 ST-7 AV          666
N215 - R237 - 00-00-02 - 182-183 STS         665
R304 - R206 - 00-00-00 - 125 ST              664
A066 - R118 - 00-00-00 - CANAL ST            663
dtype: in

In [366]:
clean_mask = (clean_turnstile_df_21["ENTRIES"] < clean_turnstile_df_21["PREV_ENTRIES"])
clean_turnstile_df_21[mask].groupby(["TURNSTILES"]).size().sort_values(ascending = False).head(20)

  clean_turnstile_df_21[mask].groupby(["TURNSTILES"]).size().sort_values(ascending = False).head(20)


TURNSTILES
PTH03 - R552 - 00-01-08 - JOURNAL SQUARE     18
PTH03 - R552 - 00-01-01 - JOURNAL SQUARE     10
PTH04 - R551 - 00-04-06 - GROVE STREET        9
PTH22 - R540 - 00-01-04 - PATH NEW WTC        5
N012 - R035 - 01-05-00 - 168 ST               4
PTH03 - R552 - 00-05-03 - JOURNAL SQUARE      4
PTH11 - R545 - 00-00-02 - 14TH STREET         4
H007A - R248 - 02-05-00 - 1 AV                4
R201 - R041 - 00-00-02 - BOWLING GREEN        3
PTH04 - R551 - 00-04-04 - GROVE STREET        3
R240 - R047 - 00-03-00 - GRD CNTRL-42 ST      3
PTH09 - R548 - 00-00-00 - CHRISTOPHER ST      3
B032 - R264 - 00-00-01 - OCEAN PKWY           3
PTH02 - R544 - 00-06-06 - HARRISON            3
PTH07 - R550 - 00-00-00 - CITY / BUS          3
PTH01 - R549 - 00-00-08 - NEWARK HW BMEBE     3
N400A - R359 - 02-00-02 - COURT SQ            3
PTH20 - R549 - 03-00-05 - NEWARK HM HE        2
R203A - R043 - 01-05-00 - WALL ST             2
N083 - R138 - 01-06-00 - W 4 ST-WASH SQ       2
dtype: int64

In [368]:
clean_turnstile_df_21[clean_turnstile_df_21['TURNSTILES'] == 'PTH03 - R552 - 00-01-08 - JOURNAL SQUARE']

Unnamed: 0,TURNSTILES,C/A,UNIT,SCP,STATION,LINENAME,DATE,TIME,ENTRIES,EXITS,PREV_DATE,PREV_ENTRIES,PREV_EXITS,irr_entry,irr_exit,DATETIME
17610,PTH03 - R552 - 00-01-08 - JOURNAL SQUARE,PTH03,R552,00-01-08,JOURNAL SQUARE,1,01/01/2021,00:00:00,13725,10317,01/01/2021,13723.0,10293.0,False,False,2021-01-01 00:00:00
17606,PTH03 - R552 - 00-01-08 - JOURNAL SQUARE,PTH03,R552,00-01-08,JOURNAL SQUARE,1,01/01/2021,04:00:00,13708,10265,01/01/2021,13708.0,10253.0,False,False,2021-01-01 04:00:00
17607,PTH03 - R552 - 00-01-08 - JOURNAL SQUARE,PTH03,R552,00-01-08,JOURNAL SQUARE,1,01/01/2021,12:00:00,13711,10266,01/01/2021,13708.0,10265.0,False,False,2021-01-01 12:00:00
17608,PTH03 - R552 - 00-01-08 - JOURNAL SQUARE,PTH03,R552,00-01-08,JOURNAL SQUARE,1,01/01/2021,16:00:00,13717,10273,01/01/2021,13711.0,10266.0,False,False,2021-01-01 16:00:00
17609,PTH03 - R552 - 00-01-08 - JOURNAL SQUARE,PTH03,R552,00-01-08,JOURNAL SQUARE,1,01/01/2021,20:00:00,13723,10293,01/01/2021,13717.0,10273.0,False,False,2021-01-01 20:00:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3287810,PTH03 - R552 - 00-01-08 - JOURNAL SQUARE,PTH03,R552,00-01-08,JOURNAL SQUARE,1,04/23/2021,04:00:00,26959,17513,04/23/2021,26955.0,17503.0,False,False,2021-04-23 04:00:00
3287811,PTH03 - R552 - 00-01-08 - JOURNAL SQUARE,PTH03,R552,00-01-08,JOURNAL SQUARE,1,04/23/2021,08:00:00,27023,17519,04/23/2021,26959.0,17513.0,False,False,2021-04-23 08:00:00
3287812,PTH03 - R552 - 00-01-08 - JOURNAL SQUARE,PTH03,R552,00-01-08,JOURNAL SQUARE,1,04/23/2021,12:00:00,1,0,04/23/2021,27023.0,17519.0,False,False,2021-04-23 12:00:00
3287813,PTH03 - R552 - 00-01-08 - JOURNAL SQUARE,PTH03,R552,00-01-08,JOURNAL SQUARE,1,04/23/2021,16:00:00,27063,17527,04/23/2021,1.0,0.0,False,False,2021-04-23 16:00:00


# DATA ANALYSIS WITH ONLY TURNSTILE DATA

After finding the entries and exits values, we can combine the two values to find the total traffic for a particular turnstile at a time in day. 

Questions:

1. Find the top 20 stations with the highest number of exits, enteries, traffic
    - Now find the top stations with only one or two lines with the highest number of exits, entries, traffic
2. Using the results from question one, we find the stations with highest exits, entries, traffics for time ranges 8-12, 12-4, 4-8 *meal hours
    
    - Which stations have the most entries around 8-12 am
    - Which stations have the most exits around 4-8pm pm
 
    
    - Which stations have the most exits around 8-12 am?
    - Which stations have the most entriess around 4 - 8 pm? 
    
3. Find the average total of exits, entries, traffic for each weekday
    - Do entries = exits?
    - Is traffic consistent throughout the weekdays
    - Using total traffic establish percentage of people in certain stations?


# DATA VIZUALIZATIONS WITH ONLY TURNSTILE DATA

Plot the answers to the questions to itentify insights and potential gaps in data.

- Bar Chart -> Top 20 stations highest exits, entries, traffic 
- Line chart -> Consistency of Entries and Exits over time for a station (We're looking for consistent traffic)
- Scatter Plot -> Exits versus Entries for a particular station 
- Heatmat -> Traffic flow during the weekday by TIME of a particular station

# ADDING FARE AND LOCATION DATASETS

# CONCLUSION

# FUTURE IDEAS