Load in data from turnstile file

### Back Story

An email from a potential client:

> Lara & Alice -
>
> It was great to meet with you and chat at the event where we recently met and had a nice chat. We’d love to take some next steps to see if working together is something that would make sense for both parties.
>
> As we mentioned, we are interested in harnessing the power of data and analytics to optimize the effectiveness of our street team work, which is a significant portion of our fundraising efforts.
>
> WomenTechWomenYes (WTWY) has an annual gala at the beginning of the summer each year. As we are new and inclusive organization, we try to do double duty with the gala both to fill our event space with individuals passionate about increasing the participation of women in technology, and to concurrently build awareness and reach.
>
> To this end we place street teams at entrances to subway stations. The street teams collect email addresses and those who sign up are sent free tickets to our gala.
>
> Where we’d like to solicit your engagement is to use MTA subway data, which as I’m sure you know is available freely from the city, to help us optimize the placement of our street teams, such that we can gather the most signatures, ideally from those who will attend the gala and contribute to our cause.
>
> The ball is in your court now—do you think this is something that would be feasible for your group? From there we can explore what kind of an engagement would make sense for all of us.
>
> Best,
>
> Karrine and Dahlia
>
> WTWY International




#### Data:

 * MTA Data (Google it!)
 * Additional data sources welcome!

#### Skills:

 * `python` and `pandas`
 * visualizations via Matplotlib & seaborn

#### Analysis:

 * Exploratory Data Analysis


#### Deliverable/communication:


In [1]:
# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
#import seaborn as sns

%matplotlib inline

# set display options
pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 200)

# load data from MTA
df = pd.read_csv('./data/turnstile_180922.txt')

# convert time data into datetime objects
df['TIMING'] = pd.to_datetime(df['DATE'] + ' ' + df['TIME'],format = '%m/%d/%Y %H:%M:%S' )

In [19]:
df.head()

Unnamed: 0,C/A,UNIT,SCP,STATION,LINENAME,DIVISION,DATE,TIME,DESC,ENTRIES,EXITS,TIMING
0,A002,R051,02-00-00,59 ST,NQR456W,BMT,09/15/2018,00:00:00,REGULAR,6759219,2291425,2018-09-15 00:00:00
1,A002,R051,02-00-00,59 ST,NQR456W,BMT,09/15/2018,04:00:00,REGULAR,6759234,2291429,2018-09-15 04:00:00
2,A002,R051,02-00-00,59 ST,NQR456W,BMT,09/15/2018,08:00:00,REGULAR,6759251,2291453,2018-09-15 08:00:00
3,A002,R051,02-00-00,59 ST,NQR456W,BMT,09/15/2018,12:00:00,REGULAR,6759330,2291532,2018-09-15 12:00:00
4,A002,R051,02-00-00,59 ST,NQR456W,BMT,09/15/2018,16:00:00,REGULAR,6759538,2291574,2018-09-15 16:00:00


In [27]:
df.groupby(['STATION','TIMING']).ENTRIES.size()

STATION          TIMING             
1 AV             2018-09-15 00:00:00    10
                 2018-09-15 04:00:00    10
                 2018-09-15 08:00:00    10
                 2018-09-15 12:00:00    10
                 2018-09-15 16:00:00    10
                 2018-09-15 20:00:00    10
                 2018-09-16 00:00:00    10
                 2018-09-16 04:00:00    10
                 2018-09-16 08:00:00    10
                 2018-09-16 12:00:00    10
                 2018-09-16 16:00:00    10
                 2018-09-16 20:00:00    10
                 2018-09-17 00:00:00    10
                 2018-09-17 04:00:00    10
                 2018-09-17 08:00:00    10
                 2018-09-17 12:00:00    10
                 2018-09-17 16:00:00    10
                 2018-09-17 20:00:00    10
                 2018-09-18 00:00:00    10
                 2018-09-18 04:00:00    10
                 2018-09-18 08:00:00    10
                 2018-09-18 12:00:00    10
                 

In [31]:
(df[
 df['STATION']=='1 AV']
 .sort_values(['SCP','TIMING'])
 .ENTRIES
)

30340      13935028
30341      13935146
30342      13935209
30343      13935751
30344      13936521
30345      13937295
30346      13937772
30347      13937895
30348      13937938
30349      13938317
30350      13938951
30351      13939593
30352      13939870
30353      13939898
30354      13940333
30355      13941526
30356      13942407
30357      13943442
30358      13943784
30359      13943784
30360      13944186
30361      13945380
30362      13946379
30363      13947434
30364      13947734
30365      13947734
30366      13948025
30367      13949198
30368      13949973
30369      13950873
30370      13951231
30371      13951231
30372      13951554
30373      13952710
30374      13953674
30375      13954908
30376      13955290
30377      13955290
30378      13955620
30379      13956828
30380      13957859
30381      13959046
30382      59333625
30383      59333800
30384      59333926
30385      59334585
30386      59335577
30387      59336628
30388      59337286
30389      59337468


In [32]:
df[df['DESC']=='RECOVR AUD']

Unnamed: 0,C/A,UNIT,SCP,STATION,LINENAME,DIVISION,DATE,TIME,DESC,ENTRIES,EXITS,TIMING
1142,A010,R080,00-00-00,57 ST-7 AV,NQRW,BMT,09/21/2018,08:00:00,RECOVR AUD,3667922,1734377,2018-09-21 08:00:00
1184,A010,R080,00-00-01,57 ST-7 AV,NQRW,BMT,09/21/2018,08:00:00,RECOVR AUD,14656870,5538653,2018-09-21 08:00:00
1226,A010,R080,00-00-02,57 ST-7 AV,NQRW,BMT,09/21/2018,08:00:00,RECOVR AUD,10676499,4185373,2018-09-21 08:00:00
1268,A010,R080,00-00-03,57 ST-7 AV,NQRW,BMT,09/21/2018,08:00:00,RECOVR AUD,3187014,1575879,2018-09-21 08:00:00
1310,A010,R080,00-00-04,57 ST-7 AV,NQRW,BMT,09/21/2018,08:00:00,RECOVR AUD,3976988,1998272,2018-09-21 08:00:00
1352,A010,R080,00-00-05,57 ST-7 AV,NQRW,BMT,09/21/2018,08:00:00,RECOVR AUD,1041670,297490,2018-09-21 08:00:00
1394,A010,R080,00-00-06,57 ST-7 AV,NQRW,BMT,09/21/2018,08:00:00,RECOVR AUD,18074098,5092966,2018-09-21 08:00:00
1436,A010,R080,00-00-07,57 ST-7 AV,NQRW,BMT,09/21/2018,08:00:00,RECOVR AUD,717797,90046,2018-09-21 08:00:00
1478,A011,R080,01-00-00,57 ST-7 AV,NQRW,BMT,09/21/2018,08:00:00,RECOVR AUD,885875210,490879476,2018-09-21 08:00:00
1520,A011,R080,01-00-01,57 ST-7 AV,NQRW,BMT,09/21/2018,08:00:00,RECOVR AUD,16722369,18155144,2018-09-21 08:00:00
