# Time-Series Building
Here we will take the uber pickup logs and the station exit data to build time series for both and construct a finalized, cleaned dataset for data exploration.

In [1]:
import pickle
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from dateutil import parser as psr
%matplotlib inline

In [2]:
# Loading the Uber pickups data and their closest stations:
with open('UberData/UberStationPickups.pickle') as file: 
    datalist = pickle.load(file)
    pickups = datalist[0]
# Loading the dictionaries of the station timeseries:
with open('StationData/StationExitsTimeSeries.pickle') as file: 
    times_sta, counts_sta, keylist = pickle.load(file)

First, we have to build the time series associated with uber pick-ups near each station...

In [4]:
pickups.tail()

Unnamed: 0,Date/Time,Lat,Lon,Base,Closest_Station,Closest_Distance
564511,4/30/2014 23:22:00,40.764,-73.9744,B02764,5TH AV-53RD ST,0.003921
564512,4/30/2014 23:26:00,40.7629,-73.9672,B02764,LEXINGTON AV,0.002042
564513,4/30/2014 23:31:00,40.7443,-73.9889,B02764,28TH ST,0.004796
564514,4/30/2014 23:32:00,40.6756,-73.9405,B02764,KINGSTON-THROOP,0.004336
564515,4/30/2014 23:48:00,40.688,-73.9608,B02764,CLASSON AV,0.001138


In [164]:
times_ube = {}
for station, times in times_sta.iteritems():
    if station not in times_ube:
        times_ube[station] = times_sta[station]

In [165]:
pickups['Date/Time'] = map(psr.parse,pickups['Date/Time']) #turn date/time string into datetime objects

In [166]:
counts_ube = {}
for station in times_ube:
    station_pickups = pickups[pickups.Closest_Station == station].sort_values(['Date/Time'])
    counts_ube[station] = []
    for t in range(len(times_ube[station])):
        if len(station_pickups) == 0:
            pickupcount = 0
        elif t == 0:
            t2 = times_ube[station][t]
            pickuplist = station_pickups[(station_pickups['Date/Time'] < t2)]
            pickupcount = len(pickuplist)
        else:
            t1 = times_ube[station][t-1]
            t2 = times_ube[station][t]
            pickuplist = station_pickups[(station_pickups['Date/Time'] > t1) & (station_pickups['Date/Time'] < t2)]
            pickupcount = len(pickuplist)
        counts_ube[station].append(pickupcount)

There are many stations where there are no pickups by uber in the dataset. Let's get rid of those stations in all our series...

In [169]:
for station in keylist:
    if all(counts == 0 for counts in counts_ube[station]):
        del times_ube[station]
        del counts_ube[station]
        del times_sta[station]
        del counts_sta[station]

In [193]:
#remake the list of stations
stationslist = list(times_sta.keys())

Now we can finally build the dataset as a dictionary where 

key = **< station name >**

value = pandas dataframe of the form **< time | exits from station | pickups from nearest station >**

In [194]:
dataset = {}
for station in stationslist:
    times = times_sta[station][17:]
    exits = counts_sta[station][17:]
    pickups = counts_ube[station][17:]
    dataset[station] = pd.DataFrame({'times' : times, 'exits' : exits, 'pickups':pickups})

In [195]:
# Saving the dataset and a list of stations:
with open('FinalTimeSeriesDataset.pickle', 'w') as file:
    pickle.dump([dataset, stationslist], file)