# Baking Stations

At the interface layer I'd like to be able to bring up the actual counts of the datasets of interest. This I can place in a flat file&mdash;all the database stuff isn't necessary.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import requests
import io
import zipfile
from tqdm import tqdm

In [2]:
r = requests.get('https://s3.amazonaws.com/tripdata/201606-citibike-tripdata.zip')
with zipfile.ZipFile(io.BytesIO(r.content)) as ar:
    trip_data = pd.read_csv(ar.open('201606-citibike-tripdata.csv'))

In [3]:
july_22 = pd.read_csv("../data/final/all_june_22_citibike_trips.csv", index_col=0)

In [4]:
all_station_ids = set(trip_data['start station id'].values).union(set(trip_data['end station id'].values))

In [8]:
trip_data.head(1)

Unnamed: 0,tripduration,starttime,stoptime,start station id,start station name,start station latitude,start station longitude,end station id,end station name,end station latitude,end station longitude,bikeid,usertype,birth year,gender
0,1470,6/1/2016 00:00:18,6/1/2016 00:24:48,380,W 4 St & 7 Ave S,40.734011,-74.002939,3236,W 42 St & Dyer Ave,40.758985,-73.9938,19859,Subscriber,1972.0,1


In [44]:
subframes = []
start_counts = []
end_counts = []

for station_id in tqdm(all_station_ids):
    started_here = trip_data[trip_data['start station id'] == station_id]
    ended_here = trip_data[trip_data['end station id'] == station_id]
    start_counts.append(len(july_22[july_22['start station id'] == station_id]))
    end_counts.append(len(july_22[july_22['end station id'] == station_id]))
    if len(started_here) > 0:
        s = started_here.iloc[0]
        subframes.append({'latitude': s['start station latitude'],
                          'longitude': s['start station longitude'],
                          'station name': s['start station name'],
                          'station id': s['start station id']})
    elif len(ended_here) > 0:
        s  = ended_here.iloc[0]
        subframes.append({'latitude': s['end station latitude'],
                          'longitude': s['end station longitude'],
                          'station name': s['end station name'],
                          'station id': s['end station id']})
    else:
        print("ERRROR")
        
stations = pd.DataFrame(subframes).set_index('station id', drop=True)
stations['incoming trips'] = start_counts
stations['outgoing trips'] = end_counts
stations['all trips'] = np.array(start_counts) + np.array(end_counts)
stations.index.name = 'station id'
del subframes

100%|████████████████████████████████████████| 489/489 [00:05<00:00, 95.56it/s]


In [45]:
stations.head(5)

Unnamed: 0_level_0,latitude,longitude,station name,incoming trips,outgoing trips,all trips
station id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
72,40.767272,-73.993929,W 52 St & 11 Ave,133,148,281
79,40.719116,-74.006667,Franklin St & W Broadway,113,115,228
82,40.711174,-74.000165,St James Pl & Pearl St,53,46,99
83,40.683826,-73.976323,Atlantic Ave & Fort Greene Pl,43,51,94
116,40.741776,-74.001497,W 17 St & 8 Ave,265,267,532


In [47]:
stations.to_csv("../data/final/july_22_station_metadata.csv")

Conclusion:

Ok, so at this point I am going to defer on doing the bike roundabout thing because it actually involves a *lot* more work.

I will concentrate on mapping stations instead, for now, and gathering up all of the data (which I am realizing will also need to be transformed yet again later...sigh).

In [None]:
# all_bike_ids = set(july_22['bike id'])
# station_counts = pd.Series(0, index=stations.index)

# for bike_id in all_bike_ids:
#     trips = july_22[july_22['start station id'] == station_id]
#     start_station_trip = july_22.sort_values(by='starttime', ascending=False).iloc[0]
#     start_station = start_station_trip['start station id']
#     station_counts