# Getting the data

Citibike data is provided as `zip` files containing `csv` files organized by trip start time. For the purposes of visualization we need to download a month's sample of CitiBike trips, unpack it, load the `csv` into a `pandas` `DataFrame`, reslice the data by bike-day, draw samples from this resliced data, and save the results.

In [64]:
import pandas as pd
import requests
import zipfile
import io

Download a month's sample of CitiBike trips, unpack it into a `csv`, and load it into a `pandas` `DataFrame`.

In [65]:
r = requests.get('https://s3.amazonaws.com/tripdata/201603-citibike-tripdata.zip')
with zipfile.ZipFile(io.BytesIO(r.content)) as ar:
    trip_data = pd.read_csv(ar.open('201603-citibike-tripdata.csv'))

In [70]:
trip_data.head(5)

Unnamed: 0,tripduration,starttime,stoptime,start station id,start station name,start station latitude,start station longitude,end station id,end station name,end station latitude,end station longitude,bikeid,usertype,birth year,gender
0,1491,3/1/2016 06:52:42,3/1/2016 07:17:33,72,W 52 St & 11 Ave,40.767272,-73.993929,427,Bus Slip & State St,40.701907,-74.013942,23914,Subscriber,1982.0,1
1,1044,3/1/2016 07:05:50,3/1/2016 07:23:15,72,W 52 St & 11 Ave,40.767272,-73.993929,254,W 11 St & 6 Ave,40.735324,-73.998004,23697,Subscriber,1978.0,1
2,714,3/1/2016 07:15:05,3/1/2016 07:26:59,72,W 52 St & 11 Ave,40.767272,-73.993929,493,W 45 St & 6 Ave,40.7568,-73.982912,21447,Subscriber,1960.0,2
3,329,3/1/2016 07:26:04,3/1/2016 07:31:34,72,W 52 St & 11 Ave,40.767272,-73.993929,478,11 Ave & W 41 St,40.760301,-73.998842,22351,Subscriber,1986.0,1
4,1871,3/1/2016 07:31:30,3/1/2016 08:02:41,72,W 52 St & 11 Ave,40.767272,-73.993929,151,Cleveland Pl & Spring St,40.722104,-73.997249,20985,Subscriber,1978.0,1


Sense check: how many bicycles are in the system?

In [71]:
len(trip_data['bikeid'].unique())

7484

Sense check: how many stations are in the system?

In [72]:
len(trip_data['start station name'].unique())

473

Everything seems tidy. We need to covert those `starttime` and `stoptime` objects into `datetime` objects.

In [105]:
trip_data['starttime'] = trip_data['starttime'].map(lambda x: datetime.strptime(x, "%m/%d/%Y %H:%M:%S"))
trip_data['stoptime'] = trip_data['stoptime'].map(lambda x: datetime.strptime(x, "%m/%d/%Y %H:%M:%S"))

Now let's take a sample of 100 random CitiBike-weeks. We'll randomly pick 10 bikes on the week starting on Saturday, March 6th.

CitiBike ridership is heavily dependent on the day of the week, the time of day, and, to a lesser extent, on the weather. This one specific week was chosen because it was one of good weather (unusually hot, even) which ought to be representative of the greater whole of ridership days in the city.

We could easily extend this to selecting a random week starting on Saturday from throughout the year or the years before, but this seems to be an unnecessary complexity.

In [120]:
import random

In [148]:
# random_days = random.sample(range(1, 31 - 7), 10)
random_bikes = random.sample(list(trip_data['bikeid'].unique()), 10)

sample_trips = pd.DataFrame(columns=trip_data.columns)

# for day, bike in zip(random_days, random_bikes):
for day, bike in zip([6]*7, random_bikes):
    selected_trips = trip_data[trip_data['starttime'] < datetime(2016, 3, day + 7)]
    selected_trips = selected_trips[selected_trips['stoptime'] >= datetime(2016, 3, day)]
    selected_trips = selected_trips[selected_trips['bikeid'] == bike]
    sample_trips = sample_trips.append(selected_trips)

One quick interesting result: CitiBike daily ridership doesn't seem to be quite what we expected it to be (hence why weeks are the unit of interest here, instead of the days used by the Life Of A Taxi viz).

In [143]:
(len(sample_trips)/10)/7

3.6

With our sample computed we can proceed to saving our output. Only `sample_trips` is saved&mdash;`trip_data` is a small and easily enough accessible file that there's not much benefit to keeping our own copy of the dataset.

In [152]:
sample_trips.index.name = 'Index'
sample_trips.to_csv("sample_trips.csv")

The next notebook handles constructing coordinate data and converting the data to the `geojson` format.

In [9]:
def import_credentials(filename='google_maps_api_key.json'):
    if filename in [f for f in os.listdir('.') if os.path.isfile(f)]:
        data = json.load(open(filename))['key']
        return data
    else:
        raise IOError(
            'This API requires a Google Maps credentials token to work. Did you forget to define one?')

In [10]:
gmaps = googlemaps.Client(key=import_credentials())

In [15]:
gmaps.directions([40.76727216,-73.99392888], [40.701907,-74.013942], mode='bicycling')

[{'bounds': {'northeast': {'lat': 40.76747049999999, 'lng': -73.993723},
   'southwest': {'lat': 40.7019074, 'lng': -74.0170893}},
  'copyrights': 'Map data ©2016 Google',
  'legs': [{'distance': {'text': '5.1 mi', 'value': 8257},
    'duration': {'text': '26 mins', 'value': 1539},
    'end_address': 'Peter Minuit Plaza, New York, NY 10004, USA',
    'end_location': {'lat': 40.7019074, 'lng': -74.0140093},
    'start_address': '600 W 52nd St, New York, NY 10019, USA',
    'start_location': {'lat': 40.7672333, 'lng': -73.9939587},
    'steps': [{'distance': {'text': '75 ft', 'value': 23},
      'duration': {'text': '1 min', 'value': 5},
      'end_location': {'lat': 40.7671295, 'lng': -73.993723},
      'html_instructions': 'Head <b>southeast</b> on <b>W 52nd St</b> toward <b>11th Ave</b>',
      'polyline': {'points': 'ejywFf}rbMRo@'},
      'start_location': {'lat': 40.7672333, 'lng': -73.9939587},
      'travel_mode': 'BICYCLING'},
     {'distance': {'text': '259 ft', 'value': 79},
 

In [16]:
req = _

In [39]:
[leg['steps'] for leg in req[0]['legs']][0]

[{'distance': {'text': '75 ft', 'value': 23},
  'duration': {'text': '1 min', 'value': 5},
  'end_location': {'lat': 40.7671295, 'lng': -73.993723},
  'html_instructions': 'Head <b>southeast</b> on <b>W 52nd St</b> toward <b>11th Ave</b>',
  'polyline': {'points': 'ejywFf}rbMRo@'},
  'start_location': {'lat': 40.7672333, 'lng': -73.9939587},
  'travel_mode': 'BICYCLING'},
 {'distance': {'text': '259 ft', 'value': 79},
  'duration': {'text': '1 min', 'value': 39},
  'end_location': {'lat': 40.7665144, 'lng': -73.9941846},
  'html_instructions': 'Turn <b>right</b> onto <b>11th Ave</b>',
  'maneuver': 'turn-right',
  'polyline': {'points': 'qiywFv{rbMzBzA'},
  'start_location': {'lat': 40.7671295, 'lng': -73.993723},
  'travel_mode': 'BICYCLING'},
 {'distance': {'text': '0.1 mi', 'value': 218},
  'duration': {'text': '1 min', 'value': 27},
  'end_location': {'lat': 40.76747049999999, 'lng': -73.9964382},
  'html_instructions': 'Turn <b>right</b> onto <b>W 51st St</b>',
  'maneuver': 'turn

In [54]:
polylines = [step['polyline']['points'] for step in [leg['steps'] for leg in req[0]['legs']][0]]

In [59]:
coords = []
for polyline in polylines:
    coords += PolylineCodec().decode(polylines[5])

In [60]:
coords

[(40.7669, -73.99703),
 (40.76564, -73.99793),
 (40.76555, -73.99799),
 (40.76451, -73.99875),
 (40.76448, -73.99877),
 (40.76445, -73.9988),
 (40.76444, -73.99882),
 (40.76441, -73.99887),
 (40.76438, -73.99891),
 (40.76436, -73.99893),
 (40.76432, -73.99895),
 (40.76379, -73.99934),
 (40.76372, -73.99939),
 (40.76366, -73.99944),
 (40.76359, -73.99949),
 (40.76355, -73.99953),
 (40.7635, -73.99957),
 (40.76345, -73.99962),
 (40.7634, -73.99968),
 (40.76335, -73.99974),
 (40.76331, -73.99981),
 (40.76327, -73.99987),
 (40.76322, -73.99995),
 (40.76317, -74.00003),
 (40.76298, -74.00031),
 (40.76281, -74.00056),
 (40.76277, -74.00062),
 (40.76272, -74.00067),
 (40.76267, -74.00073),
 (40.7625, -74.00098),
 (40.76244, -74.00106),
 (40.76239, -74.00113),
 (40.76233, -74.00121),
 (40.76229, -74.00126),
 (40.76225, -74.00131),
 (40.7622, -74.00137),
 (40.76216, -74.00141),
 (40.76212, -74.00146),
 (40.76208, -74.0015),
 (40.76203, -74.00154),
 (40.76198, -74.0016),
 (40.76192, -74.00164),


In [62]:
geojson.Feature, geojson.Point, geojson.FeatureCollection

(geojson.feature.Feature,
 geojson.geometry.Point,
 geojson.feature.FeatureCollection)