In [None]:
# Formatting the data

At this point the data is saved locally in `csv` format as `sample_trips.csv`, we need to grab that file and throw it through the wringer to format it into a GeoJSON store that can be consumed by `d3.js`.

In [1]:
import googlemaps
import json
import os
from polyline.codec import PolylineCodec
import geojson
import pandas as pd
import mplleaflet
import matplotlib.pyplot as plt

In [19]:
%matplotlib inline

First retrieve the data from the `csv` store.

In [2]:
sample_trips = pd.read_csv("sample_trips.csv", index_col=0)

We now need pass the data through the Google Maps API and passing it to a `GeoJSON`-formatted `dict` that we can save and have `d3.js` consume. This is a complex process; here are the steps:

1. Call the Google Maps Directions API using the `googlemaps` module and parse out the [encoded polylines](https://developers.google.com/maps/documentation/utilities/polylinealgorithm).
2. Use the `polyline` module to dehash these into coordinates.
3. The Google Maps Directions API represents each trip as a series of subcomponents called "legs". For our purposes we want the entire coordinate list proper, so we have to flatten these coordinate sublists into a single list.
4. Package these coordinates into a unified GeoJSON `FeaturesList`, in which each set of coordinates is a single `Feature` consisting of `Point` entities (all represented as objects from the `geojson` module in-code). (??? still a WIP...)
5. Save this as `sample_trips.geojson`. Done!

Set up Google Maps API access. You need to have your Google Maps API credentials stored locally as `google_maps_api_key.json` using the following format:

> `{ "key": "..." }`

See the [Google Developer Console](https://console.developers.google.com/) for information getting your own API key!

In [3]:
def import_credentials(filename='google_maps_api_key.json'):
    if filename in [f for f in os.listdir('.') if os.path.isfile(f)]:
        data = json.load(open(filename))['key']
        return data
    else:
        raise IOError(
            'This API requires a Google Maps credentials token to work. Did you forget to define one?')
        
gmaps = googlemaps.Client(key=import_credentials())

First I wrote and tested a method which passes directions to Google Maps, and parses it down to coordinates, and a `mplleaflet` mapper to check the results.

In [4]:
def get_path(start, end, client):
    req = client.directions(start, end, mode='bicycling')
    polylines = [step['polyline']['points'] for step in [leg['steps'] for leg in req[0]['legs']][0]]
    coords = []
    for polyline in polylines:
        coords += PolylineCodec().decode(polyline)
    return coords

In [33]:
def plot_path(coords):
    x_coords = [coord[1] for coord in coords] # Notice the position swap!
    y_coords = [coord[0] for coord in coords]
    plt.hold(True)
    plt.plot(x_coords, y_coords, 'b')
    # mplleaflet.display(fig=plt.figure())
    mplleaflet.show()

The following opens the full `Leaflet` plot in a seperate window.

In [35]:
test_path = get_path([40.76727216,-73.99392888], [40.701907,-74.013942], gmaps) # actual path from the data

In [36]:
plot_path(test_path)

Before we can go further, however, we have to address an issue that crops up. The Citibike dataset includes basic demographic information about bikes' users (`birth year` and `gender`) which is available only for subscribers (that is, records with `usertype=subscriber`). Non-subscribers (customers, probably primarily tourists, which are indicated by a `usertype=Customer` value) are not required to provide this data to ride, and so these columns are unavailable in these cases.

The Citibike dataset encodes unknown `gender` as 0.0 (`1.0` is male, `2.0` is female) and it encodes unknown `birth year` as a blank space which `pandas` converts to an `np.nan` value. However, the `geojson` spec doesn't support `np.nan` or other such sentinal values, so if we try to pass a `birth year=np.nan` to `geojson.dumps` and/or validate our data it will not take. This is easy to fix: to maintain formatting consistency we simply replace `birth year=np.nan` with `birth year=0.0` (as with `gender`) using `pd.fillna()`.


In [135]:
sample_trips = sample_trips.fillna(0.0)

The following method reads the list of paths associated with a particular bicycle from our dataset and packages it into a `FeatureCollection` containing the bicycle's entire trip-week.

In [136]:
def get_bike_paths(df, bike_id, client):
    feature_list = []
    for row in df[df['bikeid'] == bike_id].iterrows():
        trip = row[1]
        path = get_path([trip['start station latitude'], trip['start station longitude']],
                        [trip['end station latitude'], trip['end station longitude']],
                        client)
        feature = geojson.Feature(geometry=geojson.LineString(path, properties=trip.to_dict()))
        feature_list.append(feature)
    return geojson.FeatureCollection(feature_list, properties={'bike_id': bike_id})

For now I'm working with only a single sample. Once I can verify that this sample works and get the frontend working around it, I can extend the method above to calling it on every bike in our sample dataset and extend the visualization around all of the samples.

In [137]:
one_feature_collection = get_bike_paths(sample_trips, 23367.0, gmaps)

In [141]:
with open('one_trip.geojson', 'w') as outfile:
    outfile.write(geojson.dumps(one_feature_collection))

In [143]:
!dir

 Volume in drive C is SSD_80GB
 Volume Serial Number is 9279-00B2

 Directory of C:\Users\Alex\Desktop\citibike

04/27/2016  05:11 PM    <DIR>          .
04/27/2016  05:11 PM    <DIR>          ..
04/26/2016  09:59 PM                45 .gitignore
04/26/2016  09:34 PM    <DIR>          .ipynb_checkpoints
04/27/2016  05:11 PM            12,210 further-parsing.ipynb
04/26/2016  07:31 PM                56 google_maps_api_key.json
04/26/2016  09:39 PM            60,512 initial-parsing.ipynb
04/27/2016  05:12 PM           127,518 one_trip.geojson
04/26/2016  09:37 PM            43,030 sample_trips.csv
04/27/2016  03:09 PM            13,266 _map.html
               7 File(s)        256,637 bytes
               3 Dir(s)  10,286,772,224 bytes free
