# Citi Bike Data
- https://www.citibikenyc.com/system-data 

This Jupyter notebook does the following:
1. Imports .json data of Station Info (consider to be static)
2. Imports .json data of Station Status (may change frequently)
3. Imports .csv of Citi Bike trips


- We'll use the Station Info/Status data to build a VeRoViz "nodes" dataframe.
- We'll use the trips data to build a VeRoViz "assignments" dataframe.

With the nodes and assignments dataframes, we can then generate Leaflet maps (static) and Cesium movies.

---

In [1]:
# We'll need these libraries
import numpy as np
import pandas as pd

In [2]:
# These libraries will help us import JSON data:
import json
import urllib.request

In [3]:
# Go ahead and import VeRoViz
import veroviz as vrv
vrv.checkVersion()

'Your current installed version of veroviz is 0.4.5. You are up-to-date with the latest available version.'

In [4]:
# I like to use "environment" variables to store "private" stuff
# (like API keys, or paths to installed files).
# We'll need the `os` library for that:
import os

# See https://veroviz.org/documentation.html#installation for details

--- 

## 1. Import Station Info (from .json)
- These data are *mostly* static...certainly won't change throughout the course of a day.

In [5]:
# Here's one way to import JSON data.
# I'm leaving this here, because it will work with "GET" and "POST" requests,
# which we might use later this semester.
# The approach below is a bit cleaner.
'''
import json
import urllib3

urllib3.disable_warnings()

http = urllib3.PoolManager()

response = http.request('GET', "https://gbfs.citibikenyc.com/gbfs/en/station_information.json")
station_info_data = json.loads(response.data.decode('utf-8'))
station_info_data
''';

# The trailing `;` keeps Jupyter from regurgitating our block comment

In [6]:
# A cleaner approach for grabbing JSON data:
with urllib.request.urlopen("https://gbfs.citibikenyc.com/gbfs/en/station_information.json") as url:
    station_info_data = json.loads(url.read().decode())
station_info_data

{'data': {'stations': [{'name': 'W 52 St & 11 Ave',
    'short_name': '6926.01',
    'lon': -73.99392888,
    'eightd_station_services': [],
    'station_id': '72',
    'rental_uris': {'ios': 'https://bkn.lft.to/lastmile_qr_scan',
     'android': 'https://bkn.lft.to/lastmile_qr_scan'},
    'legacy_id': '72',
    'lat': 40.76727216,
    'rental_methods': ['CREDITCARD', 'KEY'],
    'external_id': '66db237e-0aca-11e7-82f6-3863bb44ef7c',
    'station_type': 'classic',
    'capacity': 55,
    'region_id': '71',
    'has_kiosk': True,
    'eightd_has_key_dispenser': False,
    'electric_bike_surcharge_waiver': False},
   {'name': 'Franklin St & W Broadway',
    'short_name': '5430.08',
    'lon': -74.00666661,
    'eightd_station_services': [],
    'station_id': '79',
    'rental_uris': {'ios': 'https://bkn.lft.to/lastmile_qr_scan',
     'android': 'https://bkn.lft.to/lastmile_qr_scan'},
    'legacy_id': '79',
    'lat': 40.71911552,
    'rental_methods': ['CREDITCARD', 'KEY'],
    'external

In [59]:
# station_info_data is a dictionary (which contains several sub-dictionaries).
# Get a list of keys within the station_info_data['data'] dictionary:
list(station_info_data['data'].keys())

['stations']

In [60]:
# How many stations are there?
len(station_info_data['data']['stations'])

1621

In [7]:
# Convert the JSON data into a Pandas dataframe:
station_info_df = pd.DataFrame(station_info_data['data']['stations'])
station_info_df.head()

Unnamed: 0,name,short_name,lon,eightd_station_services,station_id,rental_uris,legacy_id,lat,rental_methods,external_id,station_type,capacity,region_id,has_kiosk,eightd_has_key_dispenser,electric_bike_surcharge_waiver
0,W 52 St & 11 Ave,6926.01,-73.993929,[],72,"{'ios': 'https://bkn.lft.to/lastmile_qr_scan',...",72,40.767272,"[CREDITCARD, KEY]",66db237e-0aca-11e7-82f6-3863bb44ef7c,classic,55,71,True,False,False
1,Franklin St & W Broadway,5430.08,-74.006667,[],79,"{'ios': 'https://bkn.lft.to/lastmile_qr_scan',...",79,40.719116,"[CREDITCARD, KEY]",66db269c-0aca-11e7-82f6-3863bb44ef7c,classic,33,71,True,False,False
2,St James Pl & Pearl St,5167.06,-74.000165,[],82,"{'ios': 'https://bkn.lft.to/lastmile_qr_scan',...",82,40.711174,"[CREDITCARD, KEY]",66db277a-0aca-11e7-82f6-3863bb44ef7c,classic,27,71,True,False,False
3,Atlantic Ave & Fort Greene Pl,4354.07,-73.976323,[],83,"{'ios': 'https://bkn.lft.to/lastmile_qr_scan',...",83,40.683826,"[CREDITCARD, KEY]",66db281e-0aca-11e7-82f6-3863bb44ef7c,classic,62,71,True,False,False
4,W 17 St & 8 Ave,6148.02,-74.001497,[],116,"{'ios': 'https://bkn.lft.to/lastmile_qr_scan',...",116,40.741776,"[CREDITCARD, KEY]",66db28b5-0aca-11e7-82f6-3863bb44ef7c,classic,50,71,True,False,False


In [10]:
station_info_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1621 entries, 0 to 1620
Data columns (total 16 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   capacity                        1621 non-null   int64  
 1   lon                             1621 non-null   float64
 2   electric_bike_surcharge_waiver  1621 non-null   bool   
 3   eightd_has_key_dispenser        1621 non-null   bool   
 4   station_type                    1621 non-null   object 
 5   rental_uris                     1621 non-null   object 
 6   lat                             1621 non-null   float64
 7   name                            1621 non-null   object 
 8   short_name                      1621 non-null   object 
 9   region_id                       1621 non-null   object 
 10  station_id                      1621 non-null   object 
 11  has_kiosk                       1621 non-null   bool   
 12  legacy_id                       16

---

## 2.  Get Station Status Data (from .json)
- These data may change frequently.  I don't know how often they're updated.

In [8]:
# Using the "old" approach:
'''
response = http.request('GET', "https://gbfs.citibikenyc.com/gbfs/en/station_status.json")
station_status_data = json.loads(response.data.decode('utf-8'))
station_status_data
''';

In [9]:
# The cleaner approach approach for grabbing JSON data:
with urllib.request.urlopen("https://gbfs.citibikenyc.com/gbfs/en/station_status.json") as url:
    station_status_data = json.loads(url.read().decode())
station_status_data

{'data': {'stations': [{'legacy_id': '72',
    'num_docks_available': 52,
    'eightd_has_available_keys': False,
    'station_status': 'active',
    'station_id': '72',
    'is_returning': 1,
    'num_docks_disabled': 0,
    'last_reported': 1649170549,
    'num_bikes_disabled': 0,
    'is_installed': 1,
    'num_ebikes_available': 0,
    'is_renting': 1,
    'num_bikes_available': 3},
   {'legacy_id': '79',
    'num_docks_available': 0,
    'eightd_has_available_keys': False,
    'station_status': 'active',
    'station_id': '79',
    'is_returning': 1,
    'num_docks_disabled': 0,
    'last_reported': 1649165672,
    'num_bikes_disabled': 2,
    'is_installed': 1,
    'num_ebikes_available': 4,
    'is_renting': 1,
    'num_bikes_available': 31},
   {'legacy_id': '82',
    'num_docks_available': 0,
    'eightd_has_available_keys': False,
    'station_status': 'active',
    'station_id': '82',
    'is_returning': 1,
    'num_docks_disabled': 0,
    'last_reported': 1649170746,
    'n

In [10]:
# Convert the data into a Pandas dataframe:
station_status_df = pd.DataFrame(station_status_data['data']['stations'])
station_status_df.head()

Unnamed: 0,legacy_id,num_docks_available,eightd_has_available_keys,station_status,station_id,is_returning,num_docks_disabled,last_reported,num_bikes_disabled,is_installed,num_ebikes_available,is_renting,num_bikes_available,eightd_active_station_services,valet
0,72,52,False,active,72,1,0,1649170549,0,1,0,1,3,,
1,79,0,False,active,79,1,0,1649165672,2,1,4,1,31,,
2,82,0,False,active,82,1,0,1649170746,0,1,0,1,27,,
3,83,5,False,active,83,1,0,1649169936,0,1,1,1,57,,
4,116,3,False,active,116,1,0,1649170638,1,1,2,1,46,,


In [11]:
station_status_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1623 entries, 0 to 1622
Data columns (total 15 columns):
 #   Column                          Non-Null Count  Dtype 
---  ------                          --------------  ----- 
 0   legacy_id                       1623 non-null   object
 1   num_docks_available             1623 non-null   int64 
 2   eightd_has_available_keys       1623 non-null   bool  
 3   station_status                  1623 non-null   object
 4   station_id                      1623 non-null   object
 5   is_returning                    1623 non-null   int64 
 6   num_docks_disabled              1623 non-null   int64 
 7   last_reported                   1623 non-null   int64 
 8   num_bikes_disabled              1623 non-null   int64 
 9   is_installed                    1623 non-null   int64 
 10  num_ebikes_available            1623 non-null   int64 
 11  is_renting                      1623 non-null   int64 
 12  num_bikes_available             1623 non-null   

--- 

## 3.  Import Trip Data (from .csv)
- We'll create a pandas dataframe from the data.
- See https://s3.amazonaws.com/tripdata/index.html for available datasets.

In [12]:
!head '202001-citibike-tripdata.csv'

"tripduration","starttime","stoptime","start station id","start station name","start station latitude","start station longitude","end station id","end station name","end station latitude","end station longitude","bikeid","usertype","birth year","gender"
789,"2020-01-01 00:00:55.3900","2020-01-01 00:14:05.1470",504,"1 Ave & E 16 St",40.73221853,-73.98165557,307,"Canal St & Rutgers St",40.71427487,-73.98990025,30326,"Subscriber",1992,1
1541,"2020-01-01 00:01:08.1020","2020-01-01 00:26:49.1780",3423,"West Drive & Prospect Park West",40.6610633719006,-73.97945255041122,3300,"Prospect Park West & 8 St",40.66514681533792,-73.97637605667114,17105,"Customer",1969,1
1464,"2020-01-01 00:01:42.1400","2020-01-01 00:26:07.0110",3687,"E 33 St & 1 Ave",40.74322681432173,-73.97449783980846,259,"South St & Whitehall St",40.70122128,-74.01234218,40177,"Subscriber",1963,1
592,"2020-01-01 00:01:45.5610","2020-01-01 00:11:38.1550",346,"Bank St & Hudson St",40.73652889,-74.00618026,490,"8 Ave & W 33 St"

In [13]:
# I just randomly chose/downloaded this file:
bike_trips_df = pd.read_csv('202001-citibike-tripdata.csv')
bike_trips_df.head()

Unnamed: 0,tripduration,starttime,stoptime,start station id,start station name,start station latitude,start station longitude,end station id,end station name,end station latitude,end station longitude,bikeid,usertype,birth year,gender
0,789,2020-01-01 00:00:55.3900,2020-01-01 00:14:05.1470,504,1 Ave & E 16 St,40.732219,-73.981656,307,Canal St & Rutgers St,40.714275,-73.9899,30326,Subscriber,1992,1
1,1541,2020-01-01 00:01:08.1020,2020-01-01 00:26:49.1780,3423,West Drive & Prospect Park West,40.661063,-73.979453,3300,Prospect Park West & 8 St,40.665147,-73.976376,17105,Customer,1969,1
2,1464,2020-01-01 00:01:42.1400,2020-01-01 00:26:07.0110,3687,E 33 St & 1 Ave,40.743227,-73.974498,259,South St & Whitehall St,40.701221,-74.012342,40177,Subscriber,1963,1
3,592,2020-01-01 00:01:45.5610,2020-01-01 00:11:38.1550,346,Bank St & Hudson St,40.736529,-74.00618,490,8 Ave & W 33 St,40.751551,-73.993934,27690,Subscriber,1980,1
4,702,2020-01-01 00:01:45.7880,2020-01-01 00:13:28.2400,372,Franklin Ave & Myrtle Ave,40.694546,-73.958014,3637,Fulton St & Waverly Ave,40.683239,-73.965996,32583,Subscriber,1982,1


In [14]:
# bike_trips_df.columns

# Using `list()` formats things a little better:
list(bike_trips_df.columns)

['tripduration',
 'starttime',
 'stoptime',
 'start station id',
 'start station name',
 'start station latitude',
 'start station longitude',
 'end station id',
 'end station name',
 'end station latitude',
 'end station longitude',
 'bikeid',
 'usertype',
 'birth year',
 'gender']

--- 

## Create a VeRoViz "nodes" Dataframe
- We'll populate this with data from Station Info and Station Status
- We'll also hard-code some columns

In [15]:
nodes = vrv.initDataframe('nodes')

In [16]:
# Here are the columns we'll need to populate:
list(nodes.columns)

['id',
 'lat',
 'lon',
 'altMeters',
 'nodeName',
 'nodeType',
 'popupText',
 'leafletIconPrefix',
 'leafletIconType',
 'leafletColor',
 'leafletIconText',
 'cesiumIconType',
 'cesiumColor',
 'cesiumIconText',
 'elevMeters']

In [17]:
# Here are the columns from our "Station Info":
list(station_info_df.columns)

['name',
 'short_name',
 'lon',
 'eightd_station_services',
 'station_id',
 'rental_uris',
 'legacy_id',
 'lat',
 'rental_methods',
 'external_id',
 'station_type',
 'capacity',
 'region_id',
 'has_kiosk',
 'eightd_has_key_dispenser',
 'electric_bike_surcharge_waiver']

In [18]:
# An example to show the syntax for displaying 2 particular columns from a df:
station_info_df[['lat', 'lon']].head()

Unnamed: 0,lat,lon
0,40.767272,-73.993929
1,40.719116,-74.006667
2,40.711174,-74.000165
3,40.683826,-73.976323
4,40.741776,-74.001497


In [19]:
# Let's go ahead and re-initialize an empty dataframe within this cell:
nodes = vrv.initDataframe('nodes')

# Now, copy the relevant columns from our Station Info dataframe:
# NOTE: We were getting some size mis-match errors until we copied 
#       just a single column first.  
nodes['id'] = station_info_df['station_id'].values
nodes[['id', 'lat', 'lon', 'nodeName']] = station_info_df[['station_id', 'lat', 'lon', 'name']].values
nodes[['leafletIconText', 'cesiumIconText']] = station_info_df[['name', 'station_id']].values
nodes['popupText'] = station_info_df['name'].values

# Finally, we'll fill in the rest of our nodes dataframe with some hard-coded/constant values:
nodes.loc[:,'altMeters'] = 0
nodes.loc[:,['nodeType', 'leafletIconPrefix', 'leafletIconType', 'leafletColor']] = [
             'CitiBikeStation',  'fa',                'bicycle',         'orange']
nodes.loc[:,['cesiumIconType', 'cesiumColor']] = ['pin', 'Cesium.Color.ORANGE']

In [20]:
nodes.head()

Unnamed: 0,id,lat,lon,altMeters,nodeName,nodeType,popupText,leafletIconPrefix,leafletIconType,leafletColor,leafletIconText,cesiumIconType,cesiumColor,cesiumIconText,elevMeters
0,72,40.7673,-73.9939,0,W 52 St & 11 Ave,CitiBikeStation,W 52 St & 11 Ave,fa,bicycle,orange,W 52 St & 11 Ave,pin,Cesium.Color.ORANGE,72,
1,79,40.7191,-74.0067,0,Franklin St & W Broadway,CitiBikeStation,Franklin St & W Broadway,fa,bicycle,orange,Franklin St & W Broadway,pin,Cesium.Color.ORANGE,79,
2,82,40.7112,-74.0002,0,St James Pl & Pearl St,CitiBikeStation,St James Pl & Pearl St,fa,bicycle,orange,St James Pl & Pearl St,pin,Cesium.Color.ORANGE,82,
3,83,40.6838,-73.9763,0,Atlantic Ave & Fort Greene Pl,CitiBikeStation,Atlantic Ave & Fort Greene Pl,fa,bicycle,orange,Atlantic Ave & Fort Greene Pl,pin,Cesium.Color.ORANGE,83,
4,116,40.7418,-74.0015,0,W 17 St & 8 Ave,CitiBikeStation,W 17 St & 8 Ave,fa,bicycle,orange,W 17 St & 8 Ave,pin,Cesium.Color.ORANGE,116,


In [21]:
# Show all of the nodes on a Leaflet map:
vrv.createLeaflet(nodes=nodes)

#### <font color='orange'>Improvement:  Color-code the nodes</font>

In the next cell, we adjust the colors of each node to reflect the "current" station status.
- green  --> bikes and docks are available
- orange --> no docks available to return (but renting is possible)
- red    --> no bikes available to rent (but returning is possible)

In [22]:
# Add 2 new columns to the nodes dataframe,
# where the values come from the station_status_df dataframe:
nodes[['is_renting', 'is_returning']] = pd.merge(nodes, station_status_df, left_on='id', right_on='station_id')[['is_renting', 'is_returning']] 

In [23]:
nodes

Unnamed: 0,id,lat,lon,altMeters,nodeName,nodeType,popupText,leafletIconPrefix,leafletIconType,leafletColor,leafletIconText,cesiumIconType,cesiumColor,cesiumIconText,elevMeters,is_renting,is_returning
0,72,40.7673,-73.9939,0,W 52 St & 11 Ave,CitiBikeStation,W 52 St & 11 Ave,fa,bicycle,orange,W 52 St & 11 Ave,pin,Cesium.Color.ORANGE,72,,1,1
1,79,40.7191,-74.0067,0,Franklin St & W Broadway,CitiBikeStation,Franklin St & W Broadway,fa,bicycle,orange,Franklin St & W Broadway,pin,Cesium.Color.ORANGE,79,,1,1
2,82,40.7112,-74.0002,0,St James Pl & Pearl St,CitiBikeStation,St James Pl & Pearl St,fa,bicycle,orange,St James Pl & Pearl St,pin,Cesium.Color.ORANGE,82,,1,1
3,83,40.6838,-73.9763,0,Atlantic Ave & Fort Greene Pl,CitiBikeStation,Atlantic Ave & Fort Greene Pl,fa,bicycle,orange,Atlantic Ave & Fort Greene Pl,pin,Cesium.Color.ORANGE,83,,1,1
4,116,40.7418,-74.0015,0,W 17 St & 8 Ave,CitiBikeStation,W 17 St & 8 Ave,fa,bicycle,orange,W 17 St & 8 Ave,pin,Cesium.Color.ORANGE,116,,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1618,4738,40.7906,-73.942,0,E 106 St & 2 Ave,CitiBikeStation,E 106 St & 2 Ave,fa,bicycle,orange,E 106 St & 2 Ave,pin,Cesium.Color.ORANGE,4738,,1,1
1619,4739,40.8541,-73.8993,0,Valentine Ave & E 181 St,CitiBikeStation,Valentine Ave & E 181 St,fa,bicycle,orange,Valentine Ave & E 181 St,pin,Cesium.Color.ORANGE,4739,,1,1
1620,4742,40.7066,-73.9683,0,Kent Ave & Division Ave,CitiBikeStation,Kent Ave & Division Ave,fa,bicycle,orange,Kent Ave & Division Ave,pin,Cesium.Color.ORANGE,4742,,1,1
1621,4743,40.8559,-73.9271,0,Audubon Ave & W 192 St,CitiBikeStation,Audubon Ave & W 192 St,fa,bicycle,orange,Audubon Ave & W 192 St,pin,Cesium.Color.ORANGE,4743,,1,1


In [24]:
# Modify the leaflet and cesium colors for the nodes:
nodes.loc[(nodes['is_renting'] == 1) & (nodes['is_returning'] == 1), 
          ['leafletColor', 'cesiumColor']] = ['green', 'Cesium.Color.GREEN']

nodes.loc[(nodes['is_renting'] == 1) & (nodes['is_returning'] == 0), 
          ['leafletColor', 'cesiumColor']] = ['orange', 'Cesium.Color.ORANGE']

nodes.loc[(nodes['is_renting'] == 0) & (nodes['is_returning'] == 0), 
          ['leafletColor', 'cesiumColor']] = ['red', 'Cesium.Color.RED']

In [25]:
# Show all of the nodes on a Leaflet map:
vrv.createLeaflet(nodes=nodes)

--- 

## Create a VeRoViz "assignments" Dataframe
- We'll populate this with trip data
- We'll also hard-code some columns

In [26]:
# NOTE:  VeRoViz also has an "arcs" dataframe,
#        but it doesn't have time-related columns.
arcs = vrv.initDataframe('arcs')
list(arcs.columns)

# We won't use the "arcs" dataframe

['odID',
 'objectID',
 'startLat',
 'startLon',
 'endLat',
 'endLon',
 'leafletColor',
 'leafletWeight',
 'leafletStyle',
 'leafletOpacity',
 'leafletCurveType',
 'leafletCurvature',
 'useArrows',
 'cesiumColor',
 'cesiumWeight',
 'cesiumStyle',
 'cesiumOpacity',
 'popupText',
 'startElevMeters',
 'endElevMeters']

In [27]:
# Initialize an empty "assignments" dataframe:
assignments = vrv.initDataframe('assignments')
assignments.info()

<class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Data columns (total 34 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   odID              0 non-null      object
 1   objectID          0 non-null      object
 2   modelFile         0 non-null      object
 3   modelScale        0 non-null      object
 4   modelMinPxSize    0 non-null      object
 5   startTimeSec      0 non-null      object
 6   startLat          0 non-null      object
 7   startLon          0 non-null      object
 8   startAltMeters    0 non-null      object
 9   endTimeSec        0 non-null      object
 10  endLat            0 non-null      object
 11  endLon            0 non-null      object
 12  endAltMeters      0 non-null      object
 13  leafletColor      0 non-null      object
 14  leafletWeight     0 non-null      object
 15  leafletStyle      0 non-null      object
 16  leafletOpacity    0 non-null      object
 17  leafletCurveType  0 non-null     

### Here's the plan:
- These columns will come directly from bike trip data:
    - `objectID` (from `bikeid`)
    - `startLat` and `startLon` (from `start station latitude` and `start station longitude`)
    - `endLat` and `endLon` (from `end station latitude` and `end station longitude`)
- These columns will need to be calculated:
    - `startTimeSec` (from `starttime`, but converted to "seconds since the first event")
    - `endTimeSec`   (from `starttime` and `tripduration`, or `starttime` and `stoptime`)
    - We'll create some new columns in `bike_trips_df` to hold our calculations.  Then we'll copy these calculated columns into our assignments dataframe.
- This column will need to be auto generated:
    - `odID` (each origin/destination pair should get a unique integer)
- The remaining columns will be hard-coded (for now)

In [28]:
# What is the first start time in our bike_trips_df?
min(bike_trips_df['starttime'])

'2020-01-01 00:00:55.3900'

In [29]:
bike_trips_df['starttime'].dtype

dtype('O')

In [30]:
max(bike_trips_df['stoptime'])

'2020-02-04 08:42:03.2980'

In [31]:
pd.to_datetime(bike_trips_df['starttime'])

0         2020-01-01 00:00:55.390
1         2020-01-01 00:01:08.102
2         2020-01-01 00:01:42.140
3         2020-01-01 00:01:45.561
4         2020-01-01 00:01:45.788
                    ...          
1240591   2020-01-31 23:59:26.882
1240592   2020-01-31 23:59:32.641
1240593   2020-01-31 23:59:39.178
1240594   2020-01-31 23:59:49.231
1240595   2020-01-31 23:59:57.036
Name: starttime, Length: 1240596, dtype: datetime64[ns]

In [32]:
pd.to_datetime(bike_trips_df['starttime']) - \
pd.to_datetime(min(bike_trips_df['starttime']))

0                 0 days 00:00:00
1          0 days 00:00:12.712000
2          0 days 00:00:46.750000
3          0 days 00:00:50.171000
4          0 days 00:00:50.398000
                    ...          
1240591   30 days 23:58:31.492000
1240592   30 days 23:58:37.251000
1240593   30 days 23:58:43.788000
1240594   30 days 23:58:53.841000
1240595   30 days 23:59:01.646000
Name: starttime, Length: 1240596, dtype: timedelta64[ns]

In [33]:
# Add a new column to bike_trips_df...

# This next command will produce a "timestamp" (days HH:MM:SS.ms) 
# showing the time since the first observed `starttime`:
bike_trips_df['timeAfterStart'] = pd.to_datetime(bike_trips_df['starttime']) - \
                                  pd.to_datetime(min(bike_trips_df['starttime']))

# Now, convert this to a decimal number of seconds:
bike_trips_df['timeAfterStart'] = bike_trips_df['timeAfterStart'].dt.total_seconds().astype(int)

bike_trips_df[['bikeid','timeAfterStart']].tail()

Unnamed: 0,bikeid,timeAfterStart
1240591,40662,2678311
1240592,28722,2678317
1240593,32530,2678323
1240594,15314,2678333
1240595,30947,2678341


In [34]:
# Just for fun, here's the time differences between start/stop times:
pd.to_datetime(bike_trips_df['stoptime']) - pd.to_datetime(bike_trips_df['starttime'])

0         0 days 00:13:09.757000
1         0 days 00:25:41.076000
2         0 days 00:24:24.871000
3         0 days 00:09:52.594000
4         0 days 00:11:42.452000
                   ...          
1240591   0 days 00:26:27.607000
1240592   0 days 00:03:42.831000
1240593   0 days 00:02:43.862000
1240594   0 days 00:05:27.148000
1240595   0 days 00:08:04.146000
Length: 1240596, dtype: timedelta64[ns]

In [35]:
# Make sure we're starting with an empty dataframe:
assignments = vrv.initDataframe('assignments')
assignments

Unnamed: 0,odID,objectID,modelFile,modelScale,modelMinPxSize,startTimeSec,startLat,startLon,startAltMeters,endTimeSec,...,ganttColor,popupText,startElevMeters,endElevMeters,wayname,waycategory,surface,waytype,steepness,tollway


In [36]:
assignments['objectID'] = bike_trips_df['bikeid']
assignments

Unnamed: 0,odID,objectID,modelFile,modelScale,modelMinPxSize,startTimeSec,startLat,startLon,startAltMeters,endTimeSec,...,ganttColor,popupText,startElevMeters,endElevMeters,wayname,waycategory,surface,waytype,steepness,tollway
0,,30326,,,,,,,,,...,,,,,,,,,,
1,,17105,,,,,,,,,...,,,,,,,,,,
2,,40177,,,,,,,,,...,,,,,,,,,,
3,,27690,,,,,,,,,...,,,,,,,,,,
4,,32583,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1240591,,40662,,,,,,,,,...,,,,,,,,,,
1240592,,28722,,,,,,,,,...,,,,,,,,,,
1240593,,32530,,,,,,,,,...,,,,,,,,,,
1240594,,15314,,,,,,,,,...,,,,,,,,,,


In [37]:
# In one cell, we'll create our assignments dataframe.

# Make sure we're starting with an empty dataframe:
assignments = vrv.initDataframe('assignments')

# Copy over the static values.
# We'll start by copying a single column, to avoid the size mis-match issue:
assignments['objectID'] = bike_trips_df['bikeid']
assignments[['startLat', 'startLon', 'endLat', 'endLon']] = bike_trips_df[['start station latitude', 
                                                                          'start station longitude',
                                                                          'end station latitude',
                                                                          'end station longitude']].values

# Copy our new calculated column:
assignments['startTimeSec'] = bike_trips_df['timeAfterStart'].values

# Use the calculated column and tripduration to get the end time (in seconds):
assignments['endTimeSec'] = (bike_trips_df['timeAfterStart'] + bike_trips_df['tripduration']).values

# Fill in the rest of our assignments df with some hard-coded values:
# (we'll probably want to revisit this later)
assignments.loc[:,['modelFile', 'modelScale', 'modelMinPxSize', 'startAltMeters', 'endAltMeters', 
                   'leafletColor', 'leafletWeight', 'leafletStyle', 'leafletOpacity', 'useArrows',
                   'cesiumColor', 'cesiumWeight', 'cesiumStyle', 'cesiumOpacity']] = \
                  ['veroviz/models/car_blue.gltf', 100, 45, 0, 0, 
                   'blue', 2, 'solid', 0.8, False, 
                   'Cesium.Color.BLUE', 2, 'solid', 0.7]

# More hard-coded values:
assignments.loc[:,['leafletCurveType', 'leafletCurvature', 'ganttColor', 'popupText',
             'startElevMeters', 'endElevMeters', 'wayname', 'waycategory', 'surface',
             'waytype', 'steepness', 'tollway']] = \
             ['straight', 45, None, None, 0, 0, None, None, None, None, 0, False]

# Finally (for now), let's generate a unique odID value for each row.
# This will make sense only if we assume that each row corresponds to a specific
# O/D pair.  Conversely, if we have turn-by-turn arcs, we'll need to group
# multiple rows into the same O/D pair.  We'll tackle that case if/when 
# we encounter it.
assignments.loc[:,'odID'] = list(range(0, len(assignments)))

#### <font color='orange'>ISSUE:  Some bikes are getting moved</font>

Here's the plan:
1. Sort the bike_routes dataframe according to bikeid (to group the bikes) and then according to start time (to keep the sequence of routes appropriately ordered.
2. Add some new columns to the bike_routes dataframe, containing values from the **next row**.  This will allow us to identify if a bike has been moved, by comparing the end location of one row with the start location of the next row.
3. Apply a filter to identify the bikes that have magically moved.  Save the results in a new dataframe.
4. Generate new "assignments" from these new rows, and add to the assignments dataframe

Here's an example of how Step 2 works:

In [38]:
# Example -- Part 1
# Initialize a dummy dataframe with 4 rows, 2 columns:
dummy_df = pd.DataFrame({'a': [1, 2, 3, 4],
                         'b': [4, 6, 8, 12]})
dummy_df

Unnamed: 0,a,b
0,1,4
1,2,6
2,3,8
3,4,12


In [39]:
pd.DataFrame(dummy_df[1:][['a', 'b']].values)

Unnamed: 0,0,1
0,2,6
1,3,8
2,4,12


In [40]:
# Example -- Part 2
# Add 2 new columns (x and y) comprised the last 3 rows of columns a and b
dummy_df[['x', 'y']] = pd.DataFrame(dummy_df[1:][['a', 'b']].values)
dummy_df

Unnamed: 0,a,b,x,y
0,1,4,2.0,6.0
1,2,6,3.0,8.0
2,3,8,4.0,12.0
3,4,12,,


Back to our plan...

In [41]:
bike_trips_df.sort_values(by=['bikeid', 'timeAfterStart'])

Unnamed: 0,tripduration,starttime,stoptime,start station id,start station name,start station latitude,start station longitude,end station id,end station name,end station latitude,end station longitude,bikeid,usertype,birth year,gender,timeAfterStart
26822,548,2020-01-02 09:25:11.9390,2020-01-02 09:34:20.3600,261,Johnson St & Gold St,40.694749,-73.983625,2000,Front St & Washington St,40.702551,-73.989402,14530,Subscriber,1987,1,120256
50614,1061,2020-01-02 18:16:45.1630,2020-01-02 18:34:26.5780,2000,Front St & Washington St,40.702551,-73.989402,3414,Bergen St & Flatbush Ave,40.680945,-73.975673,14530,Subscriber,1992,1,152149
134556,400,2020-01-05 14:27:01.9750,2020-01-05 14:33:42.1210,3414,Bergen St & Flatbush Ave,40.680945,-73.975673,3486,Schermerhorn St & Bond St,40.688417,-73.984517,14530,Customer,1969,0,397566
171702,301,2020-01-06 16:10:01.6600,2020-01-06 16:15:03.0080,3486,Schermerhorn St & Bond St,40.688417,-73.984517,241,DeKalb Ave & S Portland Ave,40.689810,-73.974931,14530,Subscriber,1985,1,490146
175456,167,2020-01-06 17:15:51.0260,2020-01-06 17:18:38.1240,241,DeKalb Ave & S Portland Ave,40.689810,-73.974931,324,DeKalb Ave & Hudson Ave,40.689888,-73.981013,14530,Subscriber,1998,1,494095
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1089454,1231,2020-01-28 18:22:48.4450,2020-01-28 18:43:19.9380,379,W 31 St & 7 Ave,40.749156,-73.991600,3165,Central Park West & W 72 St,40.775794,-73.976206,42091,Subscriber,1967,1,2398913
1110032,511,2020-01-29 08:46:08.5270,2020-01-29 08:54:40.2700,3165,Central Park West & W 72 St,40.775794,-73.976206,3140,1 Ave & E 78 St,40.771404,-73.953517,42091,Subscriber,1969,0,2450713
1131355,611,2020-01-29 17:17:20.7780,2020-01-29 17:27:32.7280,3140,1 Ave & E 78 St,40.771404,-73.953517,3338,2 Ave & E 99 St,40.786259,-73.945526,42091,Subscriber,1975,1,2481385
1211408,356,2020-01-31 12:04:49.3410,2020-01-31 12:10:45.8400,3338,2 Ave & E 99 St,40.786259,-73.945526,3286,E 89 St & 3 Ave,40.780628,-73.952167,42091,Subscriber,1965,1,2635433


In [42]:
# Just in case, let's begin by re-sorting 
# the bike_trips_df by bikeid and timeAfterStart:
bike_trips_df = bike_trips_df.sort_values(by=['bikeid', 'timeAfterStart'])

# Reset the index to start at 0 for the first row:
bike_trips_df = bike_trips_df.reset_index(drop=True)
bike_trips_df

Unnamed: 0,tripduration,starttime,stoptime,start station id,start station name,start station latitude,start station longitude,end station id,end station name,end station latitude,end station longitude,bikeid,usertype,birth year,gender,timeAfterStart
0,548,2020-01-02 09:25:11.9390,2020-01-02 09:34:20.3600,261,Johnson St & Gold St,40.694749,-73.983625,2000,Front St & Washington St,40.702551,-73.989402,14530,Subscriber,1987,1,120256
1,1061,2020-01-02 18:16:45.1630,2020-01-02 18:34:26.5780,2000,Front St & Washington St,40.702551,-73.989402,3414,Bergen St & Flatbush Ave,40.680945,-73.975673,14530,Subscriber,1992,1,152149
2,400,2020-01-05 14:27:01.9750,2020-01-05 14:33:42.1210,3414,Bergen St & Flatbush Ave,40.680945,-73.975673,3486,Schermerhorn St & Bond St,40.688417,-73.984517,14530,Customer,1969,0,397566
3,301,2020-01-06 16:10:01.6600,2020-01-06 16:15:03.0080,3486,Schermerhorn St & Bond St,40.688417,-73.984517,241,DeKalb Ave & S Portland Ave,40.689810,-73.974931,14530,Subscriber,1985,1,490146
4,167,2020-01-06 17:15:51.0260,2020-01-06 17:18:38.1240,241,DeKalb Ave & S Portland Ave,40.689810,-73.974931,324,DeKalb Ave & Hudson Ave,40.689888,-73.981013,14530,Subscriber,1998,1,494095
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1240591,1231,2020-01-28 18:22:48.4450,2020-01-28 18:43:19.9380,379,W 31 St & 7 Ave,40.749156,-73.991600,3165,Central Park West & W 72 St,40.775794,-73.976206,42091,Subscriber,1967,1,2398913
1240592,511,2020-01-29 08:46:08.5270,2020-01-29 08:54:40.2700,3165,Central Park West & W 72 St,40.775794,-73.976206,3140,1 Ave & E 78 St,40.771404,-73.953517,42091,Subscriber,1969,0,2450713
1240593,611,2020-01-29 17:17:20.7780,2020-01-29 17:27:32.7280,3140,1 Ave & E 78 St,40.771404,-73.953517,3338,2 Ave & E 99 St,40.786259,-73.945526,42091,Subscriber,1975,1,2481385
1240594,356,2020-01-31 12:04:49.3410,2020-01-31 12:10:45.8400,3338,2 Ave & E 99 St,40.786259,-73.945526,3286,E 89 St & 3 Ave,40.780628,-73.952167,42091,Subscriber,1965,1,2635433


In [44]:
# Now, add 5 new columns, comprised of records from the NEXT line of data:
bike_trips_df[['next_bikeid', 'next_startStationID', 'next_timeAfterStart', 
               'next_startLat', 'next_startLon']] = pd.DataFrame(bike_trips_df[1:][['bikeid', 'start station id', 
                                                'timeAfterStart', 'start station latitude', 
                                                'start station longitude']].values)

bike_trips_df


# FYI...Here's how to use Pandas "Series" (1-dimensional dataframes) individually:
#bike_trips_df['nextline_bikeid'] = pd.Series(bike_trips_df[1:]['bikeid'].values)
#bike_trips_df['nextline_startStationID'] = pd.Series(bike_trips_df[1:]['start station id'].values)

Unnamed: 0,tripduration,starttime,stoptime,start station id,start station name,start station latitude,start station longitude,end station id,end station name,end station latitude,...,bikeid,usertype,birth year,gender,timeAfterStart,next_bikeid,next_startStationID,next_timeAfterStart,next_startLat,next_startLon
0,548,2020-01-02 09:25:11.9390,2020-01-02 09:34:20.3600,261,Johnson St & Gold St,40.694749,-73.983625,2000,Front St & Washington St,40.702551,...,14530,Subscriber,1987,1,120256,14530.0,2000.0,152149.0,40.702551,-73.989402
1,1061,2020-01-02 18:16:45.1630,2020-01-02 18:34:26.5780,2000,Front St & Washington St,40.702551,-73.989402,3414,Bergen St & Flatbush Ave,40.680945,...,14530,Subscriber,1992,1,152149,14530.0,3414.0,397566.0,40.680945,-73.975673
2,400,2020-01-05 14:27:01.9750,2020-01-05 14:33:42.1210,3414,Bergen St & Flatbush Ave,40.680945,-73.975673,3486,Schermerhorn St & Bond St,40.688417,...,14530,Customer,1969,0,397566,14530.0,3486.0,490146.0,40.688417,-73.984517
3,301,2020-01-06 16:10:01.6600,2020-01-06 16:15:03.0080,3486,Schermerhorn St & Bond St,40.688417,-73.984517,241,DeKalb Ave & S Portland Ave,40.689810,...,14530,Subscriber,1985,1,490146,14530.0,241.0,494095.0,40.689810,-73.974931
4,167,2020-01-06 17:15:51.0260,2020-01-06 17:18:38.1240,241,DeKalb Ave & S Portland Ave,40.689810,-73.974931,324,DeKalb Ave & Hudson Ave,40.689888,...,14530,Subscriber,1998,1,494095,14530.0,324.0,494643.0,40.689888,-73.981013
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1240591,1231,2020-01-28 18:22:48.4450,2020-01-28 18:43:19.9380,379,W 31 St & 7 Ave,40.749156,-73.991600,3165,Central Park West & W 72 St,40.775794,...,42091,Subscriber,1967,1,2398913,42091.0,3165.0,2450713.0,40.775794,-73.976206
1240592,511,2020-01-29 08:46:08.5270,2020-01-29 08:54:40.2700,3165,Central Park West & W 72 St,40.775794,-73.976206,3140,1 Ave & E 78 St,40.771404,...,42091,Subscriber,1969,0,2450713,42091.0,3140.0,2481385.0,40.771404,-73.953517
1240593,611,2020-01-29 17:17:20.7780,2020-01-29 17:27:32.7280,3140,1 Ave & E 78 St,40.771404,-73.953517,3338,2 Ave & E 99 St,40.786259,...,42091,Subscriber,1975,1,2481385,42091.0,3338.0,2635433.0,40.786259,-73.945526
1240594,356,2020-01-31 12:04:49.3410,2020-01-31 12:10:45.8400,3338,2 Ave & E 99 St,40.786259,-73.945526,3286,E 89 St & 3 Ave,40.780628,...,42091,Subscriber,1965,1,2635433,42091.0,3286.0,2636553.0,40.780628,-73.952167


In [45]:
# Let's create a new dataframe that just contains instances of bike repositions.
# A repositioning occurs when
#    - We have matching bike IDs, and 
#    - The end station ID doesn't match the next start station ID
bike_trips_repos_df = bike_trips_df[ \
    (bike_trips_df['bikeid'] == bike_trips_df['next_bikeid']) &
    (bike_trips_df['end station id'] != bike_trips_df['next_startStationID'])]
bike_trips_repos_df

Unnamed: 0,tripduration,starttime,stoptime,start station id,start station name,start station latitude,start station longitude,end station id,end station name,end station latitude,...,bikeid,usertype,birth year,gender,timeAfterStart,next_bikeid,next_startStationID,next_timeAfterStart,next_startLat,next_startLon
8,441,2020-01-06 21:52:16.6600,2020-01-06 21:59:37.9540,3420,Douglass St & 3 Ave,40.680213,-73.984327,3232,Bond St & Fulton St,40.689622,...,14530,Customer,1991,2,510681,14530.0,3377.0,566536.0,40.678612,-73.990373
44,412,2020-01-16 09:42:05.8220,2020-01-16 09:48:57.9800,127,Barrow St & Hudson St,40.731724,-74.006744,280,E 10 St & 5 Ave,40.733320,...,14530,Subscriber,1963,2,1330870,14530.0,326.0,1339185.0,40.729538,-73.984267
45,288,2020-01-16 12:00:41.2330,2020-01-16 12:05:29.6630,326,E 11 St & 1 Ave,40.729538,-73.984267,3812,University Pl & E 14 St,40.734814,...,14530,Subscriber,1974,1,1339185,14530.0,504.0,1415672.0,40.732219,-73.981656
49,417,2020-01-17 17:06:55.9070,2020-01-17 17:13:53.6720,482,W 15 St & 7 Ave,40.739355,-73.999318,509,9 Ave & W 22 St,40.745497,...,14530,Subscriber,1986,1,1443960,14530.0,3724.0,1608598.0,40.766741,-73.979069
75,938,2020-01-05 22:03:49.9150,2020-01-05 22:19:28.6610,3085,Roebling St & N 4 St,40.714690,-73.957390,265,Stanton St & Chrystie St,40.722293,...,14533,Subscriber,1987,1,424974,14533.0,473.0,578472.0,40.721101,-73.991925
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1240497,318,2020-01-28 08:53:26.8450,2020-01-28 08:58:45.1130,494,W 26 St & 8 Ave,40.747348,-73.997236,446,W 24 St & 7 Ave,40.744876,...,42046,Subscriber,1981,2,2364751,42046.0,423.0,2484047.0,40.765849,-73.986905
1240512,303,2020-01-03 16:23:51.9300,2020-01-03 16:28:55.6320,484,W 44 St & 5 Ave,40.755003,-73.980144,515,W 43 St & 10 Ave,40.760094,...,42091,Subscriber,1974,1,231776,42091.0,533.0,499420.0,40.752996,-73.987216
1240567,247,2020-01-23 07:30:27.7230,2020-01-23 07:34:35.4880,3390,E 109 St & 3 Ave,40.793297,-73.943208,3521,Lenox Ave & W 111 St,40.798786,...,42091,Subscriber,1990,1,1927772,42091.0,3443.0,1977579.0,40.761330,-73.979820
1240576,1122,2020-01-25 19:38:40.8040,2020-01-25 19:57:23.4000,173,Broadway & W 49 St,40.760683,-73.984527,326,E 11 St & 1 Ave,40.729538,...,42091,Subscriber,1968,1,2144265,42091.0,349.0,2211529.0,40.718502,-73.983299


In [46]:
bike_trips_repos_df.iloc[8]

tripduration                                    186
starttime                  2020-01-15 21:14:02.5970
stoptime                   2020-01-15 21:17:09.2300
start station id                               3905
start station name                  4 Ave & E 12 St
start station latitude                      40.7326
start station longitude                    -73.9901
end station id                                 3708
end station name                    W 13 St & 5 Ave
end station latitude                        40.7354
end station longitude                      -73.9943
bikeid                                        14534
usertype                                 Subscriber
birth year                                     1969
gender                                            1
timeAfterStart                              1285987
next_bikeid                                   14534
next_startStationID                             326
next_timeAfterStart                     1.35637e+06
next_startLa

In [47]:
# In one cell, we'll create another assignments dataframe,
# JUST FOR REPOSITIONS.

# Make sure we're starting with an empty dataframe:
assignments_repos = vrv.initDataframe('assignments')

# Copy columns from the bike_trips_repos_df to the assignments_repos dataframe:
assignments_repos[['objectID', 'startLat', 'startLon', 'endLat',  'endLon', 'endTimeSec']] = \
    pd.DataFrame(bike_trips_repos_df[['bikeid', 'end station latitude', 'end station longitude', 
                                      'next_startLat', 'next_startLon', 'next_timeAfterStart']].values)

# The start time (in seconds) of the reposition will be the end time of the previous move:
assignments_repos['startTimeSec'] = (bike_trips_repos_df['timeAfterStart'] + bike_trips_repos_df['tripduration']).values

# Fill in the rest of our assignments df with some hard-coded values:
# (we'll probably want to revisit this later)
assignments_repos.loc[:,['modelFile', 'modelScale', 'modelMinPxSize', 'startAltMeters', 'endAltMeters', 
                         'leafletColor', 'leafletWeight', 'leafletStyle', 'leafletOpacity', 'useArrows',
                         'cesiumColor', 'cesiumWeight', 'cesiumStyle', 'cesiumOpacity']] = \
                        ['veroviz/models/car_red.gltf', 100, 45, 0, 0, 
                         'red', 2, 'dashed', 0.8, False, 
                         'Cesium.Color.RED', 2, 'solid', 0.7]

# More hard-coded values:
assignments_repos.loc[:,['leafletCurveType', 'leafletCurvature', 'ganttColor', 'popupText',
             'startElevMeters', 'endElevMeters', 'wayname', 'waycategory', 'surface',
             'waytype', 'steepness', 'tollway']] = \
             ['straight', 45, None, None, 0, 0, None, None, None, None, 0, False]

# Finally (for now), let's generate a unique odID value for each row.
# This will make sense only if we assume that each row corresponds to a specific
# O/D pair.  Conversely, if we have turn-by-turn arcs, we'll need to group
# multiple rows into the same O/D pair.  We'll tackle that case if/when 
# we encounter it.
assignments_repos.loc[:,'odID'] = list(range(0, len(assignments_repos)))

In [48]:
assignments_repos.head()

Unnamed: 0,odID,objectID,modelFile,modelScale,modelMinPxSize,startTimeSec,startLat,startLon,startAltMeters,endTimeSec,...,ganttColor,popupText,startElevMeters,endElevMeters,wayname,waycategory,surface,waytype,steepness,tollway
0,0,14530.0,veroviz/models/car_red.gltf,100,45,511122,40.689622,-73.983043,0,566536.0,...,,,0,0,,,,,0,False
1,1,14530.0,veroviz/models/car_red.gltf,100,45,1331282,40.73332,-73.995101,0,1339185.0,...,,,0,0,,,,,0,False
2,2,14530.0,veroviz/models/car_red.gltf,100,45,1339473,40.734814,-73.992085,0,1415672.0,...,,,0,0,,,,,0,False
3,3,14530.0,veroviz/models/car_red.gltf,100,45,1444377,40.745497,-74.001971,0,1608598.0,...,,,0,0,,,,,0,False
4,4,14533.0,veroviz/models/car_red.gltf,100,45,425912,40.722293,-73.991475,0,578472.0,...,,,0,0,,,,,0,False


In [49]:
# Finally, combine the `assignments` and `assignments_repos` dataframe
assignments = pd.concat([assignments, assignments_repos], ignore_index=True)

In [50]:
# Display what we've created:
assignments.head()

Unnamed: 0,odID,objectID,modelFile,modelScale,modelMinPxSize,startTimeSec,startLat,startLon,startAltMeters,endTimeSec,...,ganttColor,popupText,startElevMeters,endElevMeters,wayname,waycategory,surface,waytype,steepness,tollway
0,0,30326.0,veroviz/models/car_blue.gltf,100,45,0,40.732219,-73.981656,0,789.0,...,,,0,0,,,,,0,False
1,1,17105.0,veroviz/models/car_blue.gltf,100,45,12,40.661063,-73.979453,0,1553.0,...,,,0,0,,,,,0,False
2,2,40177.0,veroviz/models/car_blue.gltf,100,45,46,40.743227,-73.974498,0,1510.0,...,,,0,0,,,,,0,False
3,3,27690.0,veroviz/models/car_blue.gltf,100,45,50,40.736529,-74.00618,0,642.0,...,,,0,0,,,,,0,False
4,4,32583.0,veroviz/models/car_blue.gltf,100,45,50,40.694546,-73.958014,0,752.0,...,,,0,0,,,,,0,False


--- 

### Create a Leaflet map 
- We have a lot of bikes...let's just display one.

In [51]:
# I'll just choose the bike with the smallest ID number:
assignments[assignments['objectID'] == min(assignments['objectID'])]

Unnamed: 0,odID,objectID,modelFile,modelScale,modelMinPxSize,startTimeSec,startLat,startLon,startAltMeters,endTimeSec,...,ganttColor,popupText,startElevMeters,endElevMeters,wayname,waycategory,surface,waytype,steepness,tollway
26822,26822,14530.0,veroviz/models/car_blue.gltf,100,45,120256,40.694749,-73.983625,0,120804.0,...,,,0,0,,,,,0,False
50614,50614,14530.0,veroviz/models/car_blue.gltf,100,45,152149,40.702551,-73.989402,0,153210.0,...,,,0,0,,,,,0,False
134556,134556,14530.0,veroviz/models/car_blue.gltf,100,45,397566,40.680945,-73.975673,0,397966.0,...,,,0,0,,,,,0,False
171702,171702,14530.0,veroviz/models/car_blue.gltf,100,45,490146,40.688417,-73.984517,0,490447.0,...,,,0,0,,,,,0,False
175456,175456,14530.0,veroviz/models/car_blue.gltf,100,45,494095,40.689810,-73.974931,0,494262.0,...,,,0,0,,,,,0,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1201970,1201970,14530.0,veroviz/models/car_blue.gltf,100,45,2623259,40.749156,-73.991600,0,2623788.0,...,,,0,0,,,,,0,False
1240596,0,14530.0,veroviz/models/car_red.gltf,100,45,511122,40.689622,-73.983043,0,566536.0,...,,,0,0,,,,,0,False
1240597,1,14530.0,veroviz/models/car_red.gltf,100,45,1331282,40.733320,-73.995101,0,1339185.0,...,,,0,0,,,,,0,False
1240598,2,14530.0,veroviz/models/car_red.gltf,100,45,1339473,40.734814,-73.992085,0,1415672.0,...,,,0,0,,,,,0,False


In [52]:
# Show all of the arcs for this particular bike:
vrv.createLeaflet(arcs=assignments[assignments['objectID'] == min(assignments['objectID'])])

In [53]:
# Show just the repositioning moves for this particular bike:
vrv.createLeaflet(arcs      = assignments_repos[assignments_repos['objectID'] == min(assignments_repos['objectID'])], 
                  useArrows = True)

--- 

#### <font color='orange'>Add "Static" Assignments for Stationary Bikes</font>

In [77]:
# Create a dataframe consisting of only bike trips where the bike is stationary at the end.
bt_stat_df = bike_trips_df[(bike_trips_df['bikeid'] == bike_trips_df['next_bikeid']) & 
                           (bike_trips_df['end station id'] == bike_trips_df['next_startStationID']) &
                           (bike_trips_df['timeAfterStart'] < bike_trips_df['next_timeAfterStart'])]

In [78]:
# We have a lot of bikes...let's just limit our focus to one.
# Get a list of bike_trips_df indices where the bike is stationary at the end of each trip.
stationary_indices = list(bike_trips_df[(bike_trips_df['bikeid'] == min(bike_trips_df['bikeid'])) &
                                        (bike_trips_df['bikeid'] == bike_trips_df['next_bikeid']) & 
                                        (bike_trips_df['end station id'] == bike_trips_df['next_startStationID']) &
                                        (bike_trips_df['timeAfterStart'] < bike_trips_df['next_timeAfterStart'])].index)

In [79]:
len(stationary_indices)

69

In [81]:
# Initialize an empty assignments dataframe to hold the static assignments:
stat_asgn_df = vrv.initDataframe('assignments')

# Append stationary bikes to this temporary dataframe:
for i in stationary_indices:
    stat_asgn_df = vrv.addStaticAssignment( \
                            # odID          = 1,
                            objectID        = bike_trips_df.loc[i]['bikeid'],
                            modelFile       = 'veroviz/models/car_blue.gltf',
                            modelScale      = 100,
                            modelMinPxSize  = 75,
                            loc             = list(bike_trips_df.loc[i][['end station latitude', 'end station longitude']]),
                            startTimeSec    = bike_trips_df.loc[i]['timeAfterStart'] + bike_trips_df.loc[i]['tripduration'],
                            endTimeSec      = bike_trips_df.loc[i]['next_timeAfterStart']) 
    
# Finally, combine the `assignments` and `stat_asgn_df` dataframes
assignments = pd.concat([assignments, stat_asgn_df], ignore_index=True)


# NOTE:  It is **WAY** faster to do it this way, rather than appending to
#        the "assignments" dataframe within the "for" loop.

--- 

### Create a Cesium movie for one bike

In [82]:
# Use this command to get documentation on the `createCesium()` function:
vrv.createCesium?

In [83]:
# Create properly-formatted start date and time strings.

# startDate: Format is "YYYY-MM-DD"
startDate = pd.to_datetime(min(bike_trips_df['starttime'])).strftime('%Y-%m-%d')

# startTime: Format is "HH:MM:SS"
startTime = pd.to_datetime(min(bike_trips_df['starttime'])).strftime('%H:%M:%S')

In [84]:
# Let's pick a specific bike to follow:
myBike = min(assignments['objectID'])
myBike

14530.0

#### Here's the original way we created the Cesium movie

In [85]:
vrv.createCesium(
    assignments = assignments[assignments['objectID'] == myBike],
    nodes       = nodes,
    startDate   = startDate,
    startTime   = startTime,
    cesiumDir   = os.environ['CESIUMDIR'],
    problemDir  = 'IE_670/citibike_example_cluttered')

Message: File selector was written to /Users/murray/cesium/IE_670/citibike_example_cluttered/;IE_670;citibike_example_cluttered.vrv ...
Message: Configs were written to /Users/murray/cesium/IE_670/citibike_example_cluttered/config.js ...
Message: Nodes were written to /Users/murray/cesium/IE_670/citibike_example_cluttered/displayNodes.js ...
Message: Assignments (.js) were written to /Users/murray/cesium/IE_670/citibike_example_cluttered/displayPaths.js ...
Message: Assignments (.czml) were written to /Users/murray/cesium/IE_670/citibike_example_cluttered/routes.czml ...


#### <font color="orange">An Improved Version</font>

- The Cesium movie is cluttered with all of our station markers.  It would be better to only include the markers that are actually relevant to our given bike.

- Fortunately, our bike trips df contains the station IDs.
    - We just need to get a list of unique IDs, and then pass to `createCesium()` only the subset of nodes corresponding to these IDs.

In [86]:
# First, let's grab a list of *unique* station IDs associated with `myBike`.
setOfStationIDs = set(bike_trips_df[bike_trips_df['bikeid'] == myBike]['start station id']).union(set(bike_trips_df[bike_trips_df['bikeid'] == myBike]['end station id']))
listOfStationIDs = list(map(str, setOfStationIDs))
listOfStationIDs

# NOTES:  
# 1. The first line is rather long.  
#    It takes the union of 2 Pandas DF columns (start and end station IDs).
# 2. The bikeids in bike_trips_df appear to be integers,
#    but the nodes dataframe is treating id as a string.
#    The map() function is converting to strings.

['3584',
 '261',
 '519',
 '3346',
 '280',
 '3362',
 '297',
 '3377',
 '3382',
 '311',
 '3386',
 '3388',
 '3648',
 '321',
 '3137',
 '324',
 '326',
 '3409',
 '3412',
 '3414',
 '3674',
 '3420',
 '3676',
 '3422',
 '355',
 '3429',
 '3687',
 '362',
 '364',
 '367',
 '3440',
 '376',
 '379',
 '127',
 '3724',
 '398',
 '143',
 '402',
 '151',
 '157',
 '3486',
 '3232',
 '418',
 '420',
 '167',
 '3242',
 '174',
 '439',
 '456',
 '462',
 '2000',
 '465',
 '3282',
 '467',
 '3283',
 '217',
 '3804',
 '476',
 '3298',
 '482',
 '3812',
 '3047',
 '3308',
 '496',
 '241',
 '3569',
 '3571',
 '3059',
 '499',
 '3314',
 '501',
 '504',
 '3579',
 '509']

In [87]:
# Just for fun, let's filter the nodes DF for just the unique station IDs
# associated with our particular bike:
nodes[nodes['id'].isin(listOfStationIDs)]

# NOTE:  This cell isn't necessary...it just demonstrates the use of the filter.

Unnamed: 0,id,lat,lon,altMeters,nodeName,nodeType,popupText,leafletIconPrefix,leafletIconType,leafletColor,leafletIconText,cesiumIconType,cesiumColor,cesiumIconText,elevMeters,is_renting,is_returning
7,127,40.7317,-74.0067,0,Barrow St & Hudson St,CitiBikeStation,,fa,bicycle,green,Barrow St & Hudson St,pin,Cesium.Color.GREEN,127,,1,1
9,143,40.6924,-73.9934,0,Clinton St & Joralemon St,CitiBikeStation,,fa,bicycle,green,Clinton St & Joralemon St,pin,Cesium.Color.GREEN,143,,1,1
13,151,40.7221,-73.9972,0,Cleveland Pl & Spring St,CitiBikeStation,,fa,bicycle,green,Cleveland Pl & Spring St,pin,Cesium.Color.GREEN,151,,1,1
16,157,40.6909,-73.9961,0,Henry St & Atlantic Ave,CitiBikeStation,,fa,bicycle,green,Henry St & Atlantic Ave,pin,Cesium.Color.GREEN,157,,1,1
21,174,40.7382,-73.9774,0,E 25 St & 1 Ave,CitiBikeStation,,fa,bicycle,green,E 25 St & 1 Ave,pin,Cesium.Color.GREEN,174,,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
742,3676,40.6758,-74.0147,0,Van Brunt St & Van Dyke St,CitiBikeStation,,fa,bicycle,red,Van Brunt St & Van Dyke St,pin,Cesium.Color.RED,3676,,0,0
748,3687,40.7432,-73.9745,0,E 33 St & 1 Ave,CitiBikeStation,,fa,bicycle,green,E 33 St & 1 Ave,pin,Cesium.Color.GREEN,3687,,1,1
768,3724,40.7667,-73.9791,0,7 Ave & Central Park South,CitiBikeStation,,fa,bicycle,green,7 Ave & Central Park South,pin,Cesium.Color.GREEN,3724,,1,1
826,3804,40.7025,-73.9868,0,Front St & Jay St,CitiBikeStation,,fa,bicycle,green,Front St & Jay St,pin,Cesium.Color.GREEN,3804,,1,1


In [88]:
# Here's the "improved" way of generating the Cesium movie with less clutter:
vrv.createCesium(
    assignments = assignments[assignments['objectID'] == myBike],
    nodes       = nodes[nodes['id'].isin(listOfStationIDs)],      # <-- changed
    startDate   = startDate,
    startTime   = startTime,
    cesiumDir   = os.environ['CESIUMDIR'],
    problemDir  = 'IE_670/citibike_example_clean')                # <-- renamed

Message: File selector was written to /Users/murray/cesium/IE_670/citibike_example_clean/;IE_670;citibike_example_clean.vrv ...
Message: Configs were written to /Users/murray/cesium/IE_670/citibike_example_clean/config.js ...
Message: Nodes were written to /Users/murray/cesium/IE_670/citibike_example_clean/displayNodes.js ...
Message: Assignments (.js) were written to /Users/murray/cesium/IE_670/citibike_example_clean/displayPaths.js ...
Message: Assignments (.czml) were written to /Users/murray/cesium/IE_670/citibike_example_clean/routes.czml ...


In [89]:
# We can also use our nodes filter to create a "cleaner" Leaflet map:
vrv.createLeaflet(
    arcs      = assignments[assignments['objectID'] == myBike],
    nodes     = nodes[nodes['id'].isin(listOfStationIDs)],
    useArrows = True)

--- 

#### Playing around with dates/times
- Here's some code related to formatting dates/times.  There might be something useful here in the future...

In [61]:
pd.to_datetime(bike_trips_df['starttime']).dt.date

0          2020-01-02
1          2020-01-02
2          2020-01-05
3          2020-01-06
4          2020-01-06
              ...    
1240591    2020-01-28
1240592    2020-01-29
1240593    2020-01-29
1240594    2020-01-31
1240595    2020-01-31
Name: starttime, Length: 1240596, dtype: object

In [62]:
pd.to_datetime(min(bike_trips_df['starttime'])).strftime('%Y-%m-%d')

'2020-01-01'

In [63]:
pd.to_datetime(min(bike_trips_df['starttime'])).strftime('%H:%M:%S')

'00:00:55'

--- 

### NYC Subway Stations

- A list of subway stations may be found here:
    - http://web.mta.info/developers/data/nyct/subway/Stations.csv 

- Other links:
    - http://web.mta.info/developers/index.html
    - http://datamine.mta.info/list-of-feeds 
    
Ideas:
- For a given location, find the nearest subway station.
- For a given destination, find the nearest **available** CitiBike station.
- For a given O/D pair, determine the best combination of subways/bikes to use.
