# LA Bikeshare Data
After working on my [Google Location History Notebook](https://nbviewer.jupyter.org/github/black-tea/google_location_history/blob/master/MyTimeinLA.ipynb), I wanted to dig into spatial data a bit more. I've seen a number of different bikeshare data analyses in the past few years; I thought I would give it a try myself.
### Extract: Get the data from LA Metro
Go to https://bikeshare.metro.net/about/data/ and download all the (1) trip data and (2) station information. As of 7/15/2017, there is one year of trip data released, separated by quarters. Everything is in CSV format.

In [16]:
##### Setup
%matplotlib inline
import urllib2
import json
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
import pysal as ps
import matplotlib.pyplot as plt
import datetime
import folium

##### About LA Metro Bikeshare Data
Attribute information is provided directly on the [LA Metro website](https://bikeshare.metro.net/about/data/). The station information contains the following fields:
* Station ID: Unique identifier for the station
* Station Name: Name of the station
* Go Live Date: Date that the station first became active
* Status: "Active" for stations available or "Inactive" for stations that are not available as of the latest update

The trip data contains the following fields:
* Trip ID ("trip_id"): Unique identifier for the trip
* Duration ("duration"): Duration of the trip, in minutes
* Start Time ("start_time"): Date/Time that the trip began, in ISO 8601 format in local time
* End Time ("end_time"): Date/Time that the trip ended, in ISO 8601 format in local time
* Start Station ("start_station"): Station ID where the trip originated
* Start Latitude ("start_lat"): Y coordinate of the station where the trip originated
* Start Longitude ("start_lon"): X coordinate of the station where the trip originated
* End Station ("end_station"): Station ID where the trip terminated
* End Latitude ("end_lat"): Y coordinate of the station where the trip terminated
* End Longitude ("end_lon"): X coordinate of the station where the trip terminated
* Bike ID ("bike_id"): Unique identifier for the bike
* Plan Duration ("plan_duration"): number of days that the plan the passholder is using entitles them to ride; 0 is used for a single ride plan (Walk-up)
* Trip Type ("trip_route_category"): One Way or Round Trip
* Passholder Type ("passholder_type"): Name of the passholders plan

I wanted to start by looking at the stations on a map. However, since the station information table doesn't contain lat/lon coordinates, I would need to merge the tale with the trip data OR use the [updated GeoJSON feed](https://bikeshare.metro.net/stations/json/) on the LA Metro website. I opted for the latter. 

In [26]:
# Initially getting HTTP Forbidden Error, so added headers
url = "https://bikeshare.metro.net/stations/json/"
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
       'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
       'Accept-Encoding': 'none',
       'Accept-Language': 'en-US,en;q=0.8',
       'Connection': 'keep-alive'}

# Load the GeoJSON
req = urllib2.Request(url, headers=hdr)
try:
    response = urllib2.urlopen(req)
except urllib2.HTTPError, e:
    print e.fp.read()
station_json = json.loads(response.read())

# Import to GeoDataFrame
stations = gpd.GeoDataFrame.from_features(station_json['features'])

# Create basemap and add station points
station_map = folium.Map([34.047677, -118.3073917], tiles='CartoDB positron', zoom_start=11)
for index, row in stations.iterrows():
    folium.Marker(
            [str(row.geometry.centroid.y), str(row.geometry.centroid.x)],
            popup=row['name']
            ).add_to(station_map)
station_map

The system only exists downtown and (as of only very recently) in Pasadena. We will come back to this map later. Let's turn our attention to trip data.

In [27]:
# Load the station file
station_path = 'data/metro_station_table.csv'
stations = pd.read_csv(station_path)

# Load the trip data
q1 = pd.read_csv('data/la_metro_gbfs_trips_Q1_2017.csv')
q2 = pd.read_csv('data/la_metro_gbfs_trips_Q2_2017.csv')
q3 = pd.read_csv('data/MetroBikeShare_2016_Q3_trips.csv')
q4 = pd.read_csv('data/Metro_trips_Q4_2016.csv')

# Concatenate data
frames = [q1,q2,q3,q4]
trips = pd.concat(frames)
trips.head()

Unnamed: 0,bike_id,duration,end_lat,end_lon,end_station,end_station_id,end_time,passholder_type,plan_duration,start_lat,start_lon,start_station,start_station_id,start_time,trip_id,trip_route_category
0,6220,480,34.048851,-118.246422,,3029,1/1/2017 0:23,Monthly Pass,30.0,34.051941,-118.24353,,3030,1/1/2017 0:15,17059131,One Way
1,6351,720,34.058319,-118.246094,,3028,1/1/2017 0:36,Walk-up,0.0,34.058319,-118.246094,,3028,1/1/2017 0:24,17059130,Round Trip
2,5836,1020,34.043732,-118.260139,,3018,1/1/2017 0:45,Walk-up,0.0,34.04998,-118.247162,,3027,1/1/2017 0:28,17059129,One Way
3,6142,300,34.044701,-118.252441,,3031,1/1/2017 0:43,Monthly Pass,30.0,34.05048,-118.254593,,3007,1/1/2017 0:38,17059128,One Way
4,6135,300,34.044701,-118.252441,,3031,1/1/2017 0:43,Monthly Pass,30.0,34.05048,-118.254593,,3007,1/1/2017 0:38,17059127,One Way
