# Weather and the MBTA #

## Motivation ##

This year, Boston has seen unprecedented levels of snow. As expected, it has caused a transportation nightmare between the congestion, delays, and cancelations. It has been so chaotic that the Boston MBTA chief has decided to step down due to public pressure. With such a rare opportunity, we decided to concentrate on the affects of snow on ridership.

## Methodology ##

### Setup ###

In [1]:
# Libraries.
import os

import matplotlib, matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

sns.set_style("whitegrid")
sns.set_context("paper")
% matplotlib inline

### Data Wrangling ###

#### MBTA ####

The MBTA provided us with entry data for each station at 15 minute intervals. After some wrangling, our base data set was composed of a ***stations*** data set and a ***gate count*** data set.

##### Stations #####

The station data set contained basic information for each station. One row corresponded to one station. Some of the data, in particular the latitude and longitude, was scraped from the web.

- ***stationid***: The unique identifier for the station.
- ***name***: The full name of the station station.
- ***line_1***: The primary line of the station (red/green/blue/orange).
- ***line_2***: The secondary line of the station (red/green/blue/orange). Only a few stations, such as park street (green/red line), had a value for this field.
- ***lat***: The latitude of the station.
- ***lon***: The longitude of the station.

In [3]:
stations = pd.read_csv('../../../data/stations.csv', low_memory=False)
stations.head()

Unnamed: 0,stationid,name,line_1,line_2,lat,lon
0,1002,Andrew Square,Red,,42.32955,-71.05696
1,1004,JFK/U Mass,Red,,42.321438,-71.052393
2,1005,North Quincy,Red,,42.274816,-71.029176
3,1006,Wollaston,Red,,42.265615,-71.019402
4,1007,Quincy Center,Red,,42.250879,-71.004798


##### Gate Count #####

The gate count data set contained basic information for entries at each station. One row corresponded to the number of entries at a particular 15 minute interval for a station.

- ***locationid***: The unique identifier for the station.
- ***entries***: The number of entries for 15 minute interval.
- ***exits***: The number of exits for the 15 minute interval (NOT USED - exists are unreliable due to the nature of the system).
- ***service_day***: The actual day the service started (services on weekends can run into the next day).
- ***service_datetime***: The 15 minute interval where the entries/exists were aggregated.

In [4]:
gatecounts = pd.read_csv('../../../data/gatecounts.csv', low_memory=False)
gatecounts.head()

Unnamed: 0,locationid,entries,exits,service_day,service_datetime
0,1002,0,1,2013-01-01 00:00:00,2013-01-01 03:00:00
1,1002,1,0,2013-01-01 00:00:00,2013-01-01 05:00:00
2,1002,2,0,2013-01-01 00:00:00,2013-01-01 05:15:00
3,1002,3,0,2013-01-01 00:00:00,2013-01-01 05:30:00
4,1002,6,0,2013-01-01 00:00:00,2013-01-01 05:45:00


#### Weather ####

The weather data was obtained by scraping the [wunderground](http://www.wunderground.com/) API. Each row corresponded to the weather for a single day in Boston. A sample of the data is provided below. The most important feature was the "snow_fall" column, which was the recorded snow fall for that day.

In [5]:
weather = pd.read_csv('../../../data/weather.csv', low_memory=False)
weather.head()

Unnamed: 0,date,fog,hail,rain,snow,temp_min,temp_max,temp_mean,rain_fall,snow_fall,wind_speed,vis_min,vis_max,vis_mean
0,2012-01-01 00:00:00,0,0,1,0,39,52,46,0.01,0,9,9,10,10
1,2012-01-02 00:00:00,0,0,1,0,34,50,42,0.01,0,14,10,10,10
2,2012-01-03 00:00:00,0,0,0,0,14,35,25,0.0,0,15,10,10,10
3,2012-01-04 00:00:00,0,0,0,0,10,28,19,0.0,0,12,10,10,10
4,2012-01-05 00:00:00,0,0,0,0,25,39,32,0.0,0,12,10,10,10


#### Daily Data ####

The daily data set was an aggregation of entries per day for each station, with a few additional features. Sparing the details on how we generated these additional features (please refer to the ***features*** section of our repository), some important ones to recognize are:

- ***entries_weeks_ago_1***: The number of entries for this station on the same day one week ago.
- ***snow_fall***: The amount of snow the fell for that day, in inches.
- ***snow_accum***: The amount of snow accumulated up to the current day. The snow accumulation was calculated using a quasi-linear decay function based on the snow fall of the previous two weeks.
- ***dist_to_center***: The distance (in kilometers) to the center of the city (city hall).


In [5]:
mbta_daily = pd.read_csv('../../../data/mbta_daily.csv', low_memory=False)
print("Rows: " + str(mbta_daily.shape[0]))
print("Cols: " + str(mbta_daily.shape[1]))
mbta_daily.head()

Rows: 47901
Cols: 47


Unnamed: 0,locationid,service_day,entries,name,line_1,line_2,lat,lon,service_datetime,fog,...,entries_weeks_ago_1,entries_weeks_ago_2,entries_weeks_ago_3,rain_predict,rain_fall_predict,snow_predict,snow_fall_predict,snow_accum,snow_accum_predict,dist_to_center
0,1002,2013-01-01 00:00:00,1892,Andrew Square,Red,,42.32955,-71.05696,2013-01-01 03:00:00,0,...,,,,0,0,0,0,0,0,3.404767
1,1002,2013-01-02 00:00:00,5134,Andrew Square,Red,,42.32955,-71.05696,2013-01-02 04:45:00,0,...,,,,0,0,0,0,0,0,3.404767
2,1002,2013-01-03 00:00:00,5733,Andrew Square,Red,,42.32955,-71.05696,2013-01-03 05:00:00,0,...,,,,0,0,0,0,0,0,3.404767
3,1002,2013-01-04 00:00:00,6125,Andrew Square,Red,,42.32955,-71.05696,2013-01-04 05:00:00,0,...,,,,0,0,0,0,0,0,3.404767
4,1002,2013-01-05 00:00:00,3410,Andrew Square,Red,,42.32955,-71.05696,2013-01-05 04:15:00,0,...,,,,0,0,1,0,0,0,3.404767


### Analysis ###

#### TODO: Daily Snow Examples for Different Bins? ####

##### Introduction #####

TODO

##### Conclusion #####

TODO

#### TODO: Linear Trend Models ####

##### Introduction #####

TODO

##### Conclusion #####

TODO

#### TODO: Prediction Improvement ####

##### Introduction #####

TODO

##### Conclusion #####

TODO

## Conclusion ##

- *Does snow affect Ridership?* **Yes!**

    TODO: Details.
    
    TODO: Linear Trend Image?


- *How does understanding snow help the MBTA?* **Improve predictions and staffing process!**

    TODO: Details.