In [3]:
import pandas as pd

# Bike Sharing

## Objectives

- Predict the count of total rental bikes based on temporal and meterological factors *or*
- find days with anomalous counts of total rental bikes and try to explain the appearances.


## Algorithms/Models

- Multiple Regression (Regression)
- Support Vector Machine (Regression)
- Random Forest (Regression)
- Seasonal Hybrid ESD (Anomaly Detection)

## Data Preview

In [4]:
data = pd.read_csv('bike_sharing.csv')
data.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600


## Data Dictionary

- instant = record index
- dteday = date
- season = season (1:springer, 2:summer, 3:fall, 4:winter)
- yr = year (0: 2011, 1:2012)
- mnth = month ( 1 to 12)
- hr = hour (0 to 23)
- holiday = weather day is holiday or not (extracted from [Web Link])
- weekday = day of the week
- workingday = if day is neither weekend nor holiday is 1, otherwise is 0.
- weathersit = weather situation
 - 1: Clear, Few clouds, Partly cloudy, Partly cloudy
 - 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
 - 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
 - 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
- temp = Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39 (only in hourly scale)
- atemp = Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50 (only in hourly scale)
- hum = Normalized humidity. The values are divided to 100 (max)
- windspeed = Normalized wind speed. The values are divided to 67 (max)
- casual = count of casual users
- registered = count of registered users
- cnt = count of total rental bikes including both casual and registered

## Basic Literature

- DeMaio, P. (2009). Bike-sharing: History, Impacts, Models of Provision, and Future. *Journal of Public Transportation, 12(4)*, 41-56. http://doi.org/10.5038/2375-0901.12.4.
- Hochenbaum, J. et al. (2017). Automatic Anomaly Detection in the Cloud via Statistical Learning. Published on arXiv. Retrieved from https://arxiv.org/pdf/1704.07706.pdf.
- Singhvi, D. et al. (2015). Predicting Bike Usage for New York City's Bike Sharing System. In AAAI Workshops. Retrieved from https://aaai.org/ocs/index.php/WS/AAAIW15/paper/view/10115.

## Additional Packages

- Anomaly Detection: thermometr (https://github.com/nlittlepoole/thermometr)

## Source

https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset