<img src="./images/Divvy_Logo.svg" alt="Drawing" align="left" style="width: 200px;"/>

# Chicago [Divvy](https://www.divvybikes.com/) Bicycle Sharing Data Analysis and Modeling

In this notebook, I conducted a series of exploratory data analysis and modeling on [Chicago Divvy bicycle sharing data](https://www.divvybikes.com/system-data). The goal of this project includes:

* Visualize the bicycle sharing data
* Try to find some interesting pheonona behind the data
* Try to model the bicycle needs behind the data

In [1]:
# import necessary packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import gc, os

from utils import query_weather

%matplotlib inline

# Data Preprocessing

### Weather information

Among all the external information, weather has a huge influence on the usage of bicycle in Chicago. In this project, I first write a wrapper to download the weather information from [Weather Underground](https://www.wunderground.com/). 

In [5]:
# Query data in different years
keys = ['ac6a917d396d3bd0', '37c617f5f653f918', '7a60a4d9659f26ff', 
        'ccdd498e9a04cf55', '86c52e1a015baa55']
years = [2013, 2014, 2015, 2016, 2017]

for key, year in zip(keys, years):
    path = './data/weather_' + str(year) + '.csv'
    if os.path.isfile(path):
        continue
    df, _ = query_weather(key=key, year=year, state='IL', area='Chicago')
    df.to_csv(path, index=False)
    print('File saved:\t', path)

In [None]:
# load weather information
weather_2013 = pd.read_csv('./data/weather_2013.csv', parse_dates=['date'])
weather_2014 = pd.read_csv('./data/weather_2014.csv', parse_dates=['date'])
weather_2015 = pd.read_csv('./data/weather_2015.csv', parse_dates=['date'])
weather_2016 = pd.read_csv('./data/weather_2016.csv', parse_dates=['date'])
weather_2017 = pd.read_csv('./data/weather_2017.csv', parse_dates=['date'])

### Divvy bicycle sharing data

In [None]:
# read data from ./data/
# year 2013
trip_2013 = pd.read_csv('./data/2013/Divvy_Trips_2013.csv', low_memory=False)
station_2013 = pd.read_csv('./data/2013/Divvy_Stations_2013.csv')

# year 2014, Q1 and Q2
trip_2014_Q1Q2 = pd.read_csv('./data/2014_Q1Q2/Divvy_Trips_2014_Q1Q2.csv', low_memory=False)
station_2014_Q1Q2 = pd.read_excel('./data/2014_Q1Q2/Divvy_Stations_2014-Q1Q2.xlsx')

# year 2014, Q3 and Q4
trip_2014_Q3_07 = pd.read_csv('./data/2014_Q3Q4/Divvy_Trips_2014-Q3-07.csv', low_memory=False)
trip_2014_Q3_0809 = pd.read_csv('./data/2014_Q3Q4/Divvy_Trips_2014-Q3-0809.csv', low_memory=False)
trip_2014_Q4 = pd.read_csv('./data/2014_Q3Q4/Divvy_Trips_2014-Q4.csv', low_memory=False)
station_2014_Q3Q4 = pd.read_csv('./data/2014_Q3Q4/Divvy_Stations_2014-Q3Q4.csv')

# year 2015, Q1 and Q2
trip_2015_Q1 = pd.read_csv('./data/2015_Q1Q2/Divvy_Trips_2015-Q1.csv', low_memory=False)
trip_2015_Q2 = pd.read_csv('./data/2015_Q1Q2/Divvy_Trips_2015-Q2.csv', low_memory=False)
station_2015 = pd.read_csv('./data/2015_Q1Q2/Divvy_Stations_2015.csv')

# year 2015, Q3 and Q4
trip_2015_Q3_07 = pd.read_csv('./data/2015_Q3Q4/Divvy_Trips_2015_07.csv', low_memory=False)
trip_2015_Q3_08 = pd.read_csv('./data/2015_Q3Q4/Divvy_Trips_2015_08.csv', low_memory=False)
trip_2015_Q3_09 = pd.read_csv('./data/2015_Q3Q4/Divvy_Trips_2015_09.csv', low_memory=False)
trip_2015_Q4 = pd.read_csv('./data/2015_Q3Q4/Divvy_Trips_2015_Q4.csv', low_memory=False)

# year 2016, Q1 and Q2
trip_2016_Q1 = pd.read_csv('./data/2016_Q1Q2/Divvy_Trips_2016_Q1.csv', low_memory=False)
trip_2016_Q2_04 = pd.read_csv('./data/2016_Q1Q2/Divvy_Trips_2016_04.csv', low_memory=False)
trip_2016_Q2_05 = pd.read_csv('./data/2016_Q1Q2/Divvy_Trips_2016_05.csv', low_memory=False)
trip_2016_Q2_06 = pd.read_csv('./data/2016_Q1Q2/Divvy_Trips_2016_06.csv', low_memory=False)
station_2016_Q1Q2 = pd.read_csv('./data/2016_Q1Q2/Divvy_Stations_2016_Q1Q2.csv')

# year 2016, Q3 and Q4
trip_2016_Q3 = pd.read_csv('./data/2016_Q3Q4/Divvy_Trips_2016_Q3.csv', low_memory=False)
station_2016_Q3 = pd.read_csv('./data/2016_Q3Q4/Divvy_Stations_2016_Q3.csv')
trip_2016_Q4 = pd.read_csv('./data/2016_Q3Q4/Divvy_Trips_2016_Q4.csv', low_memory=False)
station_2016_Q4 = pd.read_csv('./data/2016_Q3Q4/Divvy_Stations_2016_Q4.csv')

# year 2017, Q1 and Q2
trip_2017_Q1 = pd.read_csv('./data/2017_Q1Q2/Divvy_Trips_2017_Q1.csv', low_memory=False)
trip_2017_Q2 = pd.read_csv('./data/2017_Q1Q2/Divvy_Trips_2017_Q2.csv', low_memory=False)
station_2017_Q1Q2 = pd.read_csv('./data/2017_Q1Q2/Divvy_Stations_2017_Q1Q2.csv')

# year 2017, Q3 and Q4
trip_2017_Q3 = pd.read_csv('./data/2017_Q3Q4/Divvy_Trips_2017_Q3.csv', low_memory=False)
trip_2017_Q4 = pd.read_csv('./data/2017_Q3Q4/Divvy_Trips_2017_Q4.csv', low_memory=False)
station_2017_Q3Q4 = pd.read_csv('./data/2017_Q3Q4/Divvy_Stations_2017_Q3Q4.csv')

### Combine bicycle, station, and weather data

In [None]:
_ = gc.collect()

# Visualization and Analysis

# Data Modeling

# What's Next?

### Fixed stations vs. Station-less, which one is better?


| Divvy Bicycle              | China Ofo / Mobike         |
|----------------------------|----------------------------|
| Fixed stations             | Station-less               |
| Easy to manage             | Hard to manage             |
| Easy to track              | Hard to track single user  |
| High cost                  | Low cost                   |