<a href="https://colab.research.google.com/github/SamuelWanjiru/Bike-sharing-forecast/blob/main/BikeSharing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#  **Bike Sharing Washington DC 🚲** 
---
## **Context**
Climate change is forcing cities to re-imaging their transportation infrastructure. Shared mobility concepts, such as car sharing, bike sharing or scooter sharing become more and more popular.
And if they are implemented well, they can actually contribute to mitigating climate change. Bike sharing in particular is interesting because no electricity of gasoline is necessary (unless e-bikes are used) for this mode of transportation. However, there are inherent problems to this type of shared mobility:
*   varying demand at bike sharing stations needs to be balanced to avoid oversupply or shortages
*   Heavily used bikes break down more often

Forecasting the future demand can help address those issues. Moreover, demand forecasts can help operators decide whether to expand the business, determine adequate prices and generate additional income through advertisements at particularly busy stations.
But that's not all. Another challenge is redistributing bikes between stations and determining the optimal routes. And determining the location of new stations is also an area of interest for operators.

## **Content**
This dataset can be used to forecast demand to avoid oversupply and shortages. It spans from January 1, 2011, until December 31, 2018. Determining new station locations, analyzing movement patterns or planning routes will only be possible with additional data.

## **Connecting/mounting the google drive**

In [1]:
from google.colab import drive 
drive.mount('/content/gdrive')

Mounted at /content/gdrive


## **Importing the relevant data analysis libraries**

In [4]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sb
import math
from scipy.stats import kruskal, pearsonr, randint, uniform, chi2_contingency, boxcox

## Loading the dataset from google drive and converting the date column into a datetime variable.

In [6]:
bike_data=pd.read_csv(r'/content/gdrive/My Drive/KAGGLE PROJECTS/Bike Sharing Washington DC/bike_sharing_dataset.csv',parse_dates=['date'])

## **Understanding the data**

In [7]:
# Displaying the 1st 5 rows of the bike dataset
bike_data.head()

Unnamed: 0,date,temp_avg,temp_min,temp_max,temp_observ,precip,wind,wt_fog,wt_heavy_fog,wt_thunder,...,wt_freeze_rain,wt_snow,wt_ground_fog,wt_ice_fog,wt_freeze_drizzle,wt_unknown,casual,registered,total_cust,holiday
0,2011-01-01,,-1.566667,11.973333,2.772727,0.069333,2.575,1.0,,,...,,,,,,,330.0,629.0,959.0,
1,2011-01-02,,0.88,13.806667,7.327273,1.037349,3.925,1.0,1.0,,...,,,,,,,130.0,651.0,781.0,
2,2011-01-03,,-3.442857,7.464286,-3.06,1.878824,3.625,,,,...,,,,,,,120.0,1181.0,1301.0,
3,2011-01-04,,-5.957143,4.642857,-3.1,0.0,1.8,,,,...,,,,,,,107.0,1429.0,1536.0,
4,2011-01-05,,-4.293333,6.113333,-1.772727,0.0,2.95,,,,...,,,,,,,82.0,1489.0,1571.0,


In [8]:
# Descriptive statistics
bike_data.describe()

Unnamed: 0,temp_avg,temp_min,temp_max,temp_observ,precip,wind,wt_fog,wt_heavy_fog,wt_thunder,wt_sleet,...,wt_freeze_rain,wt_snow,wt_ground_fog,wt_ice_fog,wt_freeze_drizzle,wt_unknown,casual,registered,total_cust,holiday
count,2101.0,2922.0,2922.0,2922.0,2922.0,2922.0,1503.0,208.0,694.0,129.0,...,5.0,84.0,36.0,10.0,4.0,1.0,2918.0,2918.0,2918.0,89.0
mean,14.419007,8.506468,19.015689,11.069243,3.435734,3.162898,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1679.776217,6046.297121,7726.073338,1.0
std,9.556401,9.473941,9.835524,9.481232,8.183658,1.379582,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,,1560.762932,2756.888032,3745.220092,0.0
min,-12.1,-16.99375,-7.98,-15.658333,0.0,0.375,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,2.0,19.0,21.0,1.0
25%,6.566667,0.516538,11.081562,3.013068,0.00551,2.2,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,512.25,3839.25,4628.5,1.0
50%,15.433333,8.504911,19.992857,11.619091,0.271504,2.9,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1220.5,5964.0,7442.5,1.0
75%,23.066667,17.338393,27.874583,19.767083,2.885381,3.875,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,2357.25,8187.5,10849.5,1.0
max,31.733333,26.20625,37.85,28.666667,118.789796,12.75,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,10173.0,15419.0,19113.0,1.0


In [9]:
# Checking the data types of every varaible

bike_data.dtypes

date                 datetime64[ns]
temp_avg                    float64
temp_min                    float64
temp_max                    float64
temp_observ                 float64
precip                      float64
wind                        float64
wt_fog                      float64
wt_heavy_fog                float64
wt_thunder                  float64
wt_sleet                    float64
wt_hail                     float64
wt_glaze                    float64
wt_haze                     float64
wt_drift_snow               float64
wt_high_wind                float64
wt_mist                     float64
wt_drizzle                  float64
wt_rain                     float64
wt_freeze_rain              float64
wt_snow                     float64
wt_ground_fog               float64
wt_ice_fog                  float64
wt_freeze_drizzle           float64
wt_unknown                  float64
casual                      float64
registered                  float64
total_cust                  

### All the variables apart from date are float variables. The date variable is converted to a datetime variable at the point of data loading.


### *Checking and dealing with missing values*

In [10]:
bike_data.isnull().sum()

date                    0
temp_avg              821
temp_min                0
temp_max                0
temp_observ             0
precip                  0
wind                    0
wt_fog               1419
wt_heavy_fog         2714
wt_thunder           2228
wt_sleet             2793
wt_hail              2872
wt_glaze             2769
wt_haze              2217
wt_drift_snow        2915
wt_high_wind         2664
wt_mist              2551
wt_drizzle           2794
wt_rain              2516
wt_freeze_rain       2917
wt_snow              2838
wt_ground_fog        2886
wt_ice_fog           2912
wt_freeze_drizzle    2918
wt_unknown           2921
casual                  4
registered              4
total_cust              4
holiday              2833
dtype: int64

In [11]:
# fill missing values with 0 where applicable

wt_features = [x for x in bike_data.columns if 'wt' in x]
bike_data['holiday'] = bike_data['holiday'].fillna(0)
bike_data[wt_features] = bike_data[wt_features].fillna(0)

In [13]:
# check casual, registered and total_cust missing rows
missing_target = bike_data[bike_data['total_cust'].isna()]
missing_target

Unnamed: 0,date,temp_avg,temp_min,temp_max,temp_observ,precip,wind,wt_fog,wt_heavy_fog,wt_thunder,...,wt_freeze_rain,wt_snow,wt_ground_fog,wt_ice_fog,wt_freeze_drizzle,wt_unknown,casual,registered,total_cust,holiday
1848,2016-01-23,-4.366667,-6.128571,-2.392857,-4.688889,42.045946,8.08,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,,,,0.0
1849,2016-01-24,-2.666667,-7.985714,-1.028571,-6.366667,19.33913,3.75,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,,,,0.0
1850,2016-01-25,-5.133333,-11.128571,2.028571,-9.877778,0.0,1.15,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,,,,0.0
1851,2016-01-26,2.333333,-7.871429,7.471429,3.588889,0.0,2.85,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,,,,0.0


There seem to be four days where no data was captured for the rented bikes. 
Being a time series, i will interpolate these missing values using the forward fill method.

In [14]:
# filling the missing values in the customer variables with forward fill method
bike_data[['total_cust', 'casual', 'registered']] = bike_data[['total_cust', 'casual', 'registered']].fillna( method='ffill')
bike_data.isnull().sum()

date                   0
temp_avg             821
temp_min               0
temp_max               0
temp_observ            0
precip                 0
wind                   0
wt_fog                 0
wt_heavy_fog           0
wt_thunder             0
wt_sleet               0
wt_hail                0
wt_glaze               0
wt_haze                0
wt_drift_snow          0
wt_high_wind           0
wt_mist                0
wt_drizzle             0
wt_rain                0
wt_freeze_rain         0
wt_snow                0
wt_ground_fog          0
wt_ice_fog             0
wt_freeze_drizzle      0
wt_unknown             0
casual                 0
registered             0
total_cust             0
holiday                0
dtype: int64