 # Bike Sharing

> **Background Information**

>> Bike sharing systems are the new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back at another position. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues. 

> **Research Interest**
>> Apart from interesting real world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration of travel, departure and arrival position is explicitly recorded in these systems. This feature turns bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of important events in the city could be detected via monitoring these data.

> **Attribute Information**

>> - instant: record index 
- dteday : date 
- season : season (1:winter, 2:spring, 3:summer, 4:fall) 
- yr : year (0: 2011, 1:2012) 
- mnth : month ( 1 to 12) 
- hr : hour (0 to 23) 
- holiday : weather day is holiday or not 
- weekday : day of the week 
- workingday : if day is neither weekend nor holiday is 1, otherwise is 0. 
+ weathersit : 
    - 1: Clear, Few clouds, Partly cloudy, Partly cloudy 
    - 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist 
    - 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds 
    - 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog 
- temp : Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39 (only in hourly scale) 
- atemp: Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50 (only in hourly scale) 
- hum: Normalized humidity. The values are divided to 100 (max) 
- windspeed: Normalized wind speed. The values are divided to 67 (max) 
- casual: count of casual users 
- registered: count of registered users 
>> - cnt: count of total rental bikes including both casual and registered 

In [1]:
# Imports

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [2]:
# Get Rid of Warnings

import warnings
warnings.filterwarnings('ignore')

In [3]:
day = pd.read_csv('day.csv')

In [4]:
day.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600


In [5]:
day.tail()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
726,727,2012-12-27,1,1,12,0,4,1,2,0.254167,0.226642,0.652917,0.350133,247,1867,2114
727,728,2012-12-28,1,1,12,0,5,1,2,0.253333,0.255046,0.59,0.155471,644,2451,3095
728,729,2012-12-29,1,1,12,0,6,0,2,0.253333,0.2424,0.752917,0.124383,159,1182,1341
729,730,2012-12-30,1,1,12,0,0,0,1,0.255833,0.2317,0.483333,0.350754,364,1432,1796
730,731,2012-12-31,1,1,12,0,1,1,2,0.215833,0.223487,0.5775,0.154846,439,2290,2729


In [6]:
hour = pd.read_csv('hour.csv')

In [7]:
hour.head()

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
1,2,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
2,3,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32
3,4,2011-01-01,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10,13
4,5,2011-01-01,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1,1


In [8]:
hour.tail()

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
17374,17375,2012-12-31,1,1,12,19,0,1,1,2,0.26,0.2576,0.6,0.1642,11,108,119
17375,17376,2012-12-31,1,1,12,20,0,1,1,2,0.26,0.2576,0.6,0.1642,8,81,89
17376,17377,2012-12-31,1,1,12,21,0,1,1,1,0.26,0.2576,0.6,0.1642,7,83,90
17377,17378,2012-12-31,1,1,12,22,0,1,1,1,0.26,0.2727,0.56,0.1343,13,48,61
17378,17379,2012-12-31,1,1,12,23,0,1,1,1,0.26,0.2727,0.65,0.1343,12,37,49


# Day Dataset

In [9]:
day.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   instant     731 non-null    int64  
 1   dteday      731 non-null    object 
 2   season      731 non-null    int64  
 3   yr          731 non-null    int64  
 4   mnth        731 non-null    int64  
 5   holiday     731 non-null    int64  
 6   weekday     731 non-null    int64  
 7   workingday  731 non-null    int64  
 8   weathersit  731 non-null    int64  
 9   temp        731 non-null    float64
 10  atemp       731 non-null    float64
 11  hum         731 non-null    float64
 12  windspeed   731 non-null    float64
 13  casual      731 non-null    int64  
 14  registered  731 non-null    int64  
 15  cnt         731 non-null    int64  
dtypes: float64(4), int64(11), object(1)
memory usage: 91.5+ KB


In [10]:
day.drop(["instant"], axis = 1, inplace = True)

In [11]:
day.head()

Unnamed: 0,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
4,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600


In [12]:
day.shape

(731, 15)

In [13]:
day.describe()

Unnamed: 0,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
count,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0
mean,2.49658,0.500684,6.519836,0.028728,2.997264,0.683995,1.395349,0.495385,0.474354,0.627894,0.190486,848.176471,3656.172367,4504.348837
std,1.110807,0.500342,3.451913,0.167155,2.004787,0.465233,0.544894,0.183051,0.162961,0.142429,0.077498,686.622488,1560.256377,1937.211452
min,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.05913,0.07907,0.0,0.022392,2.0,20.0,22.0
25%,2.0,0.0,4.0,0.0,1.0,0.0,1.0,0.337083,0.337842,0.52,0.13495,315.5,2497.0,3152.0
50%,3.0,1.0,7.0,0.0,3.0,1.0,1.0,0.498333,0.486733,0.626667,0.180975,713.0,3662.0,4548.0
75%,3.0,1.0,10.0,0.0,5.0,1.0,2.0,0.655417,0.608602,0.730209,0.233214,1096.0,4776.5,5956.0
max,4.0,1.0,12.0,1.0,6.0,1.0,3.0,0.861667,0.840896,0.9725,0.507463,3410.0,6946.0,8714.0


In [14]:
# Getting rid of duplicates

day = day.drop_duplicates()
day.duplicated().sum()

0

In [15]:
# What are our headers in the dataset?

header_day = day.dtypes.index
print(header)

NameError: name 'header' is not defined

In [None]:
# Correlation
plt.subplots(figsize=(20,15))
sns.heatmap(day.corr(), annot = True, vmin=-1, vmax=1, center= 0, cmap= 'Greens', cbar_kws= {'orientation': 'horizontal'})

In [None]:
null_columns=day.columns[day.isnull().any()]
day[null_columns].isnull().sum()

In [None]:
day.isnull().sum()

In [None]:
%matplotlib inline
pd.set_option('display.max_columns',None)
sns.set(style="darkgrid", palette="pastel", color_codes=True)
sns.set_context('paper')

import plotly.graph_objs as go
from plotly.subplots import make_subplots
import plotly.io as pio
pio.templates.default = "plotly_dark"

In [None]:
# Season Distribution of Total Bike Users

fig = go.Figure()
fig.add_trace(go.Histogram(x = day[day['season'] == 1]["cnt"],marker_color="blue",name="Winter"))
fig.add_trace(go.Histogram(x = day[day['season'] == 2]["cnt"],marker_color="green",name="Spring"))
fig.add_trace(go.Histogram(x = day[day['season'] == 3]["cnt"],marker_color="red",name="Summer"))
fig.add_trace(go.Histogram(x = day[day['season'] == 4]["cnt"],marker_color="orange",name="Fall"))


# Overlay both histograms
fig.update_layout(barmode='overlay')

# Reduce opacity to see both histograms
fig.update_traces(opacity=0.75)
fig.update_layout(title="Season Distribution of Total Bike Users",xaxis_title="Usage Count",yaxis_title="Counts")
fig.show()

In [None]:
# Season Distribution of Registered Bike Users

fig = go.Figure()
fig.add_trace(go.Histogram(x = day[day['season'] == 1]["registered"],marker_color="blue",name="Winter"))
fig.add_trace(go.Histogram(x = day[day['season'] == 2]["registered"],marker_color="green",name="Spring"))
fig.add_trace(go.Histogram(x = day[day['season'] == 3]["registered"],marker_color="red",name="Summer"))
fig.add_trace(go.Histogram(x = day[day['season'] == 4]["registered"],marker_color="orange",name="Fall"))


# Overlay both histograms
fig.update_layout(barmode='overlay')

# Reduce opacity to see both histograms
fig.update_traces(opacity=0.75)
fig.update_layout(title="Season Distribution of Registered Bike Users",xaxis_title="Usage Count",yaxis_title="Counts")
fig.show()

In [None]:
# Season Distribution of Casual Bike Users

fig = go.Figure()
fig.add_trace(go.Histogram(x = day[day['season'] == 1]["casual"],marker_color="blue",name="Winter"))
fig.add_trace(go.Histogram(x = day[day['season'] == 2]["casual"],marker_color="green",name="Spring"))
fig.add_trace(go.Histogram(x = day[day['season'] == 3]["casual"],marker_color="red",name="Summer"))
fig.add_trace(go.Histogram(x = day[day['season'] == 4]["casual"],marker_color="orange",name="Fall"))


# Overlay both histograms
fig.update_layout(barmode='overlay')

# Reduce opacity to see both histograms
fig.update_traces(opacity=0.75)
fig.update_layout(title="Season Distribution of Casual Bike Users",xaxis_title="Usage Count",yaxis_title="Counts")
fig.show()

In [None]:
# Date Distribution

fig =  go.Figure(data=[go.Histogram(x= day["hum"])])
fig.show()

# Hour Dataset

In [None]:
hour.info()

In [None]:
hour.drop(["instant"], axis = 1, inplace = True)

In [None]:
hour.head()

In [None]:
hour.shape

In [None]:
hour.describe()

In [None]:
# Getting rid of duplicates

hour = hour.drop_duplicates()
hour.duplicated().sum()

In [None]:
# What are our headers in the dataset?

header_hour = hour.dtypes.index
print(header)

In [None]:
# Correlation
plt.subplots(figsize=(20,15))
sns.heatmap(hour.corr(), annot = True, vmin=-1, vmax=1, center= 0, cmap= 'YlOrBr', cbar_kws= {'orientation': 'horizontal'})