# BIKE SHARING CASE STUDY

## <font color="blue">Introduction
>Bike sharing system is a shared transport service, in which bikes are made available for shared use to a individuals on a short term basis for a price or fee. Many bike sharing systems allows customers to borrow a bike from one dock and return it at another dock of a same system. There are many bike sharing companies like BLOOM, OFO, Mobike, BoomBikes, Bounce, joyride, evemo, etc. in the world offering services in major metro cities.

## <font color="blue">Problem Statement
> A US bike sharing provider BoomBikes recently sufferered from revenue dips due to the ongoing corona pandemic.Because of the unexpected loss, the company is finding it difficult to sustain in the current market scenario. so it has decided to come up with the mindful business plan so that they can able to accelerate their revenue once the pandemic comes to an end and also boost the economy.
>    
> The company BoomBikes wanted to improve their business, they actually wanted to understand the demand of the shared bikes among the people once the pandemic comes to an end. They have planned this to prepare themselves to cater to the people's need and need to stand out among the competetors and to make huge profits.
>

## <font color="blue">Business Objectives

>The company specifically need to understand the factors affecting the demand of these shared bikes. They wants to find:
> - Which variables are significant in predicting the demand of the shared bikes
> - How well the variables describe the demand of the shared bikes
>    
>We are required to model the demand for shared bikes with the available independent variables. It will be used by the management to understand how exactly the demand varies with different features

## <font color="blue">Data Understanding
    
>### 'day.csv'
> It contains the meterological informations, timing details and people's styles.
>    
> - instant: record index
> - dteday : date
> - season : season (1:spring, 2:summer, 3:fall, 4:winter)
> - yr : year (0: 2018, 1:2019)
> - mnth : month ( 1 to 12)
> - holiday : weather day is a holiday or not
> - weekday : day of the week
> - workingday : if day is neither weekend nor holiday is 1, otherwise is 0.
> + weathersit : 
	1. Clear, Few clouds, Partly cloudy, Partly cloudy
    2. Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
	3. Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
	4. Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
> - temp : temperature in Celsius
> - atemp: feeling temperature in Celsius
> - hum: humidity
> - windspeed: wind speed
> - casual: count of casual users
> - registered: count of registered users
> - cnt: count of total rental bikes including both casual and registered

## <font color="blue">Data Preparation

### Importing libraries

In [1]:
# Importing basic libraries required
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels
import statsmodels.api as sm
import sklearn
import warnings
warnings.filterwarnings("ignore")

### Customizing Settings

In [2]:
# Customizing the settings
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", 20)

### Importing Data

In [3]:
# Importing the data
bike = pd.read_csv("day.csv")
bike.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,01-01-2018,1,0,1,0,6,0,2,14.110847,18.18125,80.5833,10.749882,331,654,985
1,2,02-01-2018,1,0,1,0,0,0,2,14.902598,17.68695,69.6087,16.652113,131,670,801
2,3,03-01-2018,1,0,1,0,1,1,1,8.050924,9.47025,43.7273,16.636703,120,1229,1349
3,4,04-01-2018,1,0,1,0,2,1,1,8.2,10.6061,59.0435,10.739832,108,1454,1562
4,5,05-01-2018,1,0,1,0,3,1,1,9.305237,11.4635,43.6957,12.5223,82,1518,1600


### Data Cleaning

In [4]:
# Analyzing the shape of the data
print("The shape of the bike sharing data is",bike.shape)

The shape of the bike sharing data is (730, 16)


In [5]:
# Checking the informations
bike.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 730 entries, 0 to 729
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   instant     730 non-null    int64  
 1   dteday      730 non-null    object 
 2   season      730 non-null    int64  
 3   yr          730 non-null    int64  
 4   mnth        730 non-null    int64  
 5   holiday     730 non-null    int64  
 6   weekday     730 non-null    int64  
 7   workingday  730 non-null    int64  
 8   weathersit  730 non-null    int64  
 9   temp        730 non-null    float64
 10  atemp       730 non-null    float64
 11  hum         730 non-null    float64
 12  windspeed   730 non-null    float64
 13  casual      730 non-null    int64  
 14  registered  730 non-null    int64  
 15  cnt         730 non-null    int64  
dtypes: float64(4), int64(11), object(1)
memory usage: 91.4+ KB


In [6]:
# Checking for null values in all the columns
round((bike.isnull().sum()/len(bike))*100,2).sort_values(ascending=False)

instant       0.0
dteday        0.0
season        0.0
yr            0.0
mnth          0.0
holiday       0.0
weekday       0.0
workingday    0.0
weathersit    0.0
temp          0.0
atemp         0.0
hum           0.0
windspeed     0.0
casual        0.0
registered    0.0
cnt           0.0
dtype: float64

There are no missing values in any of the rows or columns of the dataset

In [7]:
# Checking the unique values in the columns of the dataset
bike.apply(lambda x:len(x.unique()))

instant       730
dteday        730
season          4
yr              2
mnth           12
holiday         2
weekday         7
workingday      2
weathersit      3
temp          498
atemp         689
hum           594
windspeed     649
casual        605
registered    678
cnt           695
dtype: int64

In [58]:
# Checking for duplicates
bike[bike.duplicated()]

Features,Season,Year,Month,Holiday,Weekday,Workingday,Weather,Temperature,Humidity,Windspeed,Count
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1


No duplicate records are there in the dataset

### Removing Redundant and unwanted columns

#### <font color="blue">Unwanted columns

#### 1. instant column:
> - we can drop this column because it is just a index of the rows.

In [8]:
# Dropping instant column
bike.drop("instant", axis=1, inplace=True)

#### <font color="blue">Redundant columns
#### 1. dteday column:
> - Already all the required informations are extracted from this column and new columns are created like year, month, day and the details of the day. Thus we can drop this column too.
#### 2. atemp column:
> - The atemp column(feel like temperature) is depend on the air temperature. thus it will be highly correlated. Thus we can drop this column.
#### 3. casual and registered columns:
> - Both these columns contains count of bike booked by the customers who casually using without registering and those who have registered already. However we have an another column which have total counts of both which we will be using as dependent variable. so thus we can drop these two columns.

In [9]:
# Dropping dteday, atemp, causal, and registered column
bike.drop(["dteday","atemp","casual","registered"], axis=1,inplace=True)

In [10]:
bike.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 730 entries, 0 to 729
Data columns (total 11 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   season      730 non-null    int64  
 1   yr          730 non-null    int64  
 2   mnth        730 non-null    int64  
 3   holiday     730 non-null    int64  
 4   weekday     730 non-null    int64  
 5   workingday  730 non-null    int64  
 6   weathersit  730 non-null    int64  
 7   temp        730 non-null    float64
 8   hum         730 non-null    float64
 9   windspeed   730 non-null    float64
 10  cnt         730 non-null    int64  
dtypes: float64(3), int64(8)
memory usage: 62.9 KB


In [33]:
# Lets rename the column name for our convenience
bike.rename(columns={"yr":"Year","mnth":"Month","weathersit":"Weather","temp":"Temperature","hum":"Humidity","cnt":"Count"},
           inplace=True)
bike.columns = bike.columns.str.strip().str.title()
bike.index.name = "Index"
bike.columns.name = "Features"

In [34]:
bike.head()

Features,Season,Year,Month,Holiday,Weekday,Workingday,Weather,Temperature,Humidity,Windspeed,Count
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0,1,0,1,0,6,0,2,14.110847,80.5833,10.749882,985
1,1,0,1,0,0,0,2,14.902598,69.6087,16.652113,801
2,1,0,1,0,1,1,1,8.050924,43.7273,16.636703,1349
3,1,0,1,0,2,1,1,8.2,59.0435,10.739832,1562
4,1,0,1,0,3,1,1,9.305237,43.6957,12.5223,1600


### Handling Data Types

In [35]:
# Checking the data types of the dataset
bike.dtypes

Features
Season           int64
Year             int64
Month            int64
Holiday          int64
Weekday          int64
Workingday       int64
Weather          int64
Temperature    float64
Humidity       float64
Windspeed      float64
Count            int64
dtype: object

We know that the columns Season, Year, Month, Holiday, Weekday, Workingday, Weather are categorical variables but expressed as integer. thus we need to change it into categorical types. So that we can create dummy variables.

In [43]:
# Converting the columns into categorical types
bike_cat = ["Season","Year","Month","Holiday","Weekday","Workingday","Weather"]

for cols in bike_cat:
    bike[cols] = bike[cols].astype("category")
    
print(bike.dtypes)

Features
Season         category
Year           category
Month          category
Holiday        category
Weekday        category
Workingday     category
Weather        category
Temperature     float64
Humidity        float64
Windspeed       float64
Count             int64
dtype: object


### Sanity Checks

In [50]:
# The holidays will not be a working day.
bike[(bike["Holiday"] == 1) & (bike["Workingday"] == 1) ]

Features,Season,Year,Month,Holiday,Weekday,Workingday,Weather,Temperature,Humidity,Windspeed,Count
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1


No holidays are represented as working day, so no sanity problems.

### Exploratory Data Analysis