<h2 align='center'> Bike Sharing Dataset </h2>

#### Import pandas, numpy, seaborn, matplotlib.pyplot packages

In [1]:
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns 

from warnings import filterwarnings
filterwarnings('ignore')

#### Importing  Dataset

In [7]:
df = pd.read_csv('Datasets/hour.csv')
df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
1,2,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
2,3,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32
3,4,2011-01-01,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10,13
4,5,2011-01-01,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1,1


**It is a Regression Problem - the Dependent variable is cnt(ie, count of total rental bikes)**


* **Shape of Dataset**

In [3]:
df.shape

(17379, 17)

* **The dataset has total 17379 rows & 17 attributes** 

* **Checking Information of Dataset**

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17379 entries, 0 to 17378
Data columns (total 17 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   instant     17379 non-null  int64  
 1   dteday      17379 non-null  object 
 2   season      17379 non-null  int64  
 3   yr          17379 non-null  int64  
 4   mnth        17379 non-null  int64  
 5   hr          17379 non-null  int64  
 6   holiday     17379 non-null  int64  
 7   weekday     17379 non-null  int64  
 8   workingday  17379 non-null  int64  
 9   weathersit  17379 non-null  int64  
 10  temp        17379 non-null  float64
 11  atemp       17379 non-null  float64
 12  hum         17379 non-null  float64
 13  windspeed   17379 non-null  float64
 14  casual      17379 non-null  int64  
 15  registered  17379 non-null  int64  
 16  cnt         17379 non-null  int64  
dtypes: float64(4), int64(12), object(1)
memory usage: 2.3+ MB


* Dataset has 4 Float columns, 12 integer columns and 1 object (string) Columns

## Data preprocessing

Pre-processing techniques include:

* 1.Handling Missing Data
* 2.Removing Outliers
* 3.Encoding Categorical Text Variables
* 4.Feature Scaling

#### Check Null Values

In [8]:
df.isnull().sum()

instant       0
dteday        0
season        0
yr            0
mnth          0
hr            0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64

- There is no Null Values

#### Now droping irrelevent columns

In [9]:
df = df.drop(['instant','dteday'], axis=1)
df.head()

Unnamed: 0,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
1,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
2,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32
3,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10,13
4,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1,1


- There is No Object variable so, no need for label encoding/one hot encoding

### Spliting Data into x and y

In [10]:
df.head()

Unnamed: 0,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
1,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
2,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32
3,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10,13
4,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1,1


In [11]:
x = df.drop(['cnt'], axis=1)
x.head()

Unnamed: 0,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered
0,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13
1,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32
2,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27
3,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10
4,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1


In [12]:
y = df.iloc[:,-1:]
y.head()

Unnamed: 0,cnt
0,16
1,40
2,32
3,13
4,1


### Spliting data into test and training set

In [13]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.30)

In [14]:
print("Dataset shape:", df.shape)
print("Input Features shape: ", X_train.shape, y_train.shape)
print("Output Features shape: ", X_test.shape, y_test.shape)

Dataset shape: (17379, 15)
Input Features shape:  (12165, 14) (12165, 1)
Output Features shape:  (5214, 14) (5214, 1)


### Applying Linear Regression

In [15]:
from sklearn.linear_model import LinearRegression
lin = LinearRegression()

#### Fitting model

In [16]:
lin.fit(X_train,y_train)

LinearRegression()

#### Predicting values

In [17]:
pred = lin.predict(X_test)

In [18]:
pred

array([[289.],
       [461.],
       [337.],
       ...,
       [487.],
       [170.],
       [108.]])

### Evaluation Matrix

In [19]:
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error

In [20]:
print('Mean Absolute Error:', mean_absolute_error(y_test, pred))
print('Mean Squared Error:', mean_squared_error(y_test, pred))
print('Root Mean Squared Error:', np.sqrt(mean_squared_error(y_test, pred)))
print('R squared Error:', r2_score(y_test, pred))

Mean Absolute Error: 1.0648290841326865e-13
Mean Squared Error: 2.06387408709926e-26
Root Mean Squared Error: 1.436618977703991e-13
R squared Error: 1.0
