## Problem Statement:
                              The objective of this project is to develop a predictive model that can accurately estimate the demand for bike rentals during any given hour, while ensuring a stable supply of rental bikes in all conditions. The dataset provided contains rental bike information such as the date, hour, weather conditions, temperature, working days, season, humidity and the number of bikes rented.
                            The model will be evaluated based on its ability to predict the number of bikes rented during a given hour with a high level of accuracy. This project aims to help bike rental companies optimize their bike supply by forecasting demand and improving customer satisfaction.

## Attribute information of Data:

 - Instant: Index number
 - Dteday :	Date (Format: YYYY-MM-DD)
 - Season : Season Name(Winter, Spring, Summer, Autumn)
 - Yr	  : Year
 - Month  : Month (1-12)
 - Hr	  : Hour(0 to 23)
 - Holiday: Whether the holiday is there or not
 - Weekday:	Day of the week
 - Workingday: Whether it is a working day or not
 - Weathersit: Weather situation(Clear/Mist)
 - Temp	     : Temperature in Celsius
 - Atemp	 : Normalized feeling temperature
 - Hum	     : Normalized humidity. The Values are divided by 100
 - Windspeed :	Normalized Wind speed. Values are divided by 67(m/s)
 - Casual	 : Count of casual users
 - Registered: Number of registered users
 - Cnt	     : Count of total rental biked including both casual and registered


## Project Pipeline

### 
1.Import Necessary Packages(Libraries):Like Pandas,Numpy,Statmodels,Matplotlib,Seavorns,Sklearn & many more.Also,Streamlit for Model Deployement.

2.Data collection:

3.Data Preprocessing:Data Cleaning & Transforming the data into suitable format for machine learning algorithems.

4.Exploratory data analysis (EDA):Analyzing the data to gain insights into the relationships between the variables and identifying any outliers or missing values

5.Feature engineering:Creating new features from the existing data or transforming the existing features.

6.Model Selection:Selecting an appropriate machine learning algorithm such as Linear Regression, Random Forest, or Gradient Boosting, depending on the problem.

7.Splitting Data into Training & Testing.

8.Model Evalution:Evaluating the model's performance using various metrics such as MSE, RMSE, and R-squared.

9.Hyperparameter tuning: Tuning the hyperparameters of the machine learning algorithm to improve the model's performance

10.Selecting Model with best fit:

11.Import Pickle file of best performed ML Model.

12.Model Deployement:Perform Model deployement using .PKL file & Make a dashboard for Customer.

## 1.Import Necessary Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn import preprocessing, linear_model
from sklearn.metrics import r2_score, mean_squared_error, accuracy_score

import warnings
warnings.filterwarnings('ignore')

### 1.1 Importing Datset:

In [2]:
df=pd.read_csv('bike_rent.csv')
df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,01-01-2011,springer,2011,1,0,No,6,No work,Clear,0.24,0.2879,0.81,0,3,13,16
1,2,01-01-2011,springer,2011,1,1,No,6,No work,Clear,0.22,0.2727,0.8,0,8,32,40
2,3,01-01-2011,springer,2011,1,2,No,6,No work,Clear,0.22,0.2727,?,0,5,27,32
3,4,01-01-2011,springer,2011,1,3,No,6,No work,Clear,0.24,0.2879,0.75,0,3,10,13
4,5,01-01-2011,springer,2011,1,4,No,6,No work,Clear,0.24,0.2879,0.75,0,0,1,1


In [3]:
df.tail()

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
17374,17375,31-12-2012,springer,2012,12,19,No,1,Working Day,Mist,0.26,0.2576,0.6,0.1642,11,108,119
17375,17376,31-12-2012,springer,2012,12,20,No,1,Working Day,Mist,0.26,0.2576,0.6,0.1642,8,81,89
17376,17377,31-12-2012,springer,2012,12,21,No,1,Working Day,Clear,?,0.2576,0.6,0.1642,7,83,90
17377,17378,31-12-2012,springer,2012,12,22,No,1,Working Day,Clear,0.26,0.2727,0.56,0.1343,13,48,61
17378,17379,31-12-2012,springer,2012,12,23,No,1,?,Clear,0.26,0.2727,0.65,0.1343,12,37,49


In [4]:
Df=df.rename(columns={'cnt':'Bike Count','temp':'Temp','hum':'Hum',
                      'Windspeed':'Windspeed','dteday':'Date'})
Df.head()

Unnamed: 0,instant,Date,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,Temp,atemp,Hum,windspeed,casual,registered,Bike Count
0,1,01-01-2011,springer,2011,1,0,No,6,No work,Clear,0.24,0.2879,0.81,0,3,13,16
1,2,01-01-2011,springer,2011,1,1,No,6,No work,Clear,0.22,0.2727,0.8,0,8,32,40
2,3,01-01-2011,springer,2011,1,2,No,6,No work,Clear,0.22,0.2727,?,0,5,27,32
3,4,01-01-2011,springer,2011,1,3,No,6,No work,Clear,0.24,0.2879,0.75,0,3,10,13
4,5,01-01-2011,springer,2011,1,4,No,6,No work,Clear,0.24,0.2879,0.75,0,0,1,1


In [6]:
Df.shape

(17379, 17)

In [7]:
Df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17379 entries, 0 to 17378
Data columns (total 17 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   instant     17379 non-null  int64 
 1   Date        17379 non-null  object
 2   season      17379 non-null  object
 3   yr          17379 non-null  object
 4   mnth        17379 non-null  object
 5   hr          17379 non-null  int64 
 6   holiday     17379 non-null  object
 7   weekday     17379 non-null  int64 
 8   workingday  17379 non-null  object
 9   weathersit  17379 non-null  object
 10  Temp        17379 non-null  object
 11  atemp       17379 non-null  object
 12  Hum         17379 non-null  object
 13  windspeed   17379 non-null  object
 14  casual      17379 non-null  object
 15  registered  17379 non-null  object
 16  Bike Count  17379 non-null  int64 
dtypes: int64(4), object(13)
memory usage: 2.3+ MB


##### We can see from the above that most of the data present in the dataset are of Object type or string type, and there are no null values present. 

In [11]:
len(Df[Df.duplicated()])

0

#####       
There is no Duplicate values

In [12]:
Df.isnull().sum()

instant       0
Date          0
season        0
yr            0
mnth          0
hr            0
holiday       0
weekday       0
workingday    0
weathersit    0
Temp          0
atemp         0
Hum           0
windspeed     0
casual        0
registered    0
Bike Count    0
dtype: int64

##### 
From the above we can see that there are no null values present in any column of the dataset.

In [13]:
Df.columns

Index(['instant', 'Date', 'season', 'yr', 'mnth', 'hr', 'holiday', 'weekday',
       'workingday', 'weathersit', 'Temp', 'atemp', 'Hum', 'windspeed',
       'casual', 'registered', 'Bike Count'],
      dtype='object')

In [14]:
Df.describe()

Unnamed: 0,instant,hr,weekday,Bike Count
count,17379.0,17379.0,17379.0,17379.0
mean,8690.0,11.546752,3.003683,189.463088
std,5017.0295,6.914405,2.005771,181.387599
min,1.0,0.0,0.0,1.0
25%,4345.5,6.0,1.0,40.0
50%,8690.0,12.0,3.0,142.0
75%,13034.5,18.0,5.0,281.0
max,17379.0,23.0,6.0,977.0
