# Bike Rental
### **Task 1: Data Analysis**

* Explored trends in **daily bike rentals**.
* Key insights:

  * **More rentals** on **working days** compared to weekends.
  * Rentals peak during **spring and summer** seasons.
  * **Temperature and clear weather** positively affect rental counts.
  * Rainy and humid days show a noticeable drop in rentals.

---

### **Task 2: Predictive Modeling**

* Built models to predict daily rental counts using features like:

  * **Season, weather, temperature, humidity, windspeed**.
* **Random Forest Regressor** and **Gradient Boosting** performed well.
* Temperature and weather conditions were the most important predictors.

In [246]:
!pip install numpy



In [248]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import warnings

warnings.filterwarnings('ignore')

In [250]:
# Load the data
data = pd.read_csv("C:/New project CDA/Data/day.csv")
data

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,1,0,1,0,2,1,1,0.200000,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.229270,0.436957,0.186900,82,1518,1600
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
726,727,2012-12-27,1,1,12,0,4,1,2,0.254167,0.226642,0.652917,0.350133,247,1867,2114
727,728,2012-12-28,1,1,12,0,5,1,2,0.253333,0.255046,0.590000,0.155471,644,2451,3095
728,729,2012-12-29,1,1,12,0,6,0,2,0.253333,0.242400,0.752917,0.124383,159,1182,1341
729,730,2012-12-30,1,1,12,0,0,0,1,0.255833,0.231700,0.483333,0.350754,364,1432,1796


In [252]:
data.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600


In [254]:
# Check structure
print(data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   instant     731 non-null    int64  
 1   dteday      731 non-null    object 
 2   season      731 non-null    int64  
 3   yr          731 non-null    int64  
 4   mnth        731 non-null    int64  
 5   holiday     731 non-null    int64  
 6   weekday     731 non-null    int64  
 7   workingday  731 non-null    int64  
 8   weathersit  731 non-null    int64  
 9   temp        731 non-null    float64
 10  atemp       731 non-null    float64
 11  hum         731 non-null    float64
 12  windspeed   731 non-null    float64
 13  casual      731 non-null    int64  
 14  registered  731 non-null    int64  
 15  cnt         731 non-null    int64  
dtypes: float64(4), int64(11), object(1)
memory usage: 91.5+ KB
None


In [256]:
data["season"].unique()

array([1, 2, 3, 4], dtype=int64)

In [258]:
data["weekday"].unique() 

array([6, 0, 1, 2, 3, 4, 5], dtype=int64)

In [260]:
# unique value in  workingday
data["workingday"].unique()

array([0, 1], dtype=int64)

In [262]:
# valuecounts in  weathersit
data["weathersit"].value_counts()

weathersit
1    463
2    247
3     21
Name: count, dtype: int64

In [264]:
data.describe()

Unnamed: 0,instant,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
count,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0
mean,366.0,2.49658,0.500684,6.519836,0.028728,2.997264,0.683995,1.395349,0.495385,0.474354,0.627894,0.190486,848.176471,3656.172367,4504.348837
std,211.165812,1.110807,0.500342,3.451913,0.167155,2.004787,0.465233,0.544894,0.183051,0.162961,0.142429,0.077498,686.622488,1560.256377,1937.211452
min,1.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.05913,0.07907,0.0,0.022392,2.0,20.0,22.0
25%,183.5,2.0,0.0,4.0,0.0,1.0,0.0,1.0,0.337083,0.337842,0.52,0.13495,315.5,2497.0,3152.0
50%,366.0,3.0,1.0,7.0,0.0,3.0,1.0,1.0,0.498333,0.486733,0.626667,0.180975,713.0,3662.0,4548.0
75%,548.5,3.0,1.0,10.0,0.0,5.0,1.0,2.0,0.655417,0.608602,0.730209,0.233214,1096.0,4776.5,5956.0
max,731.0,4.0,1.0,12.0,1.0,6.0,1.0,3.0,0.861667,0.840896,0.9725,0.507463,3410.0,6946.0,8714.0


In [266]:
data.shape

(731, 16)

In [268]:
data.isnull().sum()

instant       0
dteday        0
season        0
yr            0
mnth          0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64

In [270]:
data.dtypes

instant         int64
dteday         object
season          int64
yr              int64
mnth            int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
dtype: object

In [272]:
data.describe(include=['O'])

Unnamed: 0,dteday
count,731
unique,731
top,2011-01-01
freq,1


In [274]:
data.rename(columns={'cnt': 'count'}, inplace=True)

In [276]:
data = data.drop('instant', axis=1)

### Exploratory Data Analysis  (EDA)

In [279]:
!pip install sweetviz



In [288]:
import sweetviz as sv            # library for univariant analysis
if 'count' in data.columns:
    data = data.drop(columns=['count'])

my_report = sv.analyze(data)
my_report.show_html()

                                             |                                             | [  0%]   00:00 ->…

Report SWEETVIZ_REPORT.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.


In [None]:
plt.figure(figsize=(20,20))
plotnumber=1    #counter variable
for column in data:
    ax=plt.subplot(4,4,plotnumber)
    sns.histplot(data[column])
    plt.xlabel(column,fontsize=20)
    plt.ylabel('Count',fontsize=20)
    plotnumber+=1
plt.tight_layout 

In [None]:
plt.figure(figsize=(20, 20))
plotnumber = 1    # Counter variable

for column in data:
    ax = plt.subplot(4, 4, plotnumber)
    sns.scatterplot(data=data, x=column, y="count")
    plt.xlabel(column, fontsize=20)
    plt.ylabel('Count', fontsize=20)
    plotnumber += 1

plt.title('Scatter Plot', fontsize=20)
plt.tight_layout()
plt.show()


In [None]:
data = data.drop(['dteday'], axis=1)

In [None]:
data=data.drop(['casual','registered'],axis=1)

In [None]:
#how data is distributed for every column
plt.figure(figsize=(20,25), facecolor='white')#defining  canvas size
plotnumber = 1 #maintian count for graph

for column in data:
    if plotnumber<=16 :# as there are 9 columns in the data
        ax = plt.subplot(4,4,plotnumber)# plotting 9 graphs (14-rows,4-columns) ,plotnumber is for count 
        sns.distplot(data[column])#plotting dist plot to know distribution
        plt.xlabel(column,fontsize=20)
    plotnumber+=1
plt.show()

In [None]:
plt.figure(figsize=(20,25), facecolor='white')
plotnumber = 1

for column in data:
    if plotnumber<=16:
        ax = plt.subplot(4,4,plotnumber)
        sns.boxplot(data[column]) 
        plt.xlabel(column,fontsize=20)
        
    plotnumber+=1
plt.show()

In [None]:
plt.figure(figsize=(30, 30))
sns.heatmap(data.corr(),annot=True)

In [None]:
print(data.columns)

data = data.drop(['atemp'], axis=1, errors='ignore')
data.columns = data.columns.str.strip()


In [None]:
plt.figure(figsize=(30, 30))
sns.heatmap(data.corr(),annot=True)

In [None]:
x=data.drop(["count", ], axis=1, )
x

In [None]:
y=data["count"]

In [None]:
## creating training and testing data
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split (x,y)
#splitting data into train and test

### Linear Regression

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(x_train, y_train)
lr = model.score(x_train, y_train)
print('Accuracy of the model:', lr)


In [None]:
#Accuracy of the trained model
lr = model.score(x_train,y_train)
print('Accuracy of the model:',lr)

In [None]:
y_predict = model.predict(x_test) #prediction

In [None]:
from sklearn import preprocessing,metrics,linear_model
from sklearn.model_selection import cross_val_score,cross_val_predict,train_test_split

r2_scores=cross_val_score(model,x_train,y_train,cv=3)
print('R-squared scores:',np.average(r2_scores))

In [None]:
predict=cross_val_predict(model,x_train,y_train,cv=3)
predict

### R-squared and mean squared error score

In [None]:
r2_scores=cross_val_score(model,x_train,y_train,cv=3)
print('R-squared scores:',np.average(r2_scores))

In [None]:
## calculation of adjusted r2 score
adjusted_r2 = 1-(1-0.74)*(183-1)/(183-12-1)
adjusted_r2

In [None]:
from sklearn.metrics import mean_absolute_error
import math
mse=np.square(np.subtract(y_test,y_predict)).mean()
rmse=math.sqrt(mse)
mae=mean_absolute_error(y_test,y_predict)
print('Root mean square error :',rmse)
print('Mean absolute error :',mae)

### Rigid regression

In [None]:
from sklearn.linear_model import Ridge

In [None]:
model2 = Ridge()  # Set the regularization parameter (alpha)
model2.fit(x_train, y_train)

In [None]:
y_pred2 = model2.predict(x_test)
y_pred = model2.predict(x_train)

In [None]:
#Accuracy of the trained model
lr2=model2.score(x_train,y_train)
print('Accuracy of the model:',lr2)

In [None]:
from sklearn.metrics import r2_score,mean_squared_error,mean_absolute_error
r2score1=r2_score(y_test,y_pred2)

In [None]:

r2score1

In [None]:
adjusted_r2 = 1-(1-0.74)*(183-1)/(183-12-1)
adjusted_r2

In [None]:
from sklearn.metrics import mean_absolute_error
import math
mse1=np.square(np.subtract(y_test,y_pred2)).mean()
rmse1=math.sqrt(mse)
mae1=mean_absolute_error(y_test,y_pred2)
print('Root mean square error :',rmse1)
print('Mean absolute error :',mae1)

### Lasso Regression

In [143]:
from sklearn.linear_model import Lasso
ls = Lasso()  # Set the regularization parameter (alpha)
ls.fit(x_train, y_train)

In [145]:
y_1 = ls.predict(x_test)
y_11 = ls.predict(x_train)

In [147]:
ls1=ls.score(x_train,y_train)
print('Accuracy of the model:',ls1)

Accuracy of the model: 0.8096150544817893


In [149]:
r2=r2_score(y_test,y_1)

In [151]:
r2

0.753442014259956

In [153]:
from sklearn.metrics import mean_absolute_error
import math
mse2=np.square(np.subtract(y_test,y_1)).mean()
rmse2=math.sqrt(mse)
mae2=mean_absolute_error(y_test,y_1)
print('Root mean square error :',rmse2)
print('Mean absolute error :',mae2)

Root mean square error : 888.902637195385
Mean absolute error : 675.6621979792852


### Support Vector Regression

In [156]:
from sklearn.svm import SVR

In [158]:
sv = SVR()  # Set the kernel, regularization parameter C, and epsilon
sv.fit(x_train, y_train)

In [160]:
y_7= sv.predict(x_test)

In [162]:
from sklearn.metrics import r2_score,mean_squared_error,mean_absolute_error
r8=r2_score(y_test,y_7)

In [164]:
r8

0.014455676224671699

In [166]:
from sklearn.metrics import mean_absolute_error
import math
rmse3 = math.sqrt(metrics.mean_squared_error(y_test,y_7))
print('Root mean square error :',rmse3)
#Mean absolute error
mae3=metrics.mean_absolute_error(y_test,y_7)
print('Mean absolute error :',mae3)

Root mean square error : 1781.2999800738623
Mean absolute error : 1431.6664918195816


### DecisionTreeRegressor

In [169]:
from sklearn.tree import DecisionTreeRegressor

In [171]:
dt = DecisionTreeRegressor()  # Set the maximum depth of the tree
dt.fit(x_train, y_train)

In [173]:

y_3 = dt.predict(x_test)

In [175]:
from sklearn.metrics import r2_score,mean_squared_error,mean_absolute_error
r3=r2_score(y_test,y_3)

In [177]:
r3

0.73664750206926

In [179]:
import math
print(mean_squared_error(y_test,y_3))
print(math.sqrt(mean_squared_error(y_test,y_3)))

847881.9836065574
920.8050736212075


### KNeighborsRegressor

In [182]:
from sklearn.neighbors import KNeighborsRegressor

In [184]:
knn = KNeighborsRegressor(n_neighbors=5)  # Set the number of neighbors
knn.fit(x_train, y_train)

In [186]:
knn = KNeighborsRegressor(n_neighbors=5)  # Set the number of neighbors
knn.fit(x_train, y_train)

In [188]:
knn_score=knn.score(x_train,y_train)
print('Accuracy of model:',knn_score)

Accuracy of model: 0.8446651547773454


In [190]:
y_8= knn.predict(x_test)

In [192]:
r2score_knn=r2_score(y_test,y_8)

In [194]:
r2score_knn

0.7188747153625348

In [196]:

#Accuracy of test data
rmse_knn=math.sqrt(metrics.mean_squared_error(y_test,y_8))
#Mean absolute error
mae_knn=metrics.mean_absolute_error(y_test,y_8)
print('Root mean square error :',rmse_knn)
print('Mean absolute error :',mae_knn)

Root mean square error : 951.3688699846955
Mean absolute error : 718.0601092896175


In [198]:
from sklearn.ensemble import RandomForestRegressor
MR=RandomForestRegressor(n_estimators=100)
MR.fit(x_train,y_train)

In [199]:
rf_score=MR.score(x_train,y_train)
print('Accuracy of the model :', rf_score)

Accuracy of the model : 0.984728173757349


In [200]:
predict=cross_val_predict(MR,x_train,y_train,cv=3)
predict

array([4799.19, 6135.19, 4734.79,  923.17, 1100.45, 4895.45, 3535.81,
       7525.  , 4987.45, 1422.8 , 2901.8 , 4013.14, 2430.38, 3631.78,
       4469.9 , 5610.75, 4969.57, 6680.92, 6590.44, 3667.53, 1851.93,
       5810.89, 7172.46, 1807.35, 4069.21, 2510.75, 2426.28, 6616.23,
       3765.23, 4466.74, 2064.24, 5004.75, 2398.48, 3902.96, 5222.11,
       1906.2 , 6180.18, 1996.91, 4405.68, 3544.3 , 1970.65, 4840.37,
       4418.21, 6770.19, 7065.96, 4423.64, 4562.  , 4303.46, 5242.53,
       4634.83, 7028.22, 7601.26, 4275.83, 3382.32, 3964.53, 2191.75,
       5234.29, 2031.16, 2119.7 , 2165.7 , 6558.99, 4777.73, 3184.95,
       6495.92, 3973.18, 2225.8 , 2261.3 , 7951.98, 1168.53, 1399.55,
       2679.1 , 7587.65, 4563.96, 4906.36, 2185.56, 6630.45, 4642.55,
       3095.31, 4415.31, 4458.8 , 3241.52, 6698.94, 3794.79, 7114.71,
       6958.47, 6513.18, 4597.42, 6928.01, 7221.6 , 6882.26, 7676.8 ,
       1875.52, 3588.74, 7428.26, 6034.99, 4388.45, 5390.95, 7529.76,
       4411.28, 5121

In [203]:
r2_scores_rt = cross_val_score(MR, x_train, y_train, cv=3)
print('R-squared scores :',np.average(r2_scores_rt))

R-squared scores : 0.8792194334584719


In [204]:
rf_pred_rt=MR.predict(x_test)

In [205]:
from sklearn.metrics import r2_score,mean_squared_error,mean_absolute_error
r2score_rt=r2_score(y_test,rf_pred_rt)
r2score_rt

0.8385539310538062

In [207]:
rmse_rt = math.sqrt(metrics.mean_squared_error(y_test,rf_pred_rt))
print('Root mean square error :',rmse_rt)
#Mean absolute error
mae_rt=metrics.mean_absolute_error(y_test,rf_pred_rt)
print('Mean absolute error :',mae_rt)

Root mean square error : 720.9625730056056
Mean absolute error : 500.662349726776


#### Root mean square error : 718.4630808277961 ####
#### Mean absolute error : 494.68327868852464 ####