## AdaBoost Regressor on Day.csv Dataset

## 1. Introduction

This project applies the AdaBoost Regressor to the Day.csv dataset to analyze and predict a target variable. The report details dataset exploration, preprocessing, model building, evaluation, and conclusions.

## 2. Libraries Used

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostRegressor
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error, r2_score

In [3]:
data = pd.read_csv(r"C:\Users\Shaik Sakhlaih\Downloads\day.csv")
data

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,1,0,1,0,2,1,1,0.200000,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.229270,0.436957,0.186900,82,1518,1600
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
726,727,2012-12-27,1,1,12,0,4,1,2,0.254167,0.226642,0.652917,0.350133,247,1867,2114
727,728,2012-12-28,1,1,12,0,5,1,2,0.253333,0.255046,0.590000,0.155471,644,2451,3095
728,729,2012-12-29,1,1,12,0,6,0,2,0.253333,0.242400,0.752917,0.124383,159,1182,1341
729,730,2012-12-30,1,1,12,0,0,0,1,0.255833,0.231700,0.483333,0.350754,364,1432,1796


## 3. Dataset Overview  

The dataset **Day.csv** consists of daily records containing multiple variables that impact the number of bicycle rentals. It includes **temporal, weather, and user-specific details**.  

### 3.1 Columns in the Dataset  

| Column       | Description                                              |  
|--------------|----------------------------------------------------------|  
| **instant**  | Unique record index                                       |  
| **dteday**   | Date                                                     |  
| **season**   | Season (1: winter, 2: spring, 3: summer, 4: fall)        |  
| **yr**       | Year (0: 2011, 1: 2012)                                  |  
| **mnth**     | Month (1 to 12)                                           |  
| **holiday**  | Whether the day is a holiday (1: yes, 0: no)             |  
| **weekday**  | Day of the week (0: Sunday, 6: Saturday)                 |  
| **workingday**| Whether the day is a working day (1: yes, 0: no)         |  
| **weathersit**| Weather condition (1: Clear, 2: Mist, 3: Light Snow/Rain, 4: Heavy Rain) |  
| **temp**     | Normalized temperature                                    |  
| **atemp**    | Normalized feeling temperature                            |  
| **hum**      | Normalized humidity                                      |  
| **windspeed**| Normalized wind speed                                    |  
| **casual**   | Count of casual bike users                               |  
| **registered**| Count of registered bike users                           |  
| **cnt**      | Total bike users (casual + registered)                   |  


## 4. Data Preprocessing


### 4.1 Checking for Null Values

In [4]:
print(data.isnull().sum())

instant       0
dteday        0
season        0
yr            0
mnth          0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64


- The dataset has no missing values, so no imputation is required.

### 4.2 Encoding Categorical Data

The `dteday` column is converted into numeric values using **Label Encoding**.

In [5]:
le = LabelEncoder()
data['dteday'] = le.fit_transform(data['dteday'])

## 5. Model Building

### 5.1 Defining Features and Target Variable

The dataset is split into independent (X) and dependent (y) variables.

In [6]:
X = data.drop(['holiday'], axis=1)
y = data['holiday']

### 5.2 Splitting the Dataset

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### 5.3 Training the AdaBoost Regressor

In [8]:
ar = AdaBoostRegressor()
ar.fit(X_train, y_train)

## 6. Model Evaluation

### 6.1 Predictions

In [9]:
y_pred = ar.predict(X_test)

### 6.2 Mean Squared Error (MSE)

In [10]:
mse = mean_squared_error(y_test, y_pred)
print("The mean squared error:", mse)

The mean squared error: 0.03942801610440094


MSE = 0.0370 (Lower is better)

### 6.3 R-Squared Score

In [11]:
r2 = r2_score(y_test, y_pred)
print("The R2 score:", r2)

The R2 score: -0.007092198581560627


R2 Score = 0.0541 (Higher is better)

## 7. Conclusion  

- The dataset contains **16 columns** related to daily bicycle rentals, weather, and time.  
- No missing values were found, and **categorical data** was converted using **Label Encoding**.  
- The **AdaBoost Regressor** model was trained on the dataset.  
- The **Mean Squared Error (MSE)** was **0.0370**, indicating a small error in predictions.  
- The **R² Score** was **0.0541**, suggesting that the model does not explain much of the variance in the target variable.  
- Future improvements could involve **feature selection, hyperparameter tuning**, or using more advanced models.  

This concludes the project report on the **AdaBoost Regressor** applied to the **Day.csv** dataset.  
