 # Bike Sharing Demand Classification - Final Report

## Overview
The **Bike Sharing Dataset** contains hourly rental data from Capital Bikeshare in Washington, D.C., collected during the years 2011 and 2012. It is used to analyze and predict bike rental demand based on temporal and environmental features.

In this project, we reformulated the problem into a **multiclass classification task** by converting the continuous rental count into 3 demand categories: `Low`, `Medium`, and `High`.


## Objective
The goal of this classification project is to categorize the bike rental demand into Low, Medium, and High classes based on historical features using various machine learning models.

## Dataset Overview
- **Dataset**: [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset)

- Total Records: ~21,000

- Target Feature: cnt_class (Created by binning rental counts)

- Classes:

   - 0 → Low Demand

   - 1 → Medium Demand

   - 2 → High Demand
 



###  Original Features

| Feature        | Type      | Description |
|----------------|-----------|-------------|
| `instant`      | Integer   | Record index |
| `date`         | Date      | Date in `yyyy-mm-dd` |
| `season`       | Categorical (1-4) | 1: Spring, 2: Summer, 3: Fall, 4: Winter |
| `yr`           | Binary (0/1) | 0: 2011, 1: 2012 |
| `mnth`         | Integer (1-12) | Month |
| `hr`           | Integer (0-23) | Hour of the day |
| `holiday`      | Binary (0/1) | Whether the day is a holiday |
| `weekday`      | Integer (0-6) | Day of the week |
 | `workingday`   | Binary (0/1) | 1 if working day, 0 otherwise |
| `weathersit`   | Categorical (1-4) | 1: Clear, 2: Mist, 3: Light Snow/Rain, 4: Heavy Rain |
| `temp`         | Float     | Normalized temperature (0 to 1) |
| `atemp`        | Float     | Normalized "feels like" temperature |
| `hum`          | Float     | Normalized humidity |
| `windspeed`    | Float     | Normalized wind speed |
| `casual`       | Integer   | Count of casual users |
| `registered`   | Integer   | Count of registered users |
| `count`        | Integer   | Total bike rentals (target for regression) |


## Data Preprocessing


### Target Variable Transformation:

df['cnt_class'] = pd.cut(df['count'], bins=[0, 100, 300, df['count'].max()], 
                         labels=['Low', 'Medium', 'High'])
 - `cnt_class` — Categorical label for bike demand:
  - `Low` (Count ≤ 100)
  - `Medium` (100 < Count ≤ 300)
  - `High` (Count > 300)


### Label Encoding:

le = LabelEncoder()

df['cnt_class'] = le.fit_transform(df['cnt_class'])



### Categorical Encoding:
Applied one-hot encoding to:

Year (yr)

Month (mnth)

Weekday (weekday)

Working day (workingday)

Season (season)



### Balancing the Dataset:
Used SMOTE to address class imbalance.



### Feature Selection:
Applied SelectKBest to choose top features based on classification relevance.



### Feature Scaling:
Used StandardScaler to normalize features before modeling.



## Models Used


| Model               | Accuracy | Precision | Recall | F1 Score |
| ------------------- | -------- | --------- | ------ | -------- |
| Logistic Regression | 0.63     | 0.63      | 0.63   | 0.63     |
| Decision Tree       | 0.84     | 0.84      | 0.84   | 0.84     |
| Random Forest       | 0.86     | 0.86      | 0.86   | 0.86     |
| AdaBoost            | 0.65     | 0.69      | 0.65   | 0.65     |
| Gradient Boosting   | 0.83     | 0.83      | 0.83   | 0.83     |


✅ Best Model: Random Forest Classifier



### Hyperparameter Tuning
- GridSearchCV was used to optimize Random Forest hyperparameters.
- Improved model slightly with best estimator settings


Tuned Random Forest Evaluation:
Accuracy: 0.8637763563136698




## Evaluation Summary
Random Forest performed best with 83% accuracy, indicating a strong ability to classify rental demand.

Feature Importance: Factors such as hr (hour), temp, humidity, and categorical time-based features (month, weekday) contributed significantly.

Dimensionality Reduction: PCA and t-SNE helped visualize class separation, validating feature selection and class distribution.

##  Pipeline Deployment
A complete Pipeline was created including:

Feature selection

Scaling

Model fitting (Random Forest)

## Predictions on Unseen Data
Tested the model with a new data point

### OUTPUT

**Predicted Bike Count Class: High**


## Conclusion
This classification system can serve as a strong component in a real-time bike-sharing prediction service, helping cities:

Anticipate high/low demand windows.

Efficiently deploy bikes to key zones.

Improve user experience by preventing shortages