# 🏠 Energy Efficiency Prediction using Machine Learning

This project uses the **Energy Efficiency dataset** from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/dataset/242/energy+efficiency) to predict the **Heating Load (HL)** of buildings based on their architectural features. The goal is to identify energy-efficient design patterns and deploy a robust regression model to estimate heating energy requirements.

---

## 📁 Dataset Description

- **Source**: UCI Machine Learning Repository  
- **Observations**: 768  
- **Features**: 8 (X1–X8)  
- **Targets**: Heating Load (Y1), Cooling Load (Y2)  
- **Prediction Focus**: Heating Load only

### 🔧 Feature Definitions

| Feature | Description |
|--------|-------------|
| X1 – `Relative_Compactness` | Ratio of volume to surface area. More compact buildings lose less heat. |
| X2 – `Surface_Area` | Total outer surface area (walls, roof). Higher = more exposure to heat exchange. |
| X3 – `Wall_Area` | Total wall area in square meters. |
| X4 – `Roof_Area` | Area of roof in square meters. |
| X5 – `Overall_Height` | Building height (low-rise or high-rise). |
| X6 – `Orientation` | Orientation of building (2–5; categorical). |
| X7 – `Glazing_Area` | Ratio of wall area taken up by windows. |
| X8 – `Glazing_Area_Distribution` | Orientation of glazing/windows (0–5; categorical). |

---

## 🧪 Exploratory Data Analysis (EDA)

- No missing values were found.
- **`Relative_Compactness`** showed a strong negative correlation with **Heating_Load**.
- **`Surface_Area`**, **`Wall_Area`**, and **`Roof_Area`** had a moderate positive correlation with Heating Load.
- Orientation and glazing distribution are categorical, encoded accordingly.
- Target (`Heating_Load`) is continuous and right-skewed, but doesn't require transformation.

---

## ⚙️ Feature Engineering & Selection

- Features were scaled using **StandardScaler** within a pipeline.
- Used `SelectKBest(f_regression, k='a')` to score all features.
- All original features were retained as they provided valuable signal.

---

## 🧠 Models Trained & Results

| Model                | MAE   | MSE   | RMSE  | R² Score |
|---------------------|-------|-------|-------|----------|
| Linear Regression   | 2.06  | 8.78  | 2.96  | 0.9035   |
| Decision Tree       | 0.36  | 0.32  | 0.57  | 0.9964   |
| Random Forest       | 0.36  | 0.32  | 0.56  | 0.9965   |
| AdaBoost            | 1.39  | 3.02  | 1.74  | 0.9668   |
| Gradient Boosting   | ✅ 0.36  | ✅ 0.28  | ✅ 0.53  | ✅ **0.9969** |

### ✅ Final Model Selected: `GradientBoostingRegressor`

---

## 🧪 Hyperparameter Tuning

Used `GridSearchCV` with cross-validation to tune:


```python
GradientBoostingRegressor(
    learning_rate=0.05,
    n_estimators=200,
    max_depth=5,
    subsample=1.0,
    random_state=42
)

```

Optimal parameters improved RMSE slightly and avoided overfitting.

---

## 🔄 Pipeline Setup

The final pipeline includes:
- `StandardScaler` (for normalization)
- `SelectKBest` (for feature scoring)
- `GradientBoostingRegressor` (best performer)

Wrapped in `Pipeline` for seamless prediction and reuse.

---

## 📈 Evaluation on Unseen Data

A batch of unseen data was predicted with an R² of **0.995+**, confirming the model's generalization.

---

## 🧠 Inference Summary

- **Relative Compactness** is the strongest negative predictor of energy consumption.
- **Surface Area** and **Roof Area** increase energy needs.
- **Window size and position** impact cooling more than heating.
- Models like **Gradient Boosting** and **Random Forest** perform extremely well (>99% R²).
- Feature scaling and selection improve consistency but all 8 features contribute meaningfully.

---

## 🏁 Conclusion

This project demonstrates that **machine learning models can effectively predict energy performance** in buildings. Using architectural attributes alone, we can estimate heating energy loads with high precision. These insights can help architects and engineers **optimize building design** for energy efficiency.

---

## 📌 Future Work

- Extend prediction to **Cooling Load**.
- Deploy as a **web app** for real-time energy simulation.



## 🧱 Input Features (X1–X8)


| Column Name                 | Description                                                                                           | Technical Impact on Energy Efficiency                                                                                        |
| --------------------------- | ----------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
| `Relative_Compactness`      | Ratio of building volume to its surface area. Higher values imply more compact, cube-like structures. | More compact buildings have **less exposed surface**, which reduces heat loss/gain, leading to **lower energy demand**.      |
| `Surface_Area`              | Total exterior surface area of the building (walls, roof, etc.) in m².                                | Larger surface area increases **heat transfer** through building envelope, generally **increasing energy loads**.            |
| `Wall_Area`                 | Total wall area in m².                                                                                | Affects **heat conduction** through walls. **More wall area = more thermal exchange** if not insulated properly.             |
| `Roof_Area`                 | Total roof area in m².                                                                                | Roofs are major points of heat gain/loss. **High roof area** can lead to **more energy required** for temperature control.   |
| `Overall_Height`            | Building height in meters (two distinct values: low-rise vs high-rise).                               | Taller buildings may have **more air volume to condition** and different heat stratification patterns.                       |
| `Orientation`               | Direction the building faces (categorical: 2, 3, 4, or 5).                                            | Affects **solar exposure**. Some orientations (e.g., south-facing) may get more sun, impacting **cooling or heating needs**. |
| `Glazing_Area`              | Fraction of wall surface occupied by windows (0 to 0.4).                                              | **More glazing** allows **natural light** and **solar gain**, but also increases **heat loss** and **cooling loads**.        |
| `Glazing_Area_Distribution` | Position of the windows (categorical: 0 to 5).                                                        | Affects **how sunlight enters** the building, especially during specific times of day or seasons, impacting energy balance.  |


##  🎯 Target Variables



| Column Name    | Description                                                           | Significance                                                                                                      |
| -------------- | --------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- |
| `Heating_Load` | Energy needed (in kWh/m²) to heat the building to comfortable levels. | Main dependent variable in winter-focused climates. Lower values indicate better insulation or passive heat gain. |
| `Cooling_Load` | Energy needed (in kWh/m²) to cool the building in warm conditions.    | Important in hot climates. Affected by orientation, glazing, and internal gains.                                  |


## 🔍 Summary
Thermal performance of a building is mostly dictated by how compact, insulated, and oriented it is.

Relative_Compactness and Overall_Height have a direct geometric effect on energy use.

Glazing features (area & distribution) impact solar heat gain/loss, critical for passive design.

Orientation is a categorical proxy for how much sunlight different walls receive.