# 🏡 **Eco Planners Predictor - Project Summary**

## 1. 🌍 **Project Overview**

The **Eco Planners Predictor** is a comprehensive analysis and prediction tool designed for **EcoCity Planners** to aid in sustainable urban development. By utilizing **machine learning models** and real estate data, the project aims to:

- **Predict house prices** with high accuracy.
- **Classify properties** based on energy efficiency.
- Provide **data-driven insights** for sustainable housing projects, balancing **affordability** and **environmental efficiency**.

This project is structured into multiple stages, including **exploratory data analysis (EDA)**, **model development**, and **insight generation**, all geared towards informing **EcoCity Planners** about the most efficient and valuable housing strategies.

---

## 2. 📊 **Dataset Summary**

The dataset contains detailed information on various aspects of housing properties, with key features including:

- **Lot Acres**: Size of the property in acres.
- **Taxes**: Annual property tax value.
- **Year Built**: Year the property was constructed.
- **Bedrooms & Bathrooms**: Number of bedrooms and bathrooms in each property.
- **Square Footage (sqrt_ft)**: Total livable square footage of the property.
- **Garage Size**: Capacity of the garage (number of cars).
- **Fireplaces**: Number of fireplaces in the property.
- **Distance to Phoenix/Tucson**: Proximity to major cities in Arizona.
- **Property Age**: The age of the property, calculated as `2024 - Year Built`.
- **Price per Square Foot**: Price per square foot based on the sold price.
- **Energy Efficiency**: A numeric score representing the energy efficiency of the property.

These features form the foundation of our **predictive models** and **classification tasks**, enabling accurate forecasting of housing prices and evaluation of sustainability measures.

---

## 3. 🔍 **Exploratory Data Analysis (EDA)**

### 3.1 **Key Findings from EDA**

- **Price Distribution**: The price distribution is heavily right-skewed, with most homes priced between **$500,000 and $1,500,000**, and a few luxury homes exceeding **$5 million**.
- **Correlations**: 
  - **Lot Acres** (+0.53) and **Square Feet** are strongly correlated with house prices, indicating that larger properties with more living space are valued higher.
  - **Energy Efficiency** shows moderate correlation with **price**, signaling potential for value increase through sustainable initiatives.
- **Outliers**: High-value outliers were identified in features like **Lot Acres**, **Taxes**, and **Energy Efficiency**. These were handled appropriately during data cleaning to ensure model accuracy.

### 3.2 **Feature Distributions**
- **Lot Acres**: Most properties have small lot sizes, with some extreme outliers reaching **1,000 acres**.
- **Taxes**: The majority of properties pay between **$3,000 and $7,500** in annual taxes, with a few high-value properties paying significantly more.
- **Bedrooms & Bathrooms**: Residential homes typically have between **3-5 bedrooms** and **2-4 bathrooms**.
- **Square Footage**: Most properties range between **2,500 and 4,500 square feet**, with a few luxury properties exceeding **20,000 square feet**.
- **Energy Efficiency**: Properties were classified into **Low**, **Medium**, and **High** efficiency categories to assist in better understanding sustainability levels.

---

## 4. 🤖 **Model Development and Performance**

We developed and evaluated several machine learning models to **predict house prices** and **classify energy efficiency**.

### 4.1 **Predictive Models for House Prices**

#### 4.1.1 **Simple Linear Regression**
- **Model Summary**: A baseline model to assess the linear relationship between property features and house prices.
- **Performance**:
  - **MSE**: `113,489,800,730.7467`
  - **R²**: `0.3789`
  - **MAPE**: `25.57%`
- **Insights**: Simple Linear Regression showed limited ability to accurately predict house prices, with high error rates and low correlation to actual prices.

#### 4.1.2 **K-Nearest Neighbors (KNN) Regressor**
- **Model Summary**: A highly effective model that uses the property’s nearest neighbors to make accurate predictions.
- **Performance**:
  - **MSE**: `1,480,483,973.4149`
  - **R²**: `0.9919`
  - **MAPE**: `0.7795%`
- **Insights**: The KNN Regressor delivered excellent predictive accuracy, with minimal error rates and near-perfect correlation to actual prices. This model is highly recommended for future price forecasting.

### 4.2 **Classification Models for Energy Efficiency**

#### 4.2.1 **K-Nearest Neighbors (KNN) Classifier**
- **Model Summary**: The KNN Classifier categorizes properties into **Low**, **Medium**, and **High** energy efficiency levels.
- **Performance**:
  - **Accuracy**: `98.30%`
  - **Precision, Recall, F1-Score**: All metrics achieved **0.9830**, showing balanced and consistent performance.
- **Insights**: The KNN Classifier is an outstanding model for classifying energy efficiency, ensuring reliable predictions across all categories.

#### 4.2.2 **Gaussian Naive Bayes Classifier**
- **Model Summary**: An alternative classification model with a focus on probability-based predictions.
- **Performance**:
  - **Accuracy**: `79.99%`
  - **F1-Scores**:
    - **High Efficiency**: `0.8511`
    - **Medium Efficiency**: `0.7709`
    - **Low Efficiency**: `0.7871`
- **Insights**: While effective for extreme values (high or low efficiency), the Gaussian Naive Bayes classifier struggled with **medium efficiency classifications**. It can be used as a supplementary model in niche cases.

---

## 5. 🎯 **Insights and Recommendations for EcoCity Planners**

### 5.1 **Price Determinants**
- **Lot Size**, **Square Footage**, and **Energy Efficiency** are key price drivers. Properties with larger areas and higher efficiency levels tend to command higher market prices.
- **Energy Efficiency**: While not the strongest predictor of price, improving energy efficiency can add value, making it an area of opportunity for **EcoCity Planners** to promote sustainable housing.

### 5.2 **Energy Efficiency Classification**
- The **KNN Classifier** offers a powerful tool for assessing the energy efficiency of properties, allowing planners to identify **eco-friendly neighborhoods**.
- **Gaussian Naive Bayes** can be used in conjunction with the KNN Classifier when additional distinctions are needed between high and low efficiency properties.

---

## 6. 🏆 **Final Thoughts and Future Directions**

The **Eco Planners Predictor** successfully combines data-driven predictions with sustainability insights to empower **EcoCity Planners** in their urban development projects. The **KNN Regressor** and **KNN Classifier** models outperform others and are recommended for use in **pricing predictions** and **energy efficiency classifications**. These models will help planners build a future where housing is both **affordable** and **eco-friendly**, driving **sustainable urban growth**.

As cities continue to grow, integrating these models will ensure that **sustainable housing** initiatives not only reduce environmental impact but also enhance the overall value of urban properties.

**Next Steps**:
- Further refine energy efficiency metrics and explore **additional features** to enhance prediction accuracy.
- Continue integrating sustainability efforts into urban planning to align with long-term **environmental goals**.
