# üìò P2.2.2.3 ‚Äì Supervised Learning

## Topic: House Price Prediction Example (Regression)
---


## üéØ Learning Objectives

By the end of this notebook, you will be able to:

- Say what **regression** is and give real-world examples
- Follow the steps in a regression pipeline (prepare features ‚Üí split ‚Üí train ‚Üí predict ‚Üí evaluate)
- Build a simple house price model and interpret **RMSE**, **R2**, and **feature coefficients**


## üìå What is regression?

**Regression** means predicting a **number** (a continuous value) for something ‚Äî not a category, but a quantity (e.g. price, temperature, salary).

- We train it using **labeled examples**: many inputs with the correct numeric output.
- The model learns patterns from those examples and then predicts a **number** for **new** inputs it has never seen.

**Some real-world examples:**

| Problem | Input | Output (number) |
|--------|--------|------------------|
| House price | Size, bedrooms, age, location | Price |
| Temperature forecast | Date, location, past weather | Temperature |
| Salary prediction | Experience, role, education | Salary |
| Sales forecast | Season, promotions, history | Sales amount |
| Demand prediction | Price, season, events | Demand (units) |

In this notebook we focus on **one** of these: **house price** from features like size, bedrooms, and age. The same idea (labeled data ‚Üí train ‚Üí predict a number) applies to all regression problems.

## üìù Problem Statement

We want to build a program that predicts house prices based on features like size, bedrooms, and age. The goal is to automate price estimation for new houses.


**Why is this important?**

- Helps buyers and sellers make informed decisions
- Automates price prediction for real estate platforms

## ü§ñ Choosing the Model & Why

We use the **Linear Regression** model because:
- It predicts continuous values (house prices)
- It is simple and interpretable
- It shows how features affect price

**Why not other models?**
- Decision Trees, Random Forests, etc. can be used, but Linear Regression is a classic choice for regression tasks and is easy to understand

## üõ†Ô∏è Example: House Price Prediction Pipeline

This example shows the steps:
1. Prepare features and target
2. Split data into train/test
3. Train Linear Regression model
4. Predict on test set and evaluate (RMSE, R2, coefficients)
5. Predict on a **new** house

*We use a **tiny dataset** (6 houses) so the flow is easy to follow; in practice you would use thousands of labeled properties.*


**When you run the code below**, you'll see **RMSE**, **R2 score**, and **feature coefficients**; we explain what they mean in the section right after the code.

In [1]:
"""
House Price Prediction using Scikit-learn
-----------------------------------------
This program predicts house prices using
Linear Regression and real ML workflow.

"""

import math
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score


def main():
    print("HOUSE PRICE PREDICTION MODEL")
    print("-----------------------------")

    # Dataset
    # Features: [Size (sqft), Bedrooms, Age]
    X = [
        [1000, 2, 10],
        [1200, 3, 5],
        [1500, 3, 8],
        [1800, 4, 3],
        [2000, 4, 2],
        [2300, 5, 1]
    ]

    # Target: Price in thousands
    y = [200, 250, 300, 360, 400, 450]

    # Train-Test Split
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.3, random_state=42
    )

    # Model
    model = LinearRegression()
    model.fit(X_train, y_train)

    # Predictions
    predictions = model.predict(X_test)

    # Evaluation
    rmse = math.sqrt(mean_squared_error(y_test, predictions))
    r2 = r2_score(y_test, predictions)

    print("Predictions:", predictions)
    print("RMSE:", round(rmse, 2))
    print("R2 Score:", round(r2, 2))

    # Feature importance
    print("\nFeature Coefficients:")
    for feature, coef in zip(["Size", "Bedrooms", "Age"], model.coef_):
        print(f"{feature}: {round(coef, 2)}")

    # New Prediction
    new_house = [[1600, 3, 4]]
    predicted_price = model.predict(new_house)
    print("\nPredicted price for new house:", round(predicted_price[0], 2))


if __name__ == "__main__":
    main()

HOUSE PRICE PREDICTION MODEL
-----------------------------
Predictions: [210.   251.25]
RMSE: 7.13
R2 Score: 0.92

Feature Coefficients:
Size: 0.19
Bedrooms: -8.75
Age: -2.5

Predicted price for new house: 328.75


## üìä Understanding RMSE, R2 & Feature Importance

- **RMSE (Root Mean Squared Error):** How far predictions are from actual values. Lower = better.

- **R2 Score:** How well the model explains the variation in prices. Closer to 1 = better.

- **Feature Coefficients:** How each feature (size, bedrooms, age) affects the predicted price.

**Why we use these:**
- To measure how well the model works
- To see which features matter most
- To decide if it's reliable enough for real use

---
## üìù Key Takeaways

- **House price prediction** = supervised **regression**: features + prices ‚Üí split ‚Üí train ‚Üí predict ‚Üí evaluate.
- **Train‚Äìtest split** from Core Concepts is used here; no vectorization needed (features are already numbers).

- We evaluate with **RMSE**, **R2**, and **feature coefficients**.

