#  Problem Statement

Imagine a scenario where someone wants to help a friend estimate the price of a house.  
The friend asks:

**“If a house has a certain size, what price should I expect?”**

They have a small dataset containing:
- House area (in square feet)  
- Actual selling price  

The task is to use **machine learning** to learn the relationship between **area** and **price**.  
By drawing a best-fit line through the data, the system should be able to **predict the price of a new house** based solely on its area.

This represents one of the simplest real-life examples of **regression**, where the goal is to predict a numerical value.


# Load the Dataset

In [1]:
import pandas as pd

# A tiny sample dataset: Area (sqft) vs Price (₹ in lakhs)
data = {
    "area": [600, 700, 800, 900, 1000, 1100, 1200],
    "price": [30, 35, 40, 45, 50, 55, 60]
}

df = pd.DataFrame(data)

X = df[["area"]]   # input feature
y = df["price"]    # target variable


##  Dataset Creation (Area vs Price)

- We create a **small custom dataset** containing two values:  
  **house area** and **house price**.
- `X` contains the **feature** (the area of the house).
- `y` contains the **target value** we want to predict (the price).
- No external dataset is required, making this perfect for **simple school demonstrations**.


# Train the Regression Model

In [2]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X, y)


## Model Training (Linear Regression)

- We use **Linear Regression**, one of the simplest and most widely used machine learning algorithms.
- The model finds the **best fit straight line** that represents the relationship between **area** and **price**.
- `model.fit()` trains the model by learning this relationship from the dataset.


# Evaluate the Model

In [3]:
score = model.score(X, y)
print("R² Score:", score)


R² Score: 1.0


## Model Evaluation (R² Score)

- The **R² Score** measures how well the regression line fits the data.
- The value ranges from **0 to 1**, where:
  - **Closer to 1** → The model fits the data very well.
  - **Closer to 0** → The model does not fit the data effectively.


# Predict the Price of a New House

In [4]:
# Predict price of a 950 sq ft house
predicted_price = model.predict([[950]])
print("Predicted Price (in lakhs):", predicted_price[0])


Predicted Price (in lakhs): 47.5




## Making a Prediction

- We give the model a **new input value**: **950 sq ft**.
- The model uses the learned relationship to return the **predicted price**.
- This demonstrates a practical real-world use case:  
  **“How much should I pay for a house of this size?”**


#  Workflow Summary (Area vs Price Prediction)

### 1. **Create a Small Dataset**
- Build a simple dataset containing **house area** and **price**.

### 2. **Separate Features and Target**
- `X` → the **feature** (area)  
- `y` → the **target** (price)

### 3. **Train a Linear Regression Model**
- Fit the model to learn the relationship between area and price.

### 4. **Evaluate Using R² Score**
- Check how well the regression line fits the data.

### 5. **Predict Price for a New House**
- Give an area (e.g., **950 sq ft**) and get the predicted price.

### 6. **(Optional) Visualize the Model**
- Plot:
  - Actual data points (scatter plot)  
  - Regression line  
- Helps students easily understand what the model has learned.

---

# Libraries Used

## 1. **pandas**
- Stores data in **DataFrame** format.  
- Makes it easy to extract columns for `X` and `y`.

## 2. **scikit-learn (sklearn)**
- Trains the **Linear Regression** model.  
- Provides built-in functions for:
  - Training (`fit`)
  - Evaluating (`score`)
  - Predicting (`predict`)

## 3. **matplotlib**
- Used to create:
  - Scatter plots  
  - Regression line plots  
- Helps visualize how well the model fits the data.
