# Supervised Learning: Regression Models and Performance Metrics

Question 1 : What is Simple Linear Regression (SLR)? Explain its purpose.


**Answer :**

Simple Linear Regression (SLR) is a statistical method used to find the relationship between **one independent variable (X)** and **one dependent variable (Y)**. It tries to fit the **best straight line** through the data points.

The purpose of SLR is to:

* **Understand how X affects Y**
* **Predict the value of Y** for any given value of X
* **Identify trends or patterns** between the two variables

In simple words, it helps us see if changing one factor (X) can help us predict or estimate another factor (Y).


Question 2: What are the key assumptions of Simple Linear Regression?

**Answer :**

The key assumptions of Simple Linear Regression are:

1. **Linearity** – The relationship between X and Y should be a straight line.
2. **Independence** – All the observations should be independent of each other.
3. **Homoscedasticity** – The variance of errors should be constant (errors should not increase or decrease with X).
4. **Normality of errors** – The residuals (errors) should follow a normal distribution.
5. **No major outliers** – There should not be extreme values that distort the model.

These assumptions help the regression model give accurate and reliable predictions.


Question 3: Write the mathematical equation for a simple linear regression model and
explain each term.

**Answer :**

The mathematical equation for a Simple Linear Regression model is:

[
Y = b_0 + b_1X
]

Here’s what each term means:

* **Y** → Dependent variable (the value we want to predict)
* **X** → Independent variable (the value we use to make the prediction)
* **b₀ (Intercept)** → The value of Y when X = 0
* **b₁ (Slope)** → Shows how much Y changes when X increases by 1 unit
* **Error term (not shown here)** → The difference between the actual and predicted values

This equation represents a straight line that best fits the data.


Question 4: Provide a real-world example where simple linear regression can be
applied.

**Answer :**

A real-world example of simple linear regression is:

**Predicting a student’s exam score based on the number of hours they studied.**

* **X (independent variable):** Hours studied
* **Y (dependent variable):** Exam score

As study hours increase, exam scores usually increase. So we can fit a straight line to predict a student’s score from how many hours they studied.

This helps teachers or students understand the relationship between study time and performance.


Question 5: What is the method of least squares in linear regression?

**Answer :**

The **method of least squares** is a technique used in linear regression to find the **best-fitting line** for the data.

It works by:

* Calculating the **difference** between the actual Y values and the predicted Y values from the line.
* Squaring these differences (errors).
* Adding them all together.
* The best line is the one that has the **smallest total squared error**.

In simple words:
It finds the line that is **closest to all the data points** by minimizing the squared mistakes.


Question 6: What is Logistic Regression? How does it differ from Linear Regression?


**Answer :**

Logistic Regression is a method used to **predict categories**, mainly **0 or 1** (like yes/no or pass/fail).

**How it is different from Linear Regression:**

* Linear Regression predicts **numbers**.
* Logistic Regression predicts **classes**.
* Linear Regression gives a **straight line**.
* Logistic Regression gives an **S-shaped curve** and outputs a **probability**.

So, Logistic Regression is used when the output is **categorical**, not numeric.


Question 7: Name and briefly describe three common evaluation metrics for regression
models.


**Answer :**

Here are three common evaluation metrics for regression:

1. **MSE (Mean Squared Error)**

   * It measures how far the predicted values are from the actual values.
   * Smaller MSE = better model.

2. **MAE (Mean Absolute Error)**

   * It is the average of the absolute difference between actual and predicted values.
   * Easy to understand and less affected by large errors.

3. **R² (R-squared)**

   * It shows how well the model explains the variation in the data.
   * Value ranges from 0 to 1.
   * Higher R² = better performance.

These metrics help us understand how accurate the regression model is.


Question 8: What is the purpose of the R-squared metric in regression analysis?


**Answer :**

R-squared tells us **how well the regression model fits the data**.

It shows the **percentage of variation in the dependent variable (Y)** that can be explained by the independent variable (X).

* R² = 1 → Perfect model
* R² = 0 → Model explains nothing

In simple words:
**Higher R-squared means the model’s predictions are closer to the actual values.**


Question 9: Write Python code to fit a simple linear regression model using scikit-learn
and print the slope and intercept.

In [None]:
from sklearn.linear_model import LinearRegression
import numpy as np

X = np.array([[1], [2], [3], [4], [5]])
Y = np.array([2, 4, 5, 4, 5])

model = LinearRegression()

model.fit(X, Y)

print("Slope:", model.coef_[0])
print("Intercept:", model.intercept_)


Question 10: How do you interpret the coefficients in a simple linear regression model?

**Answer :**

In simple linear regression:

* **Slope (b₁):**
  It tells us **how much Y changes** when X increases by **1 unit**.
  If the slope is positive → Y increases with X.
  If the slope is negative → Y decreases with X.

* **Intercept (b₀):**
  It is the **value of Y when X = 0**.

So, the coefficients help us understand the **strength and direction** of the relationship between X and Y.
