#Supervised Learning: Regression Models and Performance Metrics

Q.1 What is Simple Linear Regression (SLR)? Explain its purpose.

--> Simple Linear Regression (SLR) is a statistical method used to study the relationship between two continuous variables — one independent variable (X) and one dependent variable (Y).

Simple Linear Regression finds the best-fitting straight line (called the regression line) through a set of data points, which can be represented by the equation:

Y=b0​+b1​X+ε

Where:

Y = Dependent variable

X = Independent variable

𝑏
0 = Intercept (value of
Y when
X=0)

b1= Slope
  (how much
Y changes for each unit change in
X)

ε = Error term (difference between actual and predicted values)

Purpose of Simple Linear Regression

- Prediction:
To predict the value of the dependent variable based on a known value of the independent variable.
Example: Predicting house price (Y) based on its size (X).

- Understanding relationships:
To quantify the strength and direction of the relationship between two variables.

    - Positive slope -> as X increases, Y increases.

    - Negative slope -> as X increases, Y decreases.

- Trend estimation: To identify and model trends in data, often used in forecasting.

Example

Suppose you want to predict a student's exam score (Y) based on study hours (X):

Hours Studied (X) : 2, 4, 6, 8

Exam Score (Y) : 50, 65, 80, 90

The regression model might find an equation:

Y=40+6X

That means for every additional hour studied, the score increases by 6 points.

Q.2 What are the key assumptions of Simple Linear Regression?

--> The key assumptions of Simple Linear Regression (SLR) ensure that the model estimates are reliable, unbiased, and statistically valid.

Here are the five main assumption

1.Linearity

The relationship between the independent variable (X) and the dependent variable (Y) is linear.

This means that a change in X results in a proportional change in Y.

Example: If hours studied increases, marks should increase at a roughly constant rate.

2.Independence of Errors

The residuals (errors) — the differences between actual and predicted Y — should be independent of each other.

This means there is no correlation among error terms.

Example: One student's exam performance should not influence another's.

3.Homoscedasticity (Constant Variance of Errors)

The variance of the residuals should be constant across all values of X.

If the spread of residuals increases or decreases with X, it indicates heteroscedasticity.

Example: Prediction errors should be equally spread for both low and high study hours.

4.Normality of Errors

The residuals should be normally distributed (bell-shaped).

This is important for making valid statistical tests (like t-tests for coefficients).

5.No Perfect Multicollinearity

In Simple Linear Regression, there only one predictor, so this assumption is automatically satisfied.

(It becomes important in Multiple Linear Regression, where predictors should not be highly correlated.)






Q.3 Write the mathematical equation for a simple linear regression model and
explain each term.


--> The mathematical equation for a Simple Linear Regression (SLR) model

𝑌 =
𝑏
0
+
𝑏
1
𝑋
+
𝜀

Explanation of Each Term

Y (Dependent Variable) : The variable we are trying to predict or explain (e.g.- house price, marks, sales).

X (Independent Variable) : The variable used to make predictions (e.g.- area of house, study hours, advertisement spend).

b0 (Intercept) : The expected value of Y when X=0. It represents the point where the regression line crosses the Y-axis.

b1 (Slope Coefficient) : The average change in Y for each one-unit increase in X. It shows the strength and direction of the relationship between X and Y.

ε (Error Term) : The difference between the actual value of Y and the predicted value from the model. It accounts for factors not explained by X.

Example

Suppose you are predicting a student's exam score (Y) based on the number of hours studied (X).

The regression equation might be:

Y = 40 + 6X + ε

Interpretation:

b0 ​= 40 : A student who studies 0 hours is expected to score 40 marks.

b1 ​= 6 : For each additional hour studied, the exam score increases by 6 marks (on average)

ε: Random error - captures variation in scores not explained by study hours.





Q.4 Provide a real-world example where simple linear regression can be
applied.

--> Real-world example of where Simple Linear Regression (SLR) can be applied

Predicting House Prices Based on Size

Scenario:

A real estate company wants to predict the price of a house based on its size (in square feet).

They collect data on several houses:

House Size (sq. ft.) : 800, 1000, 1200, 1500, 1800

Price (₹ Lakhs) : 45, 55, 65, 80, 95

Applying Simple Linear Regression:

We assume a linear relationship between house size (X) and price (Y):

Y=b0​+b1​X+ε

After fitting the model, we might get:

Predicted Price=20+0.04X

Interpretation:

b0=20: The base price (even for a house of size 0) is ₹20 lakhs.

b1=0.04: For every additional 1 sq. ft. of area, the house price increases by ₹0.04 lakhs (i.e., ₹4,000).

So, for a 1,500 sq. ft. house:

Price=20+0.04(1500)=80 lakhs (approx.)

Purpose of Using SLR Here

- Prediction: Estimate house prices for new listings.

- Understanding: Quantify how much house size affects price.

- Decision-making: Help buyers and sellers determine fair market values



Q.5 What is the method of least squares in linear regression?

--> The method of least squares is the most common technique used to find the best-fitting line in a linear regression model.

It works by minimizing the sum of the squared differences (errors) between the observed values and the predicted values given by the regression line.

Mathematical Explanation

The simple linear regression model is:

Y=b0​+b1​X+ε

We have:

Yi : Actual observed value

𝑌i = 𝑏0 + 𝑏1𝑋i : Predicted value from the regression line

ei = Yi - Yi : Residual

Goal of Least Squares

The goal is to minimize the sum of squared residuals (errors):

Minimize S=i=1∑n​(Yi​−Yi​^​)2=i=1∑n​(Yi​−b0​−b1​Xi​)2

Formulas for the Coefficients

By minimizing S with respect to b0 and b1, we get:

b1 = ∑(Xi​−Xˉ)(Yi​−Yˉ)​ / ∑(Xi​−Xˉ)^2

b0 = Yˉ−b1​Xˉ

Where:

Xˉ = Mean of X values

Yˉ = Mean of Y values

These formulas give the slope and intercept of the line of best fit.

Interpretation

The slope (b1) shows how much Y changes for a one-unit change in X.

The intercept (b0) shows the predicted value of Y when X=0.

The method of least squares finds the best-fitting regression line by minimizing the sum of squared errors between the actual and predicted values — ensuring the most accurate and unbiased linear fit for the data.







Q.6 What is Logistic Regression? How does it differ from Linear Regression?


--> Logistic Regression – Overview

Logistic Regression is a statistical method used for classification problems, where the dependent variable (Y) is categorical, usually binary (e.g., Yes/No, 0/1, Pass/Fail).

It predicts the probability that an observation belongs to a particular category.

Purpose

To model the probability that Y=1 (the “success” class) given an independent variable X.

Mathematical Form

Instead of fitting a straight line (like in linear regression), Logistic Regression fits an S-shaped curve (sigmoid function):

P(Y=1∣X)=1 / 1+e−(b0​+b1​X)1​

Where:

P(Y=1∣X) = Probability that the outcome is 1

b0 = Intercept

b1= Coefficient of X

e = Base of the natural logarithm (~2.718)

The output is always between 0 and 1, making it ideal for probability estimation.

Decision Rule

If:
P(Y=1∣X)≥0.5 - Predict 1 (Yes / Success)

P(Y=1∣X)<0.5 - Predict 0 (No / Failure)

Difference Between Linear Regression and Logistic Regression

Linear Regression

 - Predicts a continuous outcome
 - Output can be any real number (−∞ to +∞).
 - Y = b0 + b1X + ε
 - Models a linear relationship between X and Y.
 - Predicting sales, income, temperature, etc.
 - Uses Mean Squared Error (MSE).


Logistic Regression

 - Predicts a categorical outcome (e.g., yes/no, pass/fail).
 - Output is a probability between 0 and 1.
 - P(Y=1∣X)=1 / 1+e−(b0​+b1​X)1​
 - Models a non-linear relationship between X and the probability of Y.
 - Predicting churn, disease presence, email spam, etc.
 - Uses Log-Loss / Cross-Entropy.


 Linear Regression predicts continuous values, while Logistic Regression predicts probabilities for categorical outcomes using the sigmoid function to keep predictions between 0 and 1.

Q.7 Name and briefly describe three common evaluation metrics for regression
models.


--> Three common evaluation metrics for regression models, along with brief descriptions:

1.Mean Absolute Error (MAE)

MAE measures the average absolute difference between the actual values and the predicted values.

MAE=ni=1∑n​∣yi​−y^​i​∣

 - It shows how far predictions are from actual values on average.

 - Smaller MAE → better model performance.

 - It's easy to interpret since it uses the same unit as the target variable.

2.Mean Squared Error (MSE)

MSE calculates the average of squared differences between actual and predicted values.

MSE=n​i=1∑n​(yi​−y^​i​)2

 - Penalizes larger errors more heavily.

 - Smaller MSE indicates better performance.

 - Useful when large errors are especially undesirable.

3.R-squared

R² measures the proportion of variance in the dependent variable that is predictable from the independent variable(s).

R2=1− SSres​​/SStot

 - R² ranges from 0 to 1.

 - Higher R² - better fit (closer to 1 means predictions explain most of the variance).



Q.8 What is the purpose of the R-squared metric in regression analysis?


--> The purpose of the R-squared (R²) metric in regression analysis is to measure how well the regression model explains the variability of the dependent (target) variable based on the independent (predictor) variables.

R² tells you how much of the total variation in the actual data is captured or explained by your model.

SSres = ∑(yi - y^i)2 - Residual sum of squares (unexplained variation)

SStot​=∑(yi​−yˉ​)2 - Total sum of squares (total variation)

Interpretation:

 - R² = 1: Perfect fit — model explains all the variation in the data.

 - R² = 0: Model explains none of the variation (as good as using the mean).

 - Higher R² → Better model fit, meaning predictions are closer to actual values.

If R² = 0.85, it means 85% of the variation in the dependent variable can be explained by the model, while 15% is due to other factors or random noise.



Q.9 Write Python code to fit a simple linear regression model using scikit-learn
and print the slope and intercept.


In [None]:

from sklearn.linear_model import LinearRegression
import numpy as np

# data (X = independent variable, y = dependent variable)
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)  # X must be 2D
y = np.array([2, 4, 5, 4, 5])

# Create and fit the model
model = LinearRegression()
model.fit(X, y)

# Print slope (coefficient) and intercept
print("Slope (Coefficient):", model.coef_[0])
print("Intercept:", model.intercept_)


Slope (Coefficient): 0.6
Intercept: 2.2


Q.10 How do you interpret the coefficients in a simple linear regression model?

--> In a Simple Linear Regression model, the relationship between the independent variable X and the dependent variable Y is expressed as:

Y=β0​+β1​X+ε

Where:

β0= Intercept

β1= Slope (Coefficient)

ε = Error term

Interpretation of Coefficients:

1.Intercept (β0)

It represents the predicted value of Y when X = 0.

In other words, it's the point where the regression line crosses the Y-axis.

Example: If

β0=5, when
X=0, the model predicts
Y=5

2.Slope (β1)

It represents the change in Y for a one-unit increase in X.

In other words, it tells how much

Y increases (or decreases) when

X increases by 1 unit.

Example: If
β1=2.5, then for every 1-unit increase in

X, Y increases by 2.5 units, on average.

Example Interpretation:

If your regression equation is:

Y^=3+1.2

Then:

Intercept (3): When X=0, predicted
Y=3.

Slope (1.2): For every increase of 1 unit in
X, Y increases by 1.2 units.
