# Assignment Solution

## 1. What is Simple Linear Regression (SLR)? Explain its purpose.
  -> Simple Linear Regression tries to fit a straight line through a set of data points in such a way that it best represents the relationship between
𝑋
X and
𝑌
Y.
The general equation is:

𝑌
=
𝛽
0
+
𝛽
1
𝑋
+
𝜀
Y=β
0
	​

+β
1
	​

X+ε

Where:

𝑌
Y = Dependent (response) variable

𝑋
X = Independent (predictor) variable

𝛽
0
β
0
	​

 = Intercept (value of
𝑌
Y when
𝑋
=
0
X=0)

𝛽
1
β
1
	​

 = Slope (change in
𝑌
Y for one unit change in
𝑋
X)

𝜀
ε = Error term (difference between observed and predicted values)

Purpose of SLR

The main goals of Simple Linear Regression are:

Prediction:
Estimate or predict the value of the dependent variable
𝑌
Y based on a known value of
𝑋
X.


Example: Predict a person’s weight (Y) based on their height (X).

Relationship Understanding:
Determine whether and how strongly two variables are related.

Example: Does advertising expenditure (X) significantly affect sales revenue (Y)?

Trend Analysis:
Identify trends and patterns over time or across datasets.

 Example: Estimate how temperature (X) affects electricity consumption (Y).


## 2. What are the key assumptions of Simple Linear Regression?
  -> 1. Linearity

The relationship between the independent variable and the dependent variable should be straight-line in nature.

This means when one variable increases or decreases, the other should change in a consistent direction.

You can check this by making a scatter plot — the points should form roughly a straight line.

2. Independence of Errors

The prediction errors (the differences between actual and predicted values) should be independent of each other.

In simple terms, one observation’s error should not affect another’s.

This is especially important for time-based data to ensure no pattern or trend exists in the errors.

3. Constant Variance of Errors (Homoscedasticity)

The errors should have equal spread across all levels of the independent variable.

The amount of variation in the errors should not increase or decrease as the predicted values get larger or smaller.

If the spread of errors widens or narrows, this assumption is violated.

4. Normality of Errors

The errors should be normally distributed — meaning most errors are small, and large errors are rare.

This helps ensure that hypothesis tests and confidence intervals are valid.

You can check this using a histogram or a normal probability plot.

5. No Outliers or Influential Points

The dataset should not contain extreme or unusual values that can distort the results.

Outliers can pull the regression line away from the true pattern.

It is good practice to identify and handle such points before building the model.

## 3. Write the mathematical equation for a simple linear regression model and explain each term.
  -> Simple Linear Regression fits a straight line through the data points so that it best predicts the value of Y based on X.

Predicted Y
=
Intercept
+
(
Slope
×
X
)
Predicted Y=Intercept+(Slope×X)

## 4. Provide a real-world example where simple linear regression can be applied.
  -> Scenario

A real estate company wants to predict the price of a house based on its size (in square feet).

They collect data from several houses in a city:

House Size (sq. ft.)	House Price (₹ in lakhs)


800	  40


1000	50


1200	60


1500	75


1800	90

## 5. What is the method of least squares in linear regression?
  -> The method of least squares is a mathematical approach used to determine the line of best fit by minimizing the sum of the squared differences between the actual and predicted values of the dependent variable.

## 6. What is Logistic Regression? How does it differ from Linear Regression?
  -> Logistic Regression is a statistical method used to predict a categorical (discrete) outcome — usually when the target variable has two possible values, such as:

Yes / No

Pass / Fail

Spam / Not Spam

Disease / No Disease

Even though it’s called “regression,” it’s actually used for classification problems, not for predicting continuous numbers.



## 7. Name and briefly describe three common evaluation metrics for regression models.
  -> 1. Mean Absolute Error (MAE)

Definition:
The average of the absolute differences between actual and predicted values.

Formula (in simple words):
Take the difference between actual and predicted values, make them positive, and find the average.

Meaning:
It tells how far the predictions are from the real values on average.

Example:
If MAE = 5, it means the model’s predictions are off by about 5 units on average.
 Smaller MAE = better accuracy

 2. Mean Squared Error (MSE)

Definition:
The average of the squared differences between actual and predicted values.

Meaning:
It penalizes larger errors more because the differences are squared.

Use Case:
Useful when you want to strongly punish large mistakes.

 Smaller MSE = better model performance

 3. Root Mean Squared Error (RMSE)

Definition:
The square root of the Mean Squared Error.

Meaning:
It brings the error back to the same unit as the original data, making it easier to interpret.

Example:
If RMSE = 8, it means the model’s predictions are off by about 8 units on average.

 Smaller RMSE = more accurate predictions

## 8. What is the purpose of the R-squared metric in regression analysis?
  -> The purpose of R-squared is to show how well the regression model explains the variability of the dependent variable (the one you are trying to predict).

##9. Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.


In [1]:

from sklearn.linear_model import LinearRegression
import numpy as np

X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)

Y = np.array([2, 4, 5, 4, 5])

model = LinearRegression()
model.fit(X, Y)

print("Slope (β₁):", model.coef_[0])
print("Intercept (β₀):", model.intercept_)


Slope (β₁): 0.6
Intercept (β₀): 2.2


## 10. How do you interpret the coefficients in a simple linear regression model?
  -> Intercept (β₀): The predicted value of Y when X is zero. It’s the baseline value.

Slope (β₁): The change in Y for a one-unit increase in X. Positive slope → Y increases; negative slope → Y decreases.

Example:
If Marks = 30 + 5*Hours_Studied:

Intercept = 30 → 0 hours → 30 marks

Slope = 5 → Each extra hour → marks increase by