Question 1 : What is Simple Linear Regression (SLR)? Explain its purpose.

Answer:
- Simple Linear Regression (SLR) is a statistical technique used to describe the relationship between two continuous variables – one independent (predictor) variable and one dependent (response) variable. It assumes that this relationship can be represented by a straight line. The main purpose of SLR is to predict the value of the dependent variable based on the independent variable and to understand how the two variables are related. It helps in both prediction and interpretation, showing how much change in one variable affects the other.

Question 2: What are the key assumptions of Simple Linear Regression?

Answer:
- The key assumptions of Simple Linear Regression are:

Linearity: The relationship between the independent and dependent variable is linear.

Independence: The observations and errors are independent of each other.

Homoscedasticity: The variance of errors remains constant across all levels of the independent variable.

Normality of Errors: The residuals (errors) follow a normal distribution.

No Omitted Variables: The model is correctly specified and includes all relevant predictors.

If these assumptions are violated, the accuracy and reliability of the model may decrease.

-

Question 3: Write the mathematical equation for a simple linear regression model and explain each term.

Answer:
- The equation of a simple linear regression model is:

y = b0 + b1 * x + e

Where:

y = Dependent variable (the variable we want to predict)

x = Independent variable (the variable used for prediction)

b0 = Intercept (value of y when x = 0)

b1 = Slope (change in y for one-unit change in x)

e = Error term (difference between actual and predicted value)

The goal is to find the best values of b0 and b1 that minimize the overall error.

-
Question 4: Provide a real-world example where simple linear regression can be applied.

Answer:
- A simple example of using simple linear regression is predicting house prices based on the size of the house. Here, the independent variable (x) is the size of the house in square feet, and the dependent variable (y) is the price. The regression model can estimate how much the price increases for every additional square foot of area. This helps property dealers and buyers estimate a reasonable market price.

Question 5: What is the method of least squares in linear regression?

Answer:
- The method of least squares is used to find the best-fitting line for the given data points. It works by minimizing the sum of the squared differences between the observed values and the predicted values.
Mathematically, it minimizes:

Sum of (Actual value – Predicted value)²

The result gives the best estimates for the slope (b1) and intercept (b0) of the line. This approach ensures that the line fits the data as closely as possible, on average.

 - Question 6: What is Logistic Regression? How does it differ from Linear Regression?

Answer:
- Logistic Regression is a statistical method used for classification problems, where the dependent variable is categorical (for example, Yes/No, 0/1). It predicts the probability that an observation belongs to a particular category using a logistic function, which always gives values between 0 and 1.

Differences from Linear Regression:

Linear Regression predicts continuous values, while Logistic Regression predicts probabilities or categories.

Logistic Regression uses the logistic (sigmoid) function instead of a straight line.

Coefficients in Logistic Regression are interpreted in terms of odds and probabilities rather than direct changes in value.

Logistic Regression uses Maximum Likelihood Estimation instead of Least Squares for model fitting.

Question 7: Name and briefly describe three common evaluation metrics for regression models.

Answer:

- Mean Squared Error (MSE): Measures the average of squared differences between actual and predicted values. It penalizes large errors more heavily.

Root Mean Squared Error (RMSE): The square root of MSE. It shows the average distance between predicted and actual values in the same units as the dependent variable.

Mean Absolute Error (MAE): Measures the average of the absolute differences between actual and predicted values. It is less sensitive to outliers compared to MSE.

Question 8: What is the purpose of the R-squared metric in regression analysis?

Answer:
- R-squared (R²) measures how well the regression model explains the variation in the dependent variable. It shows the proportion of the variance in the dependent variable that is predictable from the independent variable.

Formula:
R² = 1 – (Sum of Squared Residuals / Total Sum of Squares)

The value of R² ranges between 0 and 1. A higher R² indicates that the model fits the data better. However, it does not necessarily mean that the model is the best or that the relationship is causal.

Question 9: Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.

In [1]:
# Simple Linear Regression using scikit-learn

import numpy as np
from sklearn.linear_model import LinearRegression


X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 5, 4, 5])


model = LinearRegression()
model.fit(X, y)

slope = model.coef_[0]
intercept = model.intercept_

print("Slope:", slope)
print("Intercept:", intercept)


Slope: 0.6
Intercept: 2.2


Question 10: How do you interpret the coefficients in a simple linear regression model?

Answer:

- Intercept (b0): This is the predicted value of the dependent variable (y) when the independent variable (x) is 0. It represents the baseline level of y.

Slope (b1): This shows how much the dependent variable changes for every one-unit increase in the independent variable.

For example, if b1 = 0.6, it means that for every 1-unit increase in x, the value of y increases by 0.6 units on average.
A positive slope means a direct relationship (as x increases, y increases), while a negative slope means an inverse relationship.