Question 1 : What is Simple Linear Regression (SLR)? Explain its purpose.
- Simple Linear Regression (SLR) is a statistical method used to model the relationship between two variables: One independent variable (predictor), One dependent variable (response)
- SLR tries to fit a straight line (called a regression line) through a set of data points so that the line best describes the relationship between the independent variable (X) and the dependent variable (Y).
- PURPOSE::
- Prediction: Estimate or predict the value of the dependent variable (Y) for a given value of the independent variable (X).
- Understanding Relationships: Determine whether a relationship exists between X and Y, and if so, how strong and in what direction (positive or negative).
- Quantifying Impact: Measure how much change in Y is associated with a unit change in X (interpreted through the slope)
- Trend Analysis: Identify patterns or trends in data, especially over time (e.g., sales vs. advertising spend).


Question 2: What are the key assumptions of Simple Linear Regression?
- Key Assumptions of SLR:
 - 1)Linearity: The relationship between X and Y is linear.
 - 2)Independence: Observations are independent.
 - 3)Homoscedasticity: Constant variance of errors.
 - 4)Normality: Errors are normally distributed.

Question 3: Write the mathematical equation for a simple linear regression model and explain each term
- The mathematical equation for a Simple Linear Regression (SLR) model is:

𝑌=
𝛽
0
+
𝛽
1
𝑋

- Y: Dependent variable (also called the response or outcome variable). This is what we are trying to predict or explain.
- X: Independent variable (also called the predictor or explanatory variable). This is the variable we use to make predictions.

- 𝛽0: Intercept,The expected value of Y when X=0. It represents the point where the regression line crosses the Y-axis.

- 𝛽1: Slope coefficient. It shows the rate of change in Y for a one-unit increase in X. It indicates the strength and direction of the realationship between X and Y.


Question 4: Provide a real-world example where simple linear regression can be
applied
- A school wants to understand the relationship between the number of hours students study and their scores on a final exam.
- Application of Simple Linear Regression: Independent Variable (X): Number of hours studied; Dependent Variable (Y): Exam score (percentage)
- Purpose: Using simple linear regression, the school can: Predict a student’s expected exam score based on how many hours they studied.Measure how much an additional hour of studying is expected to increase the exam score.
Evaluate the strength and direction of the relationship between study time and performance.
- Regression Equation Example:Exam Score=𝛽0+β1×Hours Studied

Suppose after analysis, the estimated model is:𝑌^=50+5𝑋
Here,Y^ = predicted exam score,
X = hours studied, 50 = intercept (baseline score if no study hours), 5 = slope (score increases by 5 points for each additional hour studied)
- Interpretation: If a student studies for 4 hours, the predicted score is:
Y^=50+5(4)=70
- So, a student who studies 4 hours is expected to score 70% on the exam

Question 5: What is the method of least squares in linear regression?
- The method of least squares is a standard approach in linear regression for finding the best-fitting line (or hyperplane in higher dimensions) through a set of data points by minimizing the sum of the squares of the residuals.
- Residuals are the differences between the observed values and the predicted values from the regression model.

For a data point (xi,yi), and a linear model y=mx+b (or more generally y=𝛽
0+β1x), the residual is:
residual
i=yi-y^=yi-(β0+β1xi)
- Goal of Least Squares:1)Minimize the sum of squared residuals:
RSS=i=1∑n(yi−(β0+β1xi))2
- This is known as the Residual Sum of Squares (RSS). The method of least squares chooses the coefficients 𝛽0 and β1 that minimize this sum.

Question 6: What is Logistic Regression? How does it differ from Linear Regression?
- Logistic Regression is a classification algorithm, not a regression one—despite its name. It's used to predict categorical outcomes, typically binary (like yes/no, 0/1, spam/not spam, etc.).
- It models the probability that a given input belongs to a certain class.
- Instead of predicting a continuous value (like in linear regression), it predicts a probability between 0 and 1.
- It uses the logistic (sigmoid) function to "squash" the output of a linear equation into a 0–1 range.
- DIIFERENCE BETWEEN LINEAR AND LOGISTIC:
- LINEAR: 1)Its goal is to predict a continuous value 2)Its output is any real number 3)It is used for regression problems 4)The loss function is Mean Squared Error(MSE) 5)It directly models relationship between input and output.
- Logistic: 1)Its goal is to predict a categorical label (usually binary) 2)Its output is the probability between 0 and 1. 3)It is used for Classification problems 4)The loss function is Log Loss (Cross-Entropy Loss) 5)It models relationship between input and log-odds of output


Question 7: Name and briefly describe three common evaluation metrics for regression models
- 1)Mean Absolute Error (MAE): MAE measures the average absolute difference between the predicted values and the actual values.
- Formula: MAE= n∑ i=1 ∣yi−y^i∣
- Interpretation: Lower values indicate better model performance. It's easy to interpret but doesn't heavily penalize large errors.
- 2)Mean Squared Error (MSE): MSE calculates the average of the squared differences between predicted and actual values.
- Formula:MSE=n∑i=1(yi−y^i)2
- Interpretation: Like MAE, lower values are better. It penalizes larger errors more than MAE due to squaring, making it sensitive to outliers.
- 3)R-squared (R²): R² measures the proportion of variance in the dependent variable that is predictable from the independent variables.
- Formula:R2=1−∑(yi−yˉ)2∑(yi−y^i)2
- Interpretation: Ranges from 0 to 1 (can be negative for poor models). A value closer to 1 means the model explains more of the variance.

Question 8: What is the purpose of the R-squared metric in regression analysis?
- The purpose of the R-squared (R²) metric in regression analysis is to measure how well the independent variables explain the variability in the dependent variable.
- R-squared tells you how much of the variation in the outcome (dependent variable) is accounted for by the model.
- Range: R² ranges from 0 to 1 (but can be negative if the model is worse than just predicting the mean).
R² = 1 → Perfect fit (the model explains all variability in the data).
R² = 0 → The model explains none of the variability.
- Interpretation:If R² = 0.85, it means 85% of the variance in the target variable is explained by the model, and 15% remains unexplained (due to randomness or missing variables).
- It helps evaluate the explanatory power of the model.
- Especially useful for comparing different models fitted on the same dataset.

In [3]:

# Question 9: Write Python code to fit a simple linear regression model using scikit-learn
 # and print the slope and intercept

from sklearn.linear_model import LinearRegression
import numpy as np

# Example data
# X: feature (independent variable)
# y: target (dependent variable)
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 6, 4, 8])


model = LinearRegression()
model.fit(X, y)

print(f"Slope (coefficient): {model.coef_[0]}")
print(f"Intercept: {model.intercept_}")


Slope (coefficient): 1.2000000000000002
Intercept: 1.1999999999999993


Question 10: How do you interpret the coefficients in a simple linear regression model?
- In a simple linear regression model, the goal is to describe the relationship between a dependent variable Y and a single independent variable X using a straight line.
Y=β0+β1X

 Where: Y is the dependent variable (outcome),X is the independent variable (predictor),𝛽0 is the intercept,𝛽1 is the slope (coefficient),
- Interpreting the Coefficients
- Intercept (𝛽0):This is the expected value of Y when X=0.
Interpretation depends on whether X=0 is within the range of observed data:if X=0 is in or near the range of the data; Less meaningful or even misleading if X=0 is outside the data range (extrapolation).
- Slope (𝛽1):This is the change in the expected value of Y for a one-unit increase in X.For each additional unit increase in X, Y is expected to increase (or decrease, if 𝛽1<0) by 𝛽1 units, on average.

Example:If 𝛽1=2, then for each 1-unit increase in X, Y increases by 2 units.