# Supervised Learning: Regressionn Models and Performance Metrics

1. What is Simple Linear Regression (SLR)? Explain its purpose.
- Simple Linear Regression is a simple way to understand how two things are related. It looks at one factor (like study hours) and sees how it affects another factor (like marks). We collect data, then draw the “best possible straight line” through it. That line shows the general trend — whether things move up together, move in opposite directions, or hardly change at all. Once the line is drawn, we can also use it to make rough predictions, like guessing marks based on study time. In short, Simple Linear Regression helps us see the pattern between two variables and use that pattern to explain or predict what might happen next.

2. What are the key assumptions of Simple Linear Regression?
- Simple Linear Regression works under several important assumptions. First, it assumes that there is a linear relationship between the independent variable (X) and the dependent variable (Y). The errors or residuals should be independent of each other and should have constant variance, a property called homoscedasticity. It is also assumed that these residuals are approximately normally distributed, especially when we want to perform hypothesis testing or build confidence intervals. The model further assumes that there are no serious outliers that can unduly influence the regression line, and that the independent variable is measured reliably and the model is correctly specified. When these assumptions are reasonably satisfied, the results from Simple Linear Regression become more accurate, meaningful, and trustworthy.

3. Write the mathematical equation for a simple linear regression model and
explain each term.
- he mathematical equation for a simple linear regression model is Y = a + bX + e. In simple words, this equation describes a straight line that best fits the data. Here, Y is the value we want to predict (like marks), and X is the factor we use to predict it (like study hours). The term a is called the intercept — it tells us the value of Y when X is zero, or where the line starts on the Y-axis. The term b is the slope — it shows how much Y changes when X increases by one unit; for example, how many extra marks are gained for each extra hour of study. Finally, e represents the error (or residual), which is the small difference between the predicted value and the actual value, because real-life data is never perfectly on the line. Altogether, this equation helps us understand the relationship between two variables and make sensible predictions.

4. Provide a real-world example where simple linear regression can be
applied.
- A real-world example of simple linear regression is predicting students’ marks from the number of hours they study. Suppose we collect data from many students — how long they studied (X) and how many marks they scored (Y). When we plot this data, we usually see that students who study more tend to score higher. Using simple linear regression, we draw the best-fitting straight line through these points. That line helps us understand the pattern and also predict, for example, what marks a student might get if they study 2 hours, 4 hours, or 6 hours. The model won’t be perfect — because motivation, teaching quality, health, and many other factors also matter — but it still gives a useful estimate and shows the general trend: more study time is usually linked with better marks.

5. What is the method of least squares in linear regression?
- The method of least squares is the technique used in linear regression to find the “best-fitting” straight line through the data. When we draw a regression line, every data point usually doesn’t fall exactly on the line — there is some gap, called the error or residual (the difference between the actual value and the predicted value). The least squares method works by squaring all those gaps and then choosing the line for which the total of these squared errors is as small as possible. We square the errors so negatives don’t cancel positives and bigger mistakes are penalized more. In simple words, least squares helps us draw the line that stays as close as possible to all data points overall, giving the most accurate and balanced fit.

6. What is Logistic Regression? How does it differ from Linear Regression?
- Logistic Regression is a statistical method used when the outcome is categorical — usually questions like yes/no, pass/fail, disease/no disease, spam/not spam. Instead of drawing a straight line, logistic regression predicts the probability that something will happen (between 0 and 1). It uses an S-shaped curve called the logistic (sigmoid) function to keep predictions within that range. Based on the probability, we then decide the class (for example, if probability ≥ 0.5 → “yes”, otherwise “no”).

Linear Regression, on the other hand, is used when the outcome is numeric, such as marks, salary, height, price, or temperature. It fits a straight line to estimate a continuous value and predictions can be any number — even negative or very large.

In simple words:
-      Linear Regression predicts a number.
 -      Logistic Regression predicts a probability (and then a category).

7. Name and briefly describe three common evaluation metrics for regression
models.
- 1️ Mean Absolute Error (MAE)

Measures the average size of the mistakes (in absolute terms).

It tells you, on average, how far predictions are from the true values.

Easy to interpret: “On average, my predictions are off by about 3 units.”

- 2️ Mean Squared Error (MSE)

Squares the errors before averaging them.

Because of squaring, large mistakes are punished more heavily.

Useful when big errors are especially bad — but harder to interpret because of the squared units.

- 3️ R-squared (Coefficient of Determination)

Tells you how much of the variation in the target your model explains.

Ranges from 0 to 1 (higher is better).

Example: R² = 0.80 means the model explains 80% of the variation in the data.

8. What is the purpose of the R-squared metric in regression analysis?
- The purpose of the R-squared (R²) metric in regression analysis is to measure how well the regression model explains the variation in the dependent variable. In other words, it tells us how much of the changes in Y can be explained by X (or by the predictors in the model). R² ranges from 0 to 1: a value close to 0 means the model explains very little and does not fit the data well, while a value close to 1 means the model explains most of the variation and fits the data well. For example, an R² of 0.75 means that 75% of the variation in the outcome is explained by the model and 25% is still unexplained. So, the main purpose of R-squared is to evaluate how good or strong the regression model is at describing the relationship between variables.

10. How do you interpret the coefficients in a simple linear regression model?
- In a simple linear regression model, the coefficients tell us how the variables are related. The equation is usually written as Y = a + bX, where a is the intercept and b is the slope. The intercept (a) represents the predicted value of Y when X is zero — it shows where the regression line starts on the Y-axis. The slope (b) tells us how much Y is expected to change when X increases by one unit. If the slope is positive, Y increases as X increases; if it’s negative, Y decreases as X increases. In simple words, the coefficients explain the direction and strength of the relationship between the predictor and the outcome.

In [1]:
#Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.
from sklearn.linear_model import LinearRegression
import numpy as np

# Example data
# X must be 2-D (n_samples, n_features)
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 5, 4, 5])

# Create and fit the model
model = LinearRegression()
model.fit(X, y)

# Print slope (coefficient) and intercept
print("Slope:", model.coef_[0])
print("Intercept:", model.intercept_)


Slope: 0.6
Intercept: 2.2
