# Supervised Learning: Regression Models and Performance Metrics Solution 

Q 1. What is Simple Linear Regression (SLR)? Explain its purpose.

->   Simple Linear Regression is a statistical and supervised learning method used to study how one independent variable affects one dependent variable. It tries to draw a straight line that best fits the given data points and represents the overall trend. The model helps in understanding whether the relationship between two variables is positive, negative, or weak.

The purpose of SLR is not only to make predictions but also to analyze how strongly one factor influences another. It is widely used in research, academic studies, economics, and business forecasting. In many cases, SLR provides a simple yet powerful explanation of real-world behavior by converting raw data into meaningful relationships. This makes it one of the most foundational techniques in machine learning and statistics.

Q 2. What are the key assumptions of Simple Linear Regression? 

->   SLR depends on certain assumptions that ensure the model gives correct predictions. The first assumption is that the dependent and independent variables must have a linear relationship. Without linearity, the regression line will not represent the true pattern.
Another assumption is that the residuals should follow a normal distribution, which helps in estimating accurate confidence intervals. Homoscedasticity, meaning constant variance of errors, is also essential because unequal variance makes the model unstable.

The errors should be independent, meaning the error at one data point should not influence the error at another. Lastly, the independent variable must be measured correctly because errors in X can distort the regression results. These assumptions together ensure a strong and reliable regression model.

Q 3. Write the mathematical equation for a simple linear regression model and explain each term. 

->  The equation for a simple linear regression model is:
Y = β0 + β1X + ε

Here, Y is the dependent variable whose value we want to predict based on X. X represents the independent variable that influences Y. β0, the intercept, is the point where the regression line touches the Y-axis, showing Y’s value when X equals zero.

β1 is the slope of the regression line and indicates how much Y changes when X increases by one unit. The term ε represents the error or disturbance in the model, which captures the effects of factors not included in the equation.

This equation helps convert real-life relationships into a mathematical form. It allows us to analyze trends, calculate predictions, and interpret data more clearly. The simplicity of this equation is what makes SLR widely used in various fields.

Q 4. Provide a real-world example where simple linear regression can be applied.

->  A practical example of simple linear regression is predicting house prices based on the area of the house. In most cases, larger houses have higher prices, so the size acts as the independent variable, and the price is the dependent variable. By taking historical data of house areas and prices, a regression line can be created to show the overall pattern.

This line can then be used to estimate the price of any new house just by knowing its size. Apart from real estate, SLR can also be used in education to predict student performance based on study hours.
Similarly, in business, companies use it to forecast sales based on advertising expenditure. These examples show how SLR helps in decision-making and planning using simple yet effective models.

Q 5. What is the method of least squares in linear regression? 

->  The method of least squares is a mathematical approach used to determine the best-fitting regression line for a dataset. It works by calculating the difference between actual values and predicted values, then squaring these errors. Squaring ensures that both positive and negative differences are treated equally and that larger errors receive greater penalty.

The regression line chosen by the model is the one for which the total squared error is the smallest. This method helps in reducing overall prediction errors and ensures accuracy.
Least squares is widely used because it is simple, efficient, and provides stable results even when data points vary significantly. It is the foundation of many advanced regression techniques as well. 

Q 6. What is Logistic Regression? How does it differ from Linear Regression?

->  Logistic Regression is a supervised learning algorithm that is mainly used for classification tasks rather than numerical prediction. It predicts the probability of an event occurring, usually in the form of binary outcomes such as 0 and 1. The model applies a sigmoid function that converts any input into a value between 0 and 1.

In contrast, Linear Regression predicts continuous values like height, salary, or temperature. Logistic Regression predicts categories like approved/rejected or pass/fail. Linear Regression uses a straight-line relationship, while Logistic Regression forms an S-shaped probability curve.

These differences make Logistic Regression suitable for situations where the goal is to classify data instead of predicting numerical outcomes. It is widely used in medical diagnosis, email spam filtering, and risk prediction.

Q 7. Name and briefly describe three common evaluation metrics for regression models. 

-> 1. Mean Absolute Error (MAE): MAE calculates the average of absolute errors between predicted and actual values. It is easy to understand because it shows real differences without squaring them.

2. Mean Squared Error (MSE): MSE takes the square of errors and then averages them. Squaring errors gives more importance to larger mistakes, which makes MSE sensitive to outliers.

3. R-squared: This metric tells how much of the variation in the dependent variable can be explained by the model. A higher R-squared value indicates a stronger relationship between the variables and a better fit of the model.
These metrics together help determine whether the model is accurate, reliable, and suitable for prediction.

Q 8. What is the purpose of the R-squared metric in regression analysis?

->  R-squared is used to measure the goodness of fit of a regression model. It shows what percentage of the total variation in the dependent variable is explained by the independent variable. A higher R-squared value indicates that the model fits the data well.
For example, an R-squared of 0.90 means that 90 percent of the change in the dependent variable can be understood through the model. This makes R-squared a very important indicator for evaluating model performance.

However, R-squared alone cannot judge whether a model is perfect; it must be studied along with other metrics. Still, it remains one of the most frequently used evaluation tools in regression analysis.

Q 9. Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept. 
(Include your Python code and output in the code box below.)

-> Python Code:

In [2]:
from sklearn.linear_model import LinearRegression
import numpy as np

# Sample data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 5, 4, 5])

# Creating the model
model = LinearRegression()
model.fit(X, y)

# Printing results
print("Slope:", model.coef_[0])
print("Intercept:", model.intercept_)

Slope: 0.6
Intercept: 2.2


Q 10. How do you interpret the coefficients in a simple linear regression model?

->   In a regression model, the slope and intercept tell us how the two variables are related. The slope (β1) explains how much the dependent variable changes when the independent variable increases by one unit. A positive slope means the dependent variable increases, while a negative slope means it decreases.

The intercept (β0) shows the expected value of the dependent variable when the independent variable is zero. Although the intercept may not always have real-world meaning, it is important for constructing the regression line.

These coefficients together help in understanding the trend, analyzing the strength of the relationship, and making predictions. They form the core interpretation of any simple linear regression model.