# **Supervised Learning: Regression Models and Performance Metrics**

1. What is Simple Linear Regression (SLR)? Explain its purpose.

    - Simple Linear Regression (SLR) is a statistical method used to study the relationship between *one independent variable (X)* and *one dependent variable (Y)*. It assumes that the relationship between X and Y can be represented by a straight line.

The mathematical form of Simple Linear Regression is:
[
Y = \beta_0 + \beta_1 X + \varepsilon
]
where:

* (Y) = dependent variable
* (X) = independent variable
* (\beta_0) = intercept
* (\beta_1) = slope of the line
* (\varepsilon) = error term

***Purpose of Simple Linear Regression:***

* To *predict* the value of the dependent variable based on the independent variable
* To *measure the strength and direction* of the relationship between two variables
* To understand how changes in the independent variable affect the dependent variable


2. What are the key assumptions of Simple Linear Regression?
    
     The key assumptions of Simple Linear Regression are:

      1. ***Linearity*** :
                 There is a linear relationship between the independent variable (X) and the dependent variable (Y).

      2. ***Independence of errors*** :
                The residuals (errors) are independent of each other.

      3. ***Homoscedasticity*** :
                The variance of the errors is constant for all values of X.

      4. ***Normality of errors*** :
                The error terms are normally distributed.

      5. ***No multicollinearity*** :
                Since SLR has only one independent variable, multicollinearity is not an issue (this assumption mainly applies to multiple regression).


3.  Write the mathematical equation for a simple linear regression model and
explain each term.

     The mathematical equation of a simple linear regression model is:

     [
     Y = \beta_0 + \beta_1 X + \varepsilon
     ]

**Where:**

* *(Y)* = Dependent variable (the variable to be predicted)
* *(X)* = Independent variable (the predictor variable)
* *(\beta_0)* = Intercept (value of Y when X = 0)
* *(\beta_1)* = Slope of the regression line (change in Y for a one-unit change in X)
* *(\varepsilon)* = Error term (difference between actual and predicted values)

4.  Provide a real-world example where simple linear regression can be
applied.
    
     - A real-world example of simple linear regression is *predicting house prices based on house size*.

  * **Independent variable (X)**: Size of the house (in square feet)
* **Dependent variable (Y)**: Price of the house

  Simple linear regression helps estimate how much the house price increases with an increase in house size.


5. What is the method of least squares in linear regression?
    
    - The ***method of least squares*** is a technique used to estimate the parameters of a linear regression model. It determines the best-fitting regression line by ***minimizing the sum of the squares of the residuals*** (errors).

      Residuals are the differences between the observed values and the predicted values.
      
      By minimizing these squared differences, the method ensures the most accurate and reliable regression line.


6. What is Logistic Regression? How does it differ from Linear Regression?
     
    - **Logistic Regression** is a statistical technique used for *classification problems, where the dependent variable is **binary** (e.g., Yes/No, 0/1, True/False). It estimates the ***probability*** of an event occurring using a *logistic (sigmoid) function*, which outputs values between 0 and 1.
    
     ***Differences between Logistic Regression and Linear Regression:***

| Linear Regression                         | Logistic Regression                            |
| ----------------------------------------- | ---------------------------------------------- |
| Used for predicting *continuous values* | Used for predicting *categorical outcomes*   |
| Output can be any real number             | Output is a probability between 0 and 1        |
| Uses a *straight line* equation         | Uses a *sigmoid (S-shaped) curve*            |
| Solved using *least squares method*     | Solved using *maximum likelihood estimation* |

---


7. Name and briefly describe three common evaluation metrics for regression
models.
   
   1. ***Mean Absolute Error (MAE):***
   Measures the average of the absolute differences between actual and predicted values. Lower MAE indicates better model performance.

   2. ***Mean Squared Error (MSE):***
   Calculates the average of the squared differences between actual and predicted values. It penalizes larger errors more heavily.

   3. ***R-squared (Coefficient of Determination):***
   Indicates how well the independent variable explains the variation in the dependent variable. Its value ranges from 0 to 1.

8. What is the purpose of the R-squared metric in regression analysis?
    
    - The *R-squared (R²)* metric measures how well a regression model explains the variability of the dependent variable.

* It represents the *proportion of variance in the dependent variable that is explained by the independent variable(s)*.
* The value of R-squared ranges from *0 to 1*.

  * *R² = 0* → the model explains none of the variability
  * *R² = 1* → the model explains all the variability

***Purpose of R-squared:***

* To evaluate the *goodness of fit* of a regression model
* To compare different regression models
* To understand how well the model explains the data


9. Write Python code to fit a simple linear regression model using scikit-learn
and print the slope and intercept.

In [1]:
import numpy as np
from sklearn.linear_model import LinearRegression

X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)  # Independent variable
Y = np.array([2, 4, 6, 8, 10])               # Dependent variable

model = LinearRegression()
model.fit(X, Y)

print("Slope (Coefficient):", model.coef_[0])
print("Intercept:", model.intercept_)

Slope (Coefficient): 2.0
Intercept: 0.0


*Explanation:*

* coef_ gives the *slope (β₁)* of the regression line
* intercept_ gives the *intercept (β₀)*
* The model learns the relationship ( Y = 2X )


10. How do you interpret the coefficients in a simple linear regression model?

     - In a simple linear regression model, the coefficients explain how the dependent variable changes with the independent variable. The model is written as:

     - [
     Y = \beta_0 + \beta_1 X + \varepsilon
     ]

* ***Intercept ((\beta_0))***:
  It represents the expected value of the dependent variable *Y when the independent variable X is zero*. It shows the baseline level of Y.

* ***Slope ((\beta_1))***:
  It indicates the *average change in Y for a one-unit increase in X*.

  * If (\beta_1 > 0), Y increases as X increases (positive relationship).
  * If (\beta_1 < 0), Y decreases as X increases (negative relationship).

   *In summary*, the intercept shows the starting point, while the slope shows the strength and direction of the relationship between X and Y.

   This interpretation helps in *understanding, predicting, and explaining* the relationship between variables in real-world problems.