## **Assignment Code: D-AG-008**

### **Supervised Learning: Regression Models and Performance Metrics | Solution**

### **Q1. What is Simple Linear Regression (SLR)? Explain its purpose.**

**Answer**:

Simple linear regression (SLR) is a statistical method used to find a linear relationship between two continuous variables. It aims to model how a dependent variable, also known as the response variable, is affected by a single independent variable, also called the explanatory or predictor variable. The model is "simple" because it only involves one independent variable. The goal is to create a straight line that best fits the data, which can then be used to predict future outcomes or understand the strength and direction of the relationship.

It assumes a linear relationship between them, represented by the equation:

                             ùëå =ùõΩ0 + ùõΩ1ùëã + ùúñ

Where:

* Y = dependent variable

* X = independent variable

* ùõΩ0 = intercept
* Œ≤1 = slope coefficient
œµ = error term

It is primarily used for prediction and trend estimation.

**Purpose**:

The main purpose of SLR is to predict the value of one variable (Y) based on the known value of another variable (X) and to quantify the strength and direction of their linear relationship. It‚Äôs also used to understand how changes in X are associated with changes in Y.


### **Q2. What are the key assumptions of Simple Linear Regression?**

**Answer**:

1. **Linearity**: Relationship between ùëã and ùëå is linear.

2. **Independence**: Observations are independent of each other.

3. **Homoscedasticity**: Constant variance of residuals across all values of ùëã.

4. **Normality of Errors**: Residuals (errors) are normally distributed.

5. **No Multicollinearity**: Not applicable in simple regression (only one predictor).

### **Q3. Write the mathematical equation for a simple linear regression model and explain each term**.

**Answer**:

The mathematical equation for a Simple Linear Regression (SLR) model is:

$$
Y = \beta_0 + \beta_1 X + \epsilon
$$

**Explanation of each term:**

- **Y** ‚Üí Dependent variable (the target or response variable we want to predict)  
- **X** ‚Üí Independent variable (the predictor or input variable)  
- **Œ≤‚ÇÄ (beta zero)** ‚Üí Intercept, the predicted value of Y when X = 0  
- **Œ≤‚ÇÅ (beta one)** ‚Üí Slope, the change in Y for a one-unit change in X  
- **Œµ (epsilon)** ‚Üí Error term, representing the difference between actual and predicted values



### **Q4. Provide a real-world example where simple linear regression can be applied**.

**Answer**:

Predicting **house prices** based on **square footage**.

**Explanation:**  
In this case,  
- **Dependent variable (Y):** House price  
- **Independent variable (X):** Area of the house (in square feet)

A simple linear regression model can be used to understand how much the price of a house increases for each additional square foot of area.


### **Q5. What is the method of least squares in linear regression?**

**Answer**:

The **method of least squares** is a mathematical technique used to find the best-fitting line in linear regression.  
It works by **minimizing the sum of the squared differences** between the actual values (Y) and the predicted values (≈∂) from the model.

Mathematically, it minimizes the following:

$$
\text{SSE} = \sum_{i=1}^{n} (Y_i - \hat{Y_i})^2
$$

Where:  
- **( Y_i \)**: Actual observed value  
- **( \hat{Y_i} \)**: Predicted value from the regression line  
- **( n \)**: Number of observations  

By minimizing this sum of squared errors (SSE), we obtain the line that best represents the data.

### **Q6. What is Logistic Regression? How does it differ from Linear Regression?**

**Answer**:

**Logistic Regression** is a **statistical model used for binary classification** problems ‚Äî that is, problems where the outcome variable has two possible values (e.g., yes/no, 0/1, true/false). It predicts the **probability** that a given input belongs to a particular category.


### **Key Differences Between Logistic Regression and Linear Regression:**

| Feature           | **Linear Regression**                                  | **Logistic Regression**                                       |
| ----------------- | ------------------------------------------------------ | ------------------------------------------------------------- |
| **Purpose**       | Predicts **continuous** numerical values               | Predicts **probability** of a binary outcome (classification) |
| **Output**        | Real numbers (can be any value from -‚àû to +‚àû)          | Values between **0 and 1** (probabilities)                    |
| **Function Used** | Linear function: `y = Œ≤‚ÇÄ + Œ≤‚ÇÅx + ... + Œ≤‚Çôx‚Çô`           | Logistic (sigmoid) function: `P = 1 / (1 + e^-(Œ≤‚ÇÄ + Œ≤‚ÇÅx))`    |
| **Linearity**     | Assumes a linear relationship between input and output | Models a **log-odds linear** relationship                     |
| **Loss Function** | Mean Squared Error (MSE)                               | Binary Cross-Entropy (Log Loss)                               |
| **Used For**      | Regression tasks (e.g., predicting price, temperature) | Classification tasks (e.g., spam detection, churn prediction) |

---

### **Q7. Name and briefly describe three common evaluation metrics for regression models**.

**Answer**:

1. **Mean Absolute Error (MAE):**  
   Measures the average magnitude of errors in a set of predictions, without considering their direction.  
   \[
   MAE = \frac{1}{n} \sum |Y_i - \hat{Y_i}|
   \]

2. **Mean Squared Error (MSE):**  
   Measures the average of the squares of the errors ‚Äî gives more weight to larger errors.  
   \[
   MSE = \frac{1}{n} \sum (Y_i - \hat{Y_i})^2
   \]

3. **R-squared (R¬≤):**  
   Indicates how well the independent variable(s) explain the variability in the dependent variable.  
   Ranges from 0 to 1, where higher values indicate a better fit.


### **Q8. What is the purpose of the R-squared metric in regression analysis?**

**Answer**:

**Answer:**

The **R-squared (R¬≤)** metric measures how well the independent variable(s) explain the variation in the dependent variable in a regression model.

It represents the **proportion of the variance** in the dependent variable that is predictable from the independent variable(s).

$$
R^2 = 1 - \frac{SS_{res}}{SS_{tot}}
$$

Where:  
- **( SS_{res} \)**: Sum of squared residuals (errors)  
- **( SS_{tot} \)**: Total sum of squares  

**Interpretation:**  
- **( R^2 = 1 \)**: Perfect fit (model explains all variability in Y)  
- **( R^2 = 0 \)**: Model explains none of the variability in Y  

In short, R-squared helps determine **how well the regression model fits the data**.

### **Q9. Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept**.

**Answer**:


In [1]:
from sklearn.linear_model import LinearRegression
import numpy as np

# Example Data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 5, 4, 5])

model = LinearRegression()
model.fit(X, y)

print("Slope (Œ≤1):", model.coef_[0])
print("Intercept (Œ≤0):", model.intercept_)

Slope (Œ≤1): 0.6
Intercept (Œ≤0): 2.2


### **Q10. How do you interpret the coefficients in a simple linear regression model?**

**Answer**:

In a simple linear regression model:

$$
Y = \beta_0 + \beta_1 X + \epsilon
$$

- **Intercept (Œ≤‚ÇÄ):**  
  The predicted value of the dependent variable (Y) when the independent variable (X) is 0.  
  It represents the point where the regression line crosses the Y-axis.

- **Slope (Œ≤‚ÇÅ):**  
  The average change in the dependent variable (Y) for a one-unit increase in the independent variable (X).  
  - If **Œ≤‚ÇÅ > 0**, there is a **positive relationship** (Y increases as X increases).  
  - If **Œ≤‚ÇÅ < 0**, there is a **negative relationship** (Y decreases as X increases).

In short, **Œ≤‚ÇÄ gives the starting point**, and **Œ≤‚ÇÅ shows the strength and direction of the relationship** between X and Y.