# **Supervised Learning: Regression Models and Performance Metrics Assignment**
---
---

## Question 1 : What is Simple Linear Regression (SLR)? Explain its purpose.

### Answer:Simple Linear Regression (SLR) is a statistical method used to study the relationship between **one independent variable (X)** and **one dependent variable (Y)**. It shows how the value of Y changes when X changes.

It is represented by the equation:

[
Y = a + bX
]

Where:

* **Y** = Dependent variable
* **X** = Independent variable
* **a** = Intercept
* **b** = Slope

### **Purpose of Simple Linear Regression**

1. **To find the relationship between two variables**
   It helps to understand whether and how two variables are related.

2. **To predict future values**
   SLR is used to predict the value of Y for a given value of X.
   Example: Predicting marks based on study hours.

3. **To measure the effect of X on Y**
   The slope tells how much Y changes for one unit change in X.

4. **To identify trends in data**
   It helps in identifying increasing or decreasing trends.

5. **To support decision making**
   Used in business, economics, and research for making data-based decisions.

### **Example**

If X is the number of hours studied and Y is exam marks, SLR helps estimate how marks increase with more study hours.

### **Conclusion**

Simple Linear Regression is an easy and useful technique to analyze, understand, and predict the relationship between two variables. It is widely used because it is simple, clear, and effective.

---

## Question 2: What are the key assumptions of Simple Linear Regression?

### Answer:Simple Linear Regression works properly only when certain assumptions are satisfied. These assumptions ensure that the results are reliable and accurate.

### **1. Linearity**

There should be a **linear relationship** between the independent variable (X) and the dependent variable (Y).
This means that changes in X cause proportional changes in Y.

### **2. Independence**

The observations should be **independent of each other**.
One data point should not influence another.

### **3. Homoscedasticity**

The **variance of errors** should be constant for all values of X.
In simple words, the spread of data points around the regression line should be equal.

### **4. Normality of Errors**

The error terms (residuals) should be **normally distributed**.
This assumption is important for valid statistical inference.

### **5. No Significant Outliers**

There should be **no extreme outliers** that strongly affect the regression line.

### **Conclusion**

If these assumptions are satisfied, Simple Linear Regression provides valid predictions and accurate interpretation of results.

---

## Question 3: Write the mathematical equation for a simple linear regression model and explain each term.

### Answer: **Mathematical Equation of Simple Linear Regression (SLR)**

The mathematical equation of a Simple Linear Regression model is:

[
Y = a + bX + \varepsilon
]

## **Explanation of Each Term**

### **1. Y (Dependent Variable)**

* Y is the **output or response variable**.
* It is the variable we want to **predict or explain**.
* Example: marks, salary, sales.

### **2. X (Independent Variable)**

* X is the **input or predictor variable**.
* It is used to predict the value of Y.
* Example: hours studied, work experience, advertising cost.

### **3. a (Intercept)**

* The intercept is the **value of Y when X = 0**.
* It shows where the regression line **cuts the Y-axis**.
* It provides the baseline level of Y.

### **4. b (Slope / Regression Coefficient)**

* The slope shows the **change in Y for a one-unit change in X**.
* If b is positive, Y increases as X increases.
* If b is negative, Y decreases as X increases.

### **5. ε (Error Term)**

* The error term represents **random variation** not explained by X.
* It includes effects of other factors, measurement errors, and noise.

## **Conclusion**

The Simple Linear Regression equation explains how the dependent variable (Y) is related to the independent variable (X). Each term plays an important role in prediction and understanding the relationship between variables.

---

## Question 4: Provide a real-world example where simple linear regression can be applied.

### Answer:## **Real-World Example of Simple Linear Regression**

A common real-world example of **Simple Linear Regression** is studying the relationship between **hours studied** and **exam marks**.

### **Example Explanation**

* **Independent Variable (X):** Number of hours studied
* **Dependent Variable (Y):** Marks obtained in the exam

Simple Linear Regression can be used to analyze how exam marks change as study hours increase. By fitting a regression line, we can **predict the expected marks** for a student based on the number of hours they study.

## **1. Business Example**

* **X (Independent Variable):** Advertising expenditure
* **Y (Dependent Variable):** Sales revenue

Simple Linear Regression is used to study how sales change when advertising spending increases. Businesses use it to **predict future sales** and plan marketing budgets.

## **2. Medical Example**

* **X (Independent Variable):** Dosage of medicine
* **Y (Dependent Variable):** Reduction in blood pressure

Doctors and researchers use Simple Linear Regression to understand how changes in medicine dosage affect patient health outcomes.

## **3. Economics Example**

* **X (Independent Variable):** Years of work experience
* **Y (Dependent Variable):** Monthly income

Simple Linear Regression helps analyze how income increases with experience and is used for **salary prediction and policy analysis**.

### **Conclusion**

These examples show that Simple Linear Regression is widely used in **business, medicine, education, and economics** to analyze relationships and make predictions.

### **Application**

* Helps teachers understand the impact of study time on performance
* Helps students plan their study schedule
* Used to predict future exam results

---


## Question 5: What is the method of least squares in linear regression?

### Answer:**Method of Least Squares in Linear Regression**

The **method of least squares** is a mathematical technique used in linear regression to find the **best-fitting straight line** that represents the relationship between the independent variable (X) and the dependent variable (Y).

### **Explanation**

In real data, the observed values of Y do not lie exactly on a straight line. The difference between the **actual value (Y)** and the **predicted value (Ŷ)** is called the **error or residual**.

[
\text{Residual} = Y - \hat{Y}
]

The method of least squares works by:

1. Calculating the residuals for all data points
2. Squaring each residual (to remove negative signs)
3. Adding all the squared residuals
4. Choosing the regression line for which this **sum of squared residuals is minimum**

Mathematically, it minimizes:

[
\sum (Y - \hat{Y})^2
]

### **Why Squaring the Errors is Important**

* It ensures all errors are positive
* It gives more weight to larger errors
* It provides a unique and optimal solution for the regression line

### **Role in Linear Regression**

Using the least squares method, we calculate the best values of:

* **Intercept (a)**
* **Slope (b)**

These values make the regression line as close as possible to all data points.

### **Purpose of the Least Squares Method**

1. To find the most accurate regression line
2. To reduce overall prediction error
3. To improve reliability of predictions
4. To provide a clear measure of model fit

### **Conclusion**

The method of least squares is the foundation of linear regression. By minimizing the total squared errors, it ensures the best possible fit of the regression line and produces reliable and meaningful predictions.

---

## Question 6: What is Logistic Regression? How does it differ from Linear Regression?

### Answer:Logistic Regression is a statistical technique used when the **dependent variable is categorical**, usually **binary** in nature.
It is mainly used for **classification problems**, such as predicting **Yes/No**, **True/False**, or **0/1** outcomes.

Instead of predicting a direct numerical value, Logistic Regression predicts the **probability** that an event will occur.

The model uses the **logistic (sigmoid) function**, which gives values between 0 and 1.

### **How Logistic Regression Differs from Linear Regression**

| Basis              | Linear Regression       | Logistic Regression   |
| ------------------ | ----------------------- | --------------------- |
| Type of problem    | Regression (prediction) | Classification        |
| Dependent variable | Continuous              | Categorical (Binary)  |
| Output             | Any real value          | Probability (0 to 1)  |
| Equation form      | (Y = a + bX)            | Uses sigmoid function |
| Curve shape        | Straight line           | S-shaped (Sigmoid)    |
| Example            | Predicting salary       | Predicting pass/fail  |


### **Example**

* **Linear Regression:** Predicting house price based on area
* **Logistic Regression:** Predicting whether a person has a disease (Yes/No)

### **Conclusion**

Linear Regression is used to predict **continuous values**, while Logistic Regression is used to predict **categorical outcomes**. Although the names are similar, their applications and outputs are different.

---

## Question 7: Name and briefly describe three common evaluation metrics for regression models.

### Answer: **Evaluation Metrics for Regression Models**

Evaluation metrics are used to measure how well a regression model performs by comparing **actual values** with **predicted values**.

### **1. Mean Absolute Error (MAE)**

* MAE is the **average of the absolute differences** between actual and predicted values.
* It shows the **average size of errors** without considering their direction.

[
MAE = \frac{1}{n}\sum |Y - \hat{Y}|
]

**Interpretation:** Lower MAE means better model performance.


### **2. Mean Squared Error (MSE)**

* MSE is the **average of the squared differences** between actual and predicted values.
* It penalizes larger errors more heavily.

[
MSE = \frac{1}{n}\sum (Y - \hat{Y})^2
]

**Interpretation:** Smaller MSE indicates a more accurate model.

### **3. R-squared (R²)**

* R² measures the **proportion of variance** in the dependent variable explained by the model.
* Its value lies between 0 and 1.

[
R^2 = 1 - \frac{SS_{res}}{SS_{tot}}
]

**Interpretation:** Higher R² means the model explains the data better.

### **Conclusion**

MAE, MSE, and R² are commonly used metrics to evaluate regression models. Together, they help assess **accuracy, error magnitude, and model effectiveness**.

---

## Question 8: What is the purpose of the R-squared metric in regression analysis?

### Answer: **Purpose of the R-squared (R²) Metric in Regression Analysis**

The **R-squared (R²)** metric is used to measure how well a regression model explains the relationship between the **independent variable(s)** and the **dependent variable**.

### **Main Purposes of R-squared**

1. **Measures Explained Variance**
   R² shows the **proportion of variation** in the dependent variable that is explained by the regression model.

2. **Evaluates Model Fit**
   It helps determine how well the regression line fits the observed data.

3. **Compares Models**
   R² is used to compare different regression models; a higher R² indicates a better-fitting model.

4. **Indicates Predictive Strength**
   A higher R² value suggests stronger predictive power of the model.

### **Value Range**

* R² ranges from **0 to 1**
* **R² = 0** → model explains none of the variation
* **R² = 1** → model explains all the variation


### **Conclusion**

The purpose of the R-squared metric is to assess how effectively a regression model explains the variability in the dependent variable and how well the model fits the data.

---

## Question 9: Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.


In [1]:
# Simple Linear Regression using scikit-learn

import numpy as np
from sklearn.linear_model import LinearRegression

# Sample data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)   # Independent variable
y = np.array([2, 4, 6, 8, 10])               # Dependent variable

# Create and fit the model
model = LinearRegression()
model.fit(X, y)

# Print slope and intercept
print("Slope:", model.coef_[0])
print("Intercept:", model.intercept_)


Slope: 2.0000000000000004
Intercept: -1.7763568394002505e-15


---

## Question 10: How do you interpret the coefficients in a simple linear regression model?

### Answer: **Interpretation of Coefficients in Simple Linear Regression**

In a **Simple Linear Regression model**, the equation is:

[
Y = a + bX
]

The model has **two coefficients**: the **intercept (a)** and the **slope (b)**.

### **1. Intercept (a)**

* The intercept represents the **expected value of Y when X = 0**.
* It shows the **starting or baseline value** of the dependent variable.
* Sometimes it may not have a practical meaning, but it is important for forming the regression line.

**Example:**
If the intercept is 10, it means when X = 0, the value of Y is 10.

### **2. Slope (b)**

* The slope indicates the **change in Y for a one-unit increase in X**.
* If the slope is positive, Y increases as X increases.
* If the slope is negative, Y decreases as X increases.

**Example:**
If the slope is 2, it means Y increases by 2 units for every 1-unit increase in X.

### **Conclusion**

The coefficients in a simple linear regression model explain the **direction**, **strength**, and **nature** of the relationship between the independent and dependent variables, making the model easy to interpret and useful for prediction.

---