# Supervised Learning: Regression
Models and Performance Metrics

1.  What is Simple Linear Regression (SLR)? Explain its purpose.
- Simple Linear Regression (SLR) is a statistical and machine-learning technique used to model the relationship between one independent variable (X) and one dependent variable (Y) by fitting a straight line to the data.
- SLR assumes a linear relationship between X and Y and is represented by the equation:
              Y=Œ≤0‚Äã+Œ≤1‚ÄãX+Œµ
Where:


Y = dependent (target) variable


X = independent (predictor) variable


Œ≤‚ÇÄ (intercept) = value of Y when X = 0


Œ≤‚ÇÅ (slope) = change in Y for a one-unit change in X

Œµ (error term) = random error or noise



- Purpose of Simple Linear Regression
The main purposes of SLR are:


1. To understand the relationship
It helps identify how one variable affects another (e.g., how study hours affect exam scores).


2. To make predictions
Once the model is trained, it can predict the value of Y for a given value of X.


3. To quantify the effect
The slope (Œ≤‚ÇÅ) tells how much Y changes when X increases by one unit.


4. To explain trends in data
It helps analyze patterns and trends in real-world data such as sales vs. advertising spend, salary vs. experience, etc.



Example
If we want to predict salary (Y) based on years of experience (X), Simple Linear Regression can estimate the expected salary for any given experience level.



2.  What are the key assumptions of Simple Linear Regression?
- The key assumptions of Simple Linear Regression (SLR) are conditions that must be satisfied for the model to give valid, reliable, and unbiased results.
1. Linearity

* There is a **linear relationship** between the independent variable (X) and the dependent variable (Y).
* This means Y changes at a constant rate with respect to X.
 2. Independence of Errors

* The residuals (errors) are **independent** of each other.
* One observation should not influence another.
3. Homoscedasticity

* The variance of residuals is **constant** across all values of X.
* Errors should be evenly spread, not increasing or decreasing.

4. Normality of Errors

* The residuals should be **normally distributed**.
* Important for hypothesis testing and confidence intervals.

5. No Perfect Multicollinearity

* Since SLR has only **one independent variable**, multicollinearity is **not an issue** here.
* (This assumption becomes important in multiple linear regression.)

6. Zero Mean of Errors

* The average value of errors should be zero:  E(Œµ)=0
  






3. Write the mathematical equation for a simple linear regression model and
explain each term.
- The mathematical equation of a Simple Linear Regression (SLR) model is:
      Y=Œ≤0‚Äã+Œ≤1‚ÄãX+Œµ

* Y (Dependent Variable)
  The outcome or target variable we want topredict or explain
 Example: salary, marks, sales

* X (Independent Variable)
  The predictor or input variable that is used to explain changes in Y
  Example: years of experience, study hours, advertising cost

*Œ≤‚ÇÄ (Intercept)
  The value of Y when X = 0
  It shows where the regression line **cuts the Y-axis**

* Œ≤‚ÇÅ (Slope / Regression Coefficient)
  Represents the **change in Y for a one-unit increase in X**

  * If Œ≤‚ÇÅ > 0 ‚Üí positive relationship
  * If Œ≤‚ÇÅ < 0 ‚Üí negative relationship

* **Œµ (Error Term / Residual)**
  Represents random error or variation in Y that **cannot be explained by X**

Example

If the regression equation is:
Salary=20,000+5,000√óExperience

* Œ≤‚ÇÄ = 20,000 ‚Üí base salary
* Œ≤‚ÇÅ = 5,000 ‚Üí salary increases by ‚Çπ5,000 for each additional year of experience



4.  Provide a real-world example where simple linear regression can be
applied.
- A real-world example of Simple Linear Regression (SLR) is predicting house price based on area.

Example: House Price Prediction

* Independent Variable (X): Area of the house (in square feet)
* Dependent Variable (Y): Price of the house (in ‚Çπ)

Using past data of houses, we can fit a simple linear regression model:

Price=Œ≤0‚Äã+Œ≤1‚Äã√óArea

How SLR is Applied

* The model learns how house prices change as the area increases.
* **Œ≤‚ÇÅ (slope)** tells how much the house price increases per extra square foot.
* Once trained, the model can **predict the price** of a new house based only on its area.

 Why This Is a Good Example

* Only **one predictor** is used ‚Üí suitable for Simple Linear Regression
* Easy to understand and interpret
* Commonly used in **real estate, data analytics, and business decision-making**

 Other Real-World Examples

* Predicting salary based on years of experience
* Predicting exam marks based on study hours
* Predicting sales based on advertising spend




5. What is the method of least squares in linear regression?
- The method of least squares is a standard technique used in linear regression to find the best-fitting regression line for a given set of data points.

 What It Means

In linear regression, many straight lines can pass through the data.
The **least squares method** selects the line that **minimizes the total error** between the actual values and the predicted values.

Specifically, it minimizes the **sum of the squares of the residuals**.

-  Mathematical Idea

For a Simple Linear Regression model:

Y=Œ≤0‚Äã+Œ≤1‚ÄãX
* **Residual (error)** for each data point:
 ei‚Äã=yi‚Äã‚àíy^‚Äãi‚Äã
* **Objective of least squares:**
  Minimize i=1‚àën‚Äã(yi‚Äã‚àíy^‚Äãi‚Äã)2

This ensures that:

* Large errors are penalized more (because of squaring)
* Positive and negative errors don‚Äôt cancel each other out

 Why Squared Errors?

* Makes the function **smooth and differentiable**
* Gives a **unique optimal solution**
* Emphasizes larger deviations

 Result

Using the least squares method, we obtain:

* **Œ≤‚ÇÄ (intercept)** and
* **Œ≤‚ÇÅ (slope)**

such that the regression line fits the data as closely as possible.




6. What is Logistic Regression? How does it differ from Linear Regression?
- Logistic Regression is a supervised machine-learning and statistical classification algorithm used when the dependent variable is categorical, most commonly binary (e.g., Yes/No, 0/1, Pass/Fail).

Instead of predicting a continuous value, logistic regression predicts the probability that an observation belongs to a particular class.

It uses the logistic (sigmoid) function:

 P(Y=1)=1+e‚àí(Œ≤0‚Äã+Œ≤1‚ÄãX)1‚Äã

 The output is always between 0 and 1, which is then converted into a class label using a threshold (usually 0.5).

 Sure üëç
Here is the **difference between Linear Regression and Logistic Regression written with clear headings**, suitable for exams:

- Logistic Regression differ from Linear Regression
1. Meaning

* Linear Regression: Used to predict a continuous numerical value.
* Logistic Regression: Used to predict a categorical (mostly binary) outcome.

2. Type of Problem

* Linear Regression: Regression problem
* Logistic Regression: Classification problem

3. Dependent Variable

* Linear Regression: Continuous (e.g., salary, marks)
* Logistic Regression: Categorical (e.g., Yes/No, Pass/Fail)

4. Output Range

* Linear Regression: Output ranges from (-\infty) to (+\infty)
* Logistic Regression: Output ranges between 0 and 1

5. Function Used

* Linear Regression: Linear (straight line) function
* Logistic Regression: Sigmoid (logistic) function

6. Model Equation

* Linear Regression:
  
  Y=Œ≤0‚Äã+Œ≤1‚ÄãX
* Logistic Regression:
  
  P(Y=1)=1+e‚àí(Œ≤0‚Äã+Œ≤1‚ÄãX)1‚Äã
  
7. Error Minimization Method

* Linear Regression: Least Squares Method
* Logistic Regression: Maximum Likelihood Estimation

8. Interpretation of Output

* Linear Regression: Predicts an exact numeric value
* Logistic Regression: Predicts probability of a class

9. Decision Boundary

* Linear Regression: Not applicable
* Logistic Regression: Uses a threshold (commonly 0.5)

10. Example

* Linear Regression: Predict house price based on area
* Logistic Regression: Predict pass/fail based on study hours



7. Name and briefly describe three common evaluation metrics for regression.
- Here are three common evaluation metrics for regression, explained briefly and clearly:

1. Mean Absolute Error (MAE)

* Measures the average absolute difference between actual and predicted values.
* Easy to understand because it is in the same units as the target variable.

MAE=n1‚Äã‚àë‚à£yi‚Äã‚àíy^‚Äãi‚Äã‚à£

2. Mean Squared Error (MSE)

* Measures the average of squared differences between actual and predicted values.
* Penalizes large errors more strongly due to squaring.

MSE=n1‚Äã‚àë(yi‚Äã‚àíy^‚Äãi‚Äã)2

3. Root Mean Squared Error (RMSE)

* Square root of MSE.
* Expressed in the same units as the dependent variable, making it easier to interpret.
RMSE= MSE power 1/2
‚Äã




8. What is the purpose of the R-squared metric in regression analysis?
- Purpose of the R-squared (R¬≤) Metric in Regression Analysis

R-squared (R¬≤), also called the Coefficient of Determination, is used to measure **how well a regression model explains the variability of the dependent variable**.

 Main Purposes of R-squared

1. Measures Goodness of Fit

* Indicates how well the regression line fits the data.
* Shows the proportion of variance in the dependent variable explained by the model.

2. Explains Variance in Percentage

* R¬≤ is usually expressed as a value between **0 and 1** (or 0%‚Äì100%).
* Example:

  * R¬≤ = 0.80 ‚Üí **80% of the variation** in Y is explained by X.

3. Compares Regression Models

* Helps compare different regression models using the same dataset.
* Higher R¬≤ generally indicates a better fit.

4. Evaluates Model Effectiveness

* Tells how effective the independent variable(s) are in predicting the dependent variable.




9. Write Python code to fit a simple linear regression model using scikit-learn
and print the slope and intercept.


In [2]:
# Import required libraries
import numpy as np
from sklearn.linear_model import LinearRegression

# Sample data (Independent variable X and Dependent variable y)
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)   # Feature
y = np.array([2, 4, 6, 8, 10])                # Target

# Create and train the model
model = LinearRegression()
model.fit(X, y)

# Get slope and intercept
slope = model.coef_[0]
intercept = model.intercept_

# Print results
print("Slope (Coefficient):", slope)
print("Intercept:", intercept)


Slope (Coefficient): 2.0
Intercept: 0.0


10.  How do you interpret the coefficients in a simple linear regression model?
- Interpretation of Coefficients in a Simple Linear Regression Model

A simple linear regression model is written as:

Y=Œ≤0‚Äã+Œ≤1‚ÄãX

1. Intercept (Œ≤‚ÇÄ)

* Represents the **expected value of Y when X = 0**.
* It is the point where the regression line **cuts the Y-axis**.
* Sometimes it may not have a practical meaning if X = 0 is not realistic.

Example:
If Œ≤‚ÇÄ = 10, then when X = 0, Y = 10.

2. Slope / Coefficient (Œ≤‚ÇÅ)

* Represents the change in Y for a one-unit increase in X.
* Shows the direction and strength of the relationship:

  * Œ≤‚ÇÅ > 0 ‚Üí Positive relationship
  * Œ≤‚ÇÅ < 0 ‚Üí Negative relationship
  * Œ≤‚ÇÅ = 0 ‚Üí No relationship

Example:
If Œ≤‚ÇÅ = 5, then for every 1-unit increase in X, Y increases by 5 units.



* Intercept: Starting value of Y
* Slope: Rate at which Y changes with X


