# Question 1 : What is Simple Linear Regression (SLR)? Explain its purpose.

Simple Linear Regression (SLR) is a statistical and machine learning technique used to model the relationship between one independent variable (X) and one dependent variable (Y) by fitting a straight line to the observed data.

It assumes that the relationship between X and Y can be expressed using a linear equation:

Y=β0+β1X+ε


Purpose of Simple Linear Regression
The main objectives of SLR are:
To understand the relationship between variables
It helps determine whether and how strongly one variable influences another.
To predict outcomes
Given a new value of X, SLR can predict the corresponding value of Y.
To quantify the effect of X on Y
indicates how much Y changes for a one-unit increase in X.
To perform trend analysis
Useful for identifying trends such as growth, decline, or stability in data.

# Question 2: What are the key assumptions of Simple Linear Regression?

1. Linearity

The relationship between the independent variable (X) and dependent variable (Y) must be linear.

This means changes in Y are proportional to changes in X.

✔️ Checked using scatter plots or residual plots.

2. Independence of Errors

The residuals (errors) should be independent of each other.

One observation’s error should not influence another’s.

✔️ Commonly violated in time-series data.

3. Homoscedasticity (Constant Variance)

The variance of errors should be constant across all values of X.

Errors should not fan out or shrink as X increases.

✔️ Checked using residual vs. fitted value plots.

4. Normality of Errors

The residuals should be normally distributed, especially for hypothesis testing and confidence intervals.

This does not mean X or Y must be normal.

✔️ Checked using histograms or Q–Q plots.

5. No Perfect Multicollinearity

Since SLR has only one independent variable, this assumption is automatically satisfied.

(More relevant in multiple linear regression.)

6. Zero Mean of Errors

The expected value of the error term is zero:

E(ε)=0
E(ε)=0

Ensures unbiased predictions.

# Question 3: The mathematical equation of a Simple Linear Regression (SLR) model is:

## Mathematical Equation of Simple Linear Regression

The mathematical equation of a **Simple Linear Regression (SLR)** model is:

\[
Y = \beta_0 + \beta_1 X + \varepsilon
\]

---

## Explanation of Each Term

- **\(Y\)** → *Dependent Variable*  
  The outcome or response variable that the model aims to predict or explain.

- **\(X\)** → *Independent Variable*  
  The predictor variable used to explain variations in \(Y\).

- **\(\beta_0\)** → *Intercept*  
  The value of \(Y\) when \(X = 0\). It represents the point where the regression line intersects the Y-axis.

- **\(\beta_1\)** → *Slope (Regression Coefficient)*  
  Indicates the change in \(Y\) for a one-unit increase in \(X\).  
  - \(\beta_1 > 0\): Positive relationship  
  - \(\beta_1 < 0\): Negative relationship

- **\(\varepsilon\)** → *Error Term*  
  Represents random error or the part of \(Y\) not explained by \(X\).

---




# Question 4:
## Real-World Example of Simple Linear Regression

A common real-world application of **Simple Linear Regression** is in **predicting house prices**.

### Example Scenario
- **Independent Variable (X):** Size of the house (in square feet)
- **Dependent Variable (Y):** Price of the house

### Regression Model
\[
\text{House Price} = \beta_0 + \beta_1 \times (\text{House Size}) + \varepsilon
\]

### Explanation
- The model analyzes how house prices change as the size of the house increases.
- The slope (\(\beta_1\)) indicates how much the house price increases for every additional square foot.
- The intercept (\(\beta_0\)) represents the estimated base price when the house size is zero.

### Purpose
- Helps real estate companies estimate property prices.
- Assists buyers and sellers in making data-driven decisions.

---

### One-line Exam Answer
> Simple linear regression can be applied to predict house prices based on house size, where price is the dependent variable and size is the independent variable.



# Question 5
## Method of Least Squares in Linear Regression

The **Method of Least Squares** is a statistical technique used in **linear regression** to estimate the best-fitting regression line by **minimizing the sum of squared errors** between the observed values and the predicted values.

---

## Concept

In simple linear regression, the predicted value of \(Y\) is:

\[
\hat{Y} = \beta_0 + \beta_1 X
\]

The error (residual) for each observation is:

\[
e_i = Y_i - \hat{Y}_i
\]

The method of least squares chooses the values of \(\beta_0\) and \(\beta_1\) such that the **sum of squared residuals** is minimized:

\[
\text{SSE} = \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2
\]

---

## Why Squared Errors?

- Squaring ensures all errors are positive.
- Larger errors are penalized more heavily.
- Makes the optimization mathematically tractable.

---

## Result

By minimizing the SSE, the method finds:
- The **best-fit straight line**
- The most accurate estimates of the **intercept (\(\beta_0\))** and **slope (\(\beta_1\))**

---

## In Simple Words

> The method of least squares finds the regression line that keeps the total squared distance between actual data points and the predicted line as small as possible.




# Question 6 What is Logistic Regression? How does it differ from Linear Regression?
## Logistic Regression

**Logistic Regression** is a **supervised machine learning and statistical classification technique** used when the **dependent variable is categorical**, most commonly **binary** (e.g., Yes/No, 0/1, True/False).

Instead of predicting a continuous value, logistic regression predicts the **probability** that an observation belongs to a particular class using the **sigmoid (logistic) function**.

### Mathematical Model
\[
P(Y=1 \mid X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X)}}
\]

Where:
- \(P(Y=1)\) → Probability of the positive class  
- \(\beta_0\) → Intercept  
- \(\beta_1\) → Coefficient  
- \(X\) → Independent variable  

---



## Difference Between Logistic Regression and Linear Regression

| Feature | Linear Regression | Logistic Regression |
|------|------------------|--------------------|
| Type of problem | Regression | Classification |
| Output | Continuous values | Probability (0–1) |
| Dependent variable | Continuous | Categorical (Binary) |
| Function used | Linear function | Sigmoid (Logistic) |
| Output range | \(-\infty\) to \(+\infty\) | 0 to 1 |
| Example | Predict salary | Predict pass/fail |

---

## Real-World Example

- **Linear Regression:** Predicting house price based on size  
- **Logistic Regression:** Predicting whether a student will **pass or fail** an exam based on study hours

---

## One-line Exam Answer

> Logistic regression is used for classification problems where the dependent variable is categorical, while linear regression is used for predicting continuous numerical values.


# Question 7 
## Evaluation Metrics for Regression Models

Evaluation metrics are used to measure how accurately a regression model predicts the target variable.

---

## 1. Mean Absolute Error (MAE)

Mean Absolute Error measures the **average absolute difference** between actual values and predicted values.

Formula (plain text):

MAE = (1/n) * Σ | Yi − Y_pred_i |

Where:
- Yi = actual value
- Y_pred_i = predicted value
- n = total number of observations

Key points:
- Easy to interpret
- Treats all errors equally
- Less sensitive to outliers

---

## 2. Mean Squared Error (MSE)

Mean Squared Error measures the **average of squared differences** between actual and predicted values.

Formula (plain text):

MSE = (1/n) * Σ ( Yi − Y_pred_i )²

Key points:
- Penalizes large errors more
- Sensitive to outliers
- Commonly used in optimization

---

## 3. Root Mean Squared Error (RMSE)

Root Mean Squared Error is the **square root of MSE**, expressed in the same units as the target variable.

Formula (plain text):

RMSE = √MSE

Key points:
- More interpretable than MSE
- Widely used in practice
- Sensitive to large prediction errors

---

## One-line Exam Answer

Common regression evaluation metrics include MAE, MSE, and RMSE, which measure the difference between actual and predicted values.


# Question 8: What is the purpose of the R-squared metric in regression analysis?

## Purpose of R-squared (R²) in Regression Analysis

**R-squared (R²)** is a statistical metric used to measure **how well a regression model explains the variability of the dependent variable**.

---

## What R-squared Indicates

- R-squared represents the **proportion of variance in the dependent variable** that is explained by the independent variable(s).
- Its value lies between **0 and 1**.

Formula (plain text):

R² = 1 − ( Sum of Squared Residuals / Total Sum of Squares )

---

## Interpretation of R-squared Values

- **R² = 0** → The model explains none of the variability
- **R² = 1** → The model explains all the variability
- **R² = 0.85** → 85% of the variation in the dependent variable is explained by the model

---

## Purpose of Using R-squared

- To evaluate the **goodness of fit** of a regression model
- To compare different regression models
- To understand how strongly the independent variable explains the dependent variable

---

## Important Note

- A high R² does **not always mean** the model is good
- It does not indicate causation
- It should be used along with other evaluation metrics

---

## One-line Exam Answer

> R-squared measures the proportion of variance in the dependent variable that is explained by the regression model.


# Question 9: Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept



In [14]:
from sklearn.linear_model import LinearRegression
import numpy as np

# Sample data (Independent variable X and Dependent variable y)
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 5, 4, 5])

# Create and fit the model
model = LinearRegression()
model.fit(X, y)

# Print slope and intercept
print("Slope (Coefficient):", model.coef_[0])
print("Intercept:", model.intercept_)

Slope (Coefficient): 0.6
Intercept: 2.2


# Question 10: How do you interpret the coefficients in a simple linear regression model?
## Interpretation of Coefficients in Simple Linear Regression

In a **Simple Linear Regression (SLR)** model, the coefficients explain how the **dependent variable changes** with respect to the **independent variable**.

The regression equation is:

Y = Intercept + (Slope × X)

---

## 1. Intercept (β0)

- The intercept represents the **expected value of Y when X = 0**.
- It indicates the baseline level of the dependent variable.
- In some real-world cases, it may not have a practical meaning but is necessary for the model.

Example:
- If Intercept = 20, then Y is expected to be 20 when X is 0.

---

## 2. Slope (β1)

- The slope represents the **change in Y for a one-unit increase in X**.
- It shows the **direction and strength** of the relationship.

Interpretation:
- Positive slope → Y increases as X increases
- Negative slope → Y decreases as X increases
- Zero slope → No relationship between X and Y

Example:
- If Slope = 5, then Y increases by 5 units for every 1-unit increase in X.

---

## In Simple Words

> The slope tells **how much Y changes when X changes**, and the intercept tells **where the regression line starts**.

---

## One-line Exam Answer

> In simple linear regression, the slope indicates the change in the dependent variable per unit change in the independent variable, while the intercept represents the value of the dependent variable when the independent variable is zero.
