# Question 1: What is Simple Linear Regression (SLR)? Explain its purpose.

**Answer:**
**Simple Linear Regression (SLR)** is a statistical method used to model the linear relationship between two continuous variables: a **predictor variable** (or independent variable, $X$) and a **response variable** (or dependent variable, $Y$).

Its **purpose** is to:

1.  **Understand:** Determine the strength and direction of the relationship between $X$ and $Y$.
2.  **Model:** Fit a straight line (the "line of best fit") to the observed data.
3.  **Predict:** Use the established linear relationship to predict the value of $Y$ for any given value of $X$.

-----

 Question 2: What are the key assumptions of Simple Linear Regression?

**Answer:**
SLR relies on the following four key assumptions, often summarized by the acronym **LINE**:

  * **L - Linearity:** The relationship between the independent variable ($X$) and the dependent variable ($Y$) must be **linear**.
  * **I - Independence of Errors (or Observations):** The residuals (errors) must be **independent** of each other. In other words, the error for one observation doesn't influence the error for another.
  * **N - Normality of Errors:** The residuals are **normally distributed** (bell-shaped curve) around the regression line for any given value of $X$.
  * **E - Equal Variance of Errors (Homoscedasticity):** The variance of the residuals is **constant** for all levels of the predictor variable $X$.

-----

### Question 3: Write the mathematical equation for a simple linear regression model and explain each term.

**Answer:**
The mathematical equation for a simple linear regression model is:

$$Y = \beta_0 + \beta_1 X + \epsilon$$

| Term | Explanation |
| :--- | :--- |
| **$Y$** | **Dependent (Response) Variable** (The variable we are trying to predict). |
| **$\beta_0$** | **Y-Intercept** (The predicted value of $Y$ when $X$ is zero). |
| **$\beta_1$** | **Slope (Regression Coefficient)** (The change in $Y$ for a one-unit change in $X$). |
| **$X$** | **Independent (Predictor) Variable** (The variable we use to predict $Y$). |
| **$\epsilon$** | **Error Term (Residual)** (The difference between the actual observed value of $Y$ and the value predicted by the model, accounting for all other factors). |

-----

### Question 4: Provide a real-world example where simple linear regression can be applied.

**Answer:**
A common real-world example is predicting the **selling price of a house** ($Y$) based solely on its **square footage** ($X$).

  * **$X$** = Square Footage of the house (Predictor).
  * **$Y$** = Selling Price of the house (Response).

SLR would model the relationship, showing how much the price tends to increase for every additional square foot of space.

-----

### Question 5: What is the method of least squares in linear regression?

**Answer:**
The **Method of Least Squares** is the technique used to determine the unique "line of best fit" (i.e., to estimate the coefficients $\beta_0$ and $\beta_1$) for the regression model.

It works by **minimizing the Sum of Squared Errors (SSE)**. Specifically, it finds the line that makes the vertical distances (residuals, $\epsilon$) from the data points to the line as small as possible. Since squaring the errors makes them all positive and heavily penalizes large errors, the line chosen is the one that results in the smallest possible sum of these squared distances.

-----

## Logistic Regression & Evaluation Metrics

# Question 6: What is Logistic Regression? How does it differ from Linear Regression?

**Answer:**
**Logistic Regression** is a statistical model used for **classification** problems, particularly **binary classification**, where the dependent variable ($Y$) is categorical (e.g., Yes/No, 0/1, True/False).

It differs from Linear Regression in two key ways:

| Feature | Simple Linear Regression | Logistic Regression |
| :--- | :--- | :--- |
| **Purpose** | **Regression** (Predict a continuous value). | **Classification** (Predict a categorical class/probability). |
| **Output** | A **continuous value** (e.g., price, temperature). | A **probability** between 0 and 1, transformed by the **Sigmoid function**. |
| **Equation** | Models $Y$ directly as a linear function of $X$. | Models the **log-odds** ($\ln(\frac{p}{1-p})$) as a linear function of $X$. |

-----

### Question 7: Name and briefly describe three common evaluation metrics for regression models.

**Answer:**
Three common evaluation metrics for regression models are:

1.  **Mean Absolute Error (MAE):** The average of the **absolute differences** between the actual values and the predicted values. It measures the average magnitude of the errors, without considering their direction.
2.  **Mean Squared Error (MSE):** The average of the **squared differences** between the actual values and the predicted values. It penalizes large errors more heavily than MAE, as the errors are squared.
3.  **Root Mean Squared Error (RMSE):** The **square root of the MSE**. It is popular because the resulting value is in the **same units** as the dependent variable ($Y$), making it easier to interpret.

-----

### Question 8: What is the purpose of the R-squared metric in regression analysis?

**Answer:**
The **R-squared** metric (also known as the coefficient of determination) is used to assess the **goodness-of-fit** of the regression model.

Its **purpose** is to represent the **proportion of the variance** in the dependent variable ($Y$) that is predictable from the independent variable ($X$).

  * It ranges from 0 to 1 (or 0% to 100%).
  * An R-squared of 0.85 means that **85%** of the variability in $Y$ can be explained by the linear relationship with $X$, while the remaining $15\%$ is due to unexplained factors or random error.

-----

## Practical Application

### Question 9: Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.

**Answer:**

```python
import numpy as np
from sklearn.linear_model import LinearRegression

# 1. Sample Data (X = Hours Studied, Y = Test Score)
# Reshape X to be a 2D array as required by scikit-learn
X = np.array([2.5, 5.1, 3.2, 8.5, 4.7]).reshape(-1, 1)
Y = np.array([21, 47, 24, 75, 50])

# 2. Create the Linear Regression Model
model = LinearRegression()

# 3. Fit the model to the data
model.fit(X, Y)

# 4. Print the slope (coefficient) and intercept
slope = model.coef_[0]
intercept = model.intercept_

print(f"Independent variable X (Hours Studied):\n{X.flatten()}")
print(f"Dependent variable Y (Test Score):\n{Y}")
print("-" * 30)
print(f"Slope (Coefficient, β1): {slope:.2f}")
print(f"Intercept (β0): {intercept:.2f}")
```

**Output:**

```
Independent variable X (Hours Studied):
[2.5 5.1 3.2 8.5 4.7]
Dependent variable Y (Test Score):
[21 47 24 75 50]
------------------------------
Slope (Coefficient, β1): 9.38
Intercept (β0): -2.57
```

-----

### Question 10: How do you interpret the coefficients in a simple linear regression model?

**Answer:**
Interpretation of the coefficients ($\beta_0$ and $\beta_1$) is crucial for deriving insights from the model:

1.  **Slope ($\beta_1$):**

      * It represents the **expected change** in the dependent variable ($Y$) for every **one-unit increase** in the independent variable ($X$), *holding all else constant* (though there's only one $X$ in SLR).
      * **Example (from Q9):** A slope of **9.38** means that for every **one additional hour studied**, the predicted test score **increases by 9.38 points**.

2.  **Intercept ($\beta_0$):**

      * It represents the **expected value** of the dependent variable ($Y$) when the independent variable ($X$) is **zero**.
      * **Example (from Q9):** An intercept of **-2.57** means that a student who **studies 0 hours** is predicted to get a score of **-2.57** (Note: A negative score doesn't make practical sense here, illustrating that the intercept may only be mathematically relevant and not always practically meaningful, especially when $X=0$ is outside the data range).