# Assignment: Supervised Learning — Regression Models and Performance Metrics

### Question 1: What is Simple Linear Regression (SLR)? Explain its purpose.

**Answer:**

Simple Linear Regression (SLR) models the relationship between a single independent variable (X) and a dependent variable (Y) using a straight line. Its purpose is to predict Y from X and to quantify the strength and direction of the linear relationship.

### Question 2: What are the key assumptions of Simple Linear Regression?

**Answer:**

Key assumptions of SLR:
1. Linearity: relationship between X and Y is linear.
2. Independence: residuals are independent.
3. Homoscedasticity: constant variance of residuals across X.
4. Normality: residuals are approximately normally distributed (for inference).
5. No (or negligible) measurement error in X; mean of errors is zero.

### Question 3: Write the mathematical equation for a simple linear regression model and explain each term.

**Answer:**

Mathematical equation: y = β0 + β1*x + ε, where β0 is the intercept (value of y when x=0), β1 is the slope (change in y for a one-unit change in x), and ε is the error term (unexplained variation).

### Question 4: Provide a real-world example where simple linear regression can be applied.

**Answer:**

Example: Predicting house price (Y) from house size in square feet (X). SLR can estimate how much price increases per additional square foot.

### Question 5: What is the method of least squares in linear regression?

**Answer:**

Method of least squares finds model parameters (β0, β1) that minimize the sum of squared residuals ∑(yi - (β0+β1 xi))^2. Solving leads to closed-form normal equations for β0 and β1.

### Question 6: What is Logistic Regression? How does it differ from Linear Regression?

**Answer:**

Logistic Regression is a classification model used for binary outcomes. It models the probability that Y=1 using the logistic (sigmoid) function applied to a linear combination of inputs.
Difference from Linear Regression: linear regression predicts continuous values; logistic regression predicts probabilities and uses a logit link and a different loss (log-loss).

### Question 7: Name and briefly describe three common evaluation metrics for regression models.

**Answer:**

Three common regression evaluation metrics:
1. Mean Absolute Error (MAE): average absolute difference between predicted and actual values — easy to interpret.
2. Mean Squared Error (MSE) / Root MSE (RMSE): average squared errors (RMSE is in same units as Y), penalizes large errors.
3. R-squared (coefficient of determination): proportion of variance in Y explained by the model.

### Question 8: What is the purpose of the R-squared metric in regression analysis?

**Answer:**

R-squared measures the proportion of variance in the dependent variable explained by the independent variable(s). It ranges from 0 to 1 (higher is better); adjusted R-squared accounts for number of predictors and sample size.

### Question 9: Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.

**Answer:**

Below is the code and its output.

In [1]:
import numpy as np
from sklearn.linear_model import LinearRegression

# Sample data (X: feature, y: target)
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([1.5, 3.1, 3.9, 5.2, 6.8])

model = LinearRegression()
model.fit(X, y)

print('Slope (beta1):', model.coef_[0])
print('Intercept (beta0):', model.intercept_)


Slope (beta1): 1.2700000000000002
Intercept (beta0): 0.28999999999999915


### Question 10: How do you interpret the coefficients in a simple linear regression model?

**Answer:**

Interpretation: The slope (β1) represents the expected change in Y for a one-unit increase in X. The intercept (β0) is the expected value of Y when X=0. Coefficients should be interpreted in the context of the data and with attention to statistical significance and assumptions.