# Machine Learning — Regression Assignment (2nd ml)

**ashok:** _kumar_

---

**Contents**

1. Questions 1–8: Concepts (Markdown)
2. Question 9: Executed linear regression example (code + output)
3. Question 10: Interpretation (Markdown)
4. How to run / Notes


## Q1 — What is Simple Linear Regression (SLR)?

Simple Linear Regression models a relationship between a single predictor `x` and a target `y` using a line: $y = \beta_0 + \beta_1 x + \epsilon$. It's used for prediction and to quantify association.

## Q2 — Key assumptions of SLR

Linearity, independence, homoscedasticity (constant variance), normality of errors (for inference), absence of influential outliers.

## Q3 — Equation and terms

$y = \beta_0 + \beta_1 x + \epsilon$ — where $\beta_0$ is intercept, $\beta_1$ slope, $\epsilon$ noise.

## Q4 — Real-world example

Predicting exam scores from hours studied; predicting house price from area (single predictor).

## Q5 — Method of least squares

Estimates coefficients by minimizing sum of squared residuals. Closed form for SLR: $\hat{\beta}_1 = \frac{\sum(x_i-\bar x)(y_i-\bar y)}{\sum (x_i-\bar x)^2}$.

## Q6 — Logistic vs Linear Regression

Logistic is for classification (predicts probabilities via sigmoid), linear for continuous targets. Loss functions and outputs differ.

## Q7 — Three common evaluation metrics

- MSE, RMSE, MAE (definitions & brief use-cases).

## Q8 — Purpose of R-squared

Proportion of variance explained: $R^2 = 1 - \frac{RSS}{TSS}$. Use adjusted R² when comparing models with different numbers of predictors.

In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error

# Recreate the synthetic data (seeded for reproducibility)
rng = np.random.RandomState(42)
X = 2.5 * rng.rand(100, 1)
true_slope = 4.2
true_intercept = 1.5
noise = rng.normal(scale=0.8, size=(100, 1))
y = true_intercept + true_slope * X + noise

model = LinearRegression().fit(X, y.ravel())
slope = model.coef_[0]
intercept = model.intercept_
preds = model.predict(X)
r2 = r2_score(y, preds)
mse = mean_squared_error(y, preds)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y, preds)

print(f"Learned slope (beta1): {slope}")
print(f"Learned intercept (beta0): {intercept}")
print(f"R^2 on training data: {r2:.5f}")
print(f"MSE: {mse:.5f}, RMSE: {rmse:.5f}, MAE: {mae:.5f}")
print('\nSample predictions:')
X_new = np.array([[0.0],[1.0],[2.0]])
for xi, p in zip(X_new.ravel(), model.predict(X_new)):
    print(f"X={xi:.2f} => predicted y={p:.6f}")

Learned slope (beta1): 4.0528725673206285
Learned intercept (beta0): 1.6720769260373993
R^2 on training data: 0.94572
MSE: 0.51621, RMSE: 0.71848, MAE: 0.56083

Sample predictions:
X=0.00 => predicted y=1.672076926037
X=1.00 => predicted y=5.724949493358
X=2.00 => predicted y=9.777822060679


## Q10 — How to interpret coefficients

- **Intercept (β0):** expected `y` when `x=0`.
- **Slope (β1):** expected change in `y` for a one-unit increase in `x`.

**Caveats:** association ≠ causation; consider confidence intervals and whether `x=0` is meaningful in your problem.

## How to run

1. Open the notebook in Google Colab or Jupyter.
2. Run all cells in order. The executed Q9 cell shows actual model outputs embedded.
3. To reproduce, ensure scikit-learn and numpy are available (`pip install scikit-learn numpy`).

---

*Notebook generated programmatically; contains no personal identifiers.*