1. What is Simple Linear Regression?

Simple Linear Regression models the relationship between one predictor X and one outcome Y using a straight line:

Y = mX + c + ε

Example: Predict salary from years of experience.

import numpy as np
from sklearn.linear_model import LinearRegression
X = np.array([0,1,2,3,4,5]).reshape(-1,1)
y = 30000 + 5000*X.ravel() + np.random.normal(0, 3000, size=X.shape[0])
model = LinearRegression().fit(X, y)
print("Slope:", model.coef_[0], "Intercept:", model.intercept_)

2. Key assumptions of Simple Linear Regression

Linearity

Independence of residuals

Homoscedasticity

Normal distribution of residuals

No major outliers

Example: House price predicted from size assumes linearity.

import matplotlib.pyplot as plt
X = np.linspace(500, 2500, 60).reshape(-1,1)
y = 2000*X.ravel() + 100000 + np.random.normal(0, 80000, size=X.shape[0])
plt.scatter(X,y)
plt.show()

3. Coefficient m in Y = mx + c

Slope m = change in Y for a one-unit increase in X. Example: Each extra square foot increases house price by ₹2000.

4. Intercept c in Y = mx + c

Intercept c = expected Y when X=0. Example: Starting salary at 0 years experience.

5. Calculating slope m

m = Σ(Xi - X̄)(Yi - Ȳ) / Σ(Xi - X̄)²
c = Ȳ - mX̄

Example: Study hours vs exam score.

6. Purpose of least squares method

Minimizes squared residuals to find best-fit line. Example: Fit line for ad spend vs sales.

7. R² interpretation

Proportion of variance in Y explained by X. Example: R² = 0.85 means 85% of salary variation explained by experience.

8. Multiple Linear Regression

Models Y using multiple predictors.

Y = β0 + β1X1 + β2X2 + ... + βpXp + ε

Example: House price ~ size + bedrooms + location.

9. Difference between Simple and Multiple Linear Regression

Simple: one predictor

Multiple: two or more predictors

10. Key assumptions of Multiple Linear Regression

Linearity

Independence

Homoscedasticity

Normal residuals

No multicollinearity

11. Heteroscedasticity

Residuals have unequal variance. Coefficients unbiased, but SE unreliable. Example: Income variance increases with age.

12. Improving multicollinearity

Remove correlated predictors

Use Ridge/Lasso

Apply PCA

13. Transforming categorical variables

One-hot encoding

Ordinal encoding

Target encoding

14. Interaction terms

Capture combined effects.

Y = β0 + β1X1 + β2X2 + β3(X1X2)

Example: Study hours effect depends on sleep quality.

15. Intercept interpretation differences

Simple: Y when X=0

Multiple: Y when all predictors=0

16. Significance of slope

Shows how much Y changes per unit of X. Example: Advertising spend slope = extra sales per ₹1000.

17. Intercept context

Baseline value of Y when predictors=0. Example: Exam score at 0 study hours.

18. Limitations of R²

Increases with more predictors

Ignores diagnostics

Doesn’t measure predictive accuracy

19. Large standard error

Coefficient estimate is uncertain. Example: Education level coefficient in salary model unclear.

20. Identifying heteroscedasticity

Residual vs fitted plot shows funnel shape. Example: Car price residuals spread wider for expensive cars.

21. High R² but low adjusted R²

Extra predictors inflate R² without real value.

22. Importance of scaling

Numerical stability

Comparability

Needed for Ridge/Lasso

23. Polynomial regression

Regression with powers of predictors.

Y = β0 + β1X + β2X² + ... + βdXd

24. Difference from linear regression

Linear fits straight line; polynomial fits curves.

25. When polynomial regression is used

When data shows nonlinear patterns. Example: Fertilizer vs crop yield.

26. General equation

Y = β0 + Σ βkX^k + ε

27. Polynomial regression with multiple variables

Yes, include polynomial terms and interactions.

28. Limitations of polynomial regression

Overfitting

Poor extrapolation

Hard to interpret coefficients

29. Evaluating polynomial degree

Cross-validation

Adjusted R², AIC, BIC

Residual plots

30. Importance of visualization

Shows curve fit, detects overfitting, communicates results.

31. Polynomial regression in Python

Use PolynomialFeatures + LinearRegression.

import numpy as np, matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression

X = np.linspace(0, 6, 50).reshape(-1,1)
y = np.sin(X).ravel() + np.random.normal(0, 0.2, 50)

model = Pipeline([
    ('poly', PolynomialFeatures(degree=3, include_bias=False)),
    ('lin', LinearRegression())
])
model.fit(X, y)

X_plot = np.linspace(0, 6, 200).reshape(-1,1)
Y_pred = model.predict(X_plot)
plt.scatter(X, y)
plt.plot(X_plot, Y_pred, color='red')
plt.show()