# Analytical Quiz: Linear Regression

### Questions

1. You apply a linear regression model and get a low R² score. What could be some reasons behind this, and how would you investigate them?

2. Explain the trade-off between underfitting and overfitting in the context of linear regression.

3. Suppose a model gives a very high R² value on training data but performs poorly on unseen data. What might be going wrong?

4. How would you determine whether a linear model is appropriate for your data?

5. You're given a dataset with highly correlated features. How can this affect your linear regression model, and what would you do about it?


# Assignment: Understanding and Applying Linear Regression

### Task 1: Simulate and Visualize

- Use NumPy to simulate a dataset of 100 data points where `y = 3x + noise`.
- Use Matplotlib to plot the data.
- Describe visually whether a linear model seems appropriate.


In [None]:
# Your code here
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)
x = np.random.rand(100, 1) * 10
noise = np.random.randn(100, 1)
y = 3 * x + noise

plt.scatter(x, y)
plt.title("Simulated Linear Data")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

### Task 2: Train a Model and Interpret

- Fit a linear regression model using `sklearn.linear_model.LinearRegression`.
- Report the learned coefficients.
- Plot the regression line along with the data.
- Comment on how well the line fits the data visually and numerically (e.g., using R²).


In [None]:
# Your code here
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

model = LinearRegression()
model.fit(x, y)
y_pred = model.predict(x)

plt.scatter(x, y)
plt.plot(x, y_pred, color='red')
plt.title("Linear Regression Fit")
plt.show()

print("Coefficient:", model.coef_)
print("Intercept:", model.intercept_)
print("R² Score:", r2_score(y, y_pred))

### Task 3: When Linear Regression Fails

- Modify the data so that the relationship is non-linear (e.g., `y = x² + noise`).
- Fit a linear regression model to this new data.
- Plot and interpret the results. What does this tell you about model choice?


In [None]:
# Your code here
x_nonlin = np.random.rand(100, 1) * 10
y_nonlin = x_nonlin**2 + np.random.randn(100, 1)

model_nonlin = LinearRegression()
model_nonlin.fit(x_nonlin, y_nonlin)
y_pred_nonlin = model_nonlin.predict(x_nonlin)

plt.scatter(x_nonlin, y_nonlin)
plt.plot(x_nonlin, y_pred_nonlin, color='red')
plt.title("Fitting Linear Regression on Nonlinear Data")
plt.show()