# Module 2: Linear Regression 1 Practice

## Introduction
In this notebook, you'll learn how to implement a simple Linear Regression model using scikit-learn on a univariate dataset.

## Initial Knowledge Check
1. What is the goal of linear regression?
2. Explain the difference between the slope and intercept in a linear model.
3. What is the meaning of Mean Squared Error (MSE)?

In [None]:
import pandas as pd

# Load dataset
df = pd.read_csv('./data/linear_regression_1.csv')
X = df[['feature']]
y = df['target']
df.head()

## 2. Exploratory Data Analysis
Visualize the relationship between the feature and the target with a scatter plot.


In [None]:
import matplotlib.pyplot as plt

plt.scatter(X, y, edgecolor='k')
plt.title('Scatter Plot: Feature vs Target')
plt.xlabel('Feature')
plt.ylabel('Target')
plt.show()

## 3. Train a Linear Regression Model
We'll train a `LinearRegression` model and evaluate using MSE and R².


In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Instantiate and train
lr = LinearRegression()
lr.fit(X, y)

# Predict and evaluate
y_pred = lr.predict(X)
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)
print(f"Intercept: {lr.intercept_:.2f}")
print(f"Coefficient: {lr.coef_[0]:.2f}")
print(f"MSE: {mse:.2f}")
print(f"R²: {r2:.2f}")

## 4. Exercise for the Student
**Task:**  
1. Split the data into a train/test set (80/20).  
2. Train the model on the training set and evaluate on both training and test sets (report MSE and R²).  
3. Plot the regression line over the scatter plot for the test set.  
4. **Bonus:** Try adding noise or outliers to the data and observe the effect on the model.


## 5. Solution
Below is one possible solution, including train/test split and plotting.


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train
lr2 = LinearRegression()
lr2.fit(X_train, y_train)

# Predict
y_train_pred = lr2.predict(X_train)
y_test_pred = lr2.predict(X_test)

# Metrics
print("Train MSE:", mean_squared_error(y_train, y_train_pred))
print("Train R²:", r2_score(y_train, y_train_pred))
print("Test MSE:", mean_squared_error(y_test, y_test_pred))
print("Test R²:", r2_score(y_test, y_test_pred))

# Plot test predictions
plt.scatter(X_test, y_test, label='True', edgecolor='k')
plt.plot(X_test, y_test_pred, label='Predicted', linewidth=2)
plt.title('Test Set: True vs Predicted')
plt.xlabel('Feature')
plt.ylabel('Target')
plt.legend()
plt.show()

---
### Next Steps
- Experiment with polynomial features to capture non-linear relationships.
- Review assumptions of linear regression and think about diagnostics.
- Prepare for Linear Regression 2 by considering multivariate data.
