# Feature Engineering

This notebook is a **companion to `04_feature_engineering.md`**.

Purpose:
- Demonstrate common feature transformations
- Show impact on model performance

---

In [None]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

np.random.seed(42)

## Synthetic Dataset

We create a dataset where a nonlinear transformation improves performance.

---

In [None]:
X = np.linspace(1, 10, 200)
y = np.log(X) + np.random.normal(0, 0.1, size=len(X))

df = pd.DataFrame({'x': X, 'y': y})

X_train, X_test, y_train, y_test = train_test_split(df[['x']], df['y'], test_size=0.3)

## Baseline Model (No Feature Engineering)

---

In [None]:
baseline = LinearRegression()
baseline.fit(X_train, y_train)

baseline_mse = mean_squared_error(y_test, baseline.predict(X_test))
baseline_mse

## Log-Transformed Feature

---

In [None]:
X_train_fe = np.log(X_train)
X_test_fe = np.log(X_test)

fe_model = LinearRegression()
fe_model.fit(X_train_fe, y_train)

fe_mse = mean_squared_error(y_test, fe_model.predict(X_test_fe))
fe_mse

## Comparison

---

In [None]:
pd.DataFrame({
    'Approach': ['Baseline', 'Log Feature'],
    'Test MSE': [baseline_mse, fe_mse]
})