# 📅 Day 7: Data Splitting and Feature Engineering

## 🎯 Objective
Learn how to split datasets and perform essential feature engineering techniques before modeling.

## 🔧 What is Feature Engineering?
Feature Engineering is the process of transforming raw data into features that better represent the underlying problem to predictive models, resulting in improved model accuracy.

### 🛠 Key Concepts
- Train-Test Split
- One-Hot Encoding
- Feature Scaling (Standardization)
- Polynomial Features
- Feature Selection

## 📘 Dataset: California Housing Dataset
We will use this for regression tasks.

In [None]:
from sklearn.datasets import fetch_california_housing
import pandas as pd

data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target
df.head()

## ✂️ Step 1: Splitting the Data

In [None]:
from sklearn.model_selection import train_test_split

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print("Training set:", X_train.shape)
print("Testing set:", X_test.shape)

## 🔄 Step 2: Feature Scaling

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## ➕ Step 3: Polynomial Features

In [None]:
from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures(degree=2, include_bias=False)
X_train_poly = poly.fit_transform(X_train_scaled)
X_test_poly = poly.transform(X_test_scaled)

print("Original shape:", X_train.shape)
print("Transformed shape:", X_train_poly.shape)

## 📉 Step 4: Feature Selection (Optional Bonus)

In [None]:
from sklearn.feature_selection import SelectKBest, f_regression

selector = SelectKBest(score_func=f_regression, k=5)
X_train_selected = selector.fit_transform(X_train_scaled, y_train)
X_test_selected = selector.transform(X_test_scaled)

selected_features = X.columns[selector.get_support()]
print("Top selected features:", selected_features.tolist())

## ✅ Summary
- You’ve learned how to split data, scale features, generate polynomial features, and select top features.
- These are essential preprocessing steps before training your ML models.