# Notebook 07 ‚Äî Custom Feature Transformations
üìÅ File name: 07_custom_transformations.ipynb

This notebook teaches how to apply custom logic to your data using:

 - FunctionTransformer ‚Äî for quick functions

 - Custom classes ‚Äî for reusable, modular transformations
It‚Äôs ideal for when built-in transformers don‚Äôt meet your specific needs.

üìí Notebook Sections
1. Title & Intro
2. When Custom Transformations Are Needed
3. FunctionTransformer (Lambda-style)
4. Custom Transformer (Class-based)
5. Use in sklearn Pipeline
6. Summary & What‚Äôs Next

## 1. Title & Introduction (Markdown)
### 07 ‚Äî Custom Feature Transformations

In this notebook, we‚Äôll learn how to build **custom transformers** to apply your own logic during preprocessing.

We‚Äôll use:

- `FunctionTransformer` for simple functions  
- Custom Python classes for more flexible logic  
- Integration with `Pipeline` to keep things modular

## 2. When Do You Need Custom Transforms? (Markdown)
###  Why Use Custom Transformations?

Built-in tools like `StandardScaler` or `PolynomialFeatures` are great ‚Äî but sometimes you need:

- Domain-specific rules (e.g. log1p of skewed features)
- External feature mapping (e.g. map zip codes to regions)
- Combined transformations across multiple columns

That‚Äôs when custom transformers shine.


## 3. FunctionTransformer ‚Äî Quick Example

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import FunctionTransformer

# Load dataset
df = pd.read_csv("../data/sample_data.csv")

# Apply log1p (log(x + 1)) to skewed numeric column
log_transformer = FunctionTransformer(np.log1p, validate=True)

df["Income_log"] = log_transformer.fit_transform(df[["Income"]])
df[["Income", "Income_log"]].head()

## 4. Build a Custom Transformer Class

In [None]:
from sklearn.base import BaseEstimator, TransformerMixin

class ColumnDifference(BaseEstimator, TransformerMixin):
    """
    Custom transformer to compute the difference between two columns
    """
    def __init__(self, col1, col2, new_col_name="diff"):
        self.col1 = col1
        self.col2 = col2
        self.new_col_name = new_col_name

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        X_copy = X.copy()
        X_copy[self.new_col_name] = X_copy[self.col1] - X_copy[self.col2]
        return X_copy

In [None]:
# Example: Income - Expenses ‚Üí Net Income
custom_diff = ColumnDifference(col1="Income", col2="Expenses", new_col_name="NetIncome")
df_transformed = custom_diff.fit_transform(df)

df_transformed[["Income", "Expenses", "NetIncome"]].head()

## 5. Use in Pipeline

In [None]:
from sklearn.pipeline import Pipeline

# Combine log transform + custom difference
pipeline = Pipeline([
    ("log_income", FunctionTransformer(np.log1p, feature_names_out="one-to-one")),
    ("net_income", ColumnDifference(col1="Income", col2="Expenses", new_col_name="NetIncome"))
])

# NOTE: FunctionTransformer in a pipeline will need a selector or ColumnTransformer wrapper.

## 6. Summary & What‚Äôs Next (Markdown)
### Summary

In this notebook, we:

- Used `FunctionTransformer` for quick transformations like `log1p`
- Built a custom class to compute column differences
- Prepared these for integration into pipelines

**Next Up**: `08_dimensionality_reduction.ipynb`  
We‚Äôll explore how to reduce feature count using **PCA** and **TSNE**.
