# `Pipeline` Example

This notebook demonstrates how to chain multiple transformers together using the `Pipeline` class to create a complete data preprocessing workflow.

In [None]:
import sys
sys.path.insert(0, '..')

import pandas as pd
import numpy as np

from transfory import Pipeline
from transfory import MissingValueHandler
from transfory import Encoder
from transfory import Scaler

## 1. Create Sample Data

Let's create a DataFrame with missing values and mixed data types to simulate a real-world dataset.

In [2]:
df = pd.DataFrame({
    "Age": [22, 35, np.nan, 19, 40],
    "Income": [40000, 52000, 60000, np.nan, 48000],
    "Gender": ["Male", "Female", "Female", np.nan, "Male"]
})

print("Original DataFrame:")
df

Original DataFrame:


Unnamed: 0,Age,Income,Gender
0,22.0,40000.0,Male
1,35.0,52000.0,Female
2,,60000.0,Female
3,19.0,,
4,40.0,48000.0,Male


## 2. Define and Run the Pipeline

We'll create a `Pipeline` that first imputes missing values (using mean for numeric and mode for categoric), then one-hot encodes the 'Gender' column, and finally scales all numeric features.

In [3]:
pipeline = Pipeline([
    ("imputer", MissingValueHandler(strategy="mean")), # Note: This will also handle the categorical 'Gender' with its mode.
    ("encoder", Encoder(method="onehot")),
    ("scaler", Scaler(method="zscore"))
])

processed_df = pipeline.fit_transform(df)

print("\nProcessed DataFrame:")
processed_df


Processed DataFrame:


Unnamed: 0,Age,Income,Gender_Male,Gender_Female
0,-0.894792,-1.550434,1.224745,-0.816497
1,0.766965,0.310087,-0.816497,1.224745
2,0.0,1.550434,-0.816497,1.224745
3,-1.278275,0.0,-0.816497,-0.816497
4,1.406102,-0.310087,1.224745,-0.816497
