# Demo 1: Basic Sequential Pipeline

This demo uses the 'titanic' dataset to show a standard, sequential cleaning pipeline. It's a great introduction to the core concepts of `Pipeline` and `InsightReporter`.

In [1]:
import sys
import os
# In a Jupyter notebook, __file__ is not defined. We can use a relative path to add the project root.
# This assumes the notebook is in the 'demo' folder, and 'transfory' is in the parent directory.
project_root = os.path.abspath('..')
if project_root not in sys.path:
    sys.path.insert(0, project_root)


import pandas as pd
import seaborn as sns

from transfory import Pipeline
from transfory import MissingValueHandler
from transfory import Encoder
from transfory import Scaler
from transfory import InsightReporter

### 1. Load Data

First, we load the 'titanic' dataset and select a subset of columns for this demonstration. We also initialize the `InsightReporter` which will capture all the transformation events.

In [2]:
df = sns.load_dataset('titanic')
reporter = InsightReporter()

# We only need a subset of columns for this demo
df_subset = df[['age', 'fare', 'embarked', 'sex', 'pclass']].copy()

print("Original Data (first 5 rows):")
display(df_subset.head())
print(f"\nOriginal data has missing values: {df_subset.isnull().values.any()}")

Original Data (first 5 rows):


Unnamed: 0,age,fare,embarked,sex,pclass
0,22.0,7.25,S,male,3
1,38.0,71.2833,C,female,1
2,26.0,7.925,S,female,3
3,35.0,53.1,S,female,1
4,35.0,8.05,S,male,3



Original data has missing values: True


### 2. Define and Run the Pipeline

Here, we define a `Pipeline` with three sequential steps:
1.  `MissingValueHandler`: Fills missing 'age' values with the column mean.
2.  `Encoder`: One-hot encodes the 'sex' and 'embarked' columns.
3.  `Scaler`: Applies z-score scaling to all resulting numeric columns.

In [3]:
pipeline = Pipeline(
    steps=[
        ("imputer", MissingValueHandler(strategy="mean")), # Fills missing 'age'
        ("encoder", Encoder(method="onehot")),           # Encodes 'sex' and 'embarked'
        ("scaler", Scaler(method="zscore"))              # Scales 'age', 'fare', and 'pclass'
    ],
    logging_callback=reporter.get_callback()
)

# Fit and transform the data
transformed_df = pipeline.fit_transform(df_subset)

print("Transformed Data (first 5 rows):")
display(transformed_df.head())
print(f"\nTransformed data has missing values: {transformed_df.isnull().values.any()}")

Transformed Data (first 5 rows):


Unnamed: 0,age,fare,pclass,embarked_S,embarked_C,embarked_Q,sex_male,sex_female
0,-0.592481,-0.502445,0.827377,0.619306,-0.482043,-0.307562,0.737695,-0.737695
1,0.638789,0.786845,-1.566107,-1.61471,2.074505,-0.307562,-1.355574,1.355574
2,-0.284663,-0.488854,0.827377,0.619306,-0.482043,-0.307562,-1.355574,1.355574
3,0.407926,0.42073,-1.566107,0.619306,-0.482043,-0.307562,-1.355574,1.355574
4,0.407926,-0.486337,0.827377,0.619306,-0.482043,-0.307562,0.737695,-0.737695



Transformed data has missing values: False


### 3. Review the Insight Report

Finally, we print the summary from the `InsightReporter`. It provides a clear, human-readable log of every action the pipeline took, showing how the data was cleaned and transformed at each stage.

In [4]:
print(reporter.summary())

=== Transfory Insight Report ===
Session started: 2025-12-09 22:04:20
Total steps logged: 13

[2025-12-09 22:04:20] Step 'Pipeline' completed a 'fit_transform_step' event.
[2025-12-09 22:04:20] [imputer] Step 'MissingValueHandler' (MissingValueHandler) learned imputation values using 'mean' for 1 column(s). Values: age: 29.70.
[2025-12-09 22:04:20] [imputer] Step 'MissingValueHandler' (MissingValueHandler) applied imputation to the data.
[2025-12-09 22:04:20] Step 'Pipeline' completed a 'fit_transform_done' event.
[2025-12-09 22:04:20] Step 'Pipeline' completed a 'fit_transform_step' event.
[2025-12-09 22:04:20] [encoder] Step 'Encoder' (Encoder) fitted for 'onehot' encoding on 2 column(s). This will create 5 new columns.
[2025-12-09 22:04:20] [encoder] Step 'Encoder' (Encoder) applied 'onehot' encoding, creating 5 new columns and removing originals.
[2025-12-09 22:04:20] [encoder] Step 'Encoder' completed a 'transform' event.
[2025-12-09 22:04:20] Step 'Pipeline' completed a 'fit_tran