# Transfory Interactive Test Notebook

Welcome! This notebook allows you to test the `Transfory` library with your own data.

**Instructions:**
1.  **Install Dependencies**: Make sure you have run `pip install -e .` in your terminal from the project root.
2.  **Define Your Data**: Go to the "Step 1" cell and create or load your pandas DataFrame.
3.  **Configure Pipeline**: In "Step 2", you can customize the pipeline by adding, removing, or reordering transformers.
4.  **Run All Cells**: Run the entire notebook to see the transformed data and the insight report.

In [1]:
import pandas as pd
import numpy as np

# Import all components from your Transfory library
from transfory.pipeline import Pipeline
from transfory.missing import MissingValueHandler
from transfory.encoder import Encoder
from transfory.featuregen import FeatureGenerator
from transfory.scaler import Scaler
from transfory.insight import InsightReporter

print("✅ Transfory components imported successfully!")

✅ Transfory components imported successfully!


## Step 1: Define Your DataFrame

Create your pandas DataFrame in the cell below. A sample messy DataFrame is provided for you to get started.

In [2]:
# === YOUR DATAFRAME GOES HERE ===
raw_df = pd.DataFrame({
    "age": [20, 25, 30, np.nan, 22],
    "income": [50000, 60000, np.nan, 55000, 52000],
    "city": ["Manila", "Cebu", "Manila", "Davao", None],
    "gender": ["M", "F", "F", None, "M"],
    "registration_date": pd.to_datetime(["2022-01-01", "2022-02-15", "2022-01-20", "2022-03-10", "2022-02-05"])
})

print("Original DataFrame:")
raw_df

Original DataFrame:


Unnamed: 0,age,income,city,gender,registration_date
0,20.0,50000.0,Manila,M,2022-01-01
1,25.0,60000.0,Cebu,F,2022-02-15
2,30.0,,Manila,F,2022-01-20
3,,55000.0,Davao,,2022-03-10
4,22.0,52000.0,,M,2022-02-05


## Step 2: Define the Transformation Pipeline

Here, we create an `InsightReporter` to track the changes and define a `Pipeline` with all the transformation steps. You can comment out or reorder steps as you wish.

In [3]:
# Create an InsightReporter to capture all events
reporter = InsightReporter()

# Define the full pipeline
full_pipeline = Pipeline([
    # Step 1: Handle missing values
    ("imputer", MissingValueHandler(strategy="mean")), # Use 'mean', 'median', or 'mode'
    
    # Step 2: Convert categorical columns to numbers
    ("encoder", Encoder(method="onehot")), # Use 'label' or 'onehot'
    
    # Step 3: Generate new features from numeric columns
    ("feature_generator", FeatureGenerator(degree=2, include_interactions=True)),
    
    # Step 4: Scale all numeric features
    ("scaler", Scaler(method="zscore")) # Use 'minmax' or 'zscore'
    
], logging_callback=reporter.get_callback()) # Attach the reporter to the pipeline

print("Pipeline defined:")
full_pipeline

Pipeline defined:


<Pipeline (4 steps): imputer → encoder → feature_generator → scaler>

## Step 3: Run the Pipeline

This cell executes the `fit_transform` method on your data, applying all the defined steps.

In [4]:
# Fit the pipeline to the data and transform it
transformed_df = full_pipeline.fit_transform(raw_df)

print("Transformed DataFrame (first 5 rows):")
transformed_df.head()

Transformed DataFrame (first 5 rows):


Unnamed: 0,age,income,registration_date,city_Manila,city_Cebu,city_Davao,gender_M,gender_F,age^p2,income^p2,...,city_Manila_x_city_Cebu,city_Manila_x_city_Davao,city_Manila_x_gender_M,city_Manila_x_gender_F,city_Cebu_x_city_Davao,city_Cebu_x_gender_M,city_Cebu_x_gender_F,city_Davao_x_gender_M,city_Davao_x_gender_F,gender_M_x_gender_F
0,-1.261511,-1.261511,2022-01-01,1.224745,-0.5,-0.5,1.224745,-0.816497,-1.174146,-1.222271,...,0.0,0.0,2.0,-0.5,0.0,0.0,-0.5,0.0,0.0,0.0
1,0.22262,1.70675,2022-02-15,-0.816497,2.0,-0.5,-0.816497,1.224745,0.15066,1.73649,...,0.0,0.0,-0.5,-0.5,0.0,0.0,2.0,0.0,0.0,0.0
2,1.70675,0.0,2022-01-20,1.224745,-0.5,-0.5,-0.816497,1.224745,1.769866,-0.030529,...,0.0,0.0,-0.5,2.0,0.0,0.0,-0.5,0.0,0.0,0.0
3,0.0,0.22262,2022-03-10,-0.816497,-0.5,2.0,-0.816497,-0.816497,-0.066829,0.189865,...,0.0,0.0,-0.5,-0.5,0.0,0.0,-0.5,0.0,0.0,0.0
4,-0.667859,-0.667859,2022-02-05,-0.816497,-0.5,-0.5,1.224745,-0.816497,-0.679552,-0.673555,...,0.0,0.0,-0.5,-0.5,0.0,0.0,-0.5,0.0,0.0,0.0


## Step 4: View the Insight Report

The `InsightReporter` provides a human-readable summary of every action the pipeline took, helping you understand exactly how your data was changed.

In [5]:
# Print the summary from the reporter
print(reporter.summary())

=== Transfory Insight Report ===
Session started: 2025-12-07 23:15:54
Total steps logged: 8

[2025-12-07 23:15:54] Step 'Pipeline' completed a 'fit_transform_step' event.
[2025-12-07 23:15:54] Step 'Pipeline' completed a 'fit_transform_done' event.
[2025-12-07 23:15:54] Step 'Pipeline' completed a 'fit_transform_step' event.
[2025-12-07 23:15:54] Step 'Pipeline' completed a 'fit_transform_done' event.
[2025-12-07 23:15:54] Step 'Pipeline' completed a 'fit_transform_step' event.
[2025-12-07 23:15:54] Step 'Pipeline' completed a 'fit_transform_done' event.
[2025-12-07 23:15:54] Step 'Pipeline' completed a 'fit_transform_step' event.
[2025-12-07 23:15:54] Step 'Pipeline' completed a 'fit_transform_done' event.


You can also view the logs as a DataFrame for easier analysis.

In [6]:
reporter.summary(as_dataframe=True)

Unnamed: 0,timestamp,step,event,details
0,2025-12-07 23:15:54,Pipeline,fit_transform_step,"{'step': 'imputer', 'input_shape': (5, 5)}"
1,2025-12-07 23:15:54,Pipeline,fit_transform_done,"{'step': 'imputer', 'output_shape': (5, 5)}"
2,2025-12-07 23:15:54,Pipeline,fit_transform_step,"{'step': 'encoder', 'input_shape': (5, 5)}"
3,2025-12-07 23:15:54,Pipeline,fit_transform_done,"{'step': 'encoder', 'output_shape': (5, 8)}"
4,2025-12-07 23:15:54,Pipeline,fit_transform_step,"{'step': 'feature_generator', 'input_shape': (..."
5,2025-12-07 23:15:54,Pipeline,fit_transform_done,"{'step': 'feature_generator', 'output_shape': ..."
6,2025-12-07 23:15:54,Pipeline,fit_transform_step,"{'step': 'scaler', 'input_shape': (5, 36)}"
7,2025-12-07 23:15:54,Pipeline,fit_transform_done,"{'step': 'scaler', 'output_shape': (5, 36)}"
