# Demo 3: Feature Engineering

This demo uses the 'flights' dataset to showcase `DatetimeFeatureExtractor` and `FeatureGenerator`, demonstrating how to create new, valuable features from existing data.

In [1]:
import sys
import os
# In a Jupyter notebook, __file__ is not defined. We can use a relative path to add the project root.
# This assumes the notebook is in the 'demo' folder, and 'transfory' is in the parent directory.
project_root = os.path.abspath('..')
if project_root not in sys.path:
    sys.path.insert(0, project_root)

import pandas as pd
import seaborn as sns

from transfory.pipeline import Pipeline
from transfory.datetime import DatetimeFeatureExtractor
from transfory.featuregen import FeatureGenerator
from transfory.scaler import Scaler
from transfory.insight import InsightReporter

### 1. Load and Prepare Data

The 'flights' dataset has 'year' and 'month' as separate columns. We'll combine them into a single `datetime` column to properly demonstrate the `DatetimeFeatureExtractor`.

In [2]:
df = sns.load_dataset('flights')
reporter = InsightReporter()

# Create a proper datetime column
df['date'] = pd.to_datetime(df['year'].astype(str) + '-' + df['month'].astype(str))
df = df[['date', 'passengers']] # Keep only the relevant columns

print("Original Data (first 5 rows):")
display(df.head())

Original Data (first 5 rows):


  df['date'] = pd.to_datetime(df['year'].astype(str) + '-' + df['month'].astype(str))


Unnamed: 0,date,passengers
0,1949-01-01,112
1,1949-02-01,118
2,1949-03-01,132
3,1949-04-01,129
4,1949-05-01,121


### 2. Define and Run the Feature Engineering Pipeline

This pipeline will:
1.  Extract 'month' and 'year' from the `date` column.
2.  Generate polynomial features (e.g., `passengers^2`) and interaction features (e.g., `passengers_x_month`) from all numeric columns.
3.  Scale all the final features to a 0-1 range using `MinMaxScaler`.

In [3]:
pipeline = Pipeline(
    steps=[
        ("date_extractor", DatetimeFeatureExtractor(features=['month', 'year'])),
        ("poly_features", FeatureGenerator(degree=2, include_interactions=True)),
        ("scaler", Scaler(method="minmax"))
    ],
    logging_callback=reporter.get_callback()
)

# Fit and transform the data
transformed_df = pipeline.fit_transform(df)

print("Transformed Data with New Features (first 5 rows):")
display(transformed_df.head())

Transformed Data with New Features (first 5 rows):


Unnamed: 0,passengers,date_month,date_year,passengers^2,date_month^2,date_year^2,passengers_x_date_month,passengers_x_date_year,date_month_x_date_year
0,0.015444,0.0,0.0,0.004595,0.0,0.0,0.0,0.01534,0.0
1,0.027027,0.090909,0.0,0.008264,0.020979,0.0,0.024448,0.026845,0.090353
2,0.054054,0.181818,0.0,0.017571,0.055944,0.0,0.055994,0.05369,0.180706
3,0.048263,0.272727,0.0,0.015489,0.104895,0.0,0.079653,0.047938,0.271058
4,0.032819,0.363636,0.0,0.010171,0.167832,0.0,0.0972,0.032598,0.361411


### 3. Review the Insight Report

The report clearly explains that datetime features were extracted, new polynomial/interaction features were created, and the final dataset was scaled.

In [4]:
print(reporter.summary())

=== Transfory Insight Report ===
Session started: 2025-12-09 06:46:54
Total steps logged: 14

[2025-12-09 06:46:54] Step 'Pipeline' completed a 'fit_transform_step' event.
[2025-12-09 06:46:54] [date_extractor] Step 'DatetimeFeatureExtractor' (DatetimeFeatureExtractor) identified 1 datetime column(s) to process: ['date']. It will extract 2 features: ['month', 'year'].
[2025-12-09 06:46:54] [date_extractor] Step 'DatetimeFeatureExtractor' (DatetimeFeatureExtractor) extracted features from 1 column(s) and dropped the originals.
[2025-12-09 06:46:54] [date_extractor] Step 'DatetimeFeatureExtractor' (DatetimeFeatureExtractor) extracted features from 0 column(s) and dropped the originals.
[2025-12-09 06:46:54] Step 'Pipeline' completed a 'fit_transform_done' event.
[2025-12-09 06:46:54] Step 'Pipeline' completed a 'fit_transform_step' event.
[2025-12-09 06:46:54] [poly_features] Step 'FeatureGenerator' (FeatureGenerator) fitted. It will generate features from 3 numeric column(s).
[2025-12-0