# Pipeline Examples

This notebook demonstrates how to build and customize pipelines. Pipelines orchestrate sequences of steps with built-in event handling, conditional execution, and fitting capabilities for ML workflows.

## Features Demonstrated

- **Basic Pipeline**: Step orchestration with automatic state management
- **Event Hooks**: Custom methods responding to pipeline lifecycle events
- **Priority System**: Multi-priority hook execution with configurable order
- **Fit-Aware Pipelines**: Automatic fitting lifecycle for ML workflows
- **Refit Control**: Preserving or updating learned parameters across runs

## Setup

In [1]:
import sys
import os

# Add the project root to Python path
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
if project_root not in sys.path:
    sys.path.insert(0, project_root)

In [2]:
from src.idspy.core.state import State
from src.idspy.core.step import Step, FitAwareStep
from src.idspy.core.pipeline import Pipeline, FitAwarePipeline, PipelineEvent

## Example Steps

Step definitions with explicit input/output declarations for pipeline validation.

In [3]:
class Load(Step):
    @Step.provides(data=list)
    def run(self, state: State):
        return {"data": [1, 2, 3]}



class Sum(Step):
    @Step.requires(data=list)
    @Step.provides(sum=(int, float))
    def run(self, state: State, data: list):
        return {"sum": sum(data)}



class MeanCenter(FitAwareStep):
    def __init__(self):
        super().__init__()
        self.mean = None

    @Step.requires(data=list)
    def fit_impl(self, state: State, data: list):
        self.mean = sum(data) / len(data)

    @Step.requires(data=list)
    @Step.provides(data=list)
    def run(self, state: State, data: list):
        centered = [x - self.mean for x in data]
        return {"data": centered}



class Accumulate(Step):
    @Step.requires(sum=int, tot=int)
    @Step.provides(tot=int)
    def run(self, state: State, sum: int, tot: int = 0):
        tot += sum
        return {"tot": tot}

## Custom Pipeline with Event Hooks

Register custom methods with `@Pipeline.hook()` to respond to pipeline lifecycle events.

In [4]:
class MyPipeline(Pipeline):
    @Pipeline.hook(PipelineEvent.PIPELINE_START)
    def _start(self, state: State) -> None:
        print("[pipeline] start")

    @Pipeline.hook(PipelineEvent.BEFORE_STEP)
    def _before(self, step: Step, state: State, index: int) -> None:
        print(f"[pipeline] before {index}: {step.name}")

    @Pipeline.hook(PipelineEvent.AFTER_STEP)
    def _after(self, step: Step, state: State, index: int) -> None:
        print(f"[pipeline] after {index}:  {step.name}")

    @Pipeline.hook(PipelineEvent.PIPELINE_END)
    def _end(self, state: State) -> None:
        print("[pipeline] end")


s = State()
p = MyPipeline([Load(), Sum()], name="Plain")
p(s)
print(s.as_dict())
# [pipeline] start
# [pipeline] before 0: Load
# [pipeline] after 0:  Load
# [pipeline] before 1: Sum
# [pipeline] after 1:  Sum
# [pipeline] end
# {'data': [1, 2, 3], 'sum': 6}

[pipeline] start
[pipeline] before 0: Load
[pipeline] after 0:  Load
[pipeline] before 1: Sum
[pipeline] after 1:  Sum
[pipeline] end
{'data': [1, 2, 3], 'sum': 6}


### Multi-Priority Hook System

Multiple hooks for the same event execute by priority order (higher numbers first).

In [5]:
class MultiPriorityPipeline(Pipeline):
    @Pipeline.hook(PipelineEvent.BEFORE_STEP, priority=100)
    def high_priority_hook(self, step: Step, state: State, index: int):
        print(f"HIGH: [id: {index}]")

    @Pipeline.hook(PipelineEvent.BEFORE_STEP, priority=50)
    def medium_priority_hook(self, step: Step, state: State, index: int):
        print(f"MEDIUM: [id: {index}]")

s = State()
p = MultiPriorityPipeline([Load(), Sum()], name="Plain")
p(s)
print(s.as_dict())
# MEDIUM: [id: 0]
# HIGH: [id: 0]
# MEDIUM: [id: 1]
# HIGH: [id: 1]
# {'data': [1, 2, 3], 'sum': 6}

MEDIUM: [id: 0]
HIGH: [id: 0]
MEDIUM: [id: 1]
HIGH: [id: 1]
{'data': [1, 2, 3], 'sum': 6}


## Fit-Aware Pipelines

Automatic fitting lifecycle for ML workflows. Controls refit behavior across runs.

In [9]:
s = State({"data": [1.0, 2.0, 3.0]})
fp = FitAwarePipeline([MeanCenter(), Sum()], name="FitPipe", refit=False)
fp(s)
print("After first run:", s.as_dict())
print("Mean learned:", fp.steps[0].mean)
# After first run: {'data': [-1.0, 0.0, 1.0], 'sum': 0}
# Mean learned: 2.0

# Second run without refit (uses same mean=2.0)
s = State({"data": [2.0, 4.0, 6.0]})
fp(s)
print("Second run (no refit):", s.as_dict())
print("Mean still:", fp.steps[0].mean)
# Second run (no refit): {'data': [0.0, 2.0, 4.0], 'sum': 6}
# Mean still: 2.0

# Pipeline with refit=True (learns new mean=4.0)
s2 = State({"data": [2.0, 4.0, 6.0]})
fp_refit = FitAwarePipeline([MeanCenter(), Sum()], name="FitPipeRefit", refit=True)
fp_refit(s2)
print("With refit=True:", s2.as_dict())
print("New mean learned:", fp_refit.steps[0].mean)
# With refit=True: {'data': [-2.0, 0.0, 2.0], 'sum': 0}
# New mean learned: 4.0

After first run: {'data': [-1.0, 0.0, 1.0], 'sum': 0.0}
Mean learned: 2.0
Second run (no refit): {'data': [0.0, 2.0, 4.0], 'sum': 6.0}
Mean still: 2.0
With refit=True: {'data': [-2.0, 0.0, 2.0], 'sum': 0.0}
New mean learned: 4.0
