# Step Examples

This notebook demonstrates the `Step` API with decorator-based configuration and pure run methods. Steps separate business logic from state management for better testability and composability.

## Features Demonstrated

- **Basic Steps**: Decorator-based input/output declarations with `@Step.requires` and `@Step.provides`
- **Validation**: Automatic validation of required outputs and type safety
- **Conditional Steps**: Steps that execute only when specific conditions are met
- **Data-Driven Logic**: Decision making based on actual data characteristics
- **Fit-Aware Steps**: Two-phase ML workflow with separate fitting and execution phases
- **Error Handling**: Proper exception handling for unfitted steps and missing outputs
- **Scoped Step**: Allows you to initialize a step with a restricted view of the state
- **Repeat Decorator**: Repeatable execution with `@Step.repeat` supporting count and early stopping predicates
- **Contextual Steps**: Steps that execute within a context manager for resource management and cleanup

## Setup

In [1]:
import sys
import os

# Add the project root to Python path
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
if project_root not in sys.path:
    sys.path.insert(0, project_root)

In [2]:
from src.idspy.core.state import State
from src.idspy.core.step import Step, ConditionalStep, FitAwareStep

In [3]:
# Ricarica il modulo per ottenere la versione aggiornata con il decoratore @Step.repeat
import importlib
import src.idspy.core.step
importlib.reload(src.idspy.core.step)

from src.idspy.core.state import State
from src.idspy.core.step import Step, ConditionalStep, FitAwareStep

## Basic Step with Decorators

Decorator-based input/output declarations. Pure `run` methods with automatic validation.

In [4]:
class MakeSum(Step):
    @Step.requires(data=list)
    @Step.provides(sum=int)
    def run(self, state: State, data=None):
        return {"sum": sum(data)}

s = State({"data": [1, 2, 3]})
MakeSum()(s)
s

State(keys_and_types=[('data', 'list'), ('sum', 'int')])

### Validation: Missing Required Outputs

Automatic validation ensures steps produce all declared outputs.

In [5]:
class NoopProvides(Step):
    @Step.provides(x=int)
    def run(self, state: State):
        # forgets to return {"x": some_value}
        return {}


s = State()
try:
    NoopProvides()(s)
except KeyError as e:
    print(e)  # -> NoopProvides is missing key 'x'

"Returned dict from step 'NoopProvides' is missing key 'x'"


## Conditional Steps

Steps that execute only when conditions are met via `should_run()` method.

In [6]:
class MaybeNormalize(ConditionalStep):

    @Step.requires(normalize=bool)
    def should_run(self, state: State, normalize:bool) -> bool:
        return normalize

    @Step.requires(data=list)
    @Step.provides(data=list)
    def run(self, state: State, data: list) -> None:
        m = sum(data) / len(data)
        normalized = [x - m for x in data]
        return {"data": normalized}

    def on_skip(self, state: State) -> None:
        print(f"[skip] {self.name} because normalize flag is False")


s = State({"data": [1, 2, 3], "normalize": False})
MaybeNormalize()(s)  # skipped → prints message
s.read_only_view()
# {'data': [1, 2, 3], 'normalize': False}

[skip] MaybeNormalize because normalize flag is False


mappingproxy({'data': [1, 2, 3], 'normalize': False})

In [7]:
s = State({"data": [1, 2, 3], "normalize": True})
MaybeNormalize()(s)  # runs
s.read_only_view()
# {'data': [-1.0, 0.0, 1.0], 'normalize': True}

mappingproxy({'data': [-1.0, 0.0, 1.0], 'normalize': True})

### Data-Driven Conditional Steps

Decision making based on actual data characteristics for dynamic pipeline behavior.

In [8]:
class TrainIfEnoughData(ConditionalStep):
    def __init__(self, min_len: int = 3):
        super().__init__()
        self.min_len = min_len

    @Step.requires(data=list)
    def should_run(self, state: State, data: list) -> bool:
        return len(data) >= self.min_len

    @Step.requires(data=list)
    @Step.provides(trained=bool)
    def run(self, state: State, data: list) -> None:
        # pretend training...
        return {"trained": True}

    def on_skip(self, state: State) -> None:
        print(f"[skip] {self.name} because not enough data (need at least {self.min_len})")


s = State({"data": [1, 2]})
TrainIfEnoughData(min_len=3)(s)  # skipped

try:
    s.get("trained", None)
except KeyError as e:
    print(e)  # -> 'trained' not found in state

s.set("data", [1, 2, 3, 4], list)
TrainIfEnoughData(min_len=3)(s)  # runs
s.get("trained", bool)
# True

[skip] TrainIfEnoughData because not enough data (need at least 3)
"Missing key 'trained'"


True

## Fit-Aware Steps

Two-phase ML workflow: `fit_impl()` for learning, `run()` for applying learned parameters.

In [9]:
class MeanCenter(FitAwareStep):
    def __init__(self):
        super().__init__()
        self.mean = None

    @Step.requires(data=list)
    def fit_impl(self, state: State, data: list) -> None:
        self.mean = sum(data) / len(data)

    @Step.requires(data=list)
    @Step.provides(data=list)
    def run(self, state: State, data: list) -> None:
        centered = [x - self.mean for x in data]
        return {"data": centered}

### Error Handling: Unfitted Steps

Framework prevents running unfitted steps with clear error messages.

In [10]:
s = State({"data": [1.0, 2.0, 3.0]})
step = MeanCenter()

try:
    step(s)
except RuntimeError as e:
    print(e)  # 'MeanCenter' is not fitted.

'MeanCenter' is not fitted.


### Proper Usage: Fit Then Run

Fitted steps store learned parameters internally, keeping pipeline state clean.

In [11]:
step = MeanCenter()
s = State({"data": [1.0, 2.0, 3.0]})

step.fit(s)  # computes and stores mean internally
print(f"Mean computed: {step.mean}")  # Mean computed: 2.0

step(s)  # apply centering
print(s.read_only_view())  # {'data': [-1.0, 0.0, 1.0]}

Mean computed: 2.0
{'data': [-1.0, 0.0, 1.0]}


## Scoped Step

By scoping a step, you ensure that it only reads from and writes to the portion of the state relevant to its context, avoiding conflicts between similarly named data in different namespaces.

In [12]:
class Sort(Step):
    @Step.requires(data=list)
    @Step.provides(data=list)
    def run(self, state: State, data: list) -> None:
        return {"data": sorted(data)}

s = State({
    "data": [3.0, 1.0, 2.0],
    "user.data": [3.0, 1.0, 2.0],
    "company.data": [3.0, 1.0, 2.0]
})

Sort(in_scope="user", out_scope="user")(s)
print(s.read_only_view())

Sort()(s)
print(s.read_only_view())

{'data': [3.0, 1.0, 2.0], 'user.data': [1.0, 2.0, 3.0], 'company.data': [3.0, 1.0, 2.0]}
{'data': [1.0, 2.0, 3.0], 'user.data': [1.0, 2.0, 3.0], 'company.data': [3.0, 1.0, 2.0]}


## Repeat Decorator

The `@Step.repeat` decorator allows you to repeat the execution of a `run` method multiple times, with support for early stopping conditions.

In [13]:
class Add(Step):
    @Step.repeat(count=3)
    @Step.requires(a=int, b=int)
    @Step.provides(a=int)
    def run(self, state: State, a: int, b: int) -> None:
        print(f"{a} + {b} = {a + b}")
        return {"a": a + b}


s = State({"a": 1, "b": 2})
Add()(s)

1 + 2 = 3
3 + 2 = 5
5 + 2 = 7


### Repeat Decorator with Predicate

The decorator can also stop early when a condition is satisfied.

In [14]:
class Add(Step):
    @Step.repeat(predicate=lambda state: state.get("a", int) > 10)
    @Step.requires(a=int, b=int)
    @Step.provides(a=int)
    def run(self, state: State, a: int, b: int) -> None:
        print(f"{a} < 10")
        return {"a": a + b}


s = State({"a": 1, "b": 2})
Add()(s)

1 < 10
3 < 10
5 < 10
7 < 10
9 < 10


## Contextual Steps

Steps that execute within a context manager for automatic resource management and cleanup. The `ContextualStep` wraps another step and provides a context manager that is available as a `context` parameter during execution.

In [15]:
from src.idspy.core.step import ContextualStep
from contextlib import contextmanager
import tempfile
import os

class FileWriterStep(Step):
    @Step.requires(filename=str, content=str)
    def run(self, state: State, filename: str, content: str, context=None) -> None:
        """Write content to a file. Uses context if provided."""
        file_path = filename
        if context and hasattr(context, 'name'):
            # Use temporary directory if provided by context
            file_path = os.path.join(context.name, filename)

        with open(file_path, 'w') as f:
            f.write(content)

        print(f"File written to: {file_path}")
        return None

class TempDirContextualStep(ContextualStep):
    """Contextual step that provides a temporary directory."""

    @contextmanager
    def context(self, state: State):
        """Create and cleanup a temporary directory."""
        with tempfile.TemporaryDirectory() as temp_dir:
            print(f"Created temporary directory: {temp_dir}")
            yield type('TempDir', (), {'name': temp_dir})()
            print(f"Cleaned up temporary directory: {temp_dir}")

# Example usage
s = State({
    "filename": "example.txt",
    "content": "Hello from contextual step!"
})

# Create a contextual step that wraps our file writer
writer_step = FileWriterStep()
contextual_writer = TempDirContextualStep(writer_step)

contextual_writer(s)

Created temporary directory: /var/folders/9q/5fdsccl51c34cl9rkc28dfx40000gn/T/tmpm8_agmo_
File written to: /var/folders/9q/5fdsccl51c34cl9rkc28dfx40000gn/T/tmpm8_agmo_/example.txt
Cleaned up temporary directory: /var/folders/9q/5fdsccl51c34cl9rkc28dfx40000gn/T/tmpm8_agmo_
