braided

A typed-dict pipeline library for Python. Define transformation DAGs in code or YAML/JSON, get automatic type checking, lazy evaluation, and optional caching.

Install

pip install braided

Optional extras:

Extra	What it unlocks
`braided[datasets]`	HuggingFace Datasets map/cache backend (`braided.integrations.hf_datasets`)
`braided[torch]`	Tensor-safe pickle serialization in `Cache`
`braided[jaxtyping]`	Array dtype/shape awareness in the type checker
`braided[dev]`	pytest, pyright, and build tools

Quickstart

from typing import Iterator, TypedDict

from braided import Node, NodeSpec, SequenceInput, execute_pipeline, strand

class Record(TypedDict):
    x: int

@strand
def double(item: Record) -> Record:
    return Record(x=item["x"] * 2)

@strand.one_to_many
def up_to(item: Record) -> Iterator[Record]:
    for i in range(item["x"]):
        yield Record(x=i)

nodes: NodeSpec[Record] = {
    "out": Node(function=up_to, args=["doubled"]),
    "doubled": Node(function=double, args=["seed"]),
}
result = execute_pipeline(nodes, {"seed": SequenceInput[Record]([Record(x=3)])})
print(list(result))
# [{"x": 0}, {"x": 1}, {"x": 2}, {"x": 3}, {"x": 4}, {"x": 5}]

Strand kinds

Decorator	Input → Output	Use for
`@strand`	`T → T'`	one-to-one row transforms
`@strand.one_to_many`	`T → Iterator[T']`	splitting or expanding rows
`@strand.many_to_many`	`Sequence[T] → Iterator[T']`	aggregations, joins, reordering

A strand can take multiple input sequences by declaring multiple parameters. For @strand and @strand.one_to_many, inputs are aligned by position (zipped); for @strand.many_to_many, they are passed as separate sequences:

class Pair(TypedDict):
    a: int
    b: int

@strand
def zip_add(left: Record, right: Record) -> Pair:
    return Pair(a=left["x"], b=right["x"])

nodes: NodeSpec[Pair] = {
    "out": Node(function=zip_add, args=["left", "right"]),
}
result = execute_pipeline(nodes, {
    "left": SequenceInput[Record]([Record(x=1), Record(x=2)]),
    "right": SequenceInput[Record]([Record(x=10), Record(x=20)]),
})
print(list(result))  # [{"a": 1, "b": 10}, {"a": 2, "b": 20}]

Class-based strands inherit from Strand[T].OneToOne(), .OneToMany(), or .ManyToMany(). They can take constructor parameters:

from braided import Strand

class Scale(Strand[Record].OneToOne()):
    def __init__(self, factor: int) -> None:
        self.factor = factor

    def __call__(self, item: Record) -> Record:
        return Record(x=item["x"] * self.factor)

Custom inputs

Pipelines receive data through PipelineInput subclasses. SequenceInput wraps an in-memory list. For other data sources — files, databases, streaming APIs — subclass PipelineInput directly:

from collections.abc import Iterator, Sequence
from typing import overload

from braided import PipelineInput

class CSVInput(PipelineInput[Record]):
    def __init__(self, path: str) -> None:
        import csv
        with open(path) as f:
            self._rows = [Record(x=int(r["x"])) for r in csv.DictReader(f)]

    def __len__(self) -> int:
        return len(self._rows)

    def __iter__(self) -> Iterator[Record]:
        return iter(self._rows)

    @overload
    def __getitem__(self, index: int) -> Record: ...
    @overload
    def __getitem__(self, index: slice) -> Sequence[Record]: ...
    def __getitem__(self, index: int | slice) -> Record | Sequence[Record]:
        return self._rows[index]

Pass it like any other input:

result = execute_pipeline(nodes, {"seed": CSVInput("data.csv")})

Custom inputs can also be instantiated from YAML config (see YAML / JSON config) as long as their constructor arguments use concrete types that jsonargparse can resolve.

YAML / JSON config

Pipelines can be defined in YAML or JSON and loaded at runtime. The function field accepts a dotted import path or a class_path + init_args dict for class-based strands.

# pipeline.yaml
nodes:
  out:
    function:
      class_path: mypackage.Scale
      init_args:
        factor: 10
    args: [doubled]
  doubled:
    function: mypackage.double
    args: [seed]

inputs:
  seed:
    class_path: mypackage.CSVInput
    init_args:
      path: data.csv

from braided import execute_pipeline_from_config

result = list(execute_pipeline_from_config("pipeline.yaml"))

Built-in strands

`Cache`

Cache is a pass-through strand that persists its input sequence to disk on the first run and reloads it on subsequent runs, skipping upstream computation:

from braided import Cache, Node, NodeSpec, SequenceInput, execute_pipeline

nodes: NodeSpec[Record] = {
    "out": Node(function=Cache[Record]("/tmp/my_cache"), args=["source"]),
    "source": Node(function=double, args=["seed"]),
}
# First run: computes and saves to disk.
execute_pipeline(nodes, {"seed": SequenceInput[Record]([Record(x=1), Record(x=2)])})
# Second run: loads from disk; "source" is never evaluated.
result = list(execute_pipeline(nodes, {"seed": SequenceInput[Record]([])}))

`join` / `Join`

Inner join on a shared key column:

from braided import Join, Node, NodeSpec, SequenceInput, execute_pipeline

nodes: NodeSpec[dict] = {
    "out": Node(function=Join[dict]("id"), args=["left", "right"]),
}
result = execute_pipeline(nodes, {
    "left": SequenceInput[dict]([{"id": 1, "val": "a"}]),
    "right": SequenceInput[dict]([{"id": 1, "score": 42}]),
})
print(list(result))  # [{"id": 1, "val": "a", "score": 42}]

Custom execution backends

Pass custom map, flat_map, or many_to_many callables to execute_pipeline to control how sequences are materialized — for example, using the HuggingFace Datasets backend:

from braided.integrations.hf_datasets import hf_map_funcs

result = execute_pipeline(nodes, inputs, **hf_map_funcs())

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
examples		examples
src/braided		src/braided
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

braided

Contents

Install

Quickstart

Strand kinds

Custom inputs

YAML / JSON config

Built-in strands

`Cache`

`join` / `Join`

Custom execution backends

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

braided

Contents

Install

Quickstart

Strand kinds

Custom inputs

YAML / JSON config

Built-in strands

Cache

join / Join

Custom execution backends

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`Cache`

`join` / `Join`

Packages