Skip to content

GCLNS/datautil-transform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

datautil-transform

Tests Python License Type Checked Code Style Version

Composable data transformation pipelines. Build reusable chains of key mapping, value transforms, and filtering operations on record lists.

Installation

pip install datautil-transform

With dev tools:

pip install datautil-transform[dev]

Quick Start

from datautil_transform import (
    Pipeline, map_keys, map_values, rename_keys, where, reject, unique_by,
)

# Build a text-processing pipeline
clean = Pipeline().then(str.strip).then(str.lower).then(str.title)
clean("  hello world  ")
# 'Hello World'

# Rename keys across records
data = [{"first_name": "Alice", "age": 30}, {"first_name": "Bob", "age": 25}]
rename_keys({"first_name": "name"}, data)
# [{'name': 'Alice', 'age': 30}, {'name': 'Bob', 'age': 25}]

# Uppercase all keys
map_keys(str.upper, data)
# [{'FIRST_NAME': 'Alice', 'AGE': 30}, ...]

# Filter records
where(lambda r: r["age"] >= 30, data)
# [{'first_name': 'Alice', 'age': 30}]

# Deduplicate by a field
records = [{"id": 1, "v": "a"}, {"id": 1, "v": "b"}, {"id": 2, "v": "c"}]
unique_by(lambda r: r["id"], records)
# [{'id': 1, 'v': 'a'}, {'id': 2, 'v': 'c'}]

API Reference

Pipeline

Pipeline()

Create an empty pipeline. Chain steps with .then(fn) and execute with .run(data) or by calling directly.

p = Pipeline().then(int).then(lambda x: x * 2)
p("21")
# 42

Mappers

map_keys(fn, records)

Apply a function to every key in each dict.

map_values(fn, records)

Apply a function to every value in each dict.

rename_keys(mapping, records)

Rename keys using an old-to-new mapping. Unmapped keys are kept.

Filters

where(predicate, records)

Keep records matching the predicate.

reject(predicate, records)

Remove records matching the predicate.

unique_by(key, records)

Deduplicate by a key function, keeping first occurrence.

Development

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

pytest
mypy src/
ruff check src/ tests/

Project Structure

src/datautil_transform/
├── __init__.py     # Public API re-exports
├── pipeline.py     # Composable transformation pipeline
├── mapper.py       # Key and value mapping utilities
└── filter.py       # Filtering and deduplication

Requirements

  • Python 3.10+
  • No runtime dependencies

About

Composable data pipelines, mappers, and filters

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages