Skip to content

GCLNS/datautil-helpers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

datautil-helpers

Tests Python License Type Checked Code Style Version

Lightweight utilities for flattening, chunking, and converting data structures. No heavy dependencies — just small, composable functions for everyday data wrangling.

Installation

pip install datautil-helpers

With dev tools:

pip install datautil-helpers[dev]

Quick Start

from datautil_helpers import (
    flatten, flatten_keys,
    chunk, sliding_window,
    to_csv, to_jsonl, from_jsonl,
)

# Flatten nested lists
flatten([1, [2, [3, 4]], 5])
# [1, 2, 3, 4, 5]

# Flatten nested dicts to dot-separated keys
flatten_keys({"a": {"b": 1, "c": {"d": 2}}})
# {'a.b': 1, 'a.c.d': 2}

# Split into fixed-size chunks
chunk(range(10), 3)
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

# Overlapping sliding windows
sliding_window(range(6), width=3, step=2)
# [[0, 1, 2], [2, 3, 4]]

# Convert dicts to CSV
to_csv([{"name": "Alice", "score": "92"}, {"name": "Bob", "score": "87"}])
# 'name,score\nAlice,92\nBob,87\n'

# Write and read JSONL
to_jsonl([{"x": 1}, {"x": 2}], path="data.jsonl")
records = from_jsonl("data.jsonl")

API Reference

Flatten

flatten(iterable, depth=-1)

Recursively flatten nested lists and tuples. Pass depth to limit how many levels are unwrapped (-1 for unlimited).

flatten([1, [2, [3]]], depth=1)
# [1, 2, [3]]

flatten_keys(mapping, sep=".", prefix="")

Flatten a nested dictionary into a single-level dict with compound keys.

flatten_keys({"db": {"host": "localhost", "port": 5432}})
# {'db.host': 'localhost', 'db.port': 5432}

Chunking

chunk(seq, size)

Split a sequence into sub-lists of at most size elements. The last chunk may be shorter.

chunk("abcdefg", 3)
# [['a', 'b', 'c'], ['d', 'e', 'f'], ['g']]

sliding_window(seq, width, step=1)

Return overlapping windows of width elements, advancing by step.

sliding_window([1, 2, 3, 4, 5], width=3, step=1)
# [[1, 2, 3], [2, 3, 4], [3, 4, 5]]

Conversion

to_csv(rows, path=None)

Convert a list of flat dicts to a CSV string. All dicts should share the same keys. Optionally writes to path.

to_csv([{"a": "1", "b": "2"}], path="out.csv")

to_jsonl(records, path=None)

Serialize a list of dicts as newline-delimited JSON. Optionally writes to path.

from_jsonl(source)

Read JSONL from a file path or a raw string. Returns a list of dicts.

from_jsonl('{"x":1}\n{"x":2}\n')
# [{'x': 1}, {'x': 2}]

Development

# Create virtual environment and install dev dependencies
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# Run tests
pytest

# Run a single test
pytest tests/test_flatten.py::test_flatten_keys -v

# Type checking (strict mode)
mypy src/

# Lint
ruff check src/ tests/

Project Structure

src/datautil_helpers/
├── __init__.py     # Public API re-exports
├── flatten.py      # Nested structure flattening
├── chunk.py        # Sequence splitting and windowing
└── convert.py      # CSV and JSONL conversion

Requirements

  • Python 3.10+
  • No runtime dependencies

About

Lightweight list and data-structure helpers for Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages