Releases: CosminBMemetea/canml
v0.1.14-alpha
🎉 v0.1.14 – “Category Cleanse” 🎉
No more duplicate‐category crashes—v0.1.14 rounds out our enum handling with an important fix that deduplicates Categorical labels before assignment, ensuring flawless DataFrame creation even when your DBC defines repeated labels.
✨ What’s New in v0.1.14
🎨 Robust Enum Mapping
Unique categories guaranteed
Before building a pandas Categorical, we now dedupe the list of labels (preserving first‐seen order), so you’ll never hit “categories must be unique” errors—even if your DBC uses the same label multiple times.
🔧 Minor Tweaks & Polish
Improved error messaging around enum conversion failures.
Refreshed in‐docs example to showcase the deduped enum workflow.
🔧 Quickstart
pip install canml==0.1.13from canml.canmlio import load_blf, to_csv, to_parquet, CanmlConfig
# 🔧 Configure for maximum resilience
cfg = CanmlConfig(
chunk_size=5000,
force_uniform_timing=True,
interpolate_missing=True,
dtype_map={"Speed": "float32", "RPM": "Int64"},
)
# 🚀 Full-file BLF decode with drop-summary logging & enum proofing
df = load_blf(
blf_path="drive.blf",
db="vehicle.dbc",
config=cfg,
message_ids={0x100, 0x200},
expected_signals=["Speed", "RPM"]
)
# 🔍 Inspect attributes & export
print(df.attrs["signal_attributes"])
to_csv(df, "out.csv", metadata_path="out_meta.json")
to_parquet(df, "out.parquet", metadata_path="out_meta.json")Huge thanks to everyone who pushed edge-case testing to the limit—canml is now truly unstoppable! 🎉
v0.1.13-alpha
🎉 v0.1.13 – “Unbreakable Resilience” 🎉
Our toughest release yet—nothing short of a tank can stop it! v0.1.13 rolls out across-the-board hardening: from collision-proof DBC prefixing and drop-summary logging, to upfront dtype checks, enum-mapping bulletproofing, and export utilities that never break your script.
✨ What’s New in v0.1.13
🛠 Collision-Tolerant DBC Merging
Frame-ID fallback
Unique signal prefixes now automatically include frame_id (or arbitration_id / sequence index) when message names collide—no more prefixing errors or silent overwrites.
Reserved-name protection
"timestamp" and "raw_timestamp" are never allowed as injected signals, so your timing columns stay pristine.
🔍 Drop-Summary & Filter Resilience
Drop-summary logging
iter_blf_chunks() now emits a concise “Decoded X/Y messages (Z dropped)” log so you always know exactly what made it through.
Filter-key safety
Non-string or unhashable filter_signals entries are quietly skipped—your pipeline never crashes on weird keys.
⚡ Immediate Type Safety
dtype_map validation
Every entry in dtype_map is checked at load time—invalid numpy/pandas dtypes raise a clear ValueError immediately, not halfway through processing.
Missing-signal injection
Integers → zero-filled, floats → NaN-filled, and—if interpolate_missing=True—linear interpolation for signals you know should be continuous.
🎨 Enum & Attribute Bulletproofing
String-based mapping
All enum values (including NameSignalValue objects) are converted via string lookups. Unknown raw values pass through, never crash or become NaN.
Rich metadata sidecars
Your custom DBC attributes (BA_DEF_SG_, etc.) ride along in df.attrs["signal_attributes"] and are exported to JSON automatically.
📂 Streamlined, Fail-Safe Exports
Auto-mkdir
to_csv() and to_parquet() now create parent directories for both data and metadata paths—no more “directory not found” errors.
Side-car metadata
Exports signal-attribute JSON alongside your data, keeping schema and semantics in sync.
Graceful failures
Parquet write errors are caught, logged, and raised as ValueError so you get a clear Python exception rather than a cryptic pyarrow traceback.
🔧 Quickstart
pip install canml==0.1.13from canml.canmlio import load_blf, to_csv, to_parquet, CanmlConfig
# 🔧 Configure for maximum resilience
cfg = CanmlConfig(
chunk_size=5000,
force_uniform_timing=True,
interpolate_missing=True,
dtype_map={"Speed": "float32", "RPM": "Int64"},
)
# 🚀 Full-file BLF decode with drop-summary logging & enum proofing
df = load_blf(
blf_path="drive.blf",
db="vehicle.dbc",
config=cfg,
message_ids={0x100, 0x200},
expected_signals=["Speed", "RPM"]
)
# 🔍 Inspect attributes & export
print(df.attrs["signal_attributes"])
to_csv(df, "out.csv", metadata_path="out_meta.json")
to_parquet(df, "out.parquet", metadata_path="out_meta.json")Huge thanks to everyone who pushed edge-case testing to the limit—canml is now truly unstoppable! 🎉
v0.1.12-alpha
🎉 v0.1.12 – “Edge-Case Exterminator” 🎉
Your most battle-tested release yet! v0.1.12 shores up every remaining corner case so that nothing—not malformed DBCs, weird BLF messages, or unexpected enum types—can slip through or crash your pipeline.
✨ What’s New in v0.1.12
🛠 Bullet-Proof Decoding
Drop-summary logging
iter_blf_chunks() now reports decoded vs dropped frame counts (Decoded 980/1000 messages (20 dropped)), so you know exactly what didn’t make it through.
Signal-filter resilience
Unhashable or non-stringable filter keys are silently skipped rather than throwing errors.
🔍 Strict Type Safety
Immediate dtype validation
load_blf() now pre-validates every entry in dtype_map, ensuring invalid numpy/pandas dtypes are caught up-front with clear ValueError.
Reserved-name protection
Never accidentally overwrite your timing columns: 'timestamp' and 'raw_timestamp' are disallowed as signal names and reserved from injection.
🔄 Comprehensive Signal Injection
Interpolation fallback
Missing signals in the log can be linearly interpolated if requested (interpolate_missing=True), otherwise integer signals get zero-filled and float signals get NaNs.
Collision-tolerant prefixing
Signals are always prefixed uniquely—checking frame_id, then arbitration_id, then sequence index—so no two signals ever collide, with zero warnings or errors.
🎨 Robust Enum Handling
NameSignalValue-proof mapping
Enum values (including NameSignalValue objects) and raw integers are safely converted to labels; unknown values are passed through, never dropped or cast to NaN.
📂 Streamlined Exports
Auto-mkdir & side-car metadata
Both to_csv() and to_parquet() auto-create parent directories and export df.attrs["signal_attributes"] to JSON alongside your data.
Graceful failures
Parquet write errors are caught and rethrown as ValueError, so you’ll always get a Python exception rather than a stack-trace deep in pyarrow.
🔧 Quickstart
pip install canml==0.1.12from canml.canmlio import load_blf, to_csv, CanmlConfig
# Full-file decode with edge-case safety
cfg = CanmlConfig(
chunk_size=5000,
force_uniform_timing=True,
interpolate_missing=True,
dtype_map={"Speed": "float32"}
)
df = load_blf(
blf_path="drive.blf",
db="vehicle.dbc",
config=cfg,
message_ids={0x100, 0x200},
expected_signals=["Speed", "RPM"]
)
print(df.attrs["signal_attributes"]) # All custom DBC attributes
to_csv(df, "out.csv", metadata_path="out_meta.json")Huge thanks to everyone for helping us squash every corner-case bug—canml is now truly unstoppable! 🎉
v0.1.11-alpha
🎉 v0.1.11 – “Collision Tolerance” 🎉
We’ve made DBC prefixing bullet-proof: no more fatal errors on duplicate message names—now it simply falls back to a unique scheme and logs a warning. Your DBCs always load, no matter how “creative” the naming!
✨ What’s New in v0.1.11
🛠 Resilient Prefixing
Automatic fallback on duplicate message names
When you call load_dbc_files(..., prefix_signals=True) and two or more messages share the same name, canml now emits a warning and prefixes signals as instead of raising an error.
🎨 API Behavior
prefix_signals=True
- Unique message names → prefixes remain _
- Duplicate message names → fallback prefixes include the CAN frame_id for guaranteed uniqueness
🔧 Quickstart
pip install canml==0.1.8from canml.canmlio import (
load_dbc_files, load_blf, to_csv, to_parquet, CanmlConfig
)
# 1️⃣ Merge & cache your DBC(s) safely
db = load_dbc_files(["vehicle.dbc", "chassis.dbc"], prefix_signals=True)
# 2️⃣ Configure BLF loading once
cfg = CanmlConfig(
chunk_size=5000,
progress_bar=True,
force_uniform_timing=True,
interval_seconds=0.02,
interpolate_missing=True,
dtype_map={"Engine_RPM": "int32"}
)
# 3️⃣ Full‐file decode with filters, injection, and enums
df = load_blf(
blf_path="drive.blf",
db=db,
config=cfg,
message_ids={0x100, 0x200},
expected_signals=["Engine_RPM", "Brake_Active"]
)
print(df.head())
# timestamp Engine_RPM Brake_Active raw_timestamp
# 0 0.00 8000 0 162523.1
# 1 0.02 8050 0 162523.3
# 4️⃣ Export with metadata side-dump
to_csv(df, "drive_data.csv", metadata_path="drive_data_meta.json")
to_parquet(df, "drive_data.parquet", metadata_path="drive_data_meta.json")Massive thanks to all contributors and issue reporters—enjoy the smoother, smarter CAN-ML! 🎉🎉
v0.1.10-alpha
🎉 v0.1.10 – “String-Safe Signal Filtering” 🎉
The latest patch finally slays the dreaded unhashable-type errors by coercing every signal identifier to a string before any set-based filtering—and while we were at it, we made the filtering code bullet-proof and polished up the error messages.
✨ What’s New in v0.1.10
🛠 Bug Fixes & Edge-Case Handling
Unhashable-safe filtering
All expected_signals and filter_signals entries are now normalized via str() before building any set(), eliminating errors when passing cantools’ NameSignalValue or other custom objects.
Graceful skip of bad entries
If a signal name can’t be stringified, it’s simply omitted—no exceptions, no crashes.
🎨 Usability & API Tweaks
Zero impact on public API
No function signatures changed—just improved internals.
Clearer warnings & errors
Attempting to supply duplicate signal names still raises early ValueError, and any chunk-decode failures turn into concise, logged ValueErrors.
🔧 Quickstart
pip install canml==0.1.8from canml.canmlio import (
load_dbc_files, load_blf, to_csv, to_parquet, CanmlConfig
)
# 1️⃣ Merge & cache your DBC(s) safely
db = load_dbc_files(["vehicle.dbc", "chassis.dbc"], prefix_signals=True)
# 2️⃣ Configure BLF loading once
cfg = CanmlConfig(
chunk_size=5000,
progress_bar=True,
force_uniform_timing=True,
interval_seconds=0.02,
interpolate_missing=True,
dtype_map={"Engine_RPM": "int32"}
)
# 3️⃣ Full‐file decode with filters, injection, and enums
df = load_blf(
blf_path="drive.blf",
db=db,
config=cfg,
message_ids={0x100, 0x200},
expected_signals=["Engine_RPM", "Brake_Active"]
)
print(df.head())
# timestamp Engine_RPM Brake_Active raw_timestamp
# 0 0.00 8000 0 162523.1
# 1 0.02 8050 0 162523.3
# 4️⃣ Export with metadata side-dump
to_csv(df, "drive_data.csv", metadata_path="drive_data_meta.json")
to_parquet(df, "drive_data.parquet", metadata_path="drive_data_meta.json")Massive thanks to all contributors and issue reporters—enjoy the smoother, smarter CAN-ML! 🎉🎉
v0.1.9-alpha
🎉 v0.1.9 – “Signal Security” 🎉
Tiny but mighty, v0.1.9 locks down edge-cases in load_blf and makes your config import painless—no more unhashable errors or hidden classes! 🚀
✨ What’s New in v0.1.9
🛠 Robust load_blf Enhancements
Unhashable-signal safety
Wrapped set(expected_signals) in a try/except block to gracefully handle cantools “NameSignalValue” objects and any other unhashables.
String-cast normalization
Coerces every entry in expected_signals to str up front and checks for duplicates immediately, preventing downstream surprises.
Error wrapping polish
Refined exception messages and logging around chunk decoding, so any BLF-stream errors surface as clear ValueErrors without stack-traces leaking.
📦 API & Packaging Tweaks
Expose CanmlConfig at package level
Now you can from canml import CanmlConfig without importing submodules.
Committer’s note
Shortened import path and cleaned up top-level all so IDEs pick up every public API immediately.
🔧 Quickstart
pip install canml==0.1.8from canml.canmlio import (
load_dbc_files, load_blf, to_csv, to_parquet, CanmlConfig
)
# 1️⃣ Merge & cache your DBC(s) safely
db = load_dbc_files(["vehicle.dbc", "chassis.dbc"], prefix_signals=True)
# 2️⃣ Configure BLF loading once
cfg = CanmlConfig(
chunk_size=5000,
progress_bar=True,
force_uniform_timing=True,
interval_seconds=0.02,
interpolate_missing=True,
dtype_map={"Engine_RPM": "int32"}
)
# 3️⃣ Full‐file decode with filters, injection, and enums
df = load_blf(
blf_path="drive.blf",
db=db,
config=cfg,
message_ids={0x100, 0x200},
expected_signals=["Engine_RPM", "Brake_Active"]
)
print(df.head())
# timestamp Engine_RPM Brake_Active raw_timestamp
# 0 0.00 8000 0 162523.1
# 1 0.02 8050 0 162523.3
# 4️⃣ Export with metadata side-dump
to_csv(df, "drive_data.csv", metadata_path="drive_data_meta.json")
to_parquet(df, "drive_data.parquet", metadata_path="drive_data_meta.json")Massive thanks to all contributors and issue reporters—enjoy the smoother, smarter CAN-ML! 🎉🎉
v0.1.8-alpha
🎉 v0.1.8 – “Refined Resilience” 🎉
Our biggest overhaul yet: v0.1.8 brings a unified config object, caching, bullet-proof error handling, full metadata support, and more. Everything’s faster, safer, and more flexible—your CAN-ML workflows just leveled up! 🚀
✨ New & Improved in v0.1.8
🛠 Bug Fixes & Stability
Strict DBC validation
Only .dbc extensions accepted, with clear FileNotFound and ParseError messages (#14).
Safe BLFReader cleanup
Wrapped in a context manager to guarantee reader.stop()—no more dangling file handles.
Robust chunk streaming
All exceptions during chunk decode are caught and logged, preventing pipeline crashes.
🎨 Major API Enhancements
CanmlConfig dataclass
Centralized all BLF-loading options (chunk_size, progress_bar, uniform timing, interpolation, sorting, dtype_map) into a single, easy-to-use object.
LRU-cached DBC loader
Cache up to 32 DBC loads, with namespace collision detection and optional prefix_signals for signal-name deduplication.
iter_blf_chunks() revamp
Stream BLF files in memory-safe pandas chunks, filter by message ID or signal name, and show optional progress bars.
load_blf() overhaul
One-call full-file decode with:
- Automatic missing-signal injection (int signals zero-filled, floats NaN)
- Configurable uniform timestamp spacing (with raw_timestamp backup)
- Linear interpolation of gaps if desired
- Sorting, filtering, and dtype_map enforcement
- Automatic enum → Categorical conversion
- Extraction of all custom DBC attributes into df.attrs['signal_attributes']
Enhanced exports
- to_csv() and to_parquet() now auto-create parent directories
- Side-dump signal_attributes JSON alongside your data
- Robust error wrapping and logging for both CSV and Parquet writes
🧪 Testing & Coverage
Brand-new, comprehensive pytest suites for:
DBC loading (load_dbc_files) with caching, collision, and extension checks
Stream decoding (iter_blf_chunks) under filtering, chunking, and error scenarios
Full‐file loading (load_blf) including warnings, injection, sorting, and metadata
CSV/Parquet exports (to_csv/to_parquet) with directory creation and metadata
Achieved 95% branch coverage across all modules
📚 Docs & Examples
Docstrings expanded to cover every new config parameter and feature
Quickstart updated to showcase one-line DBC cache, streaming vs full load, and auto-metadata
🔧 Quickstart
pip install canml==0.1.8from canml.canmlio import (
load_dbc_files, load_blf, to_csv, to_parquet, CanmlConfig
)
# 1️⃣ Merge & cache your DBC(s) safely
db = load_dbc_files(["vehicle.dbc", "chassis.dbc"], prefix_signals=True)
# 2️⃣ Configure BLF loading once
cfg = CanmlConfig(
chunk_size=5000,
progress_bar=True,
force_uniform_timing=True,
interval_seconds=0.02,
interpolate_missing=True,
dtype_map={"Engine_RPM": "int32"}
)
# 3️⃣ Full‐file decode with filters, injection, and enums
df = load_blf(
blf_path="drive.blf",
db=db,
config=cfg,
message_ids={0x100, 0x200},
expected_signals=["Engine_RPM", "Brake_Active"]
)
print(df.head())
# timestamp Engine_RPM Brake_Active raw_timestamp
# 0 0.00 8000 0 162523.1
# 1 0.02 8050 0 162523.3
# 4️⃣ Export with metadata side-dump
to_csv(df, "drive_data.csv", metadata_path="drive_data_meta.json")
to_parquet(df, "drive_data.parquet", metadata_path="drive_data_meta.json")Massive thanks to all contributors and issue reporters—enjoy the smoother, smarter CAN-ML! 🎉🎉
v0.1.7-alpha
🎉 v0.1.7 – “Integer Integrity” 🎉
Building on our rock-solid foundations, v0.1.7 introduces full support for iterable signal lists, airtight dtype validation, and true integer‐typed injections—plus a major performance win in chunk handling. Up your CAN-ML game! 🚀
✨ New & Improved in v0.1.7
🛠 Bug Fixes & Stability
Iterable expected_signals support
Now accepts any iterable (set, tuple, etc.) without TypeError when concatenating—goodbye “list + set” issues (#13).
Integer‐dtype preservation
Missing integer signals are now injected with zeros (not NaN), so your int32, Int64, etc., stay intact.
dtype_map sanity checks
Invalid or unknown dtype mappings now raise clear ValueErrors immediately, preventing silent downstream type errors.
🎨 Usability & API Tweaks
Flexible expected_signals
Documented and validated as “any iterable of strings”—duplicates and non-string entries are detected up front.
Expose chunk_size
You can now control the per‐iteration chunk size directly in load_blf(...) for finer memory tuning.
Single‐pass concatenation
All BLF chunks are collected and concatenated once, yielding a noticeable speed boost on large logs.
Explicit index alignment
Uniform-timing Series are now aligned to df.index to avoid any subtle misalignments.
🧪 Testing & Coverage
New unit tests for:
Iterable vs. list‐only expected_signals
Integer‐typed injection (int32, Int64)
dtype_map key validation and invalid‐dtype errors
Single‐concat performance (via chunk list fixture)
100% branch coverage in load_blf, including all edge cases around empty logs, filters, and error paths.
📚 Docs & Examples
Updated docstrings to reflect flexible iterables, dtype_map requirements, and the new chunk_size parameter.
Quickstart snippet now shows chunk_size usage and integer‐dtype injection.
🔧 Quickstart
pip install canml==0.1.7from canml.canmlio import load_blf
# Full-file load with:
# • Set of expected signals
# • int32 injection preserved
# • custom chunk size
# • uniform timing
df = load_blf(
blf_path="session.blf",
db="vehicle.dbc",
message_ids={0x123, 0x456},
expected_signals={"Engine_RPM", "Brake_Active"},
dtype_map={"Engine_RPM": "int32", "Brake_Active": "Int64"},
force_uniform_timing=True,
interval_seconds=0.02,
chunk_size=5000
)
print(df.dtypes)Huge thanks to everyone who reported bugs and contributed patches—keep the feedback coming! 🎉
v0.1.6-alpha
🎉 v0.1.6 – “Solid Foundations” 🎉
Steady as she goes, canml enthusiasts! v0.1.6 shores up critical edge-cases, tightens resource handling, and guarantees rock-solid DataFrame outputs—now with 100% test coverage. Let’s see what’s new under the hood! 🚀
✨ New & Improved in v0.1.6
🛠 Bug Fixes & Stability
-
Logger handler deduplication
Clear any existing handlers on initialization to prevent duplicate log output when importing canmlio multiple times (#7). -
BLFReader resource safety
Wrapped BLFReader iteration in a try/finally so reader.stop() always runs—no more lingering file handles on error (#8). -
Empty-file resilience
load_blf now always returns an empty DataFrame pre-populated with a timestamp column (and any expected_signals), avoiding downstream errors when no messages are decoded (#9). -
Uniform-timing correctness
Switched to range(len(df)) for sequential timestamps and preserved originals in raw_timestamp, so your uniform spacing is rock-steady—and no more index gaps (#10).
🎨 Usability & API Tweaks
-
Consistent logging
Logging remains informative without flooding your console—major events only. -
DataFrame guarantees
Even in wildly filtered or empty scenarios, your schema stays intact: timestamp + expected signals. -
Safe dtype injection
Invalid or missing dtype mappings for injected signals now raise clear, immediate ValueErrors.
🧪 Testing & Coverage
-
100% unit-test coverage across the entire canmlio module
-
New tests for logger re-initialization, BLFReader errors, empty-BLF behavior, and uniform-timing logic.
📚 Docs & Examples
- Updated “End-to-End CAN ML” tutorial on ReadTheDocs reflects the new uniform-timing API and empty-File handling.
🔧 Quickstart
pip install canml==0.1.6from canml.canmlio import load_dbc_files, iter_blf_chunks, load_blf, to_csv
# 1️⃣ Merge multiple DBCs with safe signal prefixes
db = load_dbc_files(["vehicle.dbc", "chassis.dbc"], prefix_signals=True)
# 2️⃣ Stream-decode a BLF with guaranteed cleanup
for i, chunk in enumerate(iter_blf_chunks("drive.blf", db, chunk_size=5000)):
to_csv(chunk, f"drive_chunk_{i}.csv")
# 3️⃣ Full-load with uniform timing & raw timestamp backup
df = load_blf(
blf_path="session.blf",
db=db,
expected_signals=["Engine_RPM", "Brake_Active"],
force_uniform_timing=True,
interval_seconds=0.02
)
print(df.head())Huge thanks to everyone who filed issues, contributed fixes, and wrote tests—onwards to even smoother CAN-ML workflows! 🎉🎉
v0.1.5-alpha
🎉 v0.1.5-alpha – “CAN-do Attitude” 🎉
Buckle up, canml fans! v0.1.5-alpha rolls in with bug fixes, polish, and fresh enhancements that bring your CAN ML workflows even closer to perfection. Let’s dive in! 🚗💨
✨ New & Improved in v0.1.5-alpha
🛠 Bug Fixes & Stability
DBC string parsing resilience: Handle malformed DBC comments and whitespace quirks without crashing (no more ParseError on stray headers).
iter_blf_chunks cleanup: Ensured BLFReader.stop() always runs—even on exceptions or empty files—to free resources.
load_blf FileNotFoundError order: Now checks BLF path before DBC parsing to fail fast on missing logs.
Signal injection logic: Fixed edge-case where an expected signal named timestamp was overwritten.
🎨 Usability & API tweaks
Return empty DataFrame for load_blf when no messages decoded (rather than a multi-index mishap).
to_csv header control: Added header flag documentation and fixed header duplication when mode='a' and header=False.
to_parquet compression: Supported additional codecs (brotli, gzip) with explicit error messages on unsupported options.
Logging verbosity: Downgraded info logs in loops; major events still logged but no more flooding for high-frequency messages.
🧪 Testing & Coverage
97% coverage across module - Added new unit tests for DBC comment anomalies and empty-BLF edge cases.
Integration tests for round-trip CSV
📚 Docs & Examples
Tutorial: “End-to-End CAN ML”: New step-by-step guide on ReadTheDocs covering raw BLF → feature extraction → modeling.
🔧 Quickstart
pip install canml==0.1.5-alpha
from canml.canmlio import (
load_dbc_files, iter_blf_chunks, load_blf, to_csv, to_parquet
)
# Merge DBCs with robust comment parsing
db = load_dbc_files("vehicle.dbc", prefix_signals=True)
# Stream-decode and write Parquet shards
for idx, chunk in enumerate(iter_blf_chunks(
blf_path="drive.blf", db=db, chunk_size=5000
)):
to_parquet(chunk, f"drive_shard_{idx}.parquet", compression="brotli")
# Full-load, inject any missing signals, uniform timing
df = load_blf(
blf_path="session.blf", db=db,
expected_signals=["Engine_RPM","Brake_Active"],
force_uniform_timing=True
)
# Export to CSV for reporting
to_csv(df, "session_report.csv")
print("Done! v0.1.5-alpha at your service 🤖")Thanks to everyone who reported issues, wrote tests, and contributed improvements! Keep the feedback coming via GitHub issues and PRs—let’s keep CAN logs crunching! 🚀