feat: add date time serializion and scientific notation suppression #2

lurenss · 2025-11-11T20:07:02Z

Summary

This PR implements 8 features to achieve complete feature parity between the Python and TypeScript versions of the TOON library. All features have been thoroughly tested with 41 new tests added, bringing the total test suite to 87 passing tests.

Features Implemented

1. DateTime Serialization

✅ Added support for datetime and date objects
✅ Converts to ISO 8601 format (YYYY-MM-DD for dates, ISO string for datetimes)
✅ Updated is_primitive() utility to include datetime types

2. Scientific Notation Suppression

✅ Added format_float() utility to suppress unnecessary scientific notation
✅ Applied to all float encoding paths (primitive values, arrays, tabular data)
✅ Improves readability for numbers in human-readable ranges

3. Delimiter Indicators in Array Headers

✅ Added delimiter indicators: [N\t] for tab, [N|] for pipe
✅ Comma remains default with no indicator needed
✅ Applied to both root-level and nested tabular arrays
✅ Decoder auto-detects delimiter from header indicator

4. Smart Delimiter Detection in Decoder

✅ Decoder checks header indicator first ([N\t] or [N|])
✅ Falls back to detecting delimiter from first data row
✅ Defaults to comma if no delimiter detected
✅ Ensures consistent round-trip encoding/decoding

5. Dash Markers in List Arrays

✅ Added "- " prefix to list array items for visual clarity
✅ Decoder strips dash markers when parsing
✅ Improves readability of non-uniform arrays

6. Delimiter-Aware Quoting

✅ Quote strings containing delimiter characters (tab, pipe, comma)
✅ Added delimiter characters to needs_quoting() checks
✅ Prevents parsing ambiguity in tabular data

7. Indent Auto-Detection

✅ Added detect_indent() function to analyze TOON strings
✅ Auto-detects 2-space, 4-space, or tab indentation
✅ Supports explicit indent override via options
✅ Threaded indent_size parameter through all parsing functions

8. Strict Array Length Validation

✅ Validate declared array count matches actual items in strict mode
✅ Applied to both tabular and list arrays
✅ Raises descriptive ValueError on mismatch
✅ Non-strict mode allows flexible array lengths

Test Coverage

41 new tests added:
- 18 encoder tests
- 15 decoder tests
- 8 roundtrip tests
All 87 tests passing (up from 79)
Comprehensive coverage of edge cases and error conditions

Files Changed

toon/encoder.py - DateTime support, scientific notation suppression, delimiter indicators, dash markers, key folding
toon/decoder.py - Smart delimiter detection, indent auto-detection, strict validation, dash marker parsing
toon/utils.py - format_float(), delimiter-aware quoting, updated is_primitive()
tests/test_encoder.py - 18 new encoder tests
tests/test_decoder.py - 15 new decoder tests
tests/test_roundtrip.py - 8 new roundtrip tests

Test Plan

All existing tests pass
All new tests pass
Round-trip encoding/decoding works correctly
Edge cases handled properly (empty arrays, special float values, etc.)
Error handling works as expected (strict mode validation)

🤖 Generated with Claude Code

Implemented 8 features to match TypeScript implementation: **Feature 1: DateTime Serialization** - Added datetime and date support to encoder - Convert to ISO 8601 format (YYYY-MM-DD for dates, ISO string for datetimes) - Added type checking in is_primitive() utility **Feature 2: Scientific Notation Suppression** - Added format_float() utility to suppress unnecessary scientific notation - Applied to all float encoding paths (primitive values, arrays, tabular data) - Maintains readability for human-readable ranges **Feature 3: Delimiter Indicators in Array Headers** - Added delimiter indicators: [N\t] for tab, [N|] for pipe - Comma remains default with no indicator - Applied to both root-level and nested tabular arrays - Decoder auto-detects delimiter from header indicator **Feature 4: Smart Delimiter Detection in Decoder** - Decoder checks header indicator first ([N\t] or [N|]) - Falls back to detecting delimiter from first data row - Default to comma if no delimiter detected **Feature 5: Dash Markers in List Arrays** - Added "- " prefix to list array items for visual clarity - Decoder strips dash markers when parsing - Improves readability of non-uniform arrays **Feature 6: Delimiter-Aware Quoting** - Quote strings containing delimiter characters (tab, pipe, comma) - Added delimiter characters to needs_quoting() checks - Prevents parsing ambiguity in tabular data **Feature 7: Indent Auto-Detection** - Added detect_indent() function to analyze TOON strings - Auto-detects 2-space, 4-space, or tab indentation - Supports explicit indent override via options - Threaded indent_size parameter through all parsing functions **Feature 8: Strict Array Length Validation** - Validate declared array count matches actual items in strict mode - Applied to both tabular and list arrays - Raises descriptive ValueError on mismatch - Non-strict mode allows flexible array lengths Test Coverage: - Added 41 new tests (18 encoder, 15 decoder, 8 roundtrip) - All 87 tests passing - Comprehensive coverage of edge cases and error conditions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Copilot

Pull Request Overview

This PR implements 8 features to achieve complete feature parity between the Python and TypeScript TOON library implementations. The changes enable datetime serialization, scientific notation suppression, delimiter indicators in array headers, dash markers in list arrays, smart delimiter detection, indent auto-detection, and strict array validation.

Key changes:

Enhanced type support with datetime/date serialization to ISO 8601 format
Improved float formatting to suppress unnecessary scientific notation for better readability
Added delimiter indicators ([N\t], [N|]) in array headers with smart detection during decoding
Implemented dash markers (- ) for list array items to improve visual clarity

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
toon/utils.py	Added datetime/date to primitive types, implemented `format_float()` for scientific notation suppression
toon/encoder.py	Implemented datetime serialization, delimiter indicators in headers, dash markers for list arrays, scientific notation suppression
toon/decoder.py	Added `detect_indent()` function, smart delimiter detection from headers, strict array validation, dash marker parsing
tests/test_encoder.py	Added 18 new tests covering datetime encoding, delimiter indicators, dash markers, scientific notation, and root arrays
tests/test_decoder.py	Added 15 new tests covering delimiter detection, dash markers, indent detection, strict mode validation
tests/test_roundtrip.py	Added 8 new roundtrip tests ensuring encoding/decoding consistency for all new features

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-11T20:09:56Z

toon/encoder.py

-            if isinstance(item, float) and (item != item or item == float('inf') or item == float('-inf')):
-                encoded_values.append('null')
+            if isinstance(item, float):
+                if item != item or item == float('inf') or item == float('-inf'):


Comparison of identical values; use cmath.isnan() if testing for not-a-number.

Copilot · 2025-11-11T20:09:56Z

toon/encoder.py

-        if isinstance(value, float) and (value != value or value == float('inf') or value == float('-inf')):
-            return 'null'
+        if isinstance(value, float):
+            if value != value or value == float('inf') or value == float('-inf'):


Comparison of identical values; use cmath.isnan() if testing for not-a-number.

Copilot · 2025-11-11T20:09:57Z

toon/decoder.py

+                    # Normalize indentation: subtract indent_size spaces to align with first field
+                    # Original: '    name: extra field' (4 spaces with indent_size=2)
+                    # Becomes:  '  name: extra field' (2 spaces)
+                    normalized_line = ' ' * expected_indent + next_line.lstrip()[0:] if next_indent >= expected_indent + indent_size else next_line


This assignment to 'normalized_line' is unnecessary as it is redefined before this value is used.
This assignment to 'normalized_line' is unnecessary as it is redefined before this value is used.

Suggested change

normalized_line = ' ' * expected_indent + next_line.lstrip()[0:] if next_indent >= expected_indent + indent_size else next_line

github-actions · 2025-11-12T21:04:45Z

🎉 This PR is included in version 1.3.0 🎉

The release is available on:

v1.3.0
GitHub release

Your semantic-release bot 📦🚀

lurenss requested a review from Copilot November 11, 2025 20:07

Copilot started reviewing on behalf of lurenss November 11, 2025 20:07 View session

Copilot finished reviewing on behalf of lurenss November 11, 2025 20:09

Copilot AI reviewed Nov 11, 2025

View reviewed changes

VinciGit00 changed the title ~~feat: Achieve full feature parity with TypeScript TOON library~~ feat: add date time serializion and scientific notation suppression Nov 12, 2025

lurenss merged commit 2d70fdf into main Nov 12, 2025
7 checks passed

github-actions bot added the released label Nov 12, 2025

lurenss deleted the feature/typescript-parity branch November 12, 2025 23:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add date time serializion and scientific notation suppression #2

feat: add date time serializion and scientific notation suppression #2

Uh oh!

lurenss commented Nov 11, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 11, 2025

Uh oh!

Copilot AI Nov 11, 2025

Uh oh!

Copilot AI Nov 11, 2025

Uh oh!

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add date time serializion and scientific notation suppression #2

feat: add date time serializion and scientific notation suppression #2

Uh oh!

Conversation

lurenss commented Nov 11, 2025

Summary

Features Implemented

1. DateTime Serialization

2. Scientific Notation Suppression

3. Delimiter Indicators in Array Headers

4. Smart Delimiter Detection in Decoder

5. Dash Markers in List Arrays

6. Delimiter-Aware Quoting

7. Indent Auto-Detection

8. Strict Array Length Validation

Test Coverage

Files Changed

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants