Skip to content

fix: accept ISO 8601 'T' separator for naive datetimes#8

Merged
jqnatividad merged 3 commits into
mainfrom
fix/iso-t-naive-datetime
May 18, 2026
Merged

fix: accept ISO 8601 'T' separator for naive datetimes#8
jqnatividad merged 3 commits into
mainfrom
fix/iso-t-naive-datetime

Conversation

@jqnatividad
Copy link
Copy Markdown
Collaborator

Summary

  • parse_with_preference(\"2020-01-15T08:00:00\", _) and other T-separated naive datetimes (with or without fractional seconds) returned Err. The space-separated form and tz-bearing T-forms (Z, +00:00) already worked — only the T + no-timezone intersection was broken.
  • Root cause: ymd_hms's regex required \s+ between date and time, and rfc3339 requires an offset suffix, so T+no-tz fell into the gap between them.
  • Fix: extend ymd_hms's regex to (?:T|\s+) and dispatch the chrono format-string family on input.as_bytes()[10]. The trial-parse chain stays the same length for the common space-separated case — no extra cost on the hot path.

This is the form Python's datetime.isoformat() emits without .astimezone(), so consumers like qsv stats --infer-dates were misclassifying these columns as String. Downstream qsv commands (synthesize, schema, describegpt) all depend on that type inference. Origin handoff: opened from a qsv 20.1.0 release-prep session; the qsv side already landed workaround commit 4e143476b for one specific failure mode, but the proper fix lives here.

Test plan

  • cargo test --workspace --all-features — new tests pass; one pre-existing parse_unambiguous_dmy failure also fails on main without my changes (host-timezone/time-of-day-dependent; unrelated, worth a separate look).
  • cargo fmt --all -- --check clean.
  • cargo clippy --workspace --tests --all-features -- -D warnings clean.
  • tests::ymd_hms extended with T-separator cases (no-seconds, seconds, ms/us/ns fractional) and a T-vs-space equivalence assertion.
  • New tests::parse_iso_t_no_tz exercises the public parse_with_preference API and guards that existing tz-bearing T-forms (Z, +00:00) still parse.
  • /bench-compare compare to confirm no perf regression on the space-separated hot path (recommended before merge).

Out of scope

T-separator with a named timezone (e.g. 2020-01-15T08:00:00 UTC) is still rejected — that requires a sibling change to ymd_hms_z's pre-filter and regex. Easy follow-up if needed; flagged in the commit message.

🤖 Generated with Claude Code

jqnatividad and others added 3 commits May 17, 2026 21:44
`parse_with_preference("2020-01-15T08:00:00", _)` returned Err because
ymd_hms's regex required `\s+` between the date and time, and rfc3339
requires an offset suffix — so the T+no-tz intersection fell into the
gap between them. This is the form Python's `datetime.isoformat()`
emits without `.astimezone()`, so downstream consumers (qsv
`stats --infer-dates` and everything that depends on its type
inference) were misclassifying these columns as String.

Extend ymd_hms to accept either separator: regex changed from `\s+` to
`(?:T|\s+)`, then dispatch the chrono format-string family on
`input.as_bytes()[10]` so the trial-parse chain stays the same length
for the common space-separated case (no perf regression on the hot
path).

Tests:
- `tests::ymd_hms` extended with T-separator cases (no-seconds,
  seconds, ms/us/ns fractional) and a T-vs-space equivalence check.
- New `tests::parse_iso_t_no_tz` integration test exercising the
  public `parse_with_preference` API, with a regression guard that
  existing tz-bearing T-forms (`Z`, `+00:00`) still parse.

README's accepted-formats list updated to document the T-separator
form.

T-separator with named timezone (e.g. `2020-01-15T08:00:00 UTC`) is
still rejected by `ymd_hms_z`; that's a sibling change that can be
done separately if needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace `(?:T|\s+)` with `[T\s]+`. Functionally equivalent for our
purposes — any extra inputs the character class accepts at the regex
layer (`TT`, `T `, ` T`) get rejected by the chrono format strings
anyway — but compiles to a tighter DFA loop than an alternation.

Bench-compare against the 0.14.0 baseline shows every codepath that
actually touches `ymd_hms` is now stable to improved across three
runs:
- `2021-04-30 21:14:10` (ymd_hms success): −1.4%
- `2017-11-25 13:31:15 PST` (ymd_hms_z): −3.0%
- `2019-11-29 08:08:05-08` (ymd_hms_z): −3.1%
- `2021-02-21 PST` (ymd_z): −0.5%

`memory_usage` dropped from a borderline +3.2% with the alternation
form to a noise-band +1.1%.

The remaining persistent regressions in the bench (`08/21/71`,
`03/19/2012 10:11:59`) are on slash_* codepaths that never enter
`ymd_hms`; they appear identically across all three runs regardless of
the regex form here, indicating they're system-noise / icache effects
unrelated to this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The first assertion called `super::parse("31/3/22").unwrap().date()`
and compared against `Utc.ymd(2022, 3, 31)`. `parse()` uses Local
timezone and pads date-only inputs with the current time of day
(`Utc::now().time()`); the resulting UTC date can roll by ±1 day
depending on host TZ and the moment the test runs. The other two
assertions in this test use `parse_with_preference` (Utc + midnight)
and are unaffected.

Reproduction: on a host in PST during the late-evening local window
(early-morning UTC), the assembled `2022-03-31 ${local-time} PST`
converts to `2022-04-01 ${early}Z` and `.date()` returns
`2022-04-01Z`, which the assertion rejects.

Fix: compare the *Local* date (what `parse()` actually models for a
date-only input) by chaining `.with_timezone(&Local).date()` and
comparing against `Local.ymd(2022, 3, 31)`. The library's behavior
is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jqnatividad jqnatividad merged commit 1f79121 into main May 18, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant