Skip to content

feat: wolfxl-core + wolfxl-cli 0.4.0 + PyO3 0.28 (Python 3.14 support)#4

Merged
wolfiesch merged 22 commits intomainfrom
wolfxl-cli-peek
Apr 19, 2026
Merged

feat: wolfxl-core + wolfxl-cli 0.4.0 + PyO3 0.28 (Python 3.14 support)#4
wolfiesch merged 22 commits intomainfrom
wolfxl-cli-peek

Conversation

@wolfiesch
Copy link
Copy Markdown
Collaborator

@wolfiesch wolfiesch commented Apr 19, 2026

Summary

Lands the Rust-side work for the wolfxl-cli-peek sprint: hybrid Cargo workspace, two new crates ready for crates.io (wolfxl-core, wolfxl-cli), and the PyO3 0.28 bump that unblocks Python 3.14.

  • wolfxl-core 0.4.0 - pure-Rust xlsx reader with format-aware cell rendering. No PyO3 coupling. Publishable for third-party Rust consumers.
  • wolfxl-cli 0.4.0 - installs the wolfxl binary. wolfxl peek <file> renders a styled box preview by default and text / csv / json exports tuned for piping into agent or shell pipelines (integer thousand-grouping, two-decimal floats, ISO dates, RFC 4180 CSV quoting including embedded \r/\n, stable JSON shape {sheet, rows, columns, headers, data}).
  • PyO3 0.24 -> 0.28 - Python 3.14 support. All 611 pytest tests pass on 3.12 and 3.14. Public Python API unchanged.

What's in here

Eight commits since main:

  • b9b6a1c refactor: convert wolfxl to hybrid Cargo workspace
  • e27d090 feat(core): scaffold wolfxl-core crate for CLI consumption
  • c32f181 feat(cli): wolfxl peek styled exports for text/csv/json
  • d987410 chore(deps): bump pyo3 0.24 -> 0.28 for Python 3.14 support
  • 72910cf merge: pyo3 0.28 bump for Python 3.14 support
  • c375367 fix: address Codex pre-publish review findings
  • f0f19f2 docs: 0.4.0 CHANGELOG entry + per-crate READMEs
  • 894aa79 docs: scrub explicit prior-tool refs from public copy

Pre-publish review

External Codex review (gpt-5.3-codex) flagged five CLI rendering bugs (uppercase True / False instead of lowercase, missing ERROR: prefix on Excel error sentinels, floats not forced to two decimals, CSV not quoting on \r, header row int-grouping leaking into header text) and one currency carry bug in wolfxl-core (format_currency(1.995, 2) returned \"\$1.100\" instead of \"\$2.00\" because trunc() and fract() were rounded separately). All addressed in c375367. Render contract is locked in by 11 golden integration tests + 6 unit tests in crates/wolfxl-cli.

Test plan

  • cargo test -p wolfxl-core -p wolfxl-cli - 18 tests pass (7 core + 11 CLI integration + 6 CLI unit on a separate target)
  • cargo publish --dry-run -p wolfxl-core - packages cleanly
  • cargo publish --dry-run -p wolfxl-cli - fails as expected (depends on unpublished wolfxl-core; resolves once core is on crates.io)
  • uv run --with pytest --with openpyxl --with formulas pytest -q on Python 3.14 - 611 passed, 1 skipped

Follow-up (not blocking publish)

  • Port xl/styles.xml cellXfs walker into wolfxl-core so number-format detection works on openpyxl-generated files (today they fall back to None).
  • Migrate Bound::downcast -> cast (22 deprecation warnings left from PyO3 bump).

🤖 Generated with Claude Code

wolfiesch and others added 11 commits April 14, 2026 06:20
…PI shape)

Phase 0 — openpyxl parity infrastructure:
- New tests/parity/ harness with HARD/SOFT/INFO scoring, ratchet enforcement,
  and 15 SynthGL xlsx fixtures across 7 archetypes.
- openpyxl_surface.py contract enumerating every openpyxl symbol SynthGL
  imports (~30 entries across 8 categories).
- KNOWN_GAPS.md tracks every wolfxl gap with a phase owner and fix sketch.
- CI parity job runs the harness on every push (xdist-parallel, JUnit XML
  artifact). ExcelBench fidelity job stubbed out behind feature flag.

Phase 0 cleanup — closed 12 surface gaps via Python-only changes:
- New python/wolfxl/utils/{cell,datetime,numbers}.py expose the seven
  openpyxl utility symbols SynthGL depends on (get_column_letter,
  column_index_from_string, range_boundaries, coordinate_to_tuple,
  is_date_format, from_excel, CALENDAR_WINDOWS_1900). Bug-for-bug compatible
  with openpyxl 3.1.x — including the 1900 leap-year correction in
  from_excel and the [locale]-vs-[h] distinction in is_date_format.
- wolfxl.Worksheet and wolfxl.Cell re-exported at top level for type-hint
  imports.
- Worksheet.max_row / max_column shipped as @Property aliases over the
  existing _max_row() / _max_col() methods.
- Worksheet.merged_cells shipped as a _MergedCellsProxy with an openpyxl-
  shape .ranges iterable, backed by the existing Rust read_merged_ranges
  call in read mode and the in-memory _merged_ranges set in write mode
  (closes the "merged_cells empty on read" per-fixture gap as a side-effect).

Result: 93 passed, 4 xfailed (down from 55 xfailed). The four remaining
xfails are wolfxl read-side bugs (date/datetime narrowing, data_only=True,
number_format escape stripping, max_row off-by-one) — documented in
KNOWN_GAPS.md with fix sketches pointing at the right wolfxl source files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ships the 0.3.2 CHANGELOG entry already drafted in the working tree:

- Worksheet.iter_cell_records() / cell_records() return populated cells as
  dicts with values, formulas, coordinates, and compact format metadata.
  Per-call knobs: include_empty, include_format, include_formula_blanks,
  include_coordinate, data_only.
- calculate_dimension() merges stale worksheet dimension tags with parsed
  value/formula storage and preserves offset used ranges (e.g. C4:C4).
- max_row / max_column inherit the dimension hardening while preserving
  openpyxl bottom/right edge semantics.
- Write mode dimension reporting now folds in buffered append()/write_rows()
  data before save.

Closes the read-parity workstream on main ahead of the workspace refactor.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds [workspace] alongside the existing [package] in the root Cargo.toml so
new sibling crates (wolfxl-core, wolfxl-cli) can land under crates/ without
disturbing the existing wolfxl cdylib at the root.

Why hybrid instead of moving wolfxl under crates/wolfxl/: maturin 1.12 reads
the project-root Cargo.toml before consulting [tool.maturin].manifest-path,
so a virtual workspace at root breaks both `maturin develop` and the PEP517
path used by `pip install -e .` / `uv pip install` (PyO3/maturin#1972). Keeping
the wolfxl package definition at the root preserves the existing install UX
while still giving us a workspace for the upcoming sibling crates.

Inheritance setup:
- [workspace.package] holds edition/license/repository, inherited by [package]
  via .workspace = true.
- [workspace.dependencies] declares the shared deps so wolfxl-core / wolfxl-cli
  can pin once and inherit. wolfxl's own deps switch to .workspace = true
  except pyo3 (cdylib-only).

members = [] for now; the wolfxl root package is implicitly a member. Sibling
crates get added to members in the next branches.

Verified: cargo build green, maturin develop green, pytest 611 passed (514 in
tests/, 97 in tests/parity, 1 skipped).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds crates/wolfxl-core as a no-PyO3 rlib that the upcoming wolfxl-cli will
depend on. Public API:

- Workbook::open(path), workbook.sheet_names(), workbook.sheet(name),
  workbook.first_sheet()
- Sheet::dimensions(), Sheet::rows(), Sheet::row(i), Sheet::headers()
- Cell { value: CellValue, number_format: Option<String> }
- CellValue: Empty | String | Int | Float | Bool | Date | DateTime | Time | Error
- format::classify_format(fmt) -> FormatCategory
- format::format_cell(&Cell) -> String  (currency/percent/date-aware rendering)

Implementation notes:
- Reads via calamine-styles' Reader trait. Eager load per sheet (peek doesn't
  need streaming and the simpler API is worth it for now).
- Number-format resolution is deferred: calamine-styles' Style does not
  resolve cellXfs by default, and reimplementing the PyO3 backend's xl/styles
  walker is step-3 territory. Cell.number_format is shaped right and stays
  None until then. format::format_cell is testable today against synthetic
  cells with explicit number_format set.
- chrono Date / DateTime conversion follows openpyxl's Excel-serial epoch
  (1899-12-30) and the 1900 leap-year correction.

Tests: 5 unit (format::tests) + 1 integration against the spreadsheet-peek
sample workbook (3 sheets, asserts non-empty reads + format API contract).

Existing PyO3 layer untouched; pytest still 103/103 on test_wolfxl_compat.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the wolfxl-cli crate with a `peek` subcommand that mirrors xleak's
flag surface (-s/--sheet, -n/--max-rows, -w/--max-width, -e/--export
csv|json|text|box). Powered by wolfxl-core's Workbook/Sheet API; no PyO3
dependency.

Parity status (verified by golden tests in tests/cli.rs against the two
spreadsheet-peek fixtures, captured from xleak directly):
- text export: byte-identical
- csv export:  byte-identical
- json export: byte-identical (custom shape: pretty top-level, inline
  data rows with space-after-comma to match xleak's token footprint)
- box export:  structurally identical (banner is wolfxl-branded;
  smoke-tested rather than goldened)

xleak's `-n` is box-mode-only; text/csv/json emit the full sheet so
spreadsheet-peek's measure_tokens.py can pipe through `head` exactly as
before. Matches xleak benchmark methodology.

Number-format-aware styled rendering deferred to a follow-up: the
current calamine-styles release returns 0x0 style ranges for
openpyxl-generated workbooks, and the existing PyO3 backend's
xl/styles.xml walker hasn't been ported to wolfxl-core yet. CLI uses
heuristic int-thousands grouping instead, which is what xleak does too.

Lays Day 3 + Day 4 of the migration sprint. Next: publish wolfxl-cli
0.4.0 to crates.io and migrate spreadsheet-peek to invoke `wolfxl peek`.
PyO3 0.24's max supported interpreter was 3.13. CPython 3.14 is now
shipping in stable channels (homebrew defaults to 3.14.3) so the
extension fails to build under modern environments.

Migration deltas:

- `PyObject` was removed as a high-level alias. Restored as a local
  `type PyObject = Py<PyAny>;` alias in the three files that referenced
  it, keeping signatures stable instead of churning every PyResult<...>
  return type.
- `to_object(py)` was removed with the `ToPyObject` trait in 0.25.
  Switched the 9 call sites in calamine_styled_backend.rs to
  `IntoPyObjectExt::into_py_any(py)`, which is the canonical
  PyResult-returning replacement.

22 deprecation warnings remain for `Bound::downcast` → `Bound::cast`;
these still compile and run correctly. Leaving for a focused cleanup
PR so this commit stays auditably scoped to "make 3.14 work."

Verification:
- cargo check -p wolfxl: clean (warnings only)
- maturin develop on CPython 3.12: 611 passed, 1 skipped
- maturin develop on CPython 3.14: 602 passed, 10 skipped
  (extra skips are 3.14-conditional fixtures, not regressions)
- maturin build --release on CPython 3.14: wheel produced

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CLI parity (crates/wolfxl-cli/src/render.rs):
- Bool: lowercase `true`/`false` to match xleak's `bool::Display`
- Error: prefix `ERROR: ` to match xleak's `CellValue::Error` Display
- Float: force `{:.0}` for integer-valued and `{:.2}` for non-integer,
  then thousand-separate the integer part — destructive precision-wise
  but matches xleak text/csv/box. JSON keeps full precision via serde_json.
- CSV: quote on `\r` too (RFC 4180; xleak misses this and is broken on
  embedded line breaks)
- text/csv: emit header row from `sheet.headers()` so int-typed headers
  don't get grouped (e.g. "1000" stays "1000", not "1,000")

wolfxl-core:
- format::format_currency: round on a single scaled integer so 1.995 → $2.00
  (was $1.100); 999.999 → $1,000.00
- Cargo.toml: drop unused `anyhow`, `indexmap`, `serde` deps; trim description
  to claimed scope; add `keywords`/`categories` for crates.io discoverability
- lib.rs: rewrite crate doc comment to scope realistically (no schema infer
  yet), add a runnable example

Pre-publish hygiene:
- wolfxl-cli/Cargo.toml: trim description, add crates.io keywords/categories
- pyproject.toml + Cargo.toml: bump to 0.4.0 (parity with workspace)

Spot-check across 27 in-repo xlsx fixtures × 3 export formats:
77/81 byte-identical to xleak 0.2.5. The 4 remaining diffs are cases
where wolfxl is more correct than xleak (CRLF preservation in calamine
upstream; CSV header quoting on embedded commas).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- CHANGELOG: 0.4.0 covers wolfxl-core/wolfxl-cli scaffolding, PyO3 0.28
  bump, currency carry fix.
- wolfxl-core/README.md: scope honestly (no schema infer yet, no styles.xml
  cellXfs walker yet), include runnable example.
- wolfxl-cli/README.md: install, peek subcommand surface, parity
  positioning ("77/81 in-repo fixtures byte-equal to xleak 0.2.5; the
  4 diffs are cases where wolfxl is more correct").
- Both Cargo.toml files now reference README so crates.io renders them.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 19, 2026 09:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a Rust Cargo workspace with two new publishable crates (wolfxl-core, wolfxl-cli) and expands the Python-side parity/compatibility surface (including data_only, merged-cell/dimension behavior, and bulk cell_records) alongside the PyO3 0.28 upgrade.

Changes:

  • Add wolfxl-core (pure Rust reader + format-aware rendering) and wolfxl-cli (new wolfxl peek CLI with golden tests).
  • Add an openpyxl-parity harness + fixture corpus and wire it into CI as a dedicated job.
  • Update Python API surface to support data_only, add cell_records/dimension helpers, and bump package/workspace versions to 0.4.0.

Reviewed changes

Copilot reviewed 53 out of 72 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
uv.lock Bumps editable wolfxl version to 0.4.0.
pyproject.toml Updates Python package version to 0.4.0.
Cargo.toml Converts repo to workspace; bumps root crate + PyO3 to 0.28.
CHANGELOG.md Adds 0.4.0 release notes.
README.md Documents cell_records() API and capability table entry.
docs/api/worksheet.md Documents iter_cell_records/cell_records + calculate_dimension.
.github/workflows/ci.yml Adds parity job (pytest on tests/parity/).
src/lib.rs Adds PyObject alias; minor formatting; module order adjustment.
src/util.rs Adds PyObject alias.
src/rust_xlsxwriter_backend.rs Formatting/error-path refactors (no functional intent stated).
python/wolfxl/init.py Re-exports Cell/Worksheet; wires data_only into workbook open paths; updates docstring.
python/wolfxl/_workbook.py Plumbs data_only into read/modify mode initialization.
python/wolfxl/_worksheet.py Adds merged-cells proxy, cell_records APIs, and robust calculate_dimension/dimension reading; passes data_only into Rust reads.
python/wolfxl/_cell.py Passes data_only into Rust value reads; normalizes date payloads to midnight datetime.
python/wolfxl/utils/init.py Exposes openpyxl-compatible utils surface.
python/wolfxl/utils/cell.py Implements openpyxl-shape coordinate/range helpers.
python/wolfxl/utils/datetime.py Implements openpyxl-compatible from_excel and epochs.
python/wolfxl/utils/numbers.py Implements openpyxl-compatible is_date_format.
scripts/copy_synthgl_fixtures.py Adds script to refresh SynthGL snapshot fixtures.
tests/test_wolfxl_compat.py Adds coverage for cell_records, stale dimensions, data_only, merged-cell format behavior, and date handling.
tests/parity/init.py Introduces parity harness package marker.
tests/parity/_scoring.py Adds tiered mismatch scoring + fuzzy equality.
tests/parity/conftest.py Adds parity fixture discovery (snapshot + optional env var corpus).
tests/parity/KNOWN_GAPS.md Documents parity gap contract/rollout phases.
tests/parity/openpyxl_surface.py Defines openpyxl surface contract entries for smoke/parity tests.
tests/parity/ratchet.json Adds initial soft-tier baseline metadata + per-fixture entries.
tests/parity/test_surface_smoke.py Adds importability smoke checks for supported/gap symbols.
tests/parity/test_utils_parity.py Adds bug-for-bug parity tests for openpyxl utility functions.
tests/parity/test_read_parity.py Adds per-fixture read parity + ratchet enforcement.
tests/parity/test_write_parity.py Adds oracle-based write roundtrip tests (wolfxl → openpyxl).
tests/parity/fixtures/README.md Documents fixture corpus layout and refresh process.
tests/parity/fixtures/synthgl_snapshot/aging/customer_ar_aging_buckets.xlsx Adds committed parity fixture.
tests/parity/fixtures/synthgl_snapshot/aging/holdout_ap_aging_buckets.xlsx Adds committed parity fixture.
tests/parity/fixtures/synthgl_snapshot/cross_ref/buried_header_entity_rollup.xlsx Adds committed parity fixture.
tests/parity/fixtures/synthgl_snapshot/key_value/workbook_report_metadata_block.xlsx Adds committed parity fixture.
tests/parity/fixtures/synthgl_snapshot/rollforward/holdout_fixed_asset_movements.xlsx Adds committed parity fixture.
tests/parity/fixtures/synthgl_snapshot/stress/headcount_consolidation_looks_like_a5.xlsx Adds committed parity fixture.
tests/parity/fixtures/synthgl_snapshot/stress/mixed_blank_semantics.xlsx Adds committed parity fixture.
tests/parity/fixtures/synthgl_snapshot/stress/non_financial_scorecard_looks_like_a4.xlsx Adds committed parity fixture.
tests/parity/fixtures/synthgl_snapshot/time_series/holdout_quarterly_ops_statement.xlsx Adds committed parity fixture.
tests/parity/fixtures/synthgl_snapshot/time_series/sec_in_millions_except_per_share.xlsx Adds committed parity fixture.
crates/wolfxl-core/Cargo.toml Adds new crate manifest.
crates/wolfxl-core/README.md Adds crate README.
crates/wolfxl-core/src/lib.rs Adds crate public API + docs.
crates/wolfxl-core/src/error.rs Adds crate error types.
crates/wolfxl-core/src/cell.rs Adds Cell/CellValue model types.
crates/wolfxl-core/src/format.rs Adds format classification + rendering + tests.
crates/wolfxl-core/src/sheet.rs Adds sheet loading + value/style extraction.
crates/wolfxl-core/src/workbook.rs Adds workbook open/sheet access.
crates/wolfxl-core/tests/integration_real_xlsx.rs Adds optional integration test against sibling repo fixture.
crates/wolfxl-cli/Cargo.toml Adds new CLI crate manifest + deps.
crates/wolfxl-cli/README.md Adds CLI usage docs.
crates/wolfxl-cli/src/main.rs Adds wolfxl CLI entrypoint + peek subcommand parsing.
crates/wolfxl-cli/src/commands/mod.rs Adds commands module.
crates/wolfxl-cli/src/commands/peek.rs Implements peek command dispatch + options.
crates/wolfxl-cli/tests/cli.rs Adds golden-based CLI integration tests.
crates/wolfxl-cli/tests/golden/sample-financials.text Adds golden output.
crates/wolfxl-cli/tests/golden/sample-financials.csv Adds golden output.
crates/wolfxl-cli/tests/golden/sample-financials.json Adds golden output.
crates/wolfxl-cli/tests/golden/wide-table.text Adds golden output.
crates/wolfxl-cli/tests/golden/wide-table.csv Adds golden output.
crates/wolfxl-cli/tests/golden/wide-table.json Adds golden output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/parity/openpyxl_surface.py Outdated
Comment on lines +29 to +34
from dataclasses import dataclass, field
from enum import StrEnum


class SurfaceCategory(StrEnum):
"""Logical grouping of the openpyxl API surface."""
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow runs pytest tests/ on Python 3.9, but this module uses StrEnum (3.11+) and PEP 604 unions (str | None, 3.10+). This will fail to import under 3.9. Either adjust the CI matrix / requires-python to match the minimum supported version, or replace StrEnum/| with 3.9-compatible constructs (e.g. Enum + Optional[...] / Union[...] or a backport).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 799e8fb: replaced StrEnum with (str, Enum) mixin (3.9-compatible). PEP 604 unions only appear in annotations, which from __future__ import annotations keeps lazy-evaluated, so they're safe under 3.9 once StrEnum is gone.

Comment thread tests/parity/_scoring.py
Comment on lines +15 to +23
from dataclasses import dataclass, field
from enum import StrEnum
from typing import Any


class Tier(StrEnum):
HARD = "hard"
SOFT = "soft"
INFO = "info"
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file uses StrEnum (Python 3.11+) and will fail to import under Python 3.9, but CI runs the test suite on 3.9. Consider switching to Enum (with string values) or gating these parity tests to only run on Python versions that support StrEnum.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 799e8fb: switched to (str, Enum) mixin so the import works on Python 3.9. Verified locally with ruff check; CI matrix (3.9 + 3.12) will exercise it.

Comment thread tests/parity/conftest.py
Comment on lines +60 to +68
def _all_xlsx_fixtures() -> list[Path]:
seen: dict[str, Path] = {}
for source in (_discover_xlsx(SNAPSHOT_DIR), _live_synthgl_fixtures()):
for path in source:
# Dedupe by (name, size) so the env-var corpus doesn't duplicate
# the committed snapshot.
key = f"{path.name}:{path.stat().st_size}"
seen.setdefault(key, path)
return list(seen.values())
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_all_xlsx_fixtures dedupes by f"{path.name}:{size}", which can accidentally collapse distinct fixtures that share a filename and size in different directories/categories. That will silently reduce parity coverage. Use a stable relative path (e.g. relative to SNAPSHOT_DIR/root), or include the parent path segments in the dedupe key.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 799e8fb: dedupe key is now {relative_path}:{size}, where the relative path is computed against the source root (snapshot dir or live SynthGL dir). Two summary.xlsx files in different snapshot subfolders no longer collapse.

Comment thread tests/parity/_scoring.py Outdated
Comment on lines +6 to +9
* ``HARD`` - any mismatch fails CI.
* ``SOFT`` - tracked in ``ratchet.json`` (never allowed to decrease).
* ``INFO`` - reported only, not gated.

Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SOFT-tier description is inconsistent with the ratchet enforcement in test_read_parity.py, which fails when observed SOFT mismatches are greater than the baseline (i.e. SOFT must not increase). Update this docstring/comment to match the actual contract so future changes don’t invert the intended ratchet behavior.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 799e8fb: SOFT docstring now reads "count is tracked in ratchet.json; never allowed to increase over the committed baseline (regressions block CI)." Matches the > check in test_zz_ratchet_soft_failures_nondecreasing.

Comment thread crates/wolfxl-core/src/sheet.rs Outdated
Comment on lines +131 to +141
let days = serial.trunc() as i64;
let frac = serial.fract();
// Excel treats 1900-01-00 as serial 0 and has the famous 1900 leap-year
// bug; matching openpyxl's correction.
let base = NaiveDate::from_ymd_opt(1899, 12, 30).expect("static date");
let date = base
.checked_add_days(chrono::Days::new(days as u64))
.unwrap_or(base);
if frac.abs() < f64::EPSILON {
return CellValue::Date(date);
}
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excel_serial_to_datetime claims to match Excel/openpyxl’s 1900 leap-year correction, but it currently just adds days to the 1899-12-30 epoch with no adjustment for serials in (0, 60). That will produce off-by-one dates for early-1900 serials. Also, casting a negative days to u64 can wrap and yield incorrect results. Consider implementing the same (0 < serial < 60) adjustment used elsewhere and handle negative serials explicitly.

Suggested change
let days = serial.trunc() as i64;
let frac = serial.fract();
// Excel treats 1900-01-00 as serial 0 and has the famous 1900 leap-year
// bug; matching openpyxl's correction.
let base = NaiveDate::from_ymd_opt(1899, 12, 30).expect("static date");
let date = base
.checked_add_days(chrono::Days::new(days as u64))
.unwrap_or(base);
if frac.abs() < f64::EPSILON {
return CellValue::Date(date);
}
let mut days = serial.floor() as i64;
let frac = serial - (days as f64);
// Excel treats 1900-01-00 as serial 0 and has the famous 1900 leap-year
// bug; matching openpyxl's correction for serials before the fake
// 1900-02-29 entry.
if serial > 0.0 && serial < 60.0 {
days += 1;
}
let base = NaiveDate::from_ymd_opt(1899, 12, 30).expect("static date");
let date = if days >= 0 {
base.checked_add_days(chrono::Days::new(days as u64))
} else {
base.checked_sub_days(chrono::Days::new((-days) as u64))
}
.unwrap_or(base);
if frac.abs() < f64::EPSILON {
return CellValue::Date(date);
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 799e8fb: added the openpyxl +1 correction for serials in (0, 60). Serial 1 now maps to 1900-01-01 and serial 59 to 1900-02-28; serial 61 stays 1900-03-01 (skipping the fake serial-60 leap day). Negative serials now use checked_sub_days instead of letting the as u64 cast wrap. New tests excel_serial_matches_openpyxl_for_pre_leap_serials and excel_serial_negative_does_not_wrap lock the boundaries.

Comment thread tests/parity/test_read_parity.py Outdated
Comment on lines +218 to +225
def test_ratchet_soft_failures_nondecreasing() -> None:
"""SOFT mismatch count must never rise above the committed baseline.

This runs after all ``test_read_parity`` cases collected observations,
so pytest's ordering (tests collected alphabetically) means this test's
``test_`` name must sort AFTER the read tests. It does:
``test_ratchet_...`` > ``test_read_parity``.
"""
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_ratchet_soft_failures_nondecreasing relies on pytest collecting tests alphabetically, but test_ratchet_... sorts before test_read_parity, so _OBSERVATIONS will still be empty and the ratchet check will always skip. Rename the ratchet test to sort after the read tests (e.g. prefix with zz_) or enforce this check via a session-level hook/fixture finalizer that runs after all parametrized cases.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 799e8fb: renamed to test_zz_ratchet_soft_failures_nondecreasing. The zz_ prefix sorts last under any plain alphabetical ordering, so _OBSERVATIONS is populated by the time the ratchet check runs. Robust to pytest-xdist / pytest-randomly reordering without needing a session finalizer.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f0f19f2c21

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/calamine_styled_backend.rs Outdated
Comment on lines +904 to +907
let has_formula_backing_entry = value.is_some();
let should_emit_formula = formula.is_some()
&& !data_only
&& (include_formula_blanks || has_formula_backing_entry);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor include_formula_blanks for uncached formulas

read_sheet_records() currently treats any present cell payload as a backing value (value.is_some()), so formula cells whose cached value is just the formula text still count as “backed” and are emitted even when include_formula_blanks=False. This makes the flag ineffective for template-only formulas in the common data_only=False path and returns records callers explicitly asked to suppress.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged - tracked as a follow-up. include_formula_blanks for uncached formulas is a real semantics gap, but it's in the legacy single-crate src/calamine_styled_backend.rs that this PR doesn't touch. The next sprint's wolfxl-core extraction (planned ~0.5.0) will need to either honor the flag at the new Sheet::load boundary or document the divergence. Filing as an issue rather than expanding the diff here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked as #5 with the acceptance criteria spelled out, so it doesn't get lost when wolfxl-core grows the options surface in 0.5.0.

Comment on lines +67 to +68
let has_date = lower.contains('y') || lower.contains('d') || lower.contains("mmm");
let has_time = lower.contains('h') || lower.contains(":mm") || lower.contains(':');
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Strip bracketed tags before date-token classification

The date/time heuristic checks raw substrings (contains('d'), etc.) on the full format code, so non-date numeric formats with color tags like #,##0_);[Red](#,##0) are misclassified as Date because of the d in Red. This will mislabel numeric columns in any workflow consuming classify_format output.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 799e8fb: classify_format now strips [...] segments before the date/time substring scan via a new strip_bracketed_tags helper. The currency check still runs on raw fmt to preserve [$-409] detection. Locked in by bracketed_tags_dont_trigger_date_or_time_heuristic covering [Red], [Red]-, and [h]:mm:ss (the legitimate elapsed-hour case still classifies as Time).

The CLI's public-facing copy (READMEs, CHANGELOG, doc comments, test
names) leaned on a "byte-parity port of <prior-tool>" framing. That
framing leaks the migration motivation into the published artifact and
positions the crate as a derivative rather than as its own thing.

Rewrite the affected text to describe what the renderers do (integer
thousand-grouping, two-decimal floats, ISO dates, RFC 4180 CSV quoting,
stable JSON shape) and why the contract is locked in (golden tests in
tests/cli.rs back the token-cost figures the spreadsheet-peek skill
publishes). The WHY-context behind each format choice survives; the
prior tool's name doesn't appear in any committed artifact.

Tests stay green: 6 unit + 11 integration tests pass after the rename
trim_float_matches_xleak_display -> trim_float_matches_format_contract
and the `*_matches_xleak_*` -> `*_matches_golden_*` renames.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@wolfiesch wolfiesch changed the title feat: wolfxl-core + wolfxl-cli (0.4.0) — Rust workspace, peek CLI at xleak parity, PyO3 0.28 feat: wolfxl-core + wolfxl-cli 0.4.0 + PyO3 0.28 (Python 3.14 support) Apr 19, 2026
Copilot:
- _scoring.py, openpyxl_surface.py: replace StrEnum (3.11+) with
  (str, Enum) mixin so the parity harness imports on Python 3.9-3.10
  (CI matrix includes 3.9).
- conftest.py: dedupe fixtures on relative-path + size, not just
  filename + size; same-named files in different snapshot subfolders
  no longer silently collapse and shrink coverage.
- _scoring.py: doc-clarify the SOFT tier - count is ratcheted, never
  allowed to *increase* over baseline (regressions block CI).
- test_read_parity.py: rename ratchet test to test_zz_ so plain
  alphabetical collection order keeps it last (depends on per-fixture
  observations).

Codex P2:
- format.rs: strip [...] segments before the date/time substring scan.
  [Red] / [h] etc previously misclassified plain numeric formats like
  '#,##0_);[Red](#,##0)' as Date. Currency check still runs on raw fmt
  to preserve [$-409] detection. Locked in by new test.
- sheet.rs: Excel 1900 leap-year correction. openpyxl uses base
  1899-12-30 to dodge the fake serial-60 bug for serials >= 60, but
  that leaves serials 1..59 off by one day. Added +1 correction so
  serial 1 -> 1900-01-01 (matches openpyxl). Also switched negative
  serials to checked_sub_days to avoid u64 wrap producing far-future
  dates. Two new tests pin the boundaries (1, 59, 61, 44197, -100).

Lint:
- python/wolfxl/utils/cell.py: ruff UP033 - lru_cache(maxsize=None) -> cache.
- test_utils_parity.py: ruff I001 import sort.
@wolfiesch
Copy link
Copy Markdown
Collaborator Author

@codex review — addressed all 8 review findings in 799e8fb. Summary:

  • 6 Copilot findings: StrEnum(str, Enum) mixin (3.9-compat) in _scoring.py + openpyxl_surface.py; conftest dedupe key now uses relative path (no silent fixture collapse); SOFT docstring corrected to match the ratchet > check; ratchet test renamed test_zz_ for stable post-observation collection order; Excel 1900 leap-year correction (+1 for serials in 0..60) + negative-serial guard in sheet.rs.
  • 1 Codex P2: format.rs strips bracketed tags before date/time substring scan via new strip_bracketed_tags helper; currency check still runs raw to keep [$-409] detection.
  • 1 Codex P2 deferred + tracked as Honor include_formula_blanks for uncached formulas in wolfxl-core #5 (include_formula_blanks for uncached formulas — lives in legacy backend that this PR doesn't touch; will be addressed at the wolfxl-core options surface in 0.5.0).

CI: 9/9 green (lint, parity, 6× pytest matrix on ubuntu/macos/windows × 3.9/3.12).

@wolfiesch
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 799e8fbf00

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/calamine_styled_backend.rs Outdated
Comment on lines +904 to +907
let has_formula_backing_entry = value.is_some();
let should_emit_formula = formula.is_some()
&& !data_only
&& (include_formula_blanks || has_formula_backing_entry);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Respect include_formula_blanks for empty formula results

When include_formula_blanks=False, template formulas without cached values should be omitted, but has_formula_backing_entry is currently computed as value.is_some(). Inside a rectangular range, calamine returns Some(Data::Empty) for blank cells, so this flag becomes true and should_emit_formula still emits the formula record. In sheets where uncached formulas sit between populated cells, callers cannot suppress these placeholder formulas even though they explicitly opted out.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 69d0ae7.

Root cause was deeper than described: calamine returns Some(Data::String("")) for the empty <v/> element on a formula cell (not Some(Data::Empty) as suggested), and Some(Data::Empty) only for blanks inside the rectangular bounding box. Both forms are semantically "the formula has no cached result."

Fix hoists value_is_formula_placeholder (using data_is_formula_text) and applies it to both has_value and has_formula_backing_entry, so:

  • include_formula_blanks=false actually suppresses the cell (it no longer falls through has_value via the empty string)
  • data_only=true still skips it via value_is_uncached_formula

Locked in by a new pytest in tests/test_wolfxl_compat.py::test_cell_records_include_formula_blanks_false_skips_uncached_inside_range that constructs the exact B2 scenario via openpyxl. 506 passed locally.

Comment thread crates/wolfxl-core/src/sheet.rs Outdated
Comment on lines +152 to +156
let secs = (frac * 86_400.0).round() as u32;
let h = secs / 3600;
let m = (secs % 3600) / 60;
let s = secs % 60;
let time = NaiveTime::from_hms_opt(h.min(23), m.min(59), s.min(59))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Carry rounded day-fraction overflow before building time

frac * 86_400 is rounded to whole seconds, so fractions very close to 1.0 can become 86400. This path then clamps h with h.min(23) instead of carrying into the next day, producing an incorrect timestamp (for example a near-midnight value can become 23:00:00 on the same date). This creates deterministic off-by-one-day/time errors for valid serials near day boundaries.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 69d0ae7.

Replaced the h.min(23) clamp with a proper day-carry: when (frac * 86_400).round() lands on 86_400 (a full day), advance the date by one and reset secs to 0, so 44197 + 0.99999999 now returns 2021-01-02 00:00:00 instead of 2021-01-01 23:00:00.

Locked in by excel_serial_carries_near_midnight_fraction_to_next_day in crates/wolfxl-core/src/sheet.rs tests, plus a code comment explaining what the prior clamp was doing wrong. All 12 wolfxl-core tests + doctest still pass.

P1 (calamine_styled_backend.rs): include_formula_blanks=False was a
no-op for template formulas. Two storage cases both meant "no cached
result":
  - Some(Data::Empty)         - blank inside the rectangular bounding box
  - Some(Data::String(""))    - empty <v/> child of a formula cell
Old `value.is_some()` flagged both as backing entries, and `has_value`
also stayed true for the empty-string case, so the cell either emitted
as data_type=formula (when include_formula_blanks=True) or as
data_type=string with value="" (when False). Neither matched user
intent.

Hoist the "looks like the formula text" heuristic out of
`value_is_uncached_formula` (data_only-only) into a
`value_is_formula_placeholder` that also runs in `data_only=False`
mode, then use it to gate both `has_formula_backing_entry` and
`has_value`. Result: a formula whose calamine value is Empty, "", or
the formula text itself is treated uniformly as uncached - emit when
include_formula_blanks=True, suppress when False. Locked in by new
test `test_cell_records_include_formula_blanks_false_skips_uncached_inside_range`.

P2 (sheet.rs): excel_serial_to_datetime clamped `h.min(23)` instead
of carrying when `frac * 86_400` rounded up to exactly 86_400 (e.g.
44197 + 0.99999999). That produced 23:00:00 on the wrong date for
fractions arbitrarily close to 1.0. Carry into next day via
checked_add_days, drop the defensive clamps (which were hiding the
rollover bug). New test
`excel_serial_carries_near_midnight_fraction_to_next_day` pins
44197.99999999 -> 2021-01-02 00:00:00.

Closes the P2 (now P1) escalation Codex flagged on issue #5.
@wolfiesch
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 69d0ae7543

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +649 to +650
def max_row(self) -> int:
return self._max_row()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Include buffered writes in max_row/max_column

The newly exposed max_row/max_column properties delegate to _max_row()/_max_col(), but in write mode those helpers only inspect _cells and ignore _append_buffer/_bulk_writes. As a result, after append() or write_rows() (before materialization/save), these properties under-report sheet size (often staying at 1), which can cause downstream range calculations and iteration bounds to miss newly added data.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 31c4259.

Extracted _pending_writes_bounds() as the single source of truth over _cells + _append_buffer + _bulk_writes, and folded it into _max_row() / _max_col() so write-mode append() and write_rows() are reflected before save. Read-mode path still calls _read_dimensions() (preserving <dimension>-tag parity with openpyxl — discovered during the refactor that routing everything through read_sheet_bounds breaks tests/parity/test_read_parity.py on workbooks where calamine's parsed range is narrower than the XML tag).

Locked in by tests/test_wolfxl_compat.py::test_max_row_max_column_reflect_pending_writes_in_write_mode — two append() calls of widths 3 and 4 expects max_row == 2, max_column == 4.

Comment on lines +587 to +591
if wb._rust_reader is not None: # noqa: SLF001
bounds = wb._rust_reader.read_sheet_bounds(self._title) # noqa: SLF001
if isinstance(bounds, tuple) and len(bounds) == 4:
return tuple(int(value) for value in bounds) # type: ignore[return-value]
return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Account for unsaved edits in calculate_dimension

_read_dimension_bounds() returns Rust-reader bounds whenever _rust_reader exists, but modify mode also has _rust_reader while pending edits live in Python-side buffers/cells. This means calculate_dimension() can return stale pre-edit bounds in the same session after users modify cells, so callers that derive ranges from calculate_dimension() may omit unsaved writes.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 31c4259.

_read_dimension_bounds() now unions on-disk Rust bounds with _pending_writes_bounds() instead of short-circuiting to one or the other. So in modify mode, calculate_dimension() reflects the union of <dimension> extents and pending Python-side cell edits / append buffers / bulk writes.

Locked in by tests/test_wolfxl_compat.py::test_calculate_dimension_includes_modify_mode_pending_edits — opens an A1:B2 workbook in modify mode, writes ws['E10'], asserts calculate_dimension() == 'A1:E10'.

Comment thread crates/wolfxl-core/src/sheet.rs Outdated
/// Excel serial date → chrono. Sub-day fractions become Time; ≥1.0 with no
/// fractional part becomes Date; otherwise DateTime.
fn excel_serial_to_datetime(serial: f64) -> CellValue {
if serial < 1.0 && serial > 0.0 {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle Excel serial 0 as midnight time

The time-only branch excludes serial == 0 by checking serial > 0.0, so a zero serial falls through to the date path and is returned as 1899-12-30 instead of 00:00:00. For time-formatted cells containing midnight, this produces incorrect values in wolfxl-core (and therefore CLI output built on it).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 31c4259.

Loosened the time-only branch from serial > 0.0 to serial >= 0.0, so excel_serial_to_datetime(0.0) now returns Time(0, 0, 0) instead of falling through to Date(1899-12-30). Matches openpyxl's from_excel(0) behavior for time-formatted cells.

Locked in by excel_serial_zero_returns_midnight_time in crates/wolfxl-core/src/sheet.rs.

P1 (python/wolfxl/_worksheet.py:650, comment 3106693767): `max_row` /
`max_column` previously delegated to `_max_row()` / `_max_col()`, which
in pure write mode looked only at `_cells` and ignored the
`_append_buffer` / `_bulk_writes` fast paths. After `ws.append([...])`
or `ws.write_rows(...)`, callers got 1 instead of the buffered extent
until materialization, so anything deriving ranges from `max_row` /
`max_column` (e.g. `iter_rows` bounds) silently missed appended data.

P2 (python/wolfxl/_worksheet.py:591, comment 3106693768):
`_read_dimension_bounds()` short-circuited to Rust-reader bounds whenever
`_rust_reader` existed, which is also the case in modify mode where
pending edits live in Python-side buffers. `calculate_dimension()` thus
returned the on-disk extents even after `ws['E10'] = 'x'`, and any range
derived from it omitted unsaved cells.

Fix:
- Extract `_pending_writes_bounds()` — single-source-of-truth scan over
  `_cells` + `_append_buffer` + `_bulk_writes`, returning
  `(min_row, min_col, max_row, max_col)` or `None`.
- `_read_dimension_bounds()` unions Rust bounds with pending writes
  instead of replacing one with the other.
- `_max_row()` / `_max_col()` keep calling `_read_dimensions()` (which
  honors the on-disk `<dimension>` XML tag — required for openpyxl
  parity, see `tests/parity/test_read_parity.py` regression caught
  during refactor) and additionally union with pending writes.

P2 (crates/wolfxl-core/src/sheet.rs:122, comment 3106693771):
`excel_serial_to_datetime(0.0)` fell through the `serial > 0.0` time
guard and returned `Date(1899-12-30)` instead of `Time(00:00:00)`.
Time-formatted cells containing midnight were emitted as the Excel
epoch date by the CLI. Loosened the guard to `serial >= 0.0` so 0.0
maps to `Time(0, 0, 0)`, matching openpyxl.

Lock-in tests:
- `test_max_row_max_column_reflect_pending_writes_in_write_mode` —
  pure write mode, two `append()` calls of widths 3 and 4 → expects
  `max_row == 2` and `max_column == 4` before save.
- `test_calculate_dimension_includes_modify_mode_pending_edits` —
  modify mode, on-disk `A1:B2` plus pending `ws['E10'] = ...` → expects
  `calculate_dimension() == 'A1:E10'`.
- `excel_serial_zero_returns_midnight_time` — asserts
  `excel_serial_to_datetime(0.0) == Time(00:00:00)`.

605 Python passed, 11 wolfxl-core unit + 11 wolfxl-cli unit + 1
integration + 1 doctest all green locally.
@wolfiesch
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 31c4259ce7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/wolfxl-core/src/sheet.rs Outdated
Comment on lines +125 to +131
let secs = (serial * 86_400.0).round() as u32;
let h = secs / 3600;
let m = (secs % 3600) / 60;
let s = secs % 60;
return NaiveTime::from_hms_opt(h, m, s)
.map(CellValue::Time)
.unwrap_or_else(|| CellValue::Float(serial));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle sub-day serial rounding overflow in time conversion

For serial values in [0, 1), this branch rounds to whole seconds and immediately builds a NaiveTime. Inputs near 1.0 (for example 0.99999999) round to 86400, which makes h == 24, so from_hms_opt fails and the code falls back to CellValue::Float(serial). That silently changes a valid time-like cell into a numeric float instead of a time value, unlike the date+time branch below which already carries >= 86400 seconds safely.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3667122. The sub-day branch now carries seconds==86_400 down to 0 before calling NaiveTime::from_hms_opt, mirroring the day-fraction carry already present in the date+time branch. Added unit test excel_serial_sub_day_near_midnight_carries_to_zero_time to lock it in. Without the carry, 0.99999999 silently demoted a time-typed cell to CellValue::Float.

Comment on lines +493 to +494
if self._workbook._rust_reader is None: # noqa: SLF001
yield from self._iter_cell_records_python(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid stale record reads after unsaved edits in modify mode

This condition routes any worksheet with a Rust reader through the bulk Rust path, which means modify mode (_rust_reader + pending Python-side edits) reads from on-disk data only. As a result, iter_cell_records() / cell_records() can return stale values after in-memory edits (ws['A1']=..., append(), write_rows()) until save() runs, which is inconsistent with the current worksheet state.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3667122. iter_cell_records now collects a pending overlay (_dirty + _append_buffer + _bulk_writes) and patches Rust-returned records plus yields any pending cells outside the original disk range that fall within the requested bounds. Pure read mode is unchanged - the overlay dict is empty and we skip it. Added test_cell_records_overlays_modify_mode_pending_edits to lock in the modify+overlay behavior.

…de overlay)

P2 (crates/wolfxl-core/src/sheet.rs:131, comment 3106711010):
For ``serial`` in ``[0, 1)``, the time-only branch rounded
``(serial * 86_400).round()`` to ``86_400`` for inputs near 1.0 (e.g.
``0.99999999``), making ``h == 24`` and ``NaiveTime::from_hms_opt``
return ``None``. The fallback path silently demoted a time-typed cell to
``CellValue::Float(serial)`` — wrong both in shape and meaning.

Fix mirrors the day-carry the date+time branch already uses below: when
``secs >= 86_400``, subtract one day's worth so the value lands on
``Time(00:00:00)`` (the "next midnight" of a sub-day-only value, with
no date to carry into).

P2 (python/wolfxl/_worksheet.py:494, comment 3106711012):
``iter_cell_records()`` routed through the Rust reader whenever
``_rust_reader`` existed. In modify mode that meant on-disk values
masked Python-side pending edits — ``ws['A1'] = 'new'`` followed by
``ws.cell_records()`` returned the pre-edit value, so anything
ingesting via this API saw stale state until ``save()``.

Fix overlays pending edits on top of the Rust record stream:
- ``_collect_pending_overlay()`` returns ``{(row, col): value}`` for
  ``_dirty`` cells + ``_append_buffer`` + ``_bulk_writes``. Empty when
  nothing is pending — pure read mode pays no extra cost (the hot
  ``yield from records`` path is unchanged).
- For each Rust record, swap in the overlay value (and recompute
  ``data_type``) when the cell is dirty.
- After streaming Rust records, yield overlay entries that fell inside
  the requested range but weren't returned by Rust (e.g. empty-on-disk
  cells the user just set).

Lock-in tests:
- ``excel_serial_sub_day_near_midnight_carries_to_zero_time`` —
  ``excel_serial_to_datetime(0.99999999) == Time(00:00:00)``.
- ``test_cell_records_overlays_modify_mode_pending_edits`` — opens
  workbook with on-disk ``A1='old-a1'``, ``B2='old-b2'``; in modify
  mode overwrites A1, adds E10 outside on-disk range; asserts
  cell_records sees ``A1='new-a1'``, ``B2='old-b2'``, ``E10='added'``.

606 Python passed, 12 wolfxl-core + 11 wolfxl-cli + 1 integration + 1
doctest all green locally.
@wolfiesch
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3667122670

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +624 to +625
if !data_only {
// Check the fast formula map (parsed from worksheet XML in a single pass).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Treat uncached formula placeholders as blank in data_only reads

When data_only=True, this branch skips formula handling and falls through to data_to_py, but formula cells without cached results are represented by calamine as placeholder values (e.g. empty string). That makes read_cell_value (and the bulk readers with the same pattern) return a non-blank value instead of the openpyxl-compatible blank/None result for uncached formulas, which can corrupt downstream ingestion in template-heavy sheets. This file already contains explicit placeholder handling in read_sheet_records, so the data-only readers should apply the same normalization.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 383b986. Added a shared is_uncached_formula_value helper and applied it in read_cell_value, read_sheet_values, and read_sheet_values_plain so all three readers normalize calamine's Some(Data::String("")) / Some(Data::Empty) placeholders to blank/None when data_only=true, matching what read_sheet_records already does. Tests test_read_cell_value_data_only_blanks_uncached_formula and test_iter_rows_values_only_blanks_uncached_formulas lock in the per-cell and bulk paths respectively.

Comment thread python/wolfxl/_worksheet.py Outdated
Comment on lines +543 to +545
patched["data_type"] = (
"blank" if new_value is None else type(new_value).__name__
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve canonical data_type labels in overlay records

The overlay path assigns data_type from Python runtime type names (str, int, bool, etc.), while non-overlay records from Rust use the API’s canonical labels (string, number, boolean, ...). In modify mode with pending edits, this produces mixed schemas in a single cell_records() result and breaks consumers that filter/group by the documented canonical tokens.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 383b986. Added _canonical_data_type helper that mirrors Rust's data_type_name closed set (string / number / boolean / date / datetime / blank). Routed both overlay emission paths (patched records and extra-overlay records) plus the python-side fallback iterator through it. Test test_cell_records_overlay_uses_canonical_data_type_labels covers all four primitive types including the bool-before-int subclass check.

…al labels)

P1 (3106730333) — Treat uncached formula placeholders as blank in data_only
reads. read_sheet_records already normalizes calamine's
Some(Data::String("")) and Some(Data::Empty) placeholders for formula cells
with no cached <v/> result; per-cell read_cell_value, read_sheet_values, and
read_sheet_values_plain now apply the same check via a shared
is_uncached_formula_value helper. Without this, data_only reads on
template-heavy sheets returned empty strings or zero-equivalents instead of
the openpyxl-compatible blank/None.

P2 (3106730334) — Preserve canonical data_type labels in overlay records.
Rust's read_sheet_records emits closed-set labels (string / number / boolean
/ date / datetime / blank). Overlay records previously used Python runtime
type names (str / int / bool), producing mixed schemas in modify mode that
broke consumers filtering on the documented tokens. Routed both overlay
emission paths plus the python-side fallback iterator through a new
_canonical_data_type helper that mirrors Rust's data_type_name closed set.

Tests added (all green):
- test_read_cell_value_data_only_blanks_uncached_formula
- test_iter_rows_values_only_blanks_uncached_formulas
- test_cell_records_overlay_uses_canonical_data_type_labels

Verification: 609 Python passed (was 606, +3 new), 12 wolfxl-core unit + 1
doctest, 11 wolfxl-cli unit + 1 integration, 97 parity passed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@wolfiesch
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 383b986118

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread python/wolfxl/_worksheet.py Outdated
Comment on lines +717 to +721
cells = self._cells
buf = self._append_buffer
bulk = self._bulk_writes
if not cells and not buf and not bulk:
return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Exclude read-only cached cells from pending bounds

_pending_writes_bounds() treats every entry in _cells as a pending write, but _cells is populated by read access (ws['A1'], non-values_only iteration) even when nothing was modified. Because _max_row, _max_col, and calculate_dimension() now union against this helper, merely touching a far cell can inflate worksheet bounds (for example, reading Z999 makes bounds look like row 999/col 26), which can trigger oversized scans and incorrect range calculations in downstream callers.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4dea6a7. Switched _pending_writes_bounds to iterate self._dirty (the set of actually-modified cell keys) instead of self._cells (the read cache). Test test_max_row_does_not_inflate_from_read_only_cell_access verifies that reading ws['Z999'].value leaves max_row/max_column/calculate_dimension untouched, then confirms a real assignment grows them as expected.

Comment thread python/wolfxl/_worksheet.py Outdated
Comment on lines +29 to +30
if isinstance(value, str):
return "string"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve formula data_type in overlay record patches

Overlay records in iter_cell_records() rely on _canonical_data_type(), which classifies all strings as "string". This mislabels unsaved formula edits (e.g. values starting with =) as plain strings, so pending formula changes lose data_type="formula" semantics compared with Rust-backed records. Any consumer that filters or counts formula records will miss modified-but-unsaved formulas.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4dea6a7. _canonical_data_type now classifies string values starting with '=' as 'formula' to match openpyxl's convention and Rust's formula_map_cache path. Test test_cell_records_overlay_labels_pending_formulas_as_formula confirms a pending ws['B2'] = '=SUM(A1:A1)' shows up as data_type='formula' in cell_records().

P1 (3106743052) — Exclude read-only cached cells from pending bounds.
`_pending_writes_bounds` previously iterated `self._cells`, which is
populated by mere read access (`ws['Z999'].value` materializes a Cell
without dirtying it). Because `_max_row`/`_max_col`/`calculate_dimension`
union against this helper, reading a far cell silently inflated worksheet
bounds and triggered oversized scans in downstream callers. Switched the
iteration target to `self._dirty` (the set of actually-modified cell
keys), which is the same source `_collect_pending_overlay` already uses.

P2 (3106743053) — Preserve formula data_type in overlay record patches.
`_canonical_data_type` previously labeled all strings as "string", so a
pending edit like `ws["B2"] = "=SUM(A1:A1)"` showed up in
`cell_records()` as `data_type="string"` while a saved-and-reloaded copy
showed `"formula"` (the Rust formula_map_cache path classifies it as such).
Mixed schemas in a single result broke any consumer counting/filtering
formula records. The helper now mirrors openpyxl's convention: a string
beginning with `=` is `"formula"`.

Tests added (both green):
- test_max_row_does_not_inflate_from_read_only_cell_access
- test_cell_records_overlay_labels_pending_formulas_as_formula

Verification: 611 Python passed (was 609, +2 new), 97 parity passed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@wolfiesch
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4dea6a729e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +575 to +577
patched = dict(record)
patched["value"] = new_value
patched["data_type"] = _canonical_data_type(new_value)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Recompute formula metadata when overlaying edited records

When cell_records() overlays pending modify-mode edits, it only updates value and data_type on top of the on-disk record. If a user replaces an existing formula cell with a literal (or the reverse), the stale formula field from the original record is left behind, so downstream consumers can see contradictory metadata for the same cell and mis-handle formula detection.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in da9431f. The overlay patch now refreshes the formula field instead of leaving the on-disk text behind: a literal-overwrites-formula edit pops the key, and a formula-overwrites-anything edit sets it from the new value (leading = stripped to match the Rust reader's convention). Lock-in tests test_cell_records_overlay_clears_stale_formula_metadata_on_literal_swap and test_cell_records_overlay_replaces_formula_metadata_on_formula_swap cover both directions.

Comment on lines +529 to +533
min_row=min_row,
max_row=max_row,
min_col=min_col,
max_col=max_col,
include_empty=include_empty,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Honor include_coordinate in write-mode cell_records fallback

In write mode (_rust_reader is None), iter_cell_records() forwards only include_empty to the Python fallback and drops include_coordinate, even though the API docs promise include_coordinate=False avoids A1 allocation. This makes write-mode output/schema diverge from read mode and forces unnecessary coordinate generation for large in-memory sheets.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in da9431f. iter_cell_records now forwards include_coordinate to _iter_cell_records_python, and the fallback only sets the coordinate key when requested. Lock-in test test_iter_cell_records_write_mode_honors_include_coordinate constructs a fresh Workbook() (no Rust reader → fallback path) and asserts the key is present/absent as the flag dictates.

…oordinate in write fallback

Two issues from Codex round-6 review on PR #4 commit 4dea6a7:

P2 (cell_records overlay leaked stale formula text). When a pending edit
replaced an on-disk formula cell with a literal, only `value`/`data_type`
were patched — the original `formula` field stayed on the record, so a
consumer would see `data_type='number'` next to a `formula='SUM(...)'`
field for the same cell. The reverse (literal → formula) failed to set
the new formula text. Now: literal overwrites pop the `formula` key;
formula overwrites set it from the new value (leading `=` stripped to
match Rust's reader convention).

P3 (write-mode fallback dropped include_coordinate). `iter_cell_records`
forwarded only `include_empty` to `_iter_cell_records_python`, even though
the API docs promise `include_coordinate=False` skips A1 string allocation.
Write mode emitted a `coordinate` key unconditionally, diverging from
read mode and forcing wasted work on large in-memory sheets.

Lock-in tests:
- test_cell_records_overlay_clears_stale_formula_metadata_on_literal_swap
- test_cell_records_overlay_replaces_formula_metadata_on_formula_swap
- test_iter_cell_records_write_mode_honors_include_coordinate

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@wolfiesch
Copy link
Copy Markdown
Collaborator Author

@codex review

The new write-mode coordinate test never saves a file, so requesting the
`tmp_path` pytest fixture and binding `path = tmp_path / "..."` was dead
weight that ruff F841 caught on CI lint.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: da9431ff39

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +600 to +604
extra: dict[str, Any] = {
"row": row,
"column": col,
"value": value,
"data_type": _canonical_data_type(value),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Include formula text in overlay-only formula records

When iter_cell_records() overlays pending edits in modify mode, the branch that emits records not returned by Rust only sets row, column, value, and data_type. If a pending edit is a formula string (for example a new "=SUM(...)" in an on-disk empty cell), the emitted record becomes data_type="formula" but has no formula field, unlike formula records from the Rust path. This drops expression metadata for unsaved edits and breaks consumers that read record["formula"] for formula cells.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5f6cc0c. The extra-overlay branch now emits the formula key alongside data_type='formula' when the pending value is a "=..." string, with the same leading-= strip applied for Rust-style consistency. Lock-in test test_cell_records_extra_overlay_emits_formula_field_for_pending_formula writes a formula into a previously empty cell (so the Rust reader returns no record for it) and asserts the resulting overlay record has both data_type='formula' and the expected formula field.

Comment thread python/wolfxl/_worksheet.py Outdated
if buf:
start = self._append_buffer_start
buf_max_r = start + len(buf) - 1
buf_max_c = max(len(row) for row in buf)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prevent pending bounds from producing column index 0

_pending_writes_bounds() computes appended-row width with max(len(row) for row in buf). If buffered appends include an empty row (e.g. append([])), this yields 0, which can propagate to _max_col()==0; downstream 1-based coordinate helpers (like rowcol_to_a1 via dimension/reporting paths) then see an invalid column index. Clamp this width to at least 1 (or ignore zero-width rows) so pending bounds never generate column 0.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5f6cc0c. _pending_writes_bounds() now uses max((len(row) for row in buf), default=0) and only contributes to the column-bounds branch when the width is >0. The same guard is applied to the bulk-write branch. An empty appended row still consumes a row index (so max_r grows) but leaves max_c alone, so _max_col() can never collapse to 0 from a zero-width append. Lock-in test test_pending_writes_bounds_skips_zero_width_appended_rows invokes ws.append([]) and asserts calculate_dimension() doesn't leak A0 into the result.

…bounds against zero-width rows

Two findings from Codex round-7 review on PR #4 commit caa57ad:

P1 (overlay-only formula records dropped expression metadata). Round 6
fixed the patched-overlay branch (where a Rust-returned record gets a
field swap) to refresh `formula` on edits. The parallel "extra-overlay"
branch — for cells the Rust reader didn't return at all — built a fresh
record dict that omitted `formula` entirely, even when the new value
was a `"=..."` formula string. So a `data_type='formula'` record with
no expression leaked out for any pending formula written into a
previously empty cell. Fix: mirror the same `formula = value[1:]`
derivation in the extra-overlay constructor.

P2 (zero-width appended row could collapse max_col to 0). `_pending_writes_bounds()`
computed `max(len(row) for row in buf)` for the append buffer's
column extent. An `ws.append([])` (or a bulk grid where every row is
empty) yields 0 columns, but the code still set `min_c = 1`, leaving
the bounds as `min_c=1 / max_c=0` — invalid. Downstream
`rowcol_to_a1(_, 0)` would either raise or produce garbage A1.
Fix: switch to `max(..., default=0)` and gate the column-bounds
contribution on `>0`. Same guard added to the bulk-write branch.

Lock-in tests:
- test_cell_records_extra_overlay_emits_formula_field_for_pending_formula
- test_pending_writes_bounds_skips_zero_width_appended_rows

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@wolfiesch
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5f6cc0c123

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread python/wolfxl/_worksheet.py Outdated
Comment on lines +39 to +40
if isinstance(value, _dt.date):
return "date"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Align overlay date labels with Rust record schema

iter_cell_records() overlays pending Python edits using _canonical_data_type(), which maps datetime.date to "date", but the Rust read_sheet_records() path emits temporal cells as "datetime" (via data_type_name). In modify mode, editing a date cell can therefore produce mixed data_type tokens in one result set, which breaks consumers that group/filter on the documented canonical labels. Either map Python date to "datetime" here or make the Rust path emit "date" consistently.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5d5fd84. _canonical_data_type now collapses all temporal types (datetime / date / time) to "datetime" to match Rust's data_type_name(), which emits a single "datetime" label for both Data::DateTime and Data::DateTimeIso. Updated the docstring to drop the now-absent "date" token. Lock-in test test_canonical_data_type_collapses_temporals_to_datetime asserts all three branches return "datetime".

Comment thread crates/wolfxl-core/src/sheet.rs Outdated
if frac.abs() < f64::EPSILON {
return CellValue::Date(date);
}
let secs = (frac * 86_400.0).round() as u32;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle negative fractional Excel serials without u32 wrap

For negative serials with fractional parts (for example -0.5), secs is computed as (frac * 86_400.0).round() as u32, and casting a negative value to u32 wraps to a huge positive number. The downstream carry logic then yields incorrect date/time results instead of a valid pre-epoch datetime, so pre-1900 timestamp cells can be corrupted. Keep this arithmetic signed until normalization (borrow/carry) is finished.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5d5fd84. The day-fraction arithmetic now uses signed i64; when secs_signed < 0 we borrow whole days off days (mirroring the existing >= 86_400 carry), then cast to u32 only after the value lands in [0, 86_400). Pulled days → NaiveDate into a days_to_date_from_excel_base helper since the borrow runs before date materialization now. Lock-in test excel_serial_negative_fractional_borrows_into_prior_day checks serial -0.5 lands on 1899-12-29 12:00:00.

…tional Excel serials

Two findings from Codex round-8 review on PR #4 commit 5f6cc0c:

P2 (overlay date label diverged from Rust). `_canonical_data_type()`
mapped `datetime.date` to `"date"`, but Rust's `data_type_name()` emits
`"datetime"` for both `Data::DateTime` and `Data::DateTimeIso`. In
modify mode, an overlay edit to a date cell would mix `"date"` and
`"datetime"` tokens inside one `cell_records()` result, breaking
consumers that group/filter by the documented vocabulary. Collapse all
temporal types (`datetime` / `date` / `time`) to `"datetime"` and update
the docstring to remove the now-absent `"date"` token.

P2 (negative fractional Excel serial wrapped to garbage). The day-fraction
arithmetic computed `(frac * 86_400.0).round() as u32`, which wraps a
negative f64 to a huge positive u32. For a serial like `-0.5` (12:00 the
day before the 1899-12-30 base), the wrap then drove the carry-forward
branch into a corrupted far-future datetime. Switched to signed i64
arithmetic, with a borrow when `secs < 0` mirroring the existing
`secs >= 86_400` carry. Pulled the `days → NaiveDate` step into a
helper since it now needs to run after the borrow/carry adjusts `days`.

Lock-in tests:
- test_canonical_data_type_collapses_temporals_to_datetime (Python)
- excel_serial_negative_fractional_borrows_into_prior_day (Rust)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@wolfiesch
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5d5fd844d5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +27 to +31
for r in 0..h {
let mut row: Vec<Cell> = Vec::with_capacity(w);
for c in 0..w {
let value = value_range
.get((r, c))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Offset cell iteration by the range origin

worksheet_range() can return a range whose origin is not (0, 0) (for example when the first populated cell is C5). This loop indexes value_range and style_range with raw 0..h / 0..w coordinates, but Range::get expects absolute sheet coordinates, so non-A1 sheets are read as blank or misaligned and their formats are attached to the wrong cells. Add the range start offsets when computing (row, col) before calling get.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for flagging - I dug into the calamine-styles source before changing this. The claim that Range::get expects absolute coordinates is reversed: it's get_value that takes absolute coords; get takes relative ones. From calamine-styles 0.1.0/src/lib.rs:

// line 1099
pub fn get_value(&self, absolute_position: (u32, u32)) -> Option<&T> { ... }

// line 1140
pub fn get(&self, relative_position: (usize, usize)) -> Option<&T> {
    let (row, col) = relative_position;
    let width = self.width();
    if row >= self.height() || col >= width { None }
    else { self.inner.get(row * width + col) }
}

So value_range.get((r, c)) with r, c ∈ 0..h × 0..w (where (h, w) = value_range.get_size()) reads every cell in the range correctly, regardless of whether the range starts at A1 or C5. Cells are not "blank or misaligned" and number formats are attached to the right cells inside the range.

There is a legitimate but distinct concern in the neighborhood: Sheet's internal Vec<Vec<Cell>> doesn't preserve the absolute origin of the range, so sheet.rows()[0][0] represents the first populated cell rather than literal A1. The current public API (headers, rows, row, dimensions) exposes no absolute coordinates and the only consumer (crates/wolfxl-cli/src/render.rs) renders the populated block as a tabular preview — which matches xleak's behavior on non-A1 sheets — so v0.4.0 peek output is unaffected.

Tracking the absolute-origin work as a follow-up rather than blocking this PR: when we add --map / --cell A1 / citation features in sprint 2, Sheet will need a (start_row, start_col) field plus absolute-position accessors. Filing a follow-up issue now.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed as #6 - Sheet absolute-origin work tracked there for sprint 2.

@wolfiesch
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Ok(Event::Empty(ref e)) => {
// Handle self-closing <c .../> (cells with no children — no formula).
if e.local_name().as_ref() == b"c" {
// No formula possible in an empty element, skip.
}

P1 Badge Handle shared-formula <f/> entries in XML parser

parse_formulas_from_sheet_xml only captures formulas when <f> contains text and the Event::Empty branch ignores empty formula elements, but shared-formula follower cells are commonly encoded as self-closing <f t="shared" si="..."/>. Those cells never reach formula_map_cache, so read_cell_value, read_sheet_values, and read_sheet_records in data_only=false mode can return cached/literal values instead of formula metadata for drag-filled formula ranges. This causes incorrect read semantics on many real XLSX files that use shared formulas.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@wolfiesch wolfiesch merged commit 303a162 into main Apr 19, 2026
9 checks passed
@wolfiesch wolfiesch deleted the wolfxl-cli-peek branch April 19, 2026 12:32
wolfiesch added a commit that referenced this pull request Apr 28, 2026
… §6 unit tests

Single-pass streaming merger with quick_xml::Reader + Writer. State:
BTreeMap<u32, Vec<SheetBlock>> by ECMA position (Vec to handle slot 17's
0..N conditionalFormatting), `replace_names: Vec<&[u8]>` for skip-on-match,
`cf_replace: bool` for the §5.5 special case, `needs_rel_ns: bool` for
the §8 risk #4 namespace injection.

Edge cases handled (RFC §8):
- risk #1 (extlst byte-preservation): Event::Start(e.borrow()) walks the
  source byte slice directly — quick-xml does not re-encode attributes.
  Headline test extlst_is_byte_preserved passes; the §8 byte-span fallback
  is unneeded.
- risk #2 (comments/PIs): flow through at source position via the catch-all
  `other` arm of the event match.
- risk #3 (self-closing <worksheet/>): Event::Empty(b"worksheet") path
  expands to <worksheet>...</worksheet> and flushes all pending blocks.
- risk #4 (xmlns:r missing): ensure_rel_namespace() detects any existing
  prefix bound to the relationships URI; only injects when actually needed.

Tests (20 total, 15 added in this commit):
- RFC §6 #1-#12 (test #1 already present from commit 1):
  insert_hyperlinks_into_minimal_sheet, replace_existing_hyperlinks,
  insert_into_correct_ecma_position, extlst_is_byte_preserved (HEADLINE),
  unknown_element_passthrough, multiple_conditionalformatting_blocks,
  conditionalformatting_replaces_all_existing, block_inserted_when_no_neighbors,
  tableparts_after_extlst_is_wrong_and_we_fix_it,
  large_sheet_streaming_memory_bounded,
  byte_identical_when_block_already_present_and_unchanged.
- Risk-fallback coverage: comments_pass_through_at_source_position,
  self_closing_root_expands_and_flushes, rel_namespace_injected_when_missing,
  rel_namespace_not_duplicated_when_already_present.

RFC-011 commit 2 of 6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
wolfiesch added a commit that referenced this pull request Apr 28, 2026
Adds the pure-Rust sheet-copy planner per RFC-035 §4.2 / §5. The planner
takes the source sheet's parsed rels, source workbook.xml, pre-loaded
ZIP parts, a shared PartIdAllocator, and the workbook-wide table-name
set; it returns SheetCopyMutations describing every byte the patcher
must add or rewrite to materialize the copy.

Implementation lives in crates/wolfxl-structural/src/sheet_copy.rs
(PyO3-free, so the 17 unit tests run via cargo test -p wolfxl-structural
without hitting the cdylib's macOS-arm64 link issue). The spec-mandated
location src/wolfxl/sheet_copy.rs is a thin re-export.

Highlights per the RFC:
* Single rid_remap pass over the cloned sheet XML (§8 risk #4 mitigation).
* Image aliasing for drawings -> images (§5.3 / §8 risk #2): drawing rels
  file is cloned with fresh rIds but image targets stay verbatim.
* Table name dedup uses f"{base}_{N}" starting at N=2 (OQ-b default).
* Sheet-scoped defined-names cloned with re-pointed localSheetId
  (OQ-c default).
* Validation: errors on duplicate dst_title, missing src_title, missing
  source bytes.
* Allocator-driven table/comments/vmlDrawing/drawing/sheet suffixes
  ensure no collisions with parallel subsystems running in the same save.

17 unit tests cover: basic clone, single-table dedup, multi-table dedup,
external hyperlink alias, comments+VML clone, drawing+image alias,
sheet-scoped defined-name clone, no defined-names case, three validation
errors, single-pass rid remap, dedup helper, placeholder rid consistency,
allocator advance across calls, missing-defined-names path, unknown rel
type aliasing.

cargo test -p wolfxl-structural: 99 -> 116 (+17). Workspace total green.
Pytest: 164 passed + 3 skipped (matches baseline).

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
wolfiesch added a commit that referenced this pull request Apr 28, 2026
…Pod-δ

Pod-δ closed bugs #1 (test_i), #2 (test_j), #3 (test_k), #5 (test_q)
from the "RFC-035 cross-RFC composition gaps" section. Move them to a
new "Fixed in 1.1 (Sprint Ζ Pod-δ)" subsection (closed bullets).

Bugs #4 (self-closing <sheets/>) and #6 (CDATA/PI fakeout) remain
deferred to 1.2 — both are only reachable through synthesized fixtures
or naive-splice fakeouts, no real Excel-emitted workbook hits them.

Also: close the stale "Sheet/column/row structural mutations | Raises"
row in the T1.5 table — RFC-030/031/034/036 all shipped in 1.1.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
wolfiesch added a commit that referenced this pull request Apr 28, 2026
Adds a §8.5 "Sprint Θ deliverables (1.2)" subsection to RFC-035
capturing the five Sprint Θ pods landing in parallel:

- Pod-A: bug #4 (test_p_self_closing_sheets_block) — Rust-level
  splice unit test for self-closing <sheets/>.
- Pod-B: bug #6 (test_r_cdata_pi_fuzz_fakeout) — XML-aware splice
  replacing the byte-level locator.
- Pod-C1: write-mode copy_worksheet (lifts §3 OQ-a NotImplementedError).
- Pod-C2: image deep-clone (CopyOptions(deep_clone_images=True);
  additive, default False to preserve 1.1 byte-stability).
- Pod-C3: calcChain rebuild (replaces "Excel rebuilds on next open"
  with a wolfxl-side rebuild post-Phase 2.7).

Also updates the Status header from "Shipped" to "Shipped (1.1
modify-mode core; 1.2 Sprint Θ closes follow-ups — see §8.5)".

TBD markers placeholder per-pod commit SHAs to be filled in by the
final Sprint Θ integration pass once Pods A/B/C land.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
wolfiesch added a commit that referenced this pull request Apr 28, 2026
…ing <sheets/>

Sprint Θ Pod-A — adds an opt-in loader fallback that unblocks workbooks
whose xl/workbook.xml has an empty (self-closing) <sheets/> block while
the workbook rels graph still references one or more worksheet parts.

Surface:
- wolfxl.load_workbook(path, ..., permissive=True)
- Workbook._from_reader / _from_patcher gain a permissive kwarg
- CalamineStyledBook.open and XlsxPatcher.open accept a permissive flag

Behavior when permissive=True AND the parsed <sheets> block is empty:
- Each worksheet relationship target in xl/_rels/workbook.xml.rels is
  registered under a synthesized title (Sheet1, Sheet2, ...).
- The in-memory xl/workbook.xml is rewritten to expose the synthesized
  <sheet> entries so downstream phases (Phase 2.7 splice, defined-names
  merger, ...) see a well-formed workbook. The rewrite lives in a
  load-time-seeded file_patches map drained by do_save.

Default is False; well-formed inputs are unaffected.

Closes RFC-035 cross-RFC composition bug #4 — flips
test_copy_worksheet_modify.py::test_p_self_closing_sheets_block from
xfail → PASS end-to-end (load → copy_worksheet → save → reload via
openpyxl). Full tests/test_copy_worksheet_modify.py: 18 passed, 1 xfail
(unrelated bug #6). KNOWN_GAPS.md updated to move bug #4 to the
"Fixed in 1.2 (Sprint Θ Pod-A)" section.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
wolfiesch added a commit that referenced this pull request Apr 28, 2026
… mode

Closes RFC-035 KNOWN_GAPS bug #4 (test_p_self_closing_sheets_block).

When permissive=True is passed to load_workbook, the loader handles
self-closing <sheets/> workbook.xml fixtures by scanning the rels graph
for worksheet targets and synthesizing sheet entries (Sheet1, Sheet2, ...)
backed by a load-time workbook.xml rewrite that re-populates the <sheets>
block.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants