feat: wolfxl-core + wolfxl-cli 0.4.0 + PyO3 0.28 (Python 3.14 support) by wolfiesch · Pull Request #4 · SynthGL/wolfxl

wolfiesch · 2026-04-19T09:20:49Z

Summary

Lands the Rust-side work for the wolfxl-cli-peek sprint: hybrid Cargo workspace, two new crates ready for crates.io (wolfxl-core, wolfxl-cli), and the PyO3 0.28 bump that unblocks Python 3.14.

wolfxl-core 0.4.0 - pure-Rust xlsx reader with format-aware cell rendering. No PyO3 coupling. Publishable for third-party Rust consumers.
wolfxl-cli 0.4.0 - installs the wolfxl binary. wolfxl peek <file> renders a styled box preview by default and text / csv / json exports tuned for piping into agent or shell pipelines (integer thousand-grouping, two-decimal floats, ISO dates, RFC 4180 CSV quoting including embedded \r/\n, stable JSON shape {sheet, rows, columns, headers, data}).
PyO3 0.24 -> 0.28 - Python 3.14 support. All 611 pytest tests pass on 3.12 and 3.14. Public Python API unchanged.

What's in here

Eight commits since main:

b9b6a1c refactor: convert wolfxl to hybrid Cargo workspace
e27d090 feat(core): scaffold wolfxl-core crate for CLI consumption
c32f181 feat(cli): wolfxl peek styled exports for text/csv/json
d987410 chore(deps): bump pyo3 0.24 -> 0.28 for Python 3.14 support
72910cf merge: pyo3 0.28 bump for Python 3.14 support
c375367 fix: address Codex pre-publish review findings
f0f19f2 docs: 0.4.0 CHANGELOG entry + per-crate READMEs
894aa79 docs: scrub explicit prior-tool refs from public copy

Pre-publish review

External Codex review (gpt-5.3-codex) flagged five CLI rendering bugs (uppercase True / False instead of lowercase, missing ERROR: prefix on Excel error sentinels, floats not forced to two decimals, CSV not quoting on \r, header row int-grouping leaking into header text) and one currency carry bug in wolfxl-core (format_currency(1.995, 2) returned \"\$1.100\" instead of \"\$2.00\" because trunc() and fract() were rounded separately). All addressed in c375367. Render contract is locked in by 11 golden integration tests + 6 unit tests in crates/wolfxl-cli.

Test plan

cargo test -p wolfxl-core -p wolfxl-cli - 18 tests pass (7 core + 11 CLI integration + 6 CLI unit on a separate target)
cargo publish --dry-run -p wolfxl-core - packages cleanly
cargo publish --dry-run -p wolfxl-cli - fails as expected (depends on unpublished wolfxl-core; resolves once core is on crates.io)
uv run --with pytest --with openpyxl --with formulas pytest -q on Python 3.14 - 611 passed, 1 skipped

Follow-up (not blocking publish)

Port xl/styles.xml cellXfs walker into wolfxl-core so number-format detection works on openpyxl-generated files (today they fall back to None).
Migrate Bound::downcast -> cast (22 deprecation warnings left from PyO3 bump).

🤖 Generated with Claude Code

…PI shape) Phase 0 — openpyxl parity infrastructure: - New tests/parity/ harness with HARD/SOFT/INFO scoring, ratchet enforcement, and 15 SynthGL xlsx fixtures across 7 archetypes. - openpyxl_surface.py contract enumerating every openpyxl symbol SynthGL imports (~30 entries across 8 categories). - KNOWN_GAPS.md tracks every wolfxl gap with a phase owner and fix sketch. - CI parity job runs the harness on every push (xdist-parallel, JUnit XML artifact). ExcelBench fidelity job stubbed out behind feature flag. Phase 0 cleanup — closed 12 surface gaps via Python-only changes: - New python/wolfxl/utils/{cell,datetime,numbers}.py expose the seven openpyxl utility symbols SynthGL depends on (get_column_letter, column_index_from_string, range_boundaries, coordinate_to_tuple, is_date_format, from_excel, CALENDAR_WINDOWS_1900). Bug-for-bug compatible with openpyxl 3.1.x — including the 1900 leap-year correction in from_excel and the [locale]-vs-[h] distinction in is_date_format. - wolfxl.Worksheet and wolfxl.Cell re-exported at top level for type-hint imports. - Worksheet.max_row / max_column shipped as @Property aliases over the existing _max_row() / _max_col() methods. - Worksheet.merged_cells shipped as a _MergedCellsProxy with an openpyxl- shape .ranges iterable, backed by the existing Rust read_merged_ranges call in read mode and the in-memory _merged_ranges set in write mode (closes the "merged_cells empty on read" per-fixture gap as a side-effect). Result: 93 passed, 4 xfailed (down from 55 xfailed). The four remaining xfails are wolfxl read-side bugs (date/datetime narrowing, data_only=True, number_format escape stripping, max_row off-by-one) — documented in KNOWN_GAPS.md with fix sketches pointing at the right wolfxl source files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Ships the 0.3.2 CHANGELOG entry already drafted in the working tree: - Worksheet.iter_cell_records() / cell_records() return populated cells as dicts with values, formulas, coordinates, and compact format metadata. Per-call knobs: include_empty, include_format, include_formula_blanks, include_coordinate, data_only. - calculate_dimension() merges stale worksheet dimension tags with parsed value/formula storage and preserves offset used ranges (e.g. C4:C4). - max_row / max_column inherit the dimension hardening while preserving openpyxl bottom/right edge semantics. - Write mode dimension reporting now folds in buffered append()/write_rows() data before save. Closes the read-parity workstream on main ahead of the workspace refactor. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds [workspace] alongside the existing [package] in the root Cargo.toml so new sibling crates (wolfxl-core, wolfxl-cli) can land under crates/ without disturbing the existing wolfxl cdylib at the root. Why hybrid instead of moving wolfxl under crates/wolfxl/: maturin 1.12 reads the project-root Cargo.toml before consulting [tool.maturin].manifest-path, so a virtual workspace at root breaks both `maturin develop` and the PEP517 path used by `pip install -e .` / `uv pip install` (PyO3/maturin#1972). Keeping the wolfxl package definition at the root preserves the existing install UX while still giving us a workspace for the upcoming sibling crates. Inheritance setup: - [workspace.package] holds edition/license/repository, inherited by [package] via .workspace = true. - [workspace.dependencies] declares the shared deps so wolfxl-core / wolfxl-cli can pin once and inherit. wolfxl's own deps switch to .workspace = true except pyo3 (cdylib-only). members = [] for now; the wolfxl root package is implicitly a member. Sibling crates get added to members in the next branches. Verified: cargo build green, maturin develop green, pytest 611 passed (514 in tests/, 97 in tests/parity, 1 skipped). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds crates/wolfxl-core as a no-PyO3 rlib that the upcoming wolfxl-cli will depend on. Public API: - Workbook::open(path), workbook.sheet_names(), workbook.sheet(name), workbook.first_sheet() - Sheet::dimensions(), Sheet::rows(), Sheet::row(i), Sheet::headers() - Cell { value: CellValue, number_format: Option<String> } - CellValue: Empty | String | Int | Float | Bool | Date | DateTime | Time | Error - format::classify_format(fmt) -> FormatCategory - format::format_cell(&Cell) -> String (currency/percent/date-aware rendering) Implementation notes: - Reads via calamine-styles' Reader trait. Eager load per sheet (peek doesn't need streaming and the simpler API is worth it for now). - Number-format resolution is deferred: calamine-styles' Style does not resolve cellXfs by default, and reimplementing the PyO3 backend's xl/styles walker is step-3 territory. Cell.number_format is shaped right and stays None until then. format::format_cell is testable today against synthetic cells with explicit number_format set. - chrono Date / DateTime conversion follows openpyxl's Excel-serial epoch (1899-12-30) and the 1900 leap-year correction. Tests: 5 unit (format::tests) + 1 integration against the spreadsheet-peek sample workbook (3 sheets, asserts non-empty reads + format API contract). Existing PyO3 layer untouched; pytest still 103/103 on test_wolfxl_compat. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds the wolfxl-cli crate with a `peek` subcommand that mirrors xleak's flag surface (-s/--sheet, -n/--max-rows, -w/--max-width, -e/--export csv|json|text|box). Powered by wolfxl-core's Workbook/Sheet API; no PyO3 dependency. Parity status (verified by golden tests in tests/cli.rs against the two spreadsheet-peek fixtures, captured from xleak directly): - text export: byte-identical - csv export: byte-identical - json export: byte-identical (custom shape: pretty top-level, inline data rows with space-after-comma to match xleak's token footprint) - box export: structurally identical (banner is wolfxl-branded; smoke-tested rather than goldened) xleak's `-n` is box-mode-only; text/csv/json emit the full sheet so spreadsheet-peek's measure_tokens.py can pipe through `head` exactly as before. Matches xleak benchmark methodology. Number-format-aware styled rendering deferred to a follow-up: the current calamine-styles release returns 0x0 style ranges for openpyxl-generated workbooks, and the existing PyO3 backend's xl/styles.xml walker hasn't been ported to wolfxl-core yet. CLI uses heuristic int-thousands grouping instead, which is what xleak does too. Lays Day 3 + Day 4 of the migration sprint. Next: publish wolfxl-cli 0.4.0 to crates.io and migrate spreadsheet-peek to invoke `wolfxl peek`.

PyO3 0.24's max supported interpreter was 3.13. CPython 3.14 is now shipping in stable channels (homebrew defaults to 3.14.3) so the extension fails to build under modern environments. Migration deltas: - `PyObject` was removed as a high-level alias. Restored as a local `type PyObject = Py<PyAny>;` alias in the three files that referenced it, keeping signatures stable instead of churning every PyResult<...> return type. - `to_object(py)` was removed with the `ToPyObject` trait in 0.25. Switched the 9 call sites in calamine_styled_backend.rs to `IntoPyObjectExt::into_py_any(py)`, which is the canonical PyResult-returning replacement. 22 deprecation warnings remain for `Bound::downcast` → `Bound::cast`; these still compile and run correctly. Leaving for a focused cleanup PR so this commit stays auditably scoped to "make 3.14 work." Verification: - cargo check -p wolfxl: clean (warnings only) - maturin develop on CPython 3.12: 611 passed, 1 skipped - maturin develop on CPython 3.14: 602 passed, 10 skipped (extra skips are 3.14-conditional fixtures, not regressions) - maturin build --release on CPython 3.14: wheel produced Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

# Conflicts: # Cargo.lock

CLI parity (crates/wolfxl-cli/src/render.rs): - Bool: lowercase `true`/`false` to match xleak's `bool::Display` - Error: prefix `ERROR: ` to match xleak's `CellValue::Error` Display - Float: force `{:.0}` for integer-valued and `{:.2}` for non-integer, then thousand-separate the integer part — destructive precision-wise but matches xleak text/csv/box. JSON keeps full precision via serde_json. - CSV: quote on `\r` too (RFC 4180; xleak misses this and is broken on embedded line breaks) - text/csv: emit header row from `sheet.headers()` so int-typed headers don't get grouped (e.g. "1000" stays "1000", not "1,000") wolfxl-core: - format::format_currency: round on a single scaled integer so 1.995 → $2.00 (was $1.100); 999.999 → $1,000.00 - Cargo.toml: drop unused `anyhow`, `indexmap`, `serde` deps; trim description to claimed scope; add `keywords`/`categories` for crates.io discoverability - lib.rs: rewrite crate doc comment to scope realistically (no schema infer yet), add a runnable example Pre-publish hygiene: - wolfxl-cli/Cargo.toml: trim description, add crates.io keywords/categories - pyproject.toml + Cargo.toml: bump to 0.4.0 (parity with workspace) Spot-check across 27 in-repo xlsx fixtures × 3 export formats: 77/81 byte-identical to xleak 0.2.5. The 4 remaining diffs are cases where wolfxl is more correct than xleak (CRLF preservation in calamine upstream; CSV header quoting on embedded commas). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- CHANGELOG: 0.4.0 covers wolfxl-core/wolfxl-cli scaffolding, PyO3 0.28 bump, currency carry fix. - wolfxl-core/README.md: scope honestly (no schema infer yet, no styles.xml cellXfs walker yet), include runnable example. - wolfxl-cli/README.md: install, peek subcommand surface, parity positioning ("77/81 in-repo fixtures byte-equal to xleak 0.2.5; the 4 diffs are cases where wolfxl is more correct"). - Both Cargo.toml files now reference README so crates.io renders them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Introduces a Rust Cargo workspace with two new publishable crates (wolfxl-core, wolfxl-cli) and expands the Python-side parity/compatibility surface (including data_only, merged-cell/dimension behavior, and bulk cell_records) alongside the PyO3 0.28 upgrade.

Changes:

Add wolfxl-core (pure Rust reader + format-aware rendering) and wolfxl-cli (new wolfxl peek CLI with golden tests).
Add an openpyxl-parity harness + fixture corpus and wire it into CI as a dedicated job.
Update Python API surface to support data_only, add cell_records/dimension helpers, and bump package/workspace versions to 0.4.0.

Reviewed changes

Copilot reviewed 53 out of 72 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
uv.lock	Bumps editable `wolfxl` version to 0.4.0.
pyproject.toml	Updates Python package version to 0.4.0.
Cargo.toml	Converts repo to workspace; bumps root crate + PyO3 to 0.28.
CHANGELOG.md	Adds 0.4.0 release notes.
README.md	Documents `cell_records()` API and capability table entry.
docs/api/worksheet.md	Documents `iter_cell_records`/`cell_records` + `calculate_dimension`.
.github/workflows/ci.yml	Adds parity job (pytest on `tests/parity/`).
src/lib.rs	Adds `PyObject` alias; minor formatting; module order adjustment.
src/util.rs	Adds `PyObject` alias.
src/rust_xlsxwriter_backend.rs	Formatting/error-path refactors (no functional intent stated).
python/wolfxl/init.py	Re-exports `Cell`/`Worksheet`; wires `data_only` into workbook open paths; updates docstring.
python/wolfxl/_workbook.py	Plumbs `data_only` into read/modify mode initialization.
python/wolfxl/_worksheet.py	Adds merged-cells proxy, `cell_records` APIs, and robust `calculate_dimension`/dimension reading; passes `data_only` into Rust reads.
python/wolfxl/_cell.py	Passes `data_only` into Rust value reads; normalizes date payloads to midnight `datetime`.
python/wolfxl/utils/init.py	Exposes openpyxl-compatible utils surface.
python/wolfxl/utils/cell.py	Implements openpyxl-shape coordinate/range helpers.
python/wolfxl/utils/datetime.py	Implements openpyxl-compatible `from_excel` and epochs.
python/wolfxl/utils/numbers.py	Implements openpyxl-compatible `is_date_format`.
scripts/copy_synthgl_fixtures.py	Adds script to refresh SynthGL snapshot fixtures.
tests/test_wolfxl_compat.py	Adds coverage for `cell_records`, stale dimensions, `data_only`, merged-cell format behavior, and date handling.
tests/parity/init.py	Introduces parity harness package marker.
tests/parity/_scoring.py	Adds tiered mismatch scoring + fuzzy equality.
tests/parity/conftest.py	Adds parity fixture discovery (snapshot + optional env var corpus).
tests/parity/KNOWN_GAPS.md	Documents parity gap contract/rollout phases.
tests/parity/openpyxl_surface.py	Defines openpyxl surface contract entries for smoke/parity tests.
tests/parity/ratchet.json	Adds initial soft-tier baseline metadata + per-fixture entries.
tests/parity/test_surface_smoke.py	Adds importability smoke checks for supported/gap symbols.
tests/parity/test_utils_parity.py	Adds bug-for-bug parity tests for openpyxl utility functions.
tests/parity/test_read_parity.py	Adds per-fixture read parity + ratchet enforcement.
tests/parity/test_write_parity.py	Adds oracle-based write roundtrip tests (wolfxl → openpyxl).
tests/parity/fixtures/README.md	Documents fixture corpus layout and refresh process.
tests/parity/fixtures/synthgl_snapshot/aging/customer_ar_aging_buckets.xlsx	Adds committed parity fixture.
tests/parity/fixtures/synthgl_snapshot/aging/holdout_ap_aging_buckets.xlsx	Adds committed parity fixture.
tests/parity/fixtures/synthgl_snapshot/cross_ref/buried_header_entity_rollup.xlsx	Adds committed parity fixture.
tests/parity/fixtures/synthgl_snapshot/key_value/workbook_report_metadata_block.xlsx	Adds committed parity fixture.
tests/parity/fixtures/synthgl_snapshot/rollforward/holdout_fixed_asset_movements.xlsx	Adds committed parity fixture.
tests/parity/fixtures/synthgl_snapshot/stress/headcount_consolidation_looks_like_a5.xlsx	Adds committed parity fixture.
tests/parity/fixtures/synthgl_snapshot/stress/mixed_blank_semantics.xlsx	Adds committed parity fixture.
tests/parity/fixtures/synthgl_snapshot/stress/non_financial_scorecard_looks_like_a4.xlsx	Adds committed parity fixture.
tests/parity/fixtures/synthgl_snapshot/time_series/holdout_quarterly_ops_statement.xlsx	Adds committed parity fixture.
tests/parity/fixtures/synthgl_snapshot/time_series/sec_in_millions_except_per_share.xlsx	Adds committed parity fixture.
crates/wolfxl-core/Cargo.toml	Adds new crate manifest.
crates/wolfxl-core/README.md	Adds crate README.
crates/wolfxl-core/src/lib.rs	Adds crate public API + docs.
crates/wolfxl-core/src/error.rs	Adds crate error types.
crates/wolfxl-core/src/cell.rs	Adds `Cell`/`CellValue` model types.
crates/wolfxl-core/src/format.rs	Adds format classification + rendering + tests.
crates/wolfxl-core/src/sheet.rs	Adds sheet loading + value/style extraction.
crates/wolfxl-core/src/workbook.rs	Adds workbook open/sheet access.
crates/wolfxl-core/tests/integration_real_xlsx.rs	Adds optional integration test against sibling repo fixture.
crates/wolfxl-cli/Cargo.toml	Adds new CLI crate manifest + deps.
crates/wolfxl-cli/README.md	Adds CLI usage docs.
crates/wolfxl-cli/src/main.rs	Adds `wolfxl` CLI entrypoint + `peek` subcommand parsing.
crates/wolfxl-cli/src/commands/mod.rs	Adds commands module.
crates/wolfxl-cli/src/commands/peek.rs	Implements `peek` command dispatch + options.
crates/wolfxl-cli/tests/cli.rs	Adds golden-based CLI integration tests.
crates/wolfxl-cli/tests/golden/sample-financials.text	Adds golden output.
crates/wolfxl-cli/tests/golden/sample-financials.csv	Adds golden output.
crates/wolfxl-cli/tests/golden/sample-financials.json	Adds golden output.
crates/wolfxl-cli/tests/golden/wide-table.text	Adds golden output.
crates/wolfxl-cli/tests/golden/wide-table.csv	Adds golden output.
crates/wolfxl-cli/tests/golden/wide-table.json	Adds golden output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-19T09:26:08Z

+from dataclasses import dataclass, field
+from enum import StrEnum
+
+
+class SurfaceCategory(StrEnum):
+    """Logical grouping of the openpyxl API surface."""


The workflow runs pytest tests/ on Python 3.9, but this module uses StrEnum (3.11+) and PEP 604 unions (str | None, 3.10+). This will fail to import under 3.9. Either adjust the CI matrix / requires-python to match the minimum supported version, or replace StrEnum/| with 3.9-compatible constructs (e.g. Enum + Optional[...] / Union[...] or a backport).

Fixed in 799e8fb: replaced StrEnum with (str, Enum) mixin (3.9-compatible). PEP 604 unions only appear in annotations, which from __future__ import annotations keeps lazy-evaluated, so they're safe under 3.9 once StrEnum is gone.

Copilot · 2026-04-19T09:26:08Z

+from dataclasses import dataclass, field
+from enum import StrEnum
+from typing import Any
+
+
+class Tier(StrEnum):
+    HARD = "hard"
+    SOFT = "soft"
+    INFO = "info"


This file uses StrEnum (Python 3.11+) and will fail to import under Python 3.9, but CI runs the test suite on 3.9. Consider switching to Enum (with string values) or gating these parity tests to only run on Python versions that support StrEnum.

Fixed in 799e8fb: switched to (str, Enum) mixin so the import works on Python 3.9. Verified locally with ruff check; CI matrix (3.9 + 3.12) will exercise it.

Copilot · 2026-04-19T09:26:08Z

+def _all_xlsx_fixtures() -> list[Path]:
+    seen: dict[str, Path] = {}
+    for source in (_discover_xlsx(SNAPSHOT_DIR), _live_synthgl_fixtures()):
+        for path in source:
+            # Dedupe by (name, size) so the env-var corpus doesn't duplicate
+            # the committed snapshot.
+            key = f"{path.name}:{path.stat().st_size}"
+            seen.setdefault(key, path)
+    return list(seen.values())


_all_xlsx_fixtures dedupes by f"{path.name}:{size}", which can accidentally collapse distinct fixtures that share a filename and size in different directories/categories. That will silently reduce parity coverage. Use a stable relative path (e.g. relative to SNAPSHOT_DIR/root), or include the parent path segments in the dedupe key.

Fixed in 799e8fb: dedupe key is now {relative_path}:{size}, where the relative path is computed against the source root (snapshot dir or live SynthGL dir). Two summary.xlsx files in different snapshot subfolders no longer collapse.

Copilot · 2026-04-19T09:26:08Z

+* ``HARD`` - any mismatch fails CI.
+* ``SOFT`` - tracked in ``ratchet.json`` (never allowed to decrease).
+* ``INFO`` - reported only, not gated.
+


The SOFT-tier description is inconsistent with the ratchet enforcement in test_read_parity.py, which fails when observed SOFT mismatches are greater than the baseline (i.e. SOFT must not increase). Update this docstring/comment to match the actual contract so future changes don’t invert the intended ratchet behavior.

Fixed in 799e8fb: SOFT docstring now reads "count is tracked in ratchet.json; never allowed to increase over the committed baseline (regressions block CI)." Matches the > check in test_zz_ratchet_soft_failures_nondecreasing.

Copilot · 2026-04-19T09:26:09Z

+    let days = serial.trunc() as i64;
+    let frac = serial.fract();
+    // Excel treats 1900-01-00 as serial 0 and has the famous 1900 leap-year
+    // bug; matching openpyxl's correction.
+    let base = NaiveDate::from_ymd_opt(1899, 12, 30).expect("static date");
+    let date = base
+        .checked_add_days(chrono::Days::new(days as u64))
+        .unwrap_or(base);
+    if frac.abs() < f64::EPSILON {
+        return CellValue::Date(date);
+    }


excel_serial_to_datetime claims to match Excel/openpyxl’s 1900 leap-year correction, but it currently just adds days to the 1899-12-30 epoch with no adjustment for serials in (0, 60). That will produce off-by-one dates for early-1900 serials. Also, casting a negative days to u64 can wrap and yield incorrect results. Consider implementing the same (0 < serial < 60) adjustment used elsewhere and handle negative serials explicitly.

Suggested change

let days = serial.trunc() as i64;

let frac = serial.fract();

// Excel treats 1900-01-00 as serial 0 and has the famous 1900 leap-year

// bug; matching openpyxl's correction.

let base = NaiveDate::from_ymd_opt(1899, 12, 30).expect("static date");

let date = base

.checked_add_days(chrono::Days::new(days as u64))

.unwrap_or(base);

if frac.abs() < f64::EPSILON {

return CellValue::Date(date);

}

let mut days = serial.floor() as i64;

let frac = serial - (days as f64);

// Excel treats 1900-01-00 as serial 0 and has the famous 1900 leap-year

// bug; matching openpyxl's correction for serials before the fake

// 1900-02-29 entry.

if serial > 0.0 && serial < 60.0 {

days += 1;

}

let base = NaiveDate::from_ymd_opt(1899, 12, 30).expect("static date");

let date = if days >= 0 {

base.checked_add_days(chrono::Days::new(days as u64))

} else {

base.checked_sub_days(chrono::Days::new((-days) as u64))

}

.unwrap_or(base);

if frac.abs() < f64::EPSILON {

return CellValue::Date(date);

}

Fixed in 799e8fb: added the openpyxl +1 correction for serials in (0, 60). Serial 1 now maps to 1900-01-01 and serial 59 to 1900-02-28; serial 61 stays 1900-03-01 (skipping the fake serial-60 leap day). Negative serials now use checked_sub_days instead of letting the as u64 cast wrap. New tests excel_serial_matches_openpyxl_for_pre_leap_serials and excel_serial_negative_does_not_wrap lock the boundaries.

Copilot · 2026-04-19T09:26:09Z

+def test_ratchet_soft_failures_nondecreasing() -> None:
+    """SOFT mismatch count must never rise above the committed baseline.
+
+    This runs after all ``test_read_parity`` cases collected observations,
+    so pytest's ordering (tests collected alphabetically) means this test's
+    ``test_`` name must sort AFTER the read tests. It does:
+    ``test_ratchet_...`` > ``test_read_parity``.
+    """


test_ratchet_soft_failures_nondecreasing relies on pytest collecting tests alphabetically, but test_ratchet_... sorts before test_read_parity, so _OBSERVATIONS will still be empty and the ratchet check will always skip. Rename the ratchet test to sort after the read tests (e.g. prefix with zz_) or enforce this check via a session-level hook/fixture finalizer that runs after all parametrized cases.

Fixed in 799e8fb: renamed to test_zz_ratchet_soft_failures_nondecreasing. The zz_ prefix sorts last under any plain alphabetical ordering, so _OBSERVATIONS is populated by the time the ratchet check runs. Robust to pytest-xdist / pytest-randomly reordering without needing a session finalizer.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f0f19f2c21

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-19T09:28:48Z

+                let has_formula_backing_entry = value.is_some();
+                let should_emit_formula = formula.is_some()
+                    && !data_only
+                    && (include_formula_blanks || has_formula_backing_entry);


Honor include_formula_blanks for uncached formulas

read_sheet_records() currently treats any present cell payload as a backing value (value.is_some()), so formula cells whose cached value is just the formula text still count as “backed” and are emitted even when include_formula_blanks=False. This makes the flag ineffective for template-only formulas in the common data_only=False path and returns records callers explicitly asked to suppress.

Useful? React with 👍 / 👎.

Acknowledged - tracked as a follow-up. include_formula_blanks for uncached formulas is a real semantics gap, but it's in the legacy single-crate src/calamine_styled_backend.rs that this PR doesn't touch. The next sprint's wolfxl-core extraction (planned ~0.5.0) will need to either honor the flag at the new Sheet::load boundary or document the divergence. Filing as an issue rather than expanding the diff here.

Tracked as #5 with the acceptance criteria spelled out, so it doesn't get lost when wolfxl-core grows the options surface in 0.5.0.

chatgpt-codex-connector · 2026-04-19T09:28:48Z

+    let has_date = lower.contains('y') || lower.contains('d') || lower.contains("mmm");
+    let has_time = lower.contains('h') || lower.contains(":mm") || lower.contains(':');


Strip bracketed tags before date-token classification

The date/time heuristic checks raw substrings (contains('d'), etc.) on the full format code, so non-date numeric formats with color tags like #,##0_);[Red](#,##0) are misclassified as Date because of the d in Red. This will mislabel numeric columns in any workflow consuming classify_format output.

Useful? React with 👍 / 👎.

Fixed in 799e8fb: classify_format now strips [...] segments before the date/time substring scan via a new strip_bracketed_tags helper. The currency check still runs on raw fmt to preserve [$-409] detection. Locked in by bracketed_tags_dont_trigger_date_or_time_heuristic covering [Red], [Red]-, and [h]:mm:ss (the legitimate elapsed-hour case still classifies as Time).

The CLI's public-facing copy (READMEs, CHANGELOG, doc comments, test names) leaned on a "byte-parity port of <prior-tool>" framing. That framing leaks the migration motivation into the published artifact and positions the crate as a derivative rather than as its own thing. Rewrite the affected text to describe what the renderers do (integer thousand-grouping, two-decimal floats, ISO dates, RFC 4180 CSV quoting, stable JSON shape) and why the contract is locked in (golden tests in tests/cli.rs back the token-cost figures the spreadsheet-peek skill publishes). The WHY-context behind each format choice survives; the prior tool's name doesn't appear in any committed artifact. Tests stay green: 6 unit + 11 integration tests pass after the rename trim_float_matches_xleak_display -> trim_float_matches_format_contract and the `*_matches_xleak_*` -> `*_matches_golden_*` renames. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot: - _scoring.py, openpyxl_surface.py: replace StrEnum (3.11+) with (str, Enum) mixin so the parity harness imports on Python 3.9-3.10 (CI matrix includes 3.9). - conftest.py: dedupe fixtures on relative-path + size, not just filename + size; same-named files in different snapshot subfolders no longer silently collapse and shrink coverage. - _scoring.py: doc-clarify the SOFT tier - count is ratcheted, never allowed to *increase* over baseline (regressions block CI). - test_read_parity.py: rename ratchet test to test_zz_ so plain alphabetical collection order keeps it last (depends on per-fixture observations). Codex P2: - format.rs: strip [...] segments before the date/time substring scan. [Red] / [h] etc previously misclassified plain numeric formats like '#,##0_);[Red](#,##0)' as Date. Currency check still runs on raw fmt to preserve [$-409] detection. Locked in by new test. - sheet.rs: Excel 1900 leap-year correction. openpyxl uses base 1899-12-30 to dodge the fake serial-60 bug for serials >= 60, but that leaves serials 1..59 off by one day. Added +1 correction so serial 1 -> 1900-01-01 (matches openpyxl). Also switched negative serials to checked_sub_days to avoid u64 wrap producing far-future dates. Two new tests pin the boundaries (1, 59, 61, 44197, -100). Lint: - python/wolfxl/utils/cell.py: ruff UP033 - lru_cache(maxsize=None) -> cache. - test_utils_parity.py: ruff I001 import sort.

wolfiesch · 2026-04-19T09:55:46Z

@codex review — addressed all 8 review findings in 799e8fb. Summary:

6 Copilot findings: StrEnum → (str, Enum) mixin (3.9-compat) in _scoring.py + openpyxl_surface.py; conftest dedupe key now uses relative path (no silent fixture collapse); SOFT docstring corrected to match the ratchet > check; ratchet test renamed test_zz_ for stable post-observation collection order; Excel 1900 leap-year correction (+1 for serials in 0..60) + negative-serial guard in sheet.rs.
1 Codex P2: format.rs strips bracketed tags before date/time substring scan via new strip_bracketed_tags helper; currency check still runs raw to keep [$-409] detection.
1 Codex P2 deferred + tracked as Honor include_formula_blanks for uncached formulas in wolfxl-core #5 (include_formula_blanks for uncached formulas — lives in legacy backend that this PR doesn't touch; will be addressed at the wolfxl-core options surface in 0.5.0).

CI: 9/9 green (lint, parity, 6× pytest matrix on ubuntu/macos/windows × 3.9/3.12).

wolfiesch · 2026-04-19T10:16:10Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 799e8fbf00

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-19T10:22:06Z

+                let has_formula_backing_entry = value.is_some();
+                let should_emit_formula = formula.is_some()
+                    && !data_only
+                    && (include_formula_blanks || has_formula_backing_entry);


Respect include_formula_blanks for empty formula results

When include_formula_blanks=False, template formulas without cached values should be omitted, but has_formula_backing_entry is currently computed as value.is_some(). Inside a rectangular range, calamine returns Some(Data::Empty) for blank cells, so this flag becomes true and should_emit_formula still emits the formula record. In sheets where uncached formulas sit between populated cells, callers cannot suppress these placeholder formulas even though they explicitly opted out.

Useful? React with 👍 / 👎.

Fixed in 69d0ae7.

Root cause was deeper than described: calamine returns Some(Data::String("")) for the empty <v/> element on a formula cell (not Some(Data::Empty) as suggested), and Some(Data::Empty) only for blanks inside the rectangular bounding box. Both forms are semantically "the formula has no cached result."

Fix hoists value_is_formula_placeholder (using data_is_formula_text) and applies it to both has_value and has_formula_backing_entry, so:

include_formula_blanks=false actually suppresses the cell (it no longer falls through has_value via the empty string)

data_only=true still skips it via value_is_uncached_formula

Locked in by a new pytest in tests/test_wolfxl_compat.py::test_cell_records_include_formula_blanks_false_skips_uncached_inside_range that constructs the exact B2 scenario via openpyxl. 506 passed locally.

chatgpt-codex-connector · 2026-04-19T10:22:06Z

+    let secs = (frac * 86_400.0).round() as u32;
+    let h = secs / 3600;
+    let m = (secs % 3600) / 60;
+    let s = secs % 60;
+    let time = NaiveTime::from_hms_opt(h.min(23), m.min(59), s.min(59))


Carry rounded day-fraction overflow before building time

frac * 86_400 is rounded to whole seconds, so fractions very close to 1.0 can become 86400. This path then clamps h with h.min(23) instead of carrying into the next day, producing an incorrect timestamp (for example a near-midnight value can become 23:00:00 on the same date). This creates deterministic off-by-one-day/time errors for valid serials near day boundaries.

Useful? React with 👍 / 👎.

Fixed in 69d0ae7.

Replaced the h.min(23) clamp with a proper day-carry: when (frac * 86_400).round() lands on 86_400 (a full day), advance the date by one and reset secs to 0, so 44197 + 0.99999999 now returns 2021-01-02 00:00:00 instead of 2021-01-01 23:00:00.

Locked in by excel_serial_carries_near_midnight_fraction_to_next_day in crates/wolfxl-core/src/sheet.rs tests, plus a code comment explaining what the prior clamp was doing wrong. All 12 wolfxl-core tests + doctest still pass.

P1 (calamine_styled_backend.rs): include_formula_blanks=False was a no-op for template formulas. Two storage cases both meant "no cached result": - Some(Data::Empty) - blank inside the rectangular bounding box - Some(Data::String("")) - empty <v/> child of a formula cell Old `value.is_some()` flagged both as backing entries, and `has_value` also stayed true for the empty-string case, so the cell either emitted as data_type=formula (when include_formula_blanks=True) or as data_type=string with value="" (when False). Neither matched user intent. Hoist the "looks like the formula text" heuristic out of `value_is_uncached_formula` (data_only-only) into a `value_is_formula_placeholder` that also runs in `data_only=False` mode, then use it to gate both `has_formula_backing_entry` and `has_value`. Result: a formula whose calamine value is Empty, "", or the formula text itself is treated uniformly as uncached - emit when include_formula_blanks=True, suppress when False. Locked in by new test `test_cell_records_include_formula_blanks_false_skips_uncached_inside_range`. P2 (sheet.rs): excel_serial_to_datetime clamped `h.min(23)` instead of carrying when `frac * 86_400` rounded up to exactly 86_400 (e.g. 44197 + 0.99999999). That produced 23:00:00 on the wrong date for fractions arbitrarily close to 1.0. Carry into next day via checked_add_days, drop the defensive clamps (which were hiding the rollover bug). New test `excel_serial_carries_near_midnight_fraction_to_next_day` pins 44197.99999999 -> 2021-01-02 00:00:00. Closes the P2 (now P1) escalation Codex flagged on issue #5.

wolfiesch · 2026-04-19T10:34:15Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 69d0ae7543

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-19T10:41:18Z

+    def max_row(self) -> int:
+        return self._max_row()


Include buffered writes in max_row/max_column

The newly exposed max_row/max_column properties delegate to _max_row()/_max_col(), but in write mode those helpers only inspect _cells and ignore _append_buffer/_bulk_writes. As a result, after append() or write_rows() (before materialization/save), these properties under-report sheet size (often staying at 1), which can cause downstream range calculations and iteration bounds to miss newly added data.

Useful? React with 👍 / 👎.

Fixed in 31c4259.

Extracted _pending_writes_bounds() as the single source of truth over _cells + _append_buffer + _bulk_writes, and folded it into _max_row() / _max_col() so write-mode append() and write_rows() are reflected before save. Read-mode path still calls _read_dimensions() (preserving <dimension>-tag parity with openpyxl — discovered during the refactor that routing everything through read_sheet_bounds breaks tests/parity/test_read_parity.py on workbooks where calamine's parsed range is narrower than the XML tag).

Locked in by tests/test_wolfxl_compat.py::test_max_row_max_column_reflect_pending_writes_in_write_mode — two append() calls of widths 3 and 4 expects max_row == 2, max_column == 4.

chatgpt-codex-connector · 2026-04-19T10:41:18Z

+        if wb._rust_reader is not None:  # noqa: SLF001
+            bounds = wb._rust_reader.read_sheet_bounds(self._title)  # noqa: SLF001
+            if isinstance(bounds, tuple) and len(bounds) == 4:
+                return tuple(int(value) for value in bounds)  # type: ignore[return-value]
+            return None


Account for unsaved edits in calculate_dimension

_read_dimension_bounds() returns Rust-reader bounds whenever _rust_reader exists, but modify mode also has _rust_reader while pending edits live in Python-side buffers/cells. This means calculate_dimension() can return stale pre-edit bounds in the same session after users modify cells, so callers that derive ranges from calculate_dimension() may omit unsaved writes.

Useful? React with 👍 / 👎.

Fixed in 31c4259.

_read_dimension_bounds() now unions on-disk Rust bounds with _pending_writes_bounds() instead of short-circuiting to one or the other. So in modify mode, calculate_dimension() reflects the union of <dimension> extents and pending Python-side cell edits / append buffers / bulk writes.

Locked in by tests/test_wolfxl_compat.py::test_calculate_dimension_includes_modify_mode_pending_edits — opens an A1:B2 workbook in modify mode, writes ws['E10'], asserts calculate_dimension() == 'A1:E10'.

chatgpt-codex-connector · 2026-04-19T10:41:18Z

+/// Excel serial date → chrono. Sub-day fractions become Time; ≥1.0 with no
+/// fractional part becomes Date; otherwise DateTime.
+fn excel_serial_to_datetime(serial: f64) -> CellValue {
+    if serial < 1.0 && serial > 0.0 {


Handle Excel serial 0 as midnight time

The time-only branch excludes serial == 0 by checking serial > 0.0, so a zero serial falls through to the date path and is returned as 1899-12-30 instead of 00:00:00. For time-formatted cells containing midnight, this produces incorrect values in wolfxl-core (and therefore CLI output built on it).

Useful? React with 👍 / 👎.

Fixed in 31c4259.

Loosened the time-only branch from serial > 0.0 to serial >= 0.0, so excel_serial_to_datetime(0.0) now returns Time(0, 0, 0) instead of falling through to Date(1899-12-30). Matches openpyxl's from_excel(0) behavior for time-formatted cells.

Locked in by excel_serial_zero_returns_midnight_time in crates/wolfxl-core/src/sheet.rs.

P1 (python/wolfxl/_worksheet.py:650, comment 3106693767): `max_row` / `max_column` previously delegated to `_max_row()` / `_max_col()`, which in pure write mode looked only at `_cells` and ignored the `_append_buffer` / `_bulk_writes` fast paths. After `ws.append([...])` or `ws.write_rows(...)`, callers got 1 instead of the buffered extent until materialization, so anything deriving ranges from `max_row` / `max_column` (e.g. `iter_rows` bounds) silently missed appended data. P2 (python/wolfxl/_worksheet.py:591, comment 3106693768): `_read_dimension_bounds()` short-circuited to Rust-reader bounds whenever `_rust_reader` existed, which is also the case in modify mode where pending edits live in Python-side buffers. `calculate_dimension()` thus returned the on-disk extents even after `ws['E10'] = 'x'`, and any range derived from it omitted unsaved cells. Fix: - Extract `_pending_writes_bounds()` — single-source-of-truth scan over `_cells` + `_append_buffer` + `_bulk_writes`, returning `(min_row, min_col, max_row, max_col)` or `None`. - `_read_dimension_bounds()` unions Rust bounds with pending writes instead of replacing one with the other. - `_max_row()` / `_max_col()` keep calling `_read_dimensions()` (which honors the on-disk `<dimension>` XML tag — required for openpyxl parity, see `tests/parity/test_read_parity.py` regression caught during refactor) and additionally union with pending writes. P2 (crates/wolfxl-core/src/sheet.rs:122, comment 3106693771): `excel_serial_to_datetime(0.0)` fell through the `serial > 0.0` time guard and returned `Date(1899-12-30)` instead of `Time(00:00:00)`. Time-formatted cells containing midnight were emitted as the Excel epoch date by the CLI. Loosened the guard to `serial >= 0.0` so 0.0 maps to `Time(0, 0, 0)`, matching openpyxl. Lock-in tests: - `test_max_row_max_column_reflect_pending_writes_in_write_mode` — pure write mode, two `append()` calls of widths 3 and 4 → expects `max_row == 2` and `max_column == 4` before save. - `test_calculate_dimension_includes_modify_mode_pending_edits` — modify mode, on-disk `A1:B2` plus pending `ws['E10'] = ...` → expects `calculate_dimension() == 'A1:E10'`. - `excel_serial_zero_returns_midnight_time` — asserts `excel_serial_to_datetime(0.0) == Time(00:00:00)`. 605 Python passed, 11 wolfxl-core unit + 11 wolfxl-cli unit + 1 integration + 1 doctest all green locally.

wolfiesch · 2026-04-19T10:48:42Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 31c4259ce7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-19T10:56:58Z

+        let secs = (serial * 86_400.0).round() as u32;
+        let h = secs / 3600;
+        let m = (secs % 3600) / 60;
+        let s = secs % 60;
+        return NaiveTime::from_hms_opt(h, m, s)
+            .map(CellValue::Time)
+            .unwrap_or_else(|| CellValue::Float(serial));


Handle sub-day serial rounding overflow in time conversion

For serial values in [0, 1), this branch rounds to whole seconds and immediately builds a NaiveTime. Inputs near 1.0 (for example 0.99999999) round to 86400, which makes h == 24, so from_hms_opt fails and the code falls back to CellValue::Float(serial). That silently changes a valid time-like cell into a numeric float instead of a time value, unlike the date+time branch below which already carries >= 86400 seconds safely.

Useful? React with 👍 / 👎.

Fixed in 3667122. The sub-day branch now carries seconds==86_400 down to 0 before calling NaiveTime::from_hms_opt, mirroring the day-fraction carry already present in the date+time branch. Added unit test excel_serial_sub_day_near_midnight_carries_to_zero_time to lock it in. Without the carry, 0.99999999 silently demoted a time-typed cell to CellValue::Float.

chatgpt-codex-connector · 2026-04-19T10:56:59Z

+        if self._workbook._rust_reader is None:  # noqa: SLF001
+            yield from self._iter_cell_records_python(


Avoid stale record reads after unsaved edits in modify mode

This condition routes any worksheet with a Rust reader through the bulk Rust path, which means modify mode (_rust_reader + pending Python-side edits) reads from on-disk data only. As a result, iter_cell_records() / cell_records() can return stale values after in-memory edits (ws['A1']=..., append(), write_rows()) until save() runs, which is inconsistent with the current worksheet state.

Useful? React with 👍 / 👎.

Fixed in 3667122. iter_cell_records now collects a pending overlay (_dirty + _append_buffer + _bulk_writes) and patches Rust-returned records plus yields any pending cells outside the original disk range that fall within the requested bounds. Pure read mode is unchanged - the overlay dict is empty and we skip it. Added test_cell_records_overlays_modify_mode_pending_edits to lock in the modify+overlay behavior.

…de overlay) P2 (crates/wolfxl-core/src/sheet.rs:131, comment 3106711010): For ``serial`` in ``[0, 1)``, the time-only branch rounded ``(serial * 86_400).round()`` to ``86_400`` for inputs near 1.0 (e.g. ``0.99999999``), making ``h == 24`` and ``NaiveTime::from_hms_opt`` return ``None``. The fallback path silently demoted a time-typed cell to ``CellValue::Float(serial)`` — wrong both in shape and meaning. Fix mirrors the day-carry the date+time branch already uses below: when ``secs >= 86_400``, subtract one day's worth so the value lands on ``Time(00:00:00)`` (the "next midnight" of a sub-day-only value, with no date to carry into). P2 (python/wolfxl/_worksheet.py:494, comment 3106711012): ``iter_cell_records()`` routed through the Rust reader whenever ``_rust_reader`` existed. In modify mode that meant on-disk values masked Python-side pending edits — ``ws['A1'] = 'new'`` followed by ``ws.cell_records()`` returned the pre-edit value, so anything ingesting via this API saw stale state until ``save()``. Fix overlays pending edits on top of the Rust record stream: - ``_collect_pending_overlay()`` returns ``{(row, col): value}`` for ``_dirty`` cells + ``_append_buffer`` + ``_bulk_writes``. Empty when nothing is pending — pure read mode pays no extra cost (the hot ``yield from records`` path is unchanged). - For each Rust record, swap in the overlay value (and recompute ``data_type``) when the cell is dirty. - After streaming Rust records, yield overlay entries that fell inside the requested range but weren't returned by Rust (e.g. empty-on-disk cells the user just set). Lock-in tests: - ``excel_serial_sub_day_near_midnight_carries_to_zero_time`` — ``excel_serial_to_datetime(0.99999999) == Time(00:00:00)``. - ``test_cell_records_overlays_modify_mode_pending_edits`` — opens workbook with on-disk ``A1='old-a1'``, ``B2='old-b2'``; in modify mode overwrites A1, adds E10 outside on-disk range; asserts cell_records sees ``A1='new-a1'``, ``B2='old-b2'``, ``E10='added'``. 606 Python passed, 12 wolfxl-core + 11 wolfxl-cli + 1 integration + 1 doctest all green locally.

wolfiesch · 2026-04-19T11:03:23Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3667122670

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-19T11:14:55Z

+        if !data_only {
+            // Check the fast formula map (parsed from worksheet XML in a single pass).


Treat uncached formula placeholders as blank in data_only reads

When data_only=True, this branch skips formula handling and falls through to data_to_py, but formula cells without cached results are represented by calamine as placeholder values (e.g. empty string). That makes read_cell_value (and the bulk readers with the same pattern) return a non-blank value instead of the openpyxl-compatible blank/None result for uncached formulas, which can corrupt downstream ingestion in template-heavy sheets. This file already contains explicit placeholder handling in read_sheet_records, so the data-only readers should apply the same normalization.

Useful? React with 👍 / 👎.

Fixed in 383b986. Added a shared is_uncached_formula_value helper and applied it in read_cell_value, read_sheet_values, and read_sheet_values_plain so all three readers normalize calamine's Some(Data::String("")) / Some(Data::Empty) placeholders to blank/None when data_only=true, matching what read_sheet_records already does. Tests test_read_cell_value_data_only_blanks_uncached_formula and test_iter_rows_values_only_blanks_uncached_formulas lock in the per-cell and bulk paths respectively.

chatgpt-codex-connector · 2026-04-19T11:14:56Z

+            patched["data_type"] = (
+                "blank" if new_value is None else type(new_value).__name__
+            )


Preserve canonical data_type labels in overlay records

The overlay path assigns data_type from Python runtime type names (str, int, bool, etc.), while non-overlay records from Rust use the API’s canonical labels (string, number, boolean, ...). In modify mode with pending edits, this produces mixed schemas in a single cell_records() result and breaks consumers that filter/group by the documented canonical tokens.

Useful? React with 👍 / 👎.

Fixed in 383b986. Added _canonical_data_type helper that mirrors Rust's data_type_name closed set (string / number / boolean / date / datetime / blank). Routed both overlay emission paths (patched records and extra-overlay records) plus the python-side fallback iterator through it. Test test_cell_records_overlay_uses_canonical_data_type_labels covers all four primitive types including the bool-before-int subclass check.

…al labels) P1 (3106730333) — Treat uncached formula placeholders as blank in data_only reads. read_sheet_records already normalizes calamine's Some(Data::String("")) and Some(Data::Empty) placeholders for formula cells with no cached <v/> result; per-cell read_cell_value, read_sheet_values, and read_sheet_values_plain now apply the same check via a shared is_uncached_formula_value helper. Without this, data_only reads on template-heavy sheets returned empty strings or zero-equivalents instead of the openpyxl-compatible blank/None. P2 (3106730334) — Preserve canonical data_type labels in overlay records. Rust's read_sheet_records emits closed-set labels (string / number / boolean / date / datetime / blank). Overlay records previously used Python runtime type names (str / int / bool), producing mixed schemas in modify mode that broke consumers filtering on the documented tokens. Routed both overlay emission paths plus the python-side fallback iterator through a new _canonical_data_type helper that mirrors Rust's data_type_name closed set. Tests added (all green): - test_read_cell_value_data_only_blanks_uncached_formula - test_iter_rows_values_only_blanks_uncached_formulas - test_cell_records_overlay_uses_canonical_data_type_labels Verification: 609 Python passed (was 606, +3 new), 12 wolfxl-core unit + 1 doctest, 11 wolfxl-cli unit + 1 integration, 97 parity passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

wolfiesch · 2026-04-19T11:21:21Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 383b986118

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-19T11:26:23Z

+        cells = self._cells
+        buf = self._append_buffer
+        bulk = self._bulk_writes
+        if not cells and not buf and not bulk:
+            return None


Exclude read-only cached cells from pending bounds

_pending_writes_bounds() treats every entry in _cells as a pending write, but _cells is populated by read access (ws['A1'], non-values_only iteration) even when nothing was modified. Because _max_row, _max_col, and calculate_dimension() now union against this helper, merely touching a far cell can inflate worksheet bounds (for example, reading Z999 makes bounds look like row 999/col 26), which can trigger oversized scans and incorrect range calculations in downstream callers.

Useful? React with 👍 / 👎.

Fixed in 4dea6a7. Switched _pending_writes_bounds to iterate self._dirty (the set of actually-modified cell keys) instead of self._cells (the read cache). Test test_max_row_does_not_inflate_from_read_only_cell_access verifies that reading ws['Z999'].value leaves max_row/max_column/calculate_dimension untouched, then confirms a real assignment grows them as expected.

chatgpt-codex-connector · 2026-04-19T11:26:23Z

+    if isinstance(value, str):
+        return "string"


Preserve formula data_type in overlay record patches

Overlay records in iter_cell_records() rely on _canonical_data_type(), which classifies all strings as "string". This mislabels unsaved formula edits (e.g. values starting with =) as plain strings, so pending formula changes lose data_type="formula" semantics compared with Rust-backed records. Any consumer that filters or counts formula records will miss modified-but-unsaved formulas.

Useful? React with 👍 / 👎.

Fixed in 4dea6a7. _canonical_data_type now classifies string values starting with '=' as 'formula' to match openpyxl's convention and Rust's formula_map_cache path. Test test_cell_records_overlay_labels_pending_formulas_as_formula confirms a pending ws['B2'] = '=SUM(A1:A1)' shows up as data_type='formula' in cell_records().

P1 (3106743052) — Exclude read-only cached cells from pending bounds. `_pending_writes_bounds` previously iterated `self._cells`, which is populated by mere read access (`ws['Z999'].value` materializes a Cell without dirtying it). Because `_max_row`/`_max_col`/`calculate_dimension` union against this helper, reading a far cell silently inflated worksheet bounds and triggered oversized scans in downstream callers. Switched the iteration target to `self._dirty` (the set of actually-modified cell keys), which is the same source `_collect_pending_overlay` already uses. P2 (3106743053) — Preserve formula data_type in overlay record patches. `_canonical_data_type` previously labeled all strings as "string", so a pending edit like `ws["B2"] = "=SUM(A1:A1)"` showed up in `cell_records()` as `data_type="string"` while a saved-and-reloaded copy showed `"formula"` (the Rust formula_map_cache path classifies it as such). Mixed schemas in a single result broke any consumer counting/filtering formula records. The helper now mirrors openpyxl's convention: a string beginning with `=` is `"formula"`. Tests added (both green): - test_max_row_does_not_inflate_from_read_only_cell_access - test_cell_records_overlay_labels_pending_formulas_as_formula Verification: 611 Python passed (was 609, +2 new), 97 parity passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

wolfiesch · 2026-04-19T11:29:27Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4dea6a729e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-19T11:36:51Z

+            patched = dict(record)
+            patched["value"] = new_value
+            patched["data_type"] = _canonical_data_type(new_value)


Recompute formula metadata when overlaying edited records

When cell_records() overlays pending modify-mode edits, it only updates value and data_type on top of the on-disk record. If a user replaces an existing formula cell with a literal (or the reverse), the stale formula field from the original record is left behind, so downstream consumers can see contradictory metadata for the same cell and mis-handle formula detection.

Useful? React with 👍 / 👎.

Fixed in da9431f. The overlay patch now refreshes the formula field instead of leaving the on-disk text behind: a literal-overwrites-formula edit pops the key, and a formula-overwrites-anything edit sets it from the new value (leading = stripped to match the Rust reader's convention). Lock-in tests test_cell_records_overlay_clears_stale_formula_metadata_on_literal_swap and test_cell_records_overlay_replaces_formula_metadata_on_formula_swap cover both directions.

chatgpt-codex-connector · 2026-04-19T11:36:51Z

+                min_row=min_row,
+                max_row=max_row,
+                min_col=min_col,
+                max_col=max_col,
+                include_empty=include_empty,


Honor include_coordinate in write-mode cell_records fallback

In write mode (_rust_reader is None), iter_cell_records() forwards only include_empty to the Python fallback and drops include_coordinate, even though the API docs promise include_coordinate=False avoids A1 allocation. This makes write-mode output/schema diverge from read mode and forces unnecessary coordinate generation for large in-memory sheets.

Useful? React with 👍 / 👎.

Fixed in da9431f. iter_cell_records now forwards include_coordinate to _iter_cell_records_python, and the fallback only sets the coordinate key when requested. Lock-in test test_iter_cell_records_write_mode_honors_include_coordinate constructs a fresh Workbook() (no Rust reader → fallback path) and asserts the key is present/absent as the flag dictates.

…oordinate in write fallback Two issues from Codex round-6 review on PR #4 commit 4dea6a7: P2 (cell_records overlay leaked stale formula text). When a pending edit replaced an on-disk formula cell with a literal, only `value`/`data_type` were patched — the original `formula` field stayed on the record, so a consumer would see `data_type='number'` next to a `formula='SUM(...)'` field for the same cell. The reverse (literal → formula) failed to set the new formula text. Now: literal overwrites pop the `formula` key; formula overwrites set it from the new value (leading `=` stripped to match Rust's reader convention). P3 (write-mode fallback dropped include_coordinate). `iter_cell_records` forwarded only `include_empty` to `_iter_cell_records_python`, even though the API docs promise `include_coordinate=False` skips A1 string allocation. Write mode emitted a `coordinate` key unconditionally, diverging from read mode and forcing wasted work on large in-memory sheets. Lock-in tests: - test_cell_records_overlay_clears_stale_formula_metadata_on_literal_swap - test_cell_records_overlay_replaces_formula_metadata_on_formula_swap - test_iter_cell_records_write_mode_honors_include_coordinate Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

wolfiesch · 2026-04-19T11:42:22Z

@codex review

The new write-mode coordinate test never saves a file, so requesting the `tmp_path` pytest fixture and binding `path = tmp_path / "..."` was dead weight that ruff F841 caught on CI lint. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: da9431ff39

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-19T11:51:13Z

+            extra: dict[str, Any] = {
+                "row": row,
+                "column": col,
+                "value": value,
+                "data_type": _canonical_data_type(value),


Include formula text in overlay-only formula records

When iter_cell_records() overlays pending edits in modify mode, the branch that emits records not returned by Rust only sets row, column, value, and data_type. If a pending edit is a formula string (for example a new "=SUM(...)" in an on-disk empty cell), the emitted record becomes data_type="formula" but has no formula field, unlike formula records from the Rust path. This drops expression metadata for unsaved edits and breaks consumers that read record["formula"] for formula cells.

Useful? React with 👍 / 👎.

Fixed in 5f6cc0c. The extra-overlay branch now emits the formula key alongside data_type='formula' when the pending value is a "=..." string, with the same leading-= strip applied for Rust-style consistency. Lock-in test test_cell_records_extra_overlay_emits_formula_field_for_pending_formula writes a formula into a previously empty cell (so the Rust reader returns no record for it) and asserts the resulting overlay record has both data_type='formula' and the expected formula field.

chatgpt-codex-connector · 2026-04-19T11:51:13Z

+        if buf:
+            start = self._append_buffer_start
+            buf_max_r = start + len(buf) - 1
+            buf_max_c = max(len(row) for row in buf)


Prevent pending bounds from producing column index 0

_pending_writes_bounds() computes appended-row width with max(len(row) for row in buf). If buffered appends include an empty row (e.g. append([])), this yields 0, which can propagate to _max_col()==0; downstream 1-based coordinate helpers (like rowcol_to_a1 via dimension/reporting paths) then see an invalid column index. Clamp this width to at least 1 (or ignore zero-width rows) so pending bounds never generate column 0.

Useful? React with 👍 / 👎.

Fixed in 5f6cc0c. _pending_writes_bounds() now uses max((len(row) for row in buf), default=0) and only contributes to the column-bounds branch when the width is >0. The same guard is applied to the bulk-write branch. An empty appended row still consumes a row index (so max_r grows) but leaves max_c alone, so _max_col() can never collapse to 0 from a zero-width append. Lock-in test test_pending_writes_bounds_skips_zero_width_appended_rows invokes ws.append([]) and asserts calculate_dimension() doesn't leak A0 into the result.

…bounds against zero-width rows Two findings from Codex round-7 review on PR #4 commit caa57ad: P1 (overlay-only formula records dropped expression metadata). Round 6 fixed the patched-overlay branch (where a Rust-returned record gets a field swap) to refresh `formula` on edits. The parallel "extra-overlay" branch — for cells the Rust reader didn't return at all — built a fresh record dict that omitted `formula` entirely, even when the new value was a `"=..."` formula string. So a `data_type='formula'` record with no expression leaked out for any pending formula written into a previously empty cell. Fix: mirror the same `formula = value[1:]` derivation in the extra-overlay constructor. P2 (zero-width appended row could collapse max_col to 0). `_pending_writes_bounds()` computed `max(len(row) for row in buf)` for the append buffer's column extent. An `ws.append([])` (or a bulk grid where every row is empty) yields 0 columns, but the code still set `min_c = 1`, leaving the bounds as `min_c=1 / max_c=0` — invalid. Downstream `rowcol_to_a1(_, 0)` would either raise or produce garbage A1. Fix: switch to `max(..., default=0)` and gate the column-bounds contribution on `>0`. Same guard added to the bulk-write branch. Lock-in tests: - test_cell_records_extra_overlay_emits_formula_field_for_pending_formula - test_pending_writes_bounds_skips_zero_width_appended_rows Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

wolfiesch · 2026-04-19T11:54:03Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5f6cc0c123

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-19T12:02:45Z

+    if isinstance(value, _dt.date):
+        return "date"


Align overlay date labels with Rust record schema

iter_cell_records() overlays pending Python edits using _canonical_data_type(), which maps datetime.date to "date", but the Rust read_sheet_records() path emits temporal cells as "datetime" (via data_type_name). In modify mode, editing a date cell can therefore produce mixed data_type tokens in one result set, which breaks consumers that group/filter on the documented canonical labels. Either map Python date to "datetime" here or make the Rust path emit "date" consistently.

Useful? React with 👍 / 👎.

Fixed in 5d5fd84. _canonical_data_type now collapses all temporal types (datetime / date / time) to "datetime" to match Rust's data_type_name(), which emits a single "datetime" label for both Data::DateTime and Data::DateTimeIso. Updated the docstring to drop the now-absent "date" token. Lock-in test test_canonical_data_type_collapses_temporals_to_datetime asserts all three branches return "datetime".

chatgpt-codex-connector · 2026-04-19T12:02:45Z

+    if frac.abs() < f64::EPSILON {
+        return CellValue::Date(date);
+    }
+    let secs = (frac * 86_400.0).round() as u32;


Handle negative fractional Excel serials without u32 wrap

For negative serials with fractional parts (for example -0.5), secs is computed as (frac * 86_400.0).round() as u32, and casting a negative value to u32 wraps to a huge positive number. The downstream carry logic then yields incorrect date/time results instead of a valid pre-epoch datetime, so pre-1900 timestamp cells can be corrupted. Keep this arithmetic signed until normalization (borrow/carry) is finished.

Useful? React with 👍 / 👎.

Fixed in 5d5fd84. The day-fraction arithmetic now uses signed i64; when secs_signed < 0 we borrow whole days off days (mirroring the existing >= 86_400 carry), then cast to u32 only after the value lands in [0, 86_400). Pulled days → NaiveDate into a days_to_date_from_excel_base helper since the borrow runs before date materialization now. Lock-in test excel_serial_negative_fractional_borrows_into_prior_day checks serial -0.5 lands on 1899-12-29 12:00:00.

…tional Excel serials Two findings from Codex round-8 review on PR #4 commit 5f6cc0c: P2 (overlay date label diverged from Rust). `_canonical_data_type()` mapped `datetime.date` to `"date"`, but Rust's `data_type_name()` emits `"datetime"` for both `Data::DateTime` and `Data::DateTimeIso`. In modify mode, an overlay edit to a date cell would mix `"date"` and `"datetime"` tokens inside one `cell_records()` result, breaking consumers that group/filter by the documented vocabulary. Collapse all temporal types (`datetime` / `date` / `time`) to `"datetime"` and update the docstring to remove the now-absent `"date"` token. P2 (negative fractional Excel serial wrapped to garbage). The day-fraction arithmetic computed `(frac * 86_400.0).round() as u32`, which wraps a negative f64 to a huge positive u32. For a serial like `-0.5` (12:00 the day before the 1899-12-30 base), the wrap then drove the carry-forward branch into a corrupted far-future datetime. Switched to signed i64 arithmetic, with a borrow when `secs < 0` mirroring the existing `secs >= 86_400` carry. Pulled the `days → NaiveDate` step into a helper since it now needs to run after the borrow/carry adjusts `days`. Lock-in tests: - test_canonical_data_type_collapses_temporals_to_datetime (Python) - excel_serial_negative_fractional_borrows_into_prior_day (Rust) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

wolfiesch · 2026-04-19T12:07:02Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5d5fd844d5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-19T12:14:00Z

+        for r in 0..h {
+            let mut row: Vec<Cell> = Vec::with_capacity(w);
+            for c in 0..w {
+                let value = value_range
+                    .get((r, c))


Offset cell iteration by the range origin

worksheet_range() can return a range whose origin is not (0, 0) (for example when the first populated cell is C5). This loop indexes value_range and style_range with raw 0..h / 0..w coordinates, but Range::get expects absolute sheet coordinates, so non-A1 sheets are read as blank or misaligned and their formats are attached to the wrong cells. Add the range start offsets when computing (row, col) before calling get.

Useful? React with 👍 / 👎.

Thanks for flagging - I dug into the calamine-styles source before changing this. The claim that Range::get expects absolute coordinates is reversed: it's get_value that takes absolute coords; get takes relative ones. From calamine-styles 0.1.0/src/lib.rs:

// line 1099 pub fn get_value(&self, absolute_position: (u32, u32)) -> Option<&T> { ... } // line 1140 pub fn get(&self, relative_position: (usize, usize)) -> Option<&T> { let (row, col) = relative_position; let width = self.width(); if row >= self.height() || col >= width { None } else { self.inner.get(row * width + col) } }

So value_range.get((r, c)) with r, c ∈ 0..h × 0..w (where (h, w) = value_range.get_size()) reads every cell in the range correctly, regardless of whether the range starts at A1 or C5. Cells are not "blank or misaligned" and number formats are attached to the right cells inside the range.

There is a legitimate but distinct concern in the neighborhood: Sheet's internal Vec<Vec<Cell>> doesn't preserve the absolute origin of the range, so sheet.rows()[0][0] represents the first populated cell rather than literal A1. The current public API (headers, rows, row, dimensions) exposes no absolute coordinates and the only consumer (crates/wolfxl-cli/src/render.rs) renders the populated block as a tabular preview — which matches xleak's behavior on non-A1 sheets — so v0.4.0 peek output is unaffected.

Tracking the absolute-origin work as a follow-up rather than blocking this PR: when we add --map / --cell A1 / citation features in sprint 2, Sheet will need a (start_row, start_col) field plus absolute-position accessors. Filing a follow-up issue now.

Filed as #6 - Sheet absolute-origin work tracked there for sprint 2.

wolfiesch · 2026-04-19T12:21:49Z

@codex review

chatgpt-codex-connector

💡 Codex Review

wolfxl/src/calamine_styled_backend.rs

Lines 1974 to 1978 in 5d5fd84

    
           Ok(Event::Empty(ref e)) => { 
        
               // Handle self-closing <c .../> (cells with no children — no formula). 
        
               if e.local_name().as_ref() == b"c" { 
        
                   // No formula possible in an empty element, skip. 
        
               }

Handle shared-formula <f/> entries in XML parser

parse_formulas_from_sheet_xml only captures formulas when <f> contains text and the Event::Empty branch ignores empty formula elements, but shared-formula follower cells are commonly encoded as self-closing <f t="shared" si="..."/>. Those cells never reach formula_map_cache, so read_cell_value, read_sheet_values, and read_sheet_records in data_only=false mode can return cached/literal values instead of formula metadata for drag-filled formula ranges. This causes incorrect read semantics on many real XLSX files that use shared formulas.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

… §6 unit tests Single-pass streaming merger with quick_xml::Reader + Writer. State: BTreeMap<u32, Vec<SheetBlock>> by ECMA position (Vec to handle slot 17's 0..N conditionalFormatting), `replace_names: Vec<&[u8]>` for skip-on-match, `cf_replace: bool` for the §5.5 special case, `needs_rel_ns: bool` for the §8 risk #4 namespace injection. Edge cases handled (RFC §8): - risk #1 (extlst byte-preservation): Event::Start(e.borrow()) walks the source byte slice directly — quick-xml does not re-encode attributes. Headline test extlst_is_byte_preserved passes; the §8 byte-span fallback is unneeded. - risk #2 (comments/PIs): flow through at source position via the catch-all `other` arm of the event match. - risk #3 (self-closing <worksheet/>): Event::Empty(b"worksheet") path expands to <worksheet>...</worksheet> and flushes all pending blocks. - risk #4 (xmlns:r missing): ensure_rel_namespace() detects any existing prefix bound to the relationships URI; only injects when actually needed. Tests (20 total, 15 added in this commit): - RFC §6 #1-#12 (test #1 already present from commit 1): insert_hyperlinks_into_minimal_sheet, replace_existing_hyperlinks, insert_into_correct_ecma_position, extlst_is_byte_preserved (HEADLINE), unknown_element_passthrough, multiple_conditionalformatting_blocks, conditionalformatting_replaces_all_existing, block_inserted_when_no_neighbors, tableparts_after_extlst_is_wrong_and_we_fix_it, large_sheet_streaming_memory_bounded, byte_identical_when_block_already_present_and_unchanged. - Risk-fallback coverage: comments_pass_through_at_source_position, self_closing_root_expands_and_flushes, rel_namespace_injected_when_missing, rel_namespace_not_duplicated_when_already_present. RFC-011 commit 2 of 6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds the pure-Rust sheet-copy planner per RFC-035 §4.2 / §5. The planner takes the source sheet's parsed rels, source workbook.xml, pre-loaded ZIP parts, a shared PartIdAllocator, and the workbook-wide table-name set; it returns SheetCopyMutations describing every byte the patcher must add or rewrite to materialize the copy. Implementation lives in crates/wolfxl-structural/src/sheet_copy.rs (PyO3-free, so the 17 unit tests run via cargo test -p wolfxl-structural without hitting the cdylib's macOS-arm64 link issue). The spec-mandated location src/wolfxl/sheet_copy.rs is a thin re-export. Highlights per the RFC: * Single rid_remap pass over the cloned sheet XML (§8 risk #4 mitigation). * Image aliasing for drawings -> images (§5.3 / §8 risk #2): drawing rels file is cloned with fresh rIds but image targets stay verbatim. * Table name dedup uses f"{base}_{N}" starting at N=2 (OQ-b default). * Sheet-scoped defined-names cloned with re-pointed localSheetId (OQ-c default). * Validation: errors on duplicate dst_title, missing src_title, missing source bytes. * Allocator-driven table/comments/vmlDrawing/drawing/sheet suffixes ensure no collisions with parallel subsystems running in the same save. 17 unit tests cover: basic clone, single-table dedup, multi-table dedup, external hyperlink alias, comments+VML clone, drawing+image alias, sheet-scoped defined-name clone, no defined-names case, three validation errors, single-pass rid remap, dedup helper, placeholder rid consistency, allocator advance across calls, missing-defined-names path, unknown rel type aliasing. cargo test -p wolfxl-structural: 99 -> 116 (+17). Workspace total green. Pytest: 164 passed + 3 skipped (matches baseline). Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

…Pod-δ Pod-δ closed bugs #1 (test_i), #2 (test_j), #3 (test_k), #5 (test_q) from the "RFC-035 cross-RFC composition gaps" section. Move them to a new "Fixed in 1.1 (Sprint Ζ Pod-δ)" subsection (closed bullets). Bugs #4 (self-closing <sheets/>) and #6 (CDATA/PI fakeout) remain deferred to 1.2 — both are only reachable through synthesized fixtures or naive-splice fakeouts, no real Excel-emitted workbook hits them. Also: close the stale "Sheet/column/row structural mutations | Raises" row in the T1.5 table — RFC-030/031/034/036 all shipped in 1.1. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Adds a §8.5 "Sprint Θ deliverables (1.2)" subsection to RFC-035 capturing the five Sprint Θ pods landing in parallel: - Pod-A: bug #4 (test_p_self_closing_sheets_block) — Rust-level splice unit test for self-closing <sheets/>. - Pod-B: bug #6 (test_r_cdata_pi_fuzz_fakeout) — XML-aware splice replacing the byte-level locator. - Pod-C1: write-mode copy_worksheet (lifts §3 OQ-a NotImplementedError). - Pod-C2: image deep-clone (CopyOptions(deep_clone_images=True); additive, default False to preserve 1.1 byte-stability). - Pod-C3: calcChain rebuild (replaces "Excel rebuilds on next open" with a wolfxl-side rebuild post-Phase 2.7). Also updates the Status header from "Shipped" to "Shipped (1.1 modify-mode core; 1.2 Sprint Θ closes follow-ups — see §8.5)". TBD markers placeholder per-pod commit SHAs to be filled in by the final Sprint Θ integration pass once Pods A/B/C land. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

…ing <sheets/> Sprint Θ Pod-A — adds an opt-in loader fallback that unblocks workbooks whose xl/workbook.xml has an empty (self-closing) <sheets/> block while the workbook rels graph still references one or more worksheet parts. Surface: - wolfxl.load_workbook(path, ..., permissive=True) - Workbook._from_reader / _from_patcher gain a permissive kwarg - CalamineStyledBook.open and XlsxPatcher.open accept a permissive flag Behavior when permissive=True AND the parsed <sheets> block is empty: - Each worksheet relationship target in xl/_rels/workbook.xml.rels is registered under a synthesized title (Sheet1, Sheet2, ...). - The in-memory xl/workbook.xml is rewritten to expose the synthesized <sheet> entries so downstream phases (Phase 2.7 splice, defined-names merger, ...) see a well-formed workbook. The rewrite lives in a load-time-seeded file_patches map drained by do_save. Default is False; well-formed inputs are unaffected. Closes RFC-035 cross-RFC composition bug #4 — flips test_copy_worksheet_modify.py::test_p_self_closing_sheets_block from xfail → PASS end-to-end (load → copy_worksheet → save → reload via openpyxl). Full tests/test_copy_worksheet_modify.py: 18 passed, 1 xfail (unrelated bug #6). KNOWN_GAPS.md updated to move bug #4 to the "Fixed in 1.2 (Sprint Θ Pod-A)" section. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

… mode Closes RFC-035 KNOWN_GAPS bug #4 (test_p_self_closing_sheets_block). When permissive=True is passed to load_workbook, the loader handles self-closing <sheets/> workbook.xml fixtures by scanning the rels graph for worksheet targets and synthesizing sheet entries (Sheet1, Sheet2, ...) backed by a load-time workbook.xml rewrite that re-populates the <sheets> block. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

wolfiesch and others added 11 commits April 14, 2026 06:20

Fix WolfXL read parity for data-only and dimensions

c4f5f42

Finish WolfXL read parity cleanup

20fb8b5

merge: pyo3 0.28 bump for Python 3.14 support

72910cf

# Conflicts: # Cargo.lock

Copilot AI review requested due to automatic review settings April 19, 2026 09:20

Copilot started reviewing on behalf of wolfiesch April 19, 2026 09:21 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 19, 2026

View reviewed changes

wolfiesch changed the title ~~feat: wolfxl-core + wolfxl-cli (0.4.0) — Rust workspace, peek CLI at xleak parity, PyO3 0.28~~ feat: wolfxl-core + wolfxl-cli 0.4.0 + PyO3 0.28 (Python 3.14 support) Apr 19, 2026

wolfiesch mentioned this pull request Apr 19, 2026

Honor include_formula_blanks for uncached formulas in wolfxl-core #5

Open

chatgpt-codex-connector Bot reviewed Apr 19, 2026

View reviewed changes

wolfiesch mentioned this pull request Apr 19, 2026

Sheet should preserve absolute range origin for non-A1 sheets #6

Open

chatgpt-codex-connector Bot reviewed Apr 19, 2026

View reviewed changes

wolfiesch merged commit 303a162 into main Apr 19, 2026
9 checks passed

wolfiesch deleted the wolfxl-cli-peek branch April 19, 2026 12:32

		let has_date = lower.contains('y') \|\| lower.contains('d') \|\| lower.contains("mmm");
		let has_time = lower.contains('h') \|\| lower.contains(":mm") \|\| lower.contains(':');

		if self._workbook._rust_reader is None: # noqa: SLF001
		yield from self._iter_cell_records_python(

		if !data_only {
		// Check the fast formula map (parsed from worksheet XML in a single pass).

	Ok(Event::Empty(ref e)) => {
	// Handle self-closing <c .../> (cells with no children — no formula).
	if e.local_name().as_ref() == b"c" {
	// No formula possible in an empty element, skip.
	}

Conversation

wolfiesch commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in here

Pre-publish review

Test plan

Follow-up (not blocking publish)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wolfiesch commented Apr 19, 2026

Uh oh!

wolfiesch commented Apr 19, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wolfiesch commented Apr 19, 2026

Uh oh!

wolfiesch commented Apr 19, 2026 •

edited

Loading