Skip to content

[codex] Fix Windows SIGALRM markdown indexing#22

Merged
torresmateo merged 6 commits into
mainfrom
codex/fix-windows-sigalrm-indexing
May 13, 2026
Merged

[codex] Fix Windows SIGALRM markdown indexing#22
torresmateo merged 6 commits into
mainfrom
codex/fix-windows-sigalrm-indexing

Conversation

@jottakka
Copy link
Copy Markdown
Contributor

Summary

Fixes #21

This fixes Windows Markdown indexing by avoiding unconditional use of POSIX-only signal.SIGALRM / signal.alarm in parser file reads. It also makes libr add <directory> return a non-zero exit code when the directory indexing result contains errors, so automation can detect partial indexing failures.

Root Cause

safe_read_text() installed a timeout with signal.SIGALRM before reading every text file. Windows Python does not expose SIGALRM, so Markdown parsing failed before the file content could be read. Directory indexing collected that failure in errors, but the CLI still printed a success panel and exited with status code 0.

Implementation Approach

  • Added a small signal-timeout availability guard in librarian.processing.parsers.base.
  • Preserved the existing signal.alarm timeout path on platforms that provide it.
  • Used the normal read path without POSIX signal timeout where SIGALRM/alarm are unavailable.
  • Applied the same compatibility guard to safe_read_bytes() because it used the same POSIX-only mechanism.
  • Updated libr add <directory> to print an error status and raise typer.Exit(1) when indexing returns errors.

Tests Added

  • Regression tests for safe_read_text() and safe_read_bytes() with signal.SIGALRM removed from the runtime.
  • A CLI regression test confirming libr add <directory> exits non-zero when indexing reports errors.

Validation

  • uv run --extra dev pytest -> 75 passed, 8 skipped.
  • uv run --extra dev ruff check --diff librarian/processing/parsers/base.py librarian/cli.py tests/test_file_read_errors.py tests/test_cli.py -> no diffs.
  • uv run --extra dev mypy librarian/processing/parsers/base.py -> success.
  • Windows-compatible controlled smoke: removed signal.SIGALRM at runtime, patched a fake embedder, and confirmed index_directory_to_library() indexed one Markdown file with indexed: 1 and errors: [].

I also tried a raw CLI smoke with the real local embedding model in an isolated temporary HOME. It reached the indexing pipeline and correctly exited non-zero on failure, but the run was blocked by Hugging Face model cache setup in the temporary HOME rather than by Markdown parsing.

Remaining Limitations

On platforms without SIGALRM, file reads no longer have the POSIX alarm timeout. This preserves Windows compatibility without introducing a broader threaded/process timeout refactor.

@jottakka jottakka force-pushed the codex/fix-windows-sigalrm-indexing branch from 30de204 to ab2bafc Compare May 11, 2026 03:35
jottakka and others added 2 commits May 11, 2026 01:06
Wrap the POSIX SIGALRM/alarm setup in a single `_read_timeout` context
manager that no-ops on platforms (Windows) where those signals are
unavailable. This removes the duplicated error-handling branches in
`safe_read_text` / `safe_read_bytes` and drops the `_handle_read_error`
helper, the unreachable trailing `raise` after it, and the
`vars(signal)` + `cast` workarounds previously needed to keep mypy
happy on Windows.

Also make the new CLI regression test independent of terminal width by
pinning COLUMNS so Rich does not wrap the error line off-screen, and
annotate `monkeypatch` with `pytest.MonkeyPatch` for consistency with
the rest of the suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a Windows smoke run (Python 3.11) to the test matrix so that
POSIX-only regressions like the SIGALRM markdown-indexing bug fixed
in this PR get caught at PR time instead of in production. Ubuntu
keeps the full 3.10/3.11/3.12 matrix; Codecov upload stays restricted
to the single ubuntu/3.11 job. `fail-fast: false` so a Windows failure
does not cancel the Linux matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jottakka jottakka self-assigned this May 11, 2026
@jottakka jottakka requested a review from torresmateo May 11, 2026 15:08
jottakka and others added 3 commits May 11, 2026 12:09
Add a macOS smoke run (Python 3.11) alongside the new Windows job.
macOS is a primary developer platform for this project, so catching
Darwin-specific quirks (BSD libc differences, case-insensitive
filesystem, signal behavior) at PR time is worth the extra runner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The macOS test job was failing with:

    AttributeError: 'sqlite3.Connection' object has no attribute
    'enable_load_extension'

librarian's storage layer calls `conn.enable_load_extension(True)` to
load the sqlite-vec extension. That method only exists when CPython was
compiled with `--enable-loadable-sqlite-extensions`. The python.org
build that `actions/setup-python` ships on macOS is compiled without
that flag, so any test path that touches the database fails — 24
unrelated failures, all the same root cause.

Switch the test job to uv-managed Python (python-build-standalone),
which enables loadable sqlite extensions on every platform. This is a
single `python-version` input on `astral-sh/setup-uv`, so it also
removes the now-redundant `actions/setup-python` step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous attempt removed actions/setup-python but uv still resolved
the runner's pre-installed Python (/usr/local/bin/python3.11) because
the version matched the matrix entry. That Python is the python.org
build, compiled without --enable-loadable-sqlite-extensions, so
sqlite-vec failed to load and 24 tests crashed on macOS.

Add --python-preference=only-managed to the install step so uv
downloads a python-build-standalone interpreter instead. Those builds
enable loadable sqlite extensions on every platform.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@torresmateo torresmateo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only question is why are we using 3.11 in some of these?

Comment thread .github/workflows/ci.yml
include:
# Smoke-test on Windows to catch POSIX-only regressions (e.g. SIGALRM).
- os: windows-latest
python-version: "3.11"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 3.11 here?

@Spartee Spartee self-requested a review May 13, 2026 13:37
@torresmateo torresmateo merged commit 6aaef3c into main May 13, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Windows Markdown indexing fails because parser uses POSIX SIGALRM

3 participants