Feat/input callbacks#200
Merged
Merged
Conversation
New `io` module that lifts libxml2's `xmlRegisterInputCallbacks` into
a closure-friendly Rust API:
pub fn register_input_callback<M, O>(match_url: M, open: O)
where
M: Fn(&str) -> bool + Send + Sync + 'static,
O: Fn(&str) -> Option<Vec<u8>> + Send + Sync + 'static;
`match_url` claims a URL; `open` returns the bytes (or None to defer
back through the callback chain). The C trampolines are registered
with libxml2 exactly once per process; subsequent calls just append
to a Mutex<Vec<Callback>> registry that the trampolines walk on each
URL load. `Send + Sync` because libxml2 may dispatch from any thread.
## Motivating use case
A single-binary CLI bundles its XSLT stylesheets / RNG schemas via
`include_bytes!` and serves them through a synthetic URL scheme
(e.g. `embed:///LaTeXML-html5.xsl`). The main stylesheet is parsed
from memory via `libxslt::parser::parse_bytes(bytes, "embed:///main.xsl")`
which sets the doc's base URI. Inside libxslt, `xsl:import href="…"`
composes the absolute URL against that base, then calls `xmlReadFile`
— which walks libxml2's input-callback table and finds ours. No disk
extraction needed.
The same trick handles RelaxNG `<include>` resolution from
`xmlRelaxNGParse`, DTD external subsets, and any other libxml2-side
URL load.
## Why not `Parser::parse_file`
The existing `Parser::parse_file` reads the file via Rust I/O
(`std::fs::File::open` + `xmlReadIO`) and bypasses libxml2's URL
machinery entirely. The doctest example is marked `no_run` and notes
that the callback fires from libxslt / xmlReadFile contexts, not
from the library's own `parse_file` surface.
## Tests
Three unit tests against `xmlReadFile` (the libxml2 entry point that
actually exercises the callback chain):
* `callback_serves_registered_url` — registered URL parses through
the callback (round-trip via xmlReadFile -> trampoline_open ->
Rust closure -> trampoline_read -> libxml2 parse).
* `callback_can_decline_via_none` — open returning None fails the
load rather than returning phantom data.
* `non_matching_url_defers_to_default_handlers` — match returning
false leaves the default file/HTTP loaders intact (verified by
a /nonexistent file:// URL failing through the default chain).
All 105 pre-existing tests still pass; full sweep clean.
## Notes
* libxml2 has no per-handler unregistration API (only
`xmlCleanupInputCallbacks` which wipes the whole chain including
the defaults), so the trampolines and the Rust registry live for
the process lifetime. Reasonable for the embedded-asset use case;
documented in the module docs.
* `Mutex::lock` is held only briefly during the registry walk on
each URL load — no closures run while the lock is held that could
re-enter libxml2.
* Callback ordering is last-registered-first, matching libxml2's own
convention. Stacking multiple registrations for the same scheme is
supported.
Version: 0.3.11 -> 0.3.12.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three #[test]s deadlocked under cargo's parallel runner on libxml2 2.12.9 (pre-2.13 thread-safety bug in the input-callback / global error path); merge them into one #[test] so scenarios run sequentially. Drive-bys from the same review: * Drop redundant function-pointer aliases (4 non_camel_case warnings); Some(trampoline_*) already coerces to the bindgen Option<extern "C" fn> alias. * Extract MatchFn/OpenFn (clippy::type_complexity on the Box dyn Fn). * Iterate the registry newest-first in trampoline_open to match the module doc's "most recent wins" and libxml2's own callback table semantics. * Store entries as Arc<Callback> and snapshot the Vec before invoking a closure, so an open() that re-enters libxml2 via xmlReadFile doesn't self-deadlock on the non-reentrant registry Mutex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror trampoline_match's iteration order to trampoline_open (newest-first), .unwrap() the registry mutex in snapshot() to match register_input_callback, and add a fifth scenario asserting the documented "most recent registration wins" semantics with atomic counters. Comments and CHANGELOG entry compacted; behaviour unchanged for correct callers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the archived actions-rs/* and ryankurte/action-apt with their current-standard equivalents so the workflows run on Node 24 ahead of GitHub's June 2026 forced migration: * actions-rs/toolchain@v1 -> dtolnay/rust-toolchain (@stable, plus @master + toolchain:/targets: for the mingw windows-gnu job) * actions-rs/cargo@v1 -> plain `run: cargo test|doc` * ryankurte/action-apt -> plain `run: apt-get update && install` * actions/checkout@v2/@v4 -> @v6 Also add least-privilege `permissions:` blocks (contents: read for the CI/test workflows; contents: write for gh-pages, which pushes rendered docs to the gh-pages branch). CHANGELOG: date 0.3.12 (2026-05-23) and open a 0.3.13 in-development section. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The mingw64 job sets `defaults.run.shell: msys2 {0}`, so the converted
`run: cargo test` step executed inside the msys2 login shell. With
`path-type: minimal`, msys2 strips cargo (installed by rustup to the
Windows user profile) from PATH, so the step failed with exit 127.
The previous actions-rs/cargo@v1 step was a JS action that ran in the
runner's Windows context, never msys2 — so it always found cargo.
Restore that behavior by pinning the test step to `shell: pwsh`.
mingw64/bin is already on PATH from the prior step, so pkg-config, gcc,
and the libxml2 DLLs still resolve for the windows-gnu build.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Member
Author
|
I updated the CI workflows to silence the recent GHA warnings and prepped a v0.3.12 release. Merging and releasing. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adding the ability to run input callbacks, motivated by in-memory libxslt use.
As a motivating example:
The PR code is AI-generated, so I'll let it stew until I do a proper review pass. Tested and works with my main use case with libxslt, so there is a baseline of "it works".