feat(mediainfo): add MediaInfoContext for reusable library loading#1
Merged
Conversation
# Why
- Every v0.1.0 parse call paid a full dlopen, 10-symbol resolve, handle
creation, version probe, and dlclose cycle. For batch workloads
(asset scanners, directory walks, import pipelines) the load
overhead was pure waste and dominated the work.
- A reusable context lets callers amortize the one-time library load
cost across many parses while keeping every existing public API
signature and the global parse lock intact.
# What
- Add `MediaInfoContext` public type that loads the MediaInfo shared
library once and reuses it across all parse entry points (path,
reader, URL, pre-built `MediaInfoInput`, structured + raw-text
output). `Clone + Send + Sync`; wrap in `Arc` to share across
worker pools.
- Constructors: `new`, `with_library_file`, `with_library_search_dir`.
Accessors: `library_version`, `library_version_string`,
`library_file`, `can_parse`.
- Refactor the parse pipeline: introduce private `LoadedLibrary<'a>`
and split `load_library` into `load_library_full` so the
`*_internal_unlocked` functions accept an already-loaded library
instead of re-loading on every call.
- Route free `MediaInfo::parse*` functions through a lazily
initialized process-wide default context
(`OnceLock<RwLock<Option<Arc<MediaInfoContext>>>>`) whenever no
`library_file`/`library_search_dir` override is set. Callers that
pin an explicit path still go through the per-call load path,
preserving v0.1.0 behavior verbatim.
- Add `MediaInfo::reset_default_context()` as an escape hatch to drop
the cached context and force a fresh load on the next parse.
- Add `MediaInfoError::LibraryMismatch { context, requested }` +
`library_mismatch` constructor, returned when a context parse call
is given a conflicting `library_file`/`library_search_dir`
override.
- Ship 20 new tests: 14 in `tests/context_tests.rs` covering reuse,
thread safety, mismatch guard, reader/JSON/custom-options/reset
paths; 2 in `tests/end_to_end_tests.rs` (URL via context, 100-thread
shared-context stress); 2 in `tests/error_unit_tests.rs`; 2 new
doctests. Total test count: 203 -> 223.
- Add `benches/parse_overhead.rs` with criterion benchmarks: free
function, context, and a forced-uncached variant that pins
`library_file` to reproduce the v0.1.0 fresh-load path without
needing a git worktree.
- Add `examples/batch_parse.rs` mirroring the existing example style.
- Wire a new `semver` job (`cargo semver-checks -p rsmediainfo`) into
the CI pipeline and the release gate so future version bumps are
gated on API compatibility.
- Bump crate version 0.1.0 -> 0.2.0 in `Cargo.toml`, `README.md`
install snippet, and the bug-report issue template placeholder.
Update `CHANGELOG.md` with the 0.2.0 entry.
# Notes
- Additive only. No existing public API signature changed; every
existing test passes unmodified.
- Observable behavior change: the library now stays mapped into the
process after the first successful parse instead of being dlclosed
at the end of every call. This matches other language wrappers and
is the whole reason the reuse path is faster. The OS reclaims the
mapping on process exit.
- Default-context error handling: load failures are not cached, so a
retry after fixing the environment will work without needing
`reset_default_context()`. This sidesteps the `std::io::Error`
not-Clone problem entirely.
- Windows benchmark numbers are within noise because the OS loader
keeps `MediaInfo.dll` mapped across `LoadLibrary`/`FreeLibrary`
pairs; on Linux, where `dlclose` is eager, the context path is
expected to dominate the uncached path significantly.
- `cargo-semver-checks` classifies 0.1.0 -> 0.2.0 as a major change
(0.x semver) and skips all 252 lint checks, which is the expected
"no further update required" outcome.
Why - MediaInfo_New / MediaInfo_Delete are not thread-safe on every libmediainfo build, so the throwaway probe handle used by load_library_full could race against an in-flight parse on another thread. What - Take the global parse lock inside load_library_full for the duration of the version probe so both handle creation and destruction happen in the locked region. - Reorder parse_to_string / parse_to_string_from_url / parse_reader_to_string so the library is resolved before the parse lock is acquired, preventing the inner probe from re-entering and deadlocking the same lock. - Document the new locking contract on load_library_full. Notes - No API changes. Pure concurrency fix.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Every v0.1.0 parse call paid a full
dlopen→ 10-symbol resolve →MediaInfo_New→ version probe → configure → parse →MediaInfo_Delete→dlclosecycle. For single-file use this is invisible; for batch workloads (asset scanners, directory walks, import pipelines) the load overhead dominates and is pure waste.This PR adds a reusable
MediaInfoContextthat loads the MediaInfo shared library once and reuses it across many parse calls. Existing free functions are transparently routed through a lazily initialized process-wide default context so current users get the performance win without a single line of code change. The feature is purely additive — no public API signature changed and every existing test passes unmodified.What
[core]
src/mediainfo.rspub struct MediaInfoContext(Clone + Send + Sync) with the full 16-method parse surface mirrored fromMediaInfo(parse,parse_media_info,parse_path,parse_from_reader,parse_input, their_with_optionsvariants, plus the twoparse_to_string/parse_reader_to_stringraw-text helpers).new()(default search order),with_library_file(path),with_library_search_dir(dir).library_version() -> LibVersion,library_version_string() -> &str,library_file() -> Option<&Path>,can_parse() -> bool.struct LoadedLibrary<'a>carrying a borrowed&'a Arc<MediaInfoLib>+ cached version info.load_librarysplit intoload_library_full(library_file, library_search_dir) -> Result<(Arc<MediaInfoLib>, String, LibVersion)>.parse_to_string_internal_unlocked,parse_to_string_from_url_unlocked,parse_reader_to_string_internal_unlocked, andparse_url_via_httpnow take&LoadedLibrary<'_>as their first parameter instead of loading the library themselves.static DEFAULT_CONTEXT: OnceLock<RwLock<Option<Arc<MediaInfoContext>>>>, lazy-initialized on first use, resettable.pub fn MediaInfo::reset_default_context()escape hatch to drop and rebuild the cached context.parse_to_string_internal,parse_to_string_from_url,parse_reader_to_string_internal) now checks whetheroptions.library_file/options.library_search_diris set:load_library_full(v0.1.0 path preserved verbatim).ParseOptions::mediainfo_optionsrustdoc gained a "Thread safety" subsection explaining how custom options interact with the process-wide parse lock.[errors]
src/error.rsMediaInfoError::LibraryMismatch { context: PathBuf, requested: PathBuf }plus alibrary_mismatch(context, requested)constructor helper.library_file/library_search_diroverride that would require a different library than the one the context already loaded. Matchinglibrary_fileis allowed (tautology);library_search_diralways triggers a mismatch because a re-resolve is conceptually a different load.[api]
src/lib.rsMediaInfoContextfrom the crate root alongside the existing types.MediaInfoContextfeature bullet and a pointer to the newbatch_parseexample.[tests] 20 new tests, total count 203 → 223
tests/context_tests.rs(new, 14 tests): basic reuse, equivalence with the free function path, version cache, shared-across-threads stress (16 workers × 5 parses), library mismatch (file + search-dir), matching-file tautology, reader input, raw JSON output, custom options isolation between calls, default context reuse,reset_default_contextrecovery, free function with explicit library override, and pre-builtMediaInfoInputdispatch.tests/end_to_end_tests.rs: addstest_context_url_parse(URL via shared context using the existing tiny_http mock) andtest_thread_safety_context(100-thread stress with a sharedArc<MediaInfoContext>). Originaltest_thread_safetyis untouched.tests/error_unit_tests.rs: 2 new tests covering theLibraryMismatchDisplayformat and variant matching.MediaInfoContextandMediaInfoError::library_mismatch.[bench]
benches/parse_overhead.rs(new)free_fn/parse_media_info_path— exercises the default-context fast path.context/parse_media_info_path— explicitMediaInfoContextreuse.free_fn/parse_media_info_path_uncached— pinslibrary_fileinParseOptionsto bypass the default context and reproduce the v0.1.0 fresh-load path, giving a proper A/B baseline without needing a git worktree.criterion = "0.5"added to[dev-dependencies]; new[[bench]]entry inCargo.toml.[examples]
examples/batch_parse.rs(new)parse_path.rs. ShowsMediaInfoContext::new()?+ loop, printing the loaded library version and a per-file track count.[docs]
README.md,CHANGELOG.mdREADME.md: new "Reusable context" bullet in the features list, examples pointer updated to mention batch parsing, install snippet bumped"0.1.0" → "0.2.0".CHANGELOG.md: new0.2.0entry with Added / Changed sections.[ci]
.github/workflows/ci.ymlsemverjob that installscargo-semver-checks --lockedand runscargo semver-checks -p rsmediainfo. SetsRS_MEDIAINFO_SKIP_DOWNLOAD=1so it does not fetchMediaInfo.dllduring the check.semveradded to thereleasejob'sneeds:list so a semver regression blocks a release.[version]
Cargo.toml,README.md,.github/ISSUE_TEMPLATE/bug_report.yml0.1.0 → 0.2.0inCargo.toml.