feat(core): support offline validation and HTTP cache management#165
feat(core): support offline validation and HTTP cache management#165kikkomep wants to merge 25 commits into
Conversation
Extend HttpRequester with `offline` and `no_cache` options, plus fetch_fresh/has_cached/clear_cache/cache_info/reset helpers. Add an OfflineCacheMissError and an _OfflineFallbackSession that yields a 504 when requests_cache is unavailable, mirroring the only-if-cached behavior. Emit standardized `CachedHttpRequester:` log lines describing each request's cache outcome.
Introduce `get_user_cache_dir()` (honoring `XDG_CACHE_HOME`, falling back to `~/.cache/rocrate-validator`) and `get_default_http_cache_path()`, plus the `USER_CACHE_DIR_NAME` / `USER_CACHE_FILE_NAME` constants, so the HTTP cache can be located under a stable, user-level directory instead of the previous `/tmp` prefix.
Cover the offline/no-cache flags, fetch_fresh, has_cached, clear_cache, cache_info, reset and the _OfflineFallbackSession 504 behavior, plus the standardized cache-outcome log messages.
Introduce `cache_warmup` helpers that discover external artifacts declared by profile descriptors via `prof:hasResource`/`prof:hasArtifact` and prefetch them so subsequent offline runs resolve every required resource from the local HTTP cache. Add the `ROCRATE_VALIDATOR_AUTO_WARM` environment variable to toggle automatic warm-up.
Introduce `install_document_loader()` that patches rdflib's `source_to_json` so remote `@context` resolution goes through HttpRequester, benefiting from the HTTP cache and honoring offline mode (raising OfflineCacheMissError on offline cache misses). The install is idempotent and reversible via `uninstall_document_loader()` for tests.
Expose `offline` and `no_cache` flags on ValidationSettings and default `cache_path` to the persistent user HTTP cache so consecutive online/ offline runs share the same store. Validate that `offline` and `no_cache` are mutually exclusive. Install the JSON-LD document loader so rdflib's remote `@context` resolution goes through the cache.
…e `validate` command
Introduce `rocrate-validator cache` with `info`, `reset` and `warm` subcommands to inspect, clear and pre-populate the persistent HTTP cache used by offline validation. `warm` discovers cacheable URLs from profile descriptors and can also prefetch remote RO-Crates.
…m-up and cache CLI
Redirect XDG_CACHE_HOME to a session-scoped tmp dir so tests never touch the developer's real ~/.cache, and default ROCRATE_VALIDATOR_AUTO_WARM=0 per test to prevent unintended network calls. Tests that need warm-up opt in explicitly.
Switch DEFAULT_HTTP_CACHE_MAX_AGE from 300s to -1 so cached HTTP resources (JSON-LD contexts, profile artifacts, etc.) persist indefinitely by default. The `-1` sentinel is already supported throughout the cache stack and is the value used by `cache warm`. Users can still opt into a finite TTL via `--cache-max-age`. Note: this does not affect remote RO-Crates downloaded for validation, which are always re-fetched online via `fetch_fresh` (and the cached copy overwritten) so that subsequent offline runs validate against the latest known remote state. The `max_age` setting only governs the regular cached session used for other HTTP-backed resources.
1e122d2 to
419fece
Compare
|
I've just tested this PR and it works well offline! The A few comments after my experiments - none are blocking, just quality of life stuff:
It would be useful if either
It would be great if there was an easy way to provide a non-crate URL to cache. |
Mirror the fallback used by `validate`: when -p is given a token with no exact identifier match, pick the highest-version profile sharing that token. Emit a note listing the chosen identifier and available alternatives only when more than one version matched, so the common single-version case stays quiet.
|
Changes in the latest few commits:
Also added: @elichad let me know if it all looks good |
Summary
Introduce an offline validation mode that lets users validate RO-Crates without network access, relying on a persistent local HTTP cache for every resource the validator needs to resolve.
What changes
--offlineon the CLI) serves every HTTP-backed resource from a local cache, failing clearly when a resource has never been fetched.@contextfetches performed by rdflib go through the same cache, which previously bypassed it.rocrate-validator cachecommand group lets users inspect, reset and pre-populate the cache.User-facing changes
validate --offline/validate --no-cacheflags.rocrate-validator cache info|reset|warmsubcommand.[fix #114 #115]