[Web][COS] Persist URL→hash mapping across page loads by tomayac · Pull Request #19569 · apache/tvm

tomayac · 2026-05-15T19:26:09Z

The CrossOriginStorage class was storing the URL→hash map only in the module-level GLOBAL_HASH_CACHE. After a page reload that cache is empty, and getFileHash() can only recover hashes for HuggingFace LFS files (URLs containing /resolve/). This left several resource categories uncacheable across sessions:

JSON files not stored in LFS (mlc-chat-config.json, tokenizer.json, tensor-cache.json) — getFileHash returns null for their /resolve/ URLs because the raw pointer is the actual file content, not an LFS pointer.
.wasm files from GitHub raw URLs — no /resolve/ pattern at all.
Any file whose hash was computed from blob content via getBlobHash.

Additionally, even for genuine LFS model shards, each page load was re-fetching every shard's LFS pointer file over the network just to re-derive the SHA-256 hash.

Fix: persist the URL→hash mapping to a dedicated Cache API store (tvmjs-cos-hash-meta). Two write sites:

put() — after a file is stored in COS, persist its blob-derived hash. This covers all non-LFS files and non-HuggingFace URLs.
resolveHashDescriptor() — after getFileHash() resolves a hash from the LFS pointer, persist it immediately. This eliminates repeated pointer-file network requests for model shards on subsequent visits.

Both write sites use a best-effort try/catch so storage quota errors are silently ignored. loadPersistedHashEntry() similarly swallows errors. The typeof caches === "undefined" guard keeps the code safe in Node.js test environments.

The CrossOriginStorage class was storing the URL→hash map only in the module-level GLOBAL_HASH_CACHE. After a page reload that cache is empty, and getFileHash() can only recover hashes for HuggingFace LFS files (URLs containing /resolve/). This left several resource categories uncacheable across sessions: - JSON files not stored in LFS (mlc-chat-config.json, tokenizer.json, tensor-cache.json) — getFileHash returns null for their /resolve/ URLs because the raw pointer is the actual file content, not an LFS pointer. - .wasm files from GitHub raw URLs — no /resolve/ pattern at all. - Any file whose hash was computed from blob content via getBlobHash. Additionally, even for genuine LFS model shards, each page load was re-fetching every shard's LFS pointer file over the network just to re-derive the SHA-256 hash. Fix: persist the URL→hash mapping to a dedicated Cache API store (tvmjs-cos-hash-meta). Two write sites: 1. put() — after a file is stored in COS, persist its blob-derived hash. This covers all non-LFS files and non-HuggingFace URLs. 2. resolveHashDescriptor() — after getFileHash() resolves a hash from the LFS pointer, persist it immediately. This eliminates repeated pointer-file network requests for model shards on subsequent visits. Both write sites use a best-effort try/catch so storage quota errors are silently ignored. loadPersistedHashEntry() similarly swallows errors. The typeof caches === "undefined" guard keeps the code safe in Node.js test environments.

gemini-code-assist

Code Review

This pull request implements persistent caching for hash descriptors in CrossOriginStorage using the browser's Cache API. It introduces persistHashEntry and loadPersistedHashEntry methods to store and retrieve hashes, reducing redundant network requests for non-LFS files and specific URLs on subsequent visits. I have no feedback to provide as there were no review comments.

tomayac · 2026-05-15T19:28:32Z

Most likely @CharlieFRuan, @guan404ming, and @akaashrp would be good reviewers for this refinement.

akaashrp · 2026-05-16T22:58:50Z

Thanks for the ping @tomayac. Will take a look soon.

akaashrp · 2026-05-29T06:51:16Z

@tomayac Apologies for the delay. This looks good to me. Could you rebase so that CI checks pass?

tomayac · 2026-05-29T08:26:04Z

@tomayac Apologies for the delay. This looks good to me. Could you rebase so that CI checks pass?

Thanks so much! Can you try again now?

guan404ming

Thanks!

akaashrp · 2026-05-31T23:27:49Z

@tomayac you might need to commit again to retrigger CI, not sure what's up with it

tomayac · 2026-06-01T08:51:02Z

@tomayac you might need to commit again to retrigger CI, not sure what's up with it

Made a commit, and CI seems to be running now.

tomayac · 2026-06-01T12:34:28Z

⏳ Oh, please wait before merging. I want to test one more thing before.

tomayac · 2026-06-01T14:07:07Z

⏳ Oh, please wait before merging. I want to test one more thing before.

OK, we should be good. All resources that we now additionally cache here use some sort of versioning string in the path of their URLs (e.g, https://raw.githubusercontent.com/mlc-ai/binary-mlc-llm-libs/main/web-llm-models/v0_2_83/base/Llama-3.2-3B-Instruct-q4f32_1_cs1k-webgpu.wasm), so we can indeed use the URL as the cache key (await this.persistHashEntry(url, hash)) and not be stuck with an outdated version should the resource change. Ready to merge.

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

akaashrp approved these changes May 29, 2026

View reviewed changes

Merge branch 'apache:main' into cos-refinement

469a88c

guan404ming approved these changes May 29, 2026

View reviewed changes

tomayac commented Jun 1, 2026

View reviewed changes

Comment thread web/src/artifact_cache.ts Outdated

Apply suggestion from @tomayac

a4eace8

akaashrp merged commit 8039963 into apache:main Jun 1, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Web][COS] Persist URL→hash mapping across page loads#19569

[Web][COS] Persist URL→hash mapping across page loads#19569
akaashrp merged 3 commits into
apache:mainfrom
tomayac:cos-refinement

tomayac commented May 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

tomayac commented May 15, 2026

Uh oh!

akaashrp commented May 16, 2026

Uh oh!

akaashrp commented May 29, 2026

Uh oh!

tomayac commented May 29, 2026

Uh oh!

guan404ming left a comment

Uh oh!

akaashrp commented May 31, 2026

Uh oh!

Uh oh!

tomayac commented Jun 1, 2026

Uh oh!

tomayac commented Jun 1, 2026

Uh oh!

tomayac commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tomayac commented May 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

tomayac commented May 15, 2026

Uh oh!

akaashrp commented May 16, 2026

Uh oh!

akaashrp commented May 29, 2026

Uh oh!

tomayac commented May 29, 2026

Uh oh!

guan404ming left a comment

Choose a reason for hiding this comment

Uh oh!

akaashrp commented May 31, 2026

Uh oh!

Uh oh!

tomayac commented Jun 1, 2026

Uh oh!

tomayac commented Jun 1, 2026

Uh oh!

tomayac commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants