[Web] Seperate parallel shard download and iterative shard loading #16650

DiegoCao · 2024-02-27T22:48:33Z

This PR address the issue in mlc-ai/web-llm#313. We make the following changes:

Seperate downloading shards and loading shards to ndarraycache, where the former is done with parallel downloads, and the latter is purely sequential
We limit the maximum concurrent downloads to 4 by launching 4 parallel for loops
We add try-catch when loading shards onto ndarraycache

Separately, we add and export IndexedDB initial implementation.

DavidGOrtega · 2024-02-27T23:12:06Z

@DiegoCao Its not the real solution.
As I stated the cache can fail even if its just one by one. I have suffered that also.

Definitely HF CDN is not great. It should allow to do parallel requests without mayor problems.

Real Solution:

Implement retry mechanism (like 3 retries or something)
Parallellse n files (5 by default) instead of all of them

DavidGOrtega · 2024-02-27T23:12:45Z

I can give it a shot tomorrow

CharlieFRuan · 2024-03-02T05:04:44Z

@DavidGOrtega Thanks for offering help! We found out that the issue was probably not due to parallel downloading the shards but due to parallelly processing the shards. We managed to keep the parallel downloads. If download issues persist, we'll add parallel download in batch as you suggested.

The new version include 2 changes: - Include cache deletion API via #314 - Fix model download/caching issue on TVMjs side via apache/tvm#16650

tqchen · 2024-03-12T00:20:36Z

need rebase

Another minor follow-up to version 0.2.24 (or hence to 0.2.25). This PR adds a `try-catch` when loading the **_already-downloaded_** weights, attempting to provide more information to the `exit(1)` error in #322. The only change is TVMJS's commit apache/tvm@b193cbb from apache/tvm#16650

…ization process Co-authored-by: Charlie Ruan <53290280+CharlieFRuan@users.noreply.github.com>

This reverts commit 74dcddd.

Changes in WebLLM: - Stateful chat completion: #330 - OpenAI's `logit_bias`: #331 - OpenAI's `logprobs` and `top_logprobs`: #333 Changes in TVMjs: - apache/tvm#16650 - Fix param download issues (already reflected in 0.2.26, but at the time this PR was not merged yet) - Expose `sampleTopPFromProb` to support `logprobs` (new in 0.2.27)

…pache#16650) * Fix Parallel Download Issue by seperating the downloading with serialization process Co-authored-by: Charlie Ruan <53290280+CharlieFRuan@users.noreply.github.com> * Fix callback disply * [Web] Support IndexDB Caching * Limit max concurrent download to 4 shards * Try to catch error when loading model to ndarray cache --------- Co-authored-by: Charlie Ruan <53290280+CharlieFRuan@users.noreply.github.com>

CharlieFRuan approved these changes Feb 27, 2024

View reviewed changes

tqchen approved these changes Mar 1, 2024

View reviewed changes

DiegoCao marked this pull request as draft March 1, 2024 17:48

DiegoCao marked this pull request as ready for review March 1, 2024 17:48

DiegoCao force-pushed the indexdb2 branch from dbf8fd4 to 4edd3e2 Compare March 2, 2024 04:35

CharlieFRuan mentioned this pull request Mar 2, 2024

[Version] Bump version to 0.2.24 mlc-ai/web-llm#323

Merged

CharlieFRuan added a commit to mlc-ai/web-llm that referenced this pull request Mar 2, 2024

[Version] Bump version to 0.2.24 (#323)

0b15ee7

The new version include 2 changes: - Include cache deletion API via #314 - Fix model download/caching issue on TVMjs side via apache/tvm#16650

CharlieFRuan force-pushed the indexdb2 branch from 2f56ca2 to 7673bcc Compare March 10, 2024 14:02

CharlieFRuan mentioned this pull request Mar 10, 2024

[Version] Bump version to 0.2.25 mlc-ai/web-llm#328

Merged

CharlieFRuan mentioned this pull request Mar 12, 2024

[Version] Bump version to 0.2.26 mlc-ai/web-llm#329

Merged

DiegoCao and others added 5 commits March 11, 2024 21:23

Fix Parallel Download Issue by seperating the downloading with serial…

b4d055e

…ization process Co-authored-by: Charlie Ruan <53290280+CharlieFRuan@users.noreply.github.com>

Fix callback disply

0faa382

[Web] Support IndexDB Caching

74dcddd

Limit max concurrent download to 4 shards

ea133d0

Try to catch error when loading model to ndarray cache

adfd5ee

CharlieFRuan force-pushed the indexdb2 branch from b193cbb to adfd5ee Compare March 12, 2024 01:24

Lint

7e1825d

DiegoCao changed the title ~~[Web] Revert back to the non-parallel version to avoid cache.add() error~~ [Web] Seperate parallel shard download and iterative shard loading Mar 12, 2024

CharlieFRuan added 2 commits March 13, 2024 18:54

Revert "[Web] Support IndexDB Caching"

fe2b314

This reverts commit 74dcddd.

Expose sample top p from prob

7c1bba8

CharlieFRuan mentioned this pull request Mar 14, 2024

[OpenAI] Support logprobs and top_logprobs in ChatCompletion mlc-ai/web-llm#333

Merged

tqchen approved these changes Mar 14, 2024

View reviewed changes

tqchen merged commit 939b8b9 into apache:main Mar 14, 2024
15 checks passed

CharlieFRuan mentioned this pull request Mar 14, 2024

[Version] Bump version to 0.2.27 mlc-ai/web-llm#334

Merged

ysh329 mentioned this pull request Apr 21, 2024

[Release] v0.16.0 Release Candidate Notes #16911

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Web] Seperate parallel shard download and iterative shard loading #16650

[Web] Seperate parallel shard download and iterative shard loading #16650

DiegoCao commented Feb 27, 2024 •

edited

Loading

DavidGOrtega commented Feb 27, 2024

DavidGOrtega commented Feb 27, 2024

CharlieFRuan commented Mar 2, 2024 •

edited

Loading

tqchen commented Mar 12, 2024

[Web] Seperate parallel shard download and iterative shard loading #16650

[Web] Seperate parallel shard download and iterative shard loading #16650

Conversation

DiegoCao commented Feb 27, 2024 • edited Loading

DavidGOrtega commented Feb 27, 2024

DavidGOrtega commented Feb 27, 2024

CharlieFRuan commented Mar 2, 2024 • edited Loading

tqchen commented Mar 12, 2024

DiegoCao commented Feb 27, 2024 •

edited

Loading

CharlieFRuan commented Mar 2, 2024 •

edited

Loading