Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Web] Seperate parallel shard download and iterative shard loading #16650

Merged
merged 8 commits into from
Mar 14, 2024

Conversation

DiegoCao
Copy link
Contributor

@DiegoCao DiegoCao commented Feb 27, 2024

This PR address the issue in mlc-ai/web-llm#313. We make the following changes:

  • Seperate downloading shards and loading shards to ndarraycache, where the former is done with parallel downloads, and the latter is purely sequential
  • We limit the maximum concurrent downloads to 4 by launching 4 parallel for loops
  • We add try-catch when loading shards onto ndarraycache

Separately, we add and export IndexedDB initial implementation.

@DavidGOrtega
Copy link
Contributor

@DiegoCao Its not the real solution.
As I stated the cache can fail even if its just one by one. I have suffered that also.

Definitely HF CDN is not great. It should allow to do parallel requests without mayor problems.

Real Solution:

  1. Implement retry mechanism (like 3 retries or something)
  2. Parallellse n files (5 by default) instead of all of them

@DavidGOrtega
Copy link
Contributor

I can give it a shot tomorrow

@DiegoCao DiegoCao marked this pull request as draft March 1, 2024 17:48
@DiegoCao DiegoCao marked this pull request as ready for review March 1, 2024 17:48
@CharlieFRuan
Copy link
Contributor

CharlieFRuan commented Mar 2, 2024

@DavidGOrtega Thanks for offering help! We found out that the issue was probably not due to parallel downloading the shards but due to parallelly processing the shards. We managed to keep the parallel downloads. If download issues persist, we'll add parallel download in batch as you suggested.

CharlieFRuan added a commit to mlc-ai/web-llm that referenced this pull request Mar 2, 2024
The new version include 2 changes:
- Include cache deletion API via
#314
- Fix model download/caching issue on TVMjs side via
apache/tvm#16650
@tqchen
Copy link
Member

tqchen commented Mar 12, 2024

need rebase

CharlieFRuan added a commit to mlc-ai/web-llm that referenced this pull request Mar 12, 2024
Another minor follow-up to version 0.2.24 (or hence to 0.2.25). This PR
adds a `try-catch` when loading the **_already-downloaded_** weights,
attempting to provide more information to the `exit(1)` error in
#322.

The only change is TVMJS's commit
apache/tvm@b193cbb
from apache/tvm#16650
@DiegoCao DiegoCao changed the title [Web] Revert back to the non-parallel version to avoid cache.add() error [Web] Seperate parallel shard download and iterative shard loading Mar 12, 2024
@tqchen tqchen merged commit 939b8b9 into apache:main Mar 14, 2024
15 checks passed
CharlieFRuan added a commit to mlc-ai/web-llm that referenced this pull request Mar 14, 2024
Changes in WebLLM:
- Stateful chat completion: #330
- OpenAI's `logit_bias`: #331
- OpenAI's `logprobs` and `top_logprobs`:
#333

Changes in TVMjs:
- apache/tvm#16650
- Fix param download issues (already reflected in 0.2.26, but at the
time this PR was not merged yet)
  - Expose `sampleTopPFromProb` to support `logprobs` (new in 0.2.27)
thaisacs pushed a commit to thaisacs/tvm that referenced this pull request Apr 3, 2024
…pache#16650)

* Fix Parallel Download Issue by seperating the downloading with serialization process

Co-authored-by: Charlie Ruan <53290280+CharlieFRuan@users.noreply.github.com>

* Fix callback disply

* [Web] Support IndexDB Caching

* Limit max concurrent download to 4 shards

* Try to catch error when loading model to ndarray cache


---------

Co-authored-by: Charlie Ruan <53290280+CharlieFRuan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants