-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Web] Seperate parallel shard download and iterative shard loading #16650
Conversation
@DiegoCao Its not the real solution. Definitely HF CDN is not great. It should allow to do parallel requests without mayor problems. Real Solution:
|
I can give it a shot tomorrow |
@DavidGOrtega Thanks for offering help! We found out that the issue was probably not due to parallel downloading the shards but due to parallelly processing the shards. We managed to keep the parallel downloads. If download issues persist, we'll add parallel download in batch as you suggested. |
The new version include 2 changes: - Include cache deletion API via #314 - Fix model download/caching issue on TVMjs side via apache/tvm#16650
need rebase |
Another minor follow-up to version 0.2.24 (or hence to 0.2.25). This PR adds a `try-catch` when loading the **_already-downloaded_** weights, attempting to provide more information to the `exit(1)` error in #322. The only change is TVMJS's commit apache/tvm@b193cbb from apache/tvm#16650
…ization process Co-authored-by: Charlie Ruan <53290280+CharlieFRuan@users.noreply.github.com>
This reverts commit 74dcddd.
Changes in WebLLM: - Stateful chat completion: #330 - OpenAI's `logit_bias`: #331 - OpenAI's `logprobs` and `top_logprobs`: #333 Changes in TVMjs: - apache/tvm#16650 - Fix param download issues (already reflected in 0.2.26, but at the time this PR was not merged yet) - Expose `sampleTopPFromProb` to support `logprobs` (new in 0.2.27)
…pache#16650) * Fix Parallel Download Issue by seperating the downloading with serialization process Co-authored-by: Charlie Ruan <53290280+CharlieFRuan@users.noreply.github.com> * Fix callback disply * [Web] Support IndexDB Caching * Limit max concurrent download to 4 shards * Try to catch error when loading model to ndarray cache --------- Co-authored-by: Charlie Ruan <53290280+CharlieFRuan@users.noreply.github.com>
This PR address the issue in mlc-ai/web-llm#313. We make the following changes:
try-catch
when loading shards onto ndarraycacheSeparately, we add and export IndexedDB initial implementation.