issues Search Results · repo:AI-Hypercomputer/jetstream-pytorch language:Python
Filter by
14 results
(90 ms)14 results
inAI-Hypercomputer/jetstream-pytorch (press backspace or delete to remove)Currently sampling params such as temperature are set as commandline flags in when the server starts.
It would be nice for each request to pass in the sampling params instead.
qihqi
- 3
- Opened on Sep 24, 2024
- #185
qihqi
- Opened on Sep 11, 2024
- #183
Recently we added a new cli jpt (https://github.com/google/jetstream-pytorch/pull/178) that massively simplified the
command line args the user need to specify. However, there are other commandline args ...
qihqi
- Opened on Sep 10, 2024
- #182
As reported by @tengomucho
Currently there are a few issues with prefill / generate implemention:
1. Prefill does not use self._sample to do sampling.
2. Prefill returns a token, so first time generate ...
qihqi
- Opened on Aug 21, 2024
- #173
I m receiving an error when attempting to run:
ray job submit -- python run_ray_serve_interleave.py --tpu_chips=4 --num_hosts=1 --size=8B --model_name=llama-3 --batch_size=8 --max_cache_length=2048 --tokenizer_path=$tokenizer_path ...
ryanaoleary
- Opened on Aug 7, 2024
- #169
Prefill_ray() now returns a [result, first_token] tuple, where first_token contains a Jax array. This will cause a crash
when attempting to fetch the Ray results remotely:
job_id:06000000
:actor_name:ServeReplica:default:JetStreamDeployment ...
richardsliu
- 1
- Opened on Jul 16, 2024
- #150
Sending multiple prompts to the server, only the first prompt is able to return any results. Requests after the first
one would only return an empty response.
I ve tried 3 different ways to bring up the ...
richardsliu
- 2
- Opened on Jun 27, 2024
- #137
The checkpoint conversion script breaks for https://huggingface.co/meta-llama/Llama-2-7b, because it does not have
safetensor files. But when running the script, we set --from_hf=True since the checkpoint ...
vivianrwu
- Opened on Jun 24, 2024
- #135
I get this error
Loading checkpoint files from /home/yeandy/llama/llama-2-13b.
Loading checkpoints takes 9.128946957000153 seconds
Starting to merge weights.
Merging weights across 2 shards (shape = torch.Size([32000, ...
yeandy
- 2
- Opened on Jun 4, 2024
- #115
Right now, ray engine return interleave engine and a tuple separately. In the end, we would like to return a stable
Tuple list for both of them.
FanhaiLu1
- Opened on May 29, 2024
- #107

Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Restrict your search to the title by using the in:title qualifier.
Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Restrict your search to the title by using the in:title qualifier.