Issue search results

Filter by

14 results

(90 ms)inAI-Hypercomputer/jetstream-pytorch (press backspace or delete to remove)

AI-Hypercomputer/jetstream-pytorch
[Feature Request] Per request sampling params

Currently sampling params such as temperature are set as commandline flags in when the server starts. It would be nice for each request to pass in the sampling params instead.

qihqi

Opened
on Sep 24, 2024

#185

AI-Hypercomputer/jetstream-pytorch
Make sure the server does not crash if the input is too long

qihqi

Opened
on Sep 11, 2024

#183

AI-Hypercomputer/jetstream-pytorch
[RFC] Formalizing commandline arguments.

Recently we added a new cli jpt (https://github.com/google/jetstream-pytorch/pull/178) that massively simplified the command line args the user need to specify. However, there are other commandline args ...

qihqi

Opened
on Sep 10, 2024

#182

AI-Hypercomputer/jetstream-pytorch
Issues with prefill & generate

As reported by @tengomucho Currently there are a few issues with prefill / generate implemention: 1. Prefill does not use self._sample to do sampling. 2. Prefill returns a token, so first time generate ...

qihqi

Opened
on Aug 21, 2024

#173

AI-Hypercomputer/jetstream-pytorch
Error Running `run_ray_serve_interleave` with Llama3 8B

I m receiving an error when attempting to run: ray job submit -- python run_ray_serve_interleave.py --tpu_chips=4 --num_hosts=1 --size=8B --model_name=llama-3 --batch_size=8 --max_cache_length=2048 --tokenizer_path=$tokenizer_path ...

ryanaoleary

Opened
on Aug 7, 2024

#169

AI-Hypercomputer/jetstream-pytorch
Ray engine crashes on multihost when fetching Jax.array from prefill_ray

Prefill_ray() now returns a [result, first_token] tuple, where first_token contains a Jax array. This will cause a crash when attempting to fetch the Ray results remotely: job_id:06000000 :actor_name:ServeReplica:default:JetStreamDeployment ...

richardsliu

Opened
on Jul 16, 2024

#150

AI-Hypercomputer/jetstream-pytorch
Empty response returned for prompt responses when using run_server_with_ray.py and batch_size > 1

Sending multiple prompts to the server, only the first prompt is able to return any results. Requests after the first one would only return an empty response. I ve tried 3 different ways to bring up the ...

richardsliu

Opened
on Jun 27, 2024

#137

AI-Hypercomputer/jetstream-pytorch
Checkpoint conversion script breaks for meta-llama/llama-2-7b on HF

The checkpoint conversion script breaks for https://huggingface.co/meta-llama/Llama-2-7b, because it does not have safetensor files. But when running the script, we set --from_hf=True since the checkpoint ...

vivianrwu

Opened
on Jun 24, 2024

#135

AI-Hypercomputer/jetstream-pytorch
Bug in model conversion script

I get this error Loading checkpoint files from /home/yeandy/llama/llama-2-13b. Loading checkpoints takes 9.128946957000153 seconds Starting to merge weights. Merging weights across 2 shards (shape = torch.Size([32000, ...

yeandy

Opened
on Jun 4, 2024

#115

AI-Hypercomputer/jetstream-pytorch
Return Tuple(interleaveEngList, prefillEngineList, decodeEngineList) in create ray engine

Right now, ray engine return interleave engine and a tuple separately. In the end, we would like to return a stable Tuple list for both of them.

FanhaiLu1

Opened
on May 29, 2024

#107

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues

ProTip!

Restrict your search to the title by using the in:title qualifier.

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues

ProTip!

Restrict your search to the title by using the in:title qualifier.

Languages

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter by

State

Advanced

AI-Hypercomputer/jetstream-pytorch
[Feature Request] Per request sampling params

AI-Hypercomputer/jetstream-pytorch
Make sure the server does not crash if the input is too long

AI-Hypercomputer/jetstream-pytorch
[RFC] Formalizing commandline arguments.

AI-Hypercomputer/jetstream-pytorch
Issues with prefill & generate

AI-Hypercomputer/jetstream-pytorch
Error Running `run_ray_serve_interleave` with Llama3 8B

AI-Hypercomputer/jetstream-pytorch
Ray engine crashes on multihost when fetching Jax.array from prefill_ray

AI-Hypercomputer/jetstream-pytorch
Empty response returned for prompt responses when using run_server_with_ray.py and batch_size > 1

AI-Hypercomputer/jetstream-pytorch
Checkpoint conversion script breaks for meta-llama/llama-2-7b on HF

AI-Hypercomputer/jetstream-pytorch
Bug in model conversion script

AI-Hypercomputer/jetstream-pytorch
Return Tuple(interleaveEngList, prefillEngineList, decodeEngineList) in create ray engine

Learn how you can use GitHub Issues to plan and track your work.

Learn how you can use GitHub Issues to plan and track your work.

issues Search Results · repo:AI-Hypercomputer/jetstream-pytorch language:Python

Filter by

State

Advanced

14 results

Learn how you can use GitHub Issues to plan and track your work.

Learn how you can use GitHub Issues to plan and track your work.