Repeated token generation for gpu-based gemma models on device

### Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

No

### OS Platform and Distribution

Windows 11, Android 14

### MediaPipe Tasks SDK version

_No response_

### Task name (e.g. Image classification, Gesture recognition etc.)

LLMInference

### Programming Language and version (e.g. C++, Python, Java)

Kotlin/Java

### Describe the actual behavior

GPU Model generates garbage output (same token repeated indefinitely)

### Describe the expected behaviour

A meaningful response from the model

### Standalone code/steps you may have used to try to get what you need

```shell
I used the LLMInferenceGuide to get the code working for LLMInference for Gemma 1.1 2B (gemma-1.1-2b-it-gpu-int4.bin).
On using the CPU variant of the model, and prompting it to output only one integer value, I got what I expected (eg. 3)
When I used the GPU variant, I get garbage responses (nothing else changed in the codebase, just the model).
```


### Other info / Complete Logs

```shell
Bad output examples:
1. <bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos> (repeated until max tokens)
(Expected Output: "3", achieved when using cpu variant of same model)
2. KíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKích disagre disagre disagre disagre disagre disagre disagre disagre disagre disagre disagre disagre (repeated until max tokens)
(Expected Output: "2", achieved when using cpu variant of same model)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repeated token generation for gpu-based gemma models on device #5414

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

OS Platform and Distribution

MediaPipe Tasks SDK version

Task name (e.g. Image classification, Gesture recognition etc.)

Programming Language and version (e.g. C++, Python, Java)

Describe the actual behavior

Describe the expected behaviour

Standalone code/steps you may have used to try to get what you need

Other info / Complete Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Repeated token generation for gpu-based gemma models on device #5414

Description

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

OS Platform and Distribution

MediaPipe Tasks SDK version

Task name (e.g. Image classification, Gesture recognition etc.)

Programming Language and version (e.g. C++, Python, Java)

Describe the actual behavior

Describe the expected behaviour

Standalone code/steps you may have used to try to get what you need

Other info / Complete Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions