Skip to content

Repeated token generation for gpu-based gemma models on device #5414

Closed
@adityak6798

Description

@adityak6798

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

No

OS Platform and Distribution

Windows 11, Android 14

MediaPipe Tasks SDK version

No response

Task name (e.g. Image classification, Gesture recognition etc.)

LLMInference

Programming Language and version (e.g. C++, Python, Java)

Kotlin/Java

Describe the actual behavior

GPU Model generates garbage output (same token repeated indefinitely)

Describe the expected behaviour

A meaningful response from the model

Standalone code/steps you may have used to try to get what you need

I used the LLMInferenceGuide to get the code working for LLMInference for Gemma 1.1 2B (gemma-1.1-2b-it-gpu-int4.bin).
On using the CPU variant of the model, and prompting it to output only one integer value, I got what I expected (eg. 3)
When I used the GPU variant, I get garbage responses (nothing else changed in the codebase, just the model).

Other info / Complete Logs

Bad output examples:
1. <bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos> (repeated until max tokens)
(Expected Output: "3", achieved when using cpu variant of same model)
2. KíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKích disagre disagre disagre disagre disagre disagre disagre disagre disagre disagre disagre disagre (repeated until max tokens)
(Expected Output: "2", achieved when using cpu variant of same model)

Metadata

Metadata

Assignees

Labels

os:windowsMediaPipe issues on Windowsplatform:androidIssues with Android as Platformtask:LLM inferenceIssues related to MediaPipe LLM Inference Gen AI setuptype:supportGeneral questions

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions