Closed
Description
Have I written custom code (as opposed to using a stock example script provided in MediaPipe)
No
OS Platform and Distribution
Windows 11, Android 14
MediaPipe Tasks SDK version
No response
Task name (e.g. Image classification, Gesture recognition etc.)
LLMInference
Programming Language and version (e.g. C++, Python, Java)
Kotlin/Java
Describe the actual behavior
GPU Model generates garbage output (same token repeated indefinitely)
Describe the expected behaviour
A meaningful response from the model
Standalone code/steps you may have used to try to get what you need
I used the LLMInferenceGuide to get the code working for LLMInference for Gemma 1.1 2B (gemma-1.1-2b-it-gpu-int4.bin).
On using the CPU variant of the model, and prompting it to output only one integer value, I got what I expected (eg. 3)
When I used the GPU variant, I get garbage responses (nothing else changed in the codebase, just the model).
Other info / Complete Logs
Bad output examples:
1. <bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos> (repeated until max tokens)
(Expected Output: "3", achieved when using cpu variant of same model)
2. KíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKích disagre disagre disagre disagre disagre disagre disagre disagre disagre disagre disagre disagre (repeated until max tokens)
(Expected Output: "2", achieved when using cpu variant of same model)