Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated token generation for gpu-based gemma models on device #5414

Open
adityak6798 opened this issue May 16, 2024 · 4 comments
Open

Repeated token generation for gpu-based gemma models on device #5414

adityak6798 opened this issue May 16, 2024 · 4 comments
Assignees
Labels
os:windows MediaPipe issues on Windows platform:android Issues with Android as Platform stat:awaiting response Waiting for user response task:LLM inference Issues related to MediaPipe LLM Inference Gen AI setup type:support General questions

Comments

@adityak6798
Copy link

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

No

OS Platform and Distribution

Windows 11, Android 14

MediaPipe Tasks SDK version

No response

Task name (e.g. Image classification, Gesture recognition etc.)

LLMInference

Programming Language and version (e.g. C++, Python, Java)

Kotlin/Java

Describe the actual behavior

GPU Model generates garbage output (same token repeated indefinitely)

Describe the expected behaviour

A meaningful response from the model

Standalone code/steps you may have used to try to get what you need

I used the LLMInferenceGuide to get the code working for LLMInference for Gemma 1.1 2B (gemma-1.1-2b-it-gpu-int4.bin).
On using the CPU variant of the model, and prompting it to output only one integer value, I got what I expected (eg. 3)
When I used the GPU variant, I get garbage responses (nothing else changed in the codebase, just the model).

Other info / Complete Logs

Bad output examples:
1. <bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos> (repeated until max tokens)
(Expected Output: "3", achieved when using cpu variant of same model)
2. KíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKíchKích disagre disagre disagre disagre disagre disagre disagre disagre disagre disagre disagre disagre (repeated until max tokens)
(Expected Output: "2", achieved when using cpu variant of same model)
@adityak6798
Copy link
Author

I tried downloading the same gemma model (non-quantized) from huggingface using the given scripts in "Model Conversion Colab", (model: Gemma2B for GPU). I face the same kind of error: garbage output.
Output sample: "VentajasVentajasVentajasVentajasVentajasVentajasVentajasVentajasVentajasVentajasVentajasVentajasVentajasVentajasVentajasVentajasVentajasVentajasVentajas"

@kuaashish kuaashish self-assigned this May 20, 2024
@kuaashish kuaashish added os:windows MediaPipe issues on Windows task:LLM inference Issues related to MediaPipe LLM Inference Gen AI setup platform:android Issues with Android as Platform type:support General questions labels May 20, 2024
@kuaashish
Copy link
Collaborator

kuaashish commented May 21, 2024

Hi @adityak6798,

Could you please provide the complete steps you are following from our documentation? Additionally, please specify whether you are using an emulator or a real device, along with the device's complete configuration and MediaPipe version is in use. This will help us understand the issue better and reproduce it accurately.

Thank you!!

@kuaashish kuaashish added the stat:awaiting response Waiting for user response label May 21, 2024
@adityak6798
Copy link
Author

Using a real device: Samsung Galaxy S23 Ultra

Steps followed for Kaggle model: (from here)

  1. Added dependency (com.google.mediapipe:tasks-genai:0.10.14)
  2. Downloaded Gemma model from Kaggle - gemma-2b-it-gpu-int4
  3. Created the task (LLMInference object) with the right arguments
  4. Ran inference as specified.

Steps followed for non-Kaggle model:

  1. Added dependency as earlier
  2. Downloaded Gemma-2b using the model conversion colab (GPU Variant)
  3. Created the task as earlier (model path updated as needed)
  4. Ran inference as earlier

Device config: Android 14
Mediapipe version: com.google.mediapipe:tasks-genai:0.10.14

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Waiting for user response label May 28, 2024
@kuaashish
Copy link
Collaborator

Hi @adityak6798,

Could you please try the example available in this repository and let us know if you continue to experience the same behavior.

Thank you!!

@kuaashish kuaashish added the stat:awaiting response Waiting for user response label May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
os:windows MediaPipe issues on Windows platform:android Issues with Android as Platform stat:awaiting response Waiting for user response task:LLM inference Issues related to MediaPipe LLM Inference Gen AI setup type:support General questions
Projects
None yet
Development

No branches or pull requests

2 participants