Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System role not supported Gemma 2 #1580

Open
2 tasks
rzafiamy opened this issue Jul 7, 2024 · 2 comments
Open
2 tasks

System role not supported Gemma 2 #1580

rzafiamy opened this issue Jul 7, 2024 · 2 comments

Comments

@rzafiamy
Copy link

rzafiamy commented Jul 7, 2024

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • [ X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [X ] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

I tried Gemma2 GGUF from hugging face. It is working fine with llama.cpp CLI on my PC.
However , I got issues while runing through llama-cpp-python package as follow :

self.llama = Llama(model_path=model_path, n_gpu_layers=n_gpu_layers, n_ctx=n_ctx, echo=echo, n_batch=n_batch,flash_attn=True)

messages = [
{
"role": "system",
"content": "You are a helpful assistant who perfectly reply to user request on the topic of Mistrious Dragon"
},
{
"role": "user",
"content": "Write a beautiful stroy for kids"
}
]

output = self.llama.create_chat_completion(messages,
temperature=temperature,
max_tokens=max_tokens,
top_p=1,
frequency_penalty=1.5,
presence_penalty=1.5)

Current Behavior

I got the following error:
......................python3.11/site-packages/llama_cpp/llama_chat_format.py", line 213, in raise_exception
raise ValueError(message)
ValueError: System role not supported

Environment and Context

I use python3.11 on ubuntu linux with CUDA enabled . > 24GB GPU RAM

@rzafiamy
Copy link
Author

rzafiamy commented Jul 7, 2024

Solve it by adding chat_format:
self.llama = Llama(model_path=model_path,
n_gpu_layers=n_gpu_layers,
n_ctx=n_ctx,
echo=echo,
n_batch=n_batch,
flash_attn=True,
chat_format="gemma")

@rzafiamy rzafiamy closed this as completed Jul 7, 2024
@rzafiamy rzafiamy reopened this Jul 8, 2024
@rzafiamy
Copy link
Author

rzafiamy commented Jul 8, 2024

I think it should be reworked finally. Stop token are not well managed and system role is not ok at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant