-
Notifications
You must be signed in to change notification settings - Fork 1.3k
explicit cast messages to string for RAG purposes #1080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Hey @nivibilla can you provide a log of the messages that are sent that are causing this issue? The type hints should be correct there so any need to cast to str is likely another bug (probably in my code but just curious where it originates). |
|
In my case: |
|
I have the same problem, I used followed command start serve python3 -m llama_cpp.server --model /app/vlm_weights/MiniCPM-V-2_6-gguf/ggml-model-Q2_K.gguf --n_gpu_layers -1then I used example clinet code with from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1/", api_key="llama.cpp")
response = client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://user-images.githubusercontent.com/1991296/230134379-7181e485-c521-4d23-a0d6-f7b3b61ba524.png",
},
},
{
"type": "text",
"text": "What does the image say. Format your response as a json object with a single 'text' key.",
},
],
}
],
response_format={
"type": "json_object",
"schema": {"type": "object", "properties": {"text": {"type": "string"}}},
},
)
import json
print(json.loads(response.choices[0].message.content))In my serve terminal I have INFO: Started server process [526705]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
Exception: can only concatenate str (not "list") to str
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/llama_cpp/server/errors.py", line 171, in custom_route_handler
response = await original_route_handler(request)
File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 301, in app
raw_response = await run_endpoint_function(
File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 212, in run_endpoint_function
return await dependant.call(**values)
File "/usr/local/lib/python3.10/dist-packages/llama_cpp/server/app.py", line 513, in create_chat_completion
] = await run_in_threadpool(llama.create_chat_completion, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/starlette/concurrency.py", line 39, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py", line 1898, in create_chat_completion
return handler(
File "/usr/local/lib/python3.10/dist-packages/llama_cpp/llama_chat_format.py", line 564, in chat_completion_handler
result = chat_formatter(
File "/usr/local/lib/python3.10/dist-packages/llama_cpp/llama_chat_format.py", line 229, in __call__
prompt = self._environment.render(
File "/usr/local/lib/python3.10/dist-packages/jinja2/environment.py", line 1304, in render
self.environment.handle_exception()
File "/usr/local/lib/python3.10/dist-packages/jinja2/environment.py", line 939, in handle_exception
raise rewrite_traceback_stack(source=source)
File "<template>", line 4, in top-level template code
TypeError: can only concatenate str (not "list") to str |
|
Setting the llm = Llama(
# ...
chat_format="chatml",
# ...
) |
Hi,
Ive been testing the openai-like-server with Ollama-webui and when using the rag pipeline, the code returns a list. Not a string and so I get this error from the llamacpp-python-server
Simply force casting messages to string before appending the chat format fixes this issue. If this is not the right way pls let me know what I can do to solve this issue.
Thanks