-
-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Outlines json guided decoding #353
Comments
it's a privilege issue, fixed running chmod 777 -R * inside /app/aphrodite-engine/.cache |
if useful for someone with the same issue, workarround:
|
please 🙏, it's a simple bug but adds a lot of value, it practically kill all the previous effort of json guided decoding @AlpinDale 🙏 and in every production release we have to enter inside the container to fix it =( |
Hi sorry, I totally missed this issue! Can you run the docker in privileged mode? |
thats a good idea, I cannot run permanently in that mode, but I can handle that for a while, thanks! |
We already resolved a similar issue related to triton - it should be fixed in the latest docker. Have you tried it? |
i'm trying the last official image, still has the problem, and now I got another problem with moes with the last version, have create the other issue. (sorry for reporting so many bugs, I use a lot your engine) |
The problem indeed still exists. One solution to this is to mount a host folder with |
EDIT: I found the problem, it is described in the last comment
Your current environment
🐛 Describe the bug
When trying to generate guided output (with a pydantic json schema) its throws an exception of "/app/aphrodite-engine/.cache/" not found and for some reason the engine has not privileges to create that directory. I entered to the container and created the directory by my self. Tryed again and now got an strange exception of SQLite3
Also get this message: Token indices sequence length is longer than the specified maximum sequence length for this model (2023 > 1024) but i have configured the environment to 12k length
engine run parameters:
+ exec python3 -m aphrodite.endpoints.openai.api_server --host 0.0.0.0 --port 7860 --download-dir /data/hub --model LoneStriker/Smaug-34B-v0.1-GPTQ --dtype float16 --kv-cache-dtype fp8_e5m2 --max-model-len 12000 --tensor-parallel-size 2 --gpu-memory-utilization .97 --enforce-eager --disable-log-stats --api-keys 123 --block-size 8 --max-paddings 512 --port 3000 --swap-space 10 --chat-template /home/workspace/chat_templates/gorilla_v2__fc.jinja --served-model-name dolf --max-context-len-to-capture 512 --max-num-batched-tokens 32000 --max-num-seqs 62 --quantization gptq
used json schema:
{"description": "Useful to return the text summarization task", "properties": {"summary": {"description": "The resulting summary of the provided text.", "title": "Summary", "type": "string"}}, "required": ["summary"], "title": "result", "type": "object"}
using the oficial open ai client with:
The text was updated successfully, but these errors were encountered: