-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Description
Request body
{
"prompt": "\n\n### Instructions:\nwrite a story\n\n### Response:\n",
"stop": [
"\n",
"###"
]
}
Response Body
{ "id": "cmpl-f729eb01-6691-47c3-99f0-f30d6ab62f25",
"object": "text_completion", "created": 1699867559,
"model": "llama-2-13b-chat.Q4_0.gguf",
"choices": [
{ "text": "Once upon a time, in a far-off land, there was a mag",
"index": 0,
"logprobs": null,
"finish_reason": "length"
} ],
"usage":
{ "prompt_tokens": 20,
"completion_tokens": 16,
"total_tokens": 36
} }
In this, we can see only 16 completion_tokens generates all the time. How to solve this problem.
Terminal Input
python3 -m llama_cpp.server --model $MODEL --n_gpu_layers 1 --host 0.0.0.0 --port 8000 --n_ctx=2048 --n_ctx=2048
Metadata
Metadata
Assignees
Labels
No labels