Skip to content

Model generates only 16 Tokens always #903

@shrijayan

Description

@shrijayan

Request body
{
"prompt": "\n\n### Instructions:\nwrite a story\n\n### Response:\n",
"stop": [
"\n",
"###"
]
}

Response Body
{ "id": "cmpl-f729eb01-6691-47c3-99f0-f30d6ab62f25",
"object": "text_completion", "created": 1699867559,
"model": "llama-2-13b-chat.Q4_0.gguf",
"choices": [
{ "text": "Once upon a time, in a far-off land, there was a mag",
"index": 0,
"logprobs": null,
"finish_reason": "length"
} ],

"usage":
{ "prompt_tokens": 20,
"completion_tokens": 16,
"total_tokens": 36
} }

In this, we can see only 16 completion_tokens generates all the time. How to solve this problem.

Terminal Input
python3 -m llama_cpp.server --model $MODEL --n_gpu_layers 1 --host 0.0.0.0 --port 8000 --n_ctx=2048 --n_ctx=2048

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions