Model generates only 16 Tokens always

**Request body**
{
  "prompt": "\n\n### Instructions:\nwrite a story\n\n### Response:\n",
  "stop": [
    "\n",
    "###"
  ]
}

**Response Body**
{  "id": "cmpl-f729eb01-6691-47c3-99f0-f30d6ab62f25",
  "object": "text_completion",  "created": 1699867559,
 "model": "llama-2-13b-chat.Q4_0.gguf", 
 "choices": [     
{  "text": "Once upon a time, in a far-off land, there was a mag",  
"index": 0,  
"logprobs": null, 
 "finish_reason": "length"     
}   ], 

 "usage": 
{  "prompt_tokens": 20,  
"completion_tokens": 16, 
 "total_tokens": 36  
 } }

In this, we can see only 16 completion_tokens generates all the time. How to solve this problem.

Terminal Input 
python3 -m llama_cpp.server --model $MODEL  --n_gpu_layers 1 --host [0.0.0.0](https://0.0.0.0/) --port 8000 --n_ctx=2048 --n_ctx=2048

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model generates only 16 Tokens always #903

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Model generates only 16 Tokens always #903

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions