[Question]: Very short responses when getting completions from llama-cpp-python #873
-
Contact DetailsNo response What is your question?I'm using LibreChat with the OpenAI endpoint, but instead of actual OpenAI, Our previous chat UI was able to display messages of various lengths. Is there a setting I could change in order to allow longer responses? More DetailsDeployed using docker compose from the git repo. Docker version 24.0.5, from Ubuntu repositories What is the main subject of your question?No response ScreenshotsNo response Code of Conduct
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 12 replies
-
Interesting, maybe max_tokens needs to be sent to the request, looks like the default for that project is 16 (which is incredibly low)? Add a line after api\app\clients\OpenAIClient.js on line 64 if (!this.modelOptions) {
this.modelOptions = {
...modelOptions,
model: modelOptions.model || 'gpt-3.5-turbo',
temperature:
typeof modelOptions.temperature === 'undefined' ? 0.8 : modelOptions.temperature,
top_p: typeof modelOptions.top_p === 'undefined' ? 1 : modelOptions.top_p,
presence_penalty:
typeof modelOptions.presence_penalty === 'undefined' ? 1 : modelOptions.presence_penalty,
stop: modelOptions.stop,
};
}
this.modelOptions.max_tokens = 2000; // new line |
Beta Was this translation helpful? Give feedback.
-
You can also try editing this line in llama-cpp-python |
Beta Was this translation helpful? Give feedback.
Interesting, maybe max_tokens needs to be sent to the request, looks like the default for that project is 16 (which is incredibly low)?
abetlen/llama-cpp-python#542
Add a line after api\app\clients\OpenAIClient.js on line 64
this.modelOptions.max_tokens = 2000;