-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Hello Onyx Support Team,
I’m currently using Onyx in a self-hosted setup with successfully indexed BookStack as a connector and GPT-3.5 Turbo via the official OpenAI integration. I’ve encountered the following issues:
🔹 1. LLM responses are extremely short and incomplete
Despite having detailed instructions in BookStack, responses from the AI are overly brief and often forget important parts of the documentation. The assistant fails to summarize full sections, even when clearly related. If you ask him for this part, he gives a right answer.
The AI consistently identifies and pulls the correct BookStack article, which confirms that retrieval is working properly. The articles themselves are well-written, with clear headings, structure, and paragraph breaks. However, the assistant typically only returns a small portion of the process (e.g., 1–2 steps) instead of a complete overview or step-by-step answer.
🔹 2. Very high token usage for simple queries
Basic questions have resulted in unexpectedly high token usage – in one light chat, over 42,000 tokens were used for just a few prompts. I suspect the system is retrieving too many chunks or repeating requests unnecessarily.
🔹 3. Search must be forced manually every time
Whenever I open a new chat and ask a question, the system responds with:
"The AI decided this query didn't need a search."
This happens even when the query clearly should trigger a BookStack index search. I have to manually click “Force Search” every time to get relevant results, which breaks the workflow.
I would appreciate any help in resolving these issues – especially in ensuring:
The assistant returns more complete answers
The system limits token usage more effectively
Searches against the index are triggered reliably by default.
I´m really looking forward to get this system running.
Thank you so much
Patrick