Skip to content

v1.1.3.0

Compare
Choose a tag to compare
@c0sogi c0sogi released this 01 Jun 03:08
· 35 commits to master since this release
  1. The way chat message list is loaded from Redis has been changed from eager load to lazy load. It now loads all of the user's chat profiles first, and then loads the messages when they enter the chat. This dramatically reduces the initial loading time if you already have a large list of messages.

  2. You can set User role, AI role, and System role for each LLM. For OpenAI's ChatGPT, user, assistant, and system are used by default. For other LLaMa models, you can set other types of roles, which can help the LLM recognize the conversation role.

  3. Auto summarization is now applied. By default, when you type or receive a long message of 512 tokens or more, the Summarization background task for that message will run and when it finishes, it will be quietly saved to the message list. The summarized content is invisible to the user, but when sending messages to the LLM, the summarized message is passed along, which can be a huge savings in token usage (and cost).

  4. To overcome the performance limitations of Redis vectorstore (single-threaded) and replace the inaccurate KNN similarity search with cosine similarity search, we introduced Qdrant vectorstore. It enables fast asynchronous vector queries in microseconds via gRPC, a low-level API.