Skip to content

chore: some optimization#178

Merged
iziang merged 0 commit intomainfrom
feature/bge-embedding
Sep 5, 2023
Merged

chore: some optimization#178
iziang merged 0 commit intomainfrom
feature/bge-embedding

Conversation

@iziang
Copy link
Contributor

@iziang iziang commented Aug 24, 2023

  1. support BGE and Bert embedding
  2. Optimize the source module to make it more independent and not rely on kubechat, making it easier to be called externally.
  3. Optimize the document QA module by separating the embedding and LLM query processes from the WebSocket consumer, making these two processes more independent and facilitating external calling and testing.
  4. Optimize the synchronization module by unifying the initialization of collection creation into the synchronization process, making the code more concise.
  5. Support comparing the performance of different embedding models on the same dataset.
  6. Add a unique constraint for document names within one collection.
  7. Add the fields "peer_type" and "peer_id" to the chat table. Create separate chat records for Feishu one-on-one chats and group chats, enabling storage of Feishu chat records on the KubeChat side.
  8. Support exporting Feishu documents to PDF or Word format.
  9. Ignore the Feishu document page that contains the string "子页面目录" which is a category and has nothing helpful information
  10. Ignore the small document which has only one line and the total size is less than 30.

@apecloud-bot apecloud-bot added the size/L Denotes a PR that changes 100-499 lines. label Aug 24, 2023
@iziang iziang changed the title chore: support bge embedding chore: some optimization Aug 24, 2023
@apecloud-bot apecloud-bot added size/XL Denotes a PR that changes 500-999 lines. size/XXL Denotes a PR that changes 1000+ lines. and removed size/L Denotes a PR that changes 100-499 lines. size/XL Denotes a PR that changes 500-999 lines. labels Aug 24, 2023
@iziang
Copy link
Contributor Author

iziang commented Aug 29, 2023

Ignore the Feishu document page that contains the string "子页面目录" which is a category and has nothing helpful information

# 5.2.1 Python SDK
# 子页面目录


@iziang
Copy link
Contributor Author

iziang commented Aug 29, 2023

Ignore the small document which has only one line and the total size is less than 30.

# 5.3.1 TODO

@iziang iziang merged this pull request into main Sep 5, 2023
@iziang iziang deleted the feature/bge-embedding branch September 5, 2023 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XXL Denotes a PR that changes 1000+ lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants