-
Notifications
You must be signed in to change notification settings - Fork 0
update document preprocessor #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces document preprocessing capabilities to enhance the enVector MCP server's insert utilities. The changes enable automatic preprocessing, chunking, and embedding of documents from file paths or text inputs, making document insertion more reliable and user-friendly.
Key Changes:
- Added
DocumentPreprocessingAdapterfor document loading, chunking, and preprocessing - Introduced two new MCP tools:
insert_documents_from_pathandinsert_documents_from_text - Migrated from
es2topyenvectorSDK throughout the codebase
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 11 comments.
| File | Description |
|---|---|
| srcs/adapter/document_preprocess.py | New adapter implementing document loading, language detection, and text chunking using LangChain |
| srcs/server.py | Added document preprocessor initialization and two new document insertion tools |
| srcs/adapter/envector_sdk.py | Updated import from es2 to pyenvector (ev) |
| srcs/adapter/init.py | Exported the new DocumentPreprocessingAdapter class |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Pull Request Template (Highly Recommended)
CheckList
요약작성 (Fill-inSummary)배경작성 (Fill-inBackground)Changes)test_server.py에 테스트 추가 (새 기능 추가시)pytest -q(내부 로직 변경 시)Summary of Pull Request
This PR includes Document Preprocessor for enhanced insert utilities.
To use this, new tool, named
insert_documentis added.This will ensure reliable document retrieval and answers qualities without requiring extra user effort.
Background of Pull Request
In general, users send documents in various formats and qualities, RAG requires automatic / reliable preprocessing.
In practical, we needed to use enVector SDK to insert in bulk.
What Pull Request Changes
New Files:
srcs/adapter/document_preprocess.pyUpdated Files:
srcs/server.pysrcs/adapter/__init__.pyThanks for your contribution!