Skip to content

Conversation

@euphoria0-0
Copy link
Collaborator

@euphoria0-0 euphoria0-0 commented Dec 2, 2025

Pull Request Template (Highly Recommended)

CheckList

  • 요약 작성 (Fill-in Summary)
  • 배경 작성 (Fill-in Background)
  • 추가/변경 내용 기록 (Record Changes)
  • test_server.py에 테스트 추가 (새 기능 추가시)
  • Run pytest -q (내부 로직 변경 시)
  • Pre-commit

Summary of Pull Request

This PR includes Document Preprocessor for enhanced insert utilities.
To use this, new tool, named insert_document is added.
This will ensure reliable document retrieval and answers qualities without requiring extra user effort.

Background of Pull Request

In general, users send documents in various formats and qualities, RAG requires automatic / reliable preprocessing.
In practical, we needed to use enVector SDK to insert in bulk.

What Pull Request Changes

New Files:

srcs/adapter/document_preprocess.py

Updated Files:

srcs/server.py
srcs/adapter/__init__.py

Thanks for your contribution!

Base automatically changed from feat/add-embeddings to main December 4, 2025 01:28
@euphoria0-0 euphoria0-0 marked this pull request as ready for review December 7, 2025 12:04
@euphoria0-0 euphoria0-0 self-assigned this Dec 7, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces document preprocessing capabilities to enhance the enVector MCP server's insert utilities. The changes enable automatic preprocessing, chunking, and embedding of documents from file paths or text inputs, making document insertion more reliable and user-friendly.

Key Changes:

  • Added DocumentPreprocessingAdapter for document loading, chunking, and preprocessing
  • Introduced two new MCP tools: insert_documents_from_path and insert_documents_from_text
  • Migrated from es2 to pyenvector SDK throughout the codebase

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 11 comments.

File Description
srcs/adapter/document_preprocess.py New adapter implementing document loading, language detection, and text chunking using LangChain
srcs/server.py Added document preprocessor initialization and two new document insertion tools
srcs/adapter/envector_sdk.py Updated import from es2 to pyenvector (ev)
srcs/adapter/init.py Exported the new DocumentPreprocessingAdapter class

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jyjoo-cryptolab jyjoo-cryptolab merged commit eca1957 into main Dec 9, 2025
@jyjoo-cryptolab jyjoo-cryptolab deleted the feat/add-insert-tool branch December 9, 2025 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants