update document preprocessor #9

euphoria0-0 · 2025-12-02T14:56:42Z

Pull Request Template (Highly Recommended)

CheckList

요약 작성 (Fill-in Summary)
배경 작성 (Fill-in Background)
추가/변경 내용 기록 (Record Changes)
test_server.py에 테스트 추가 (새 기능 추가시)
Run pytest -q (내부 로직 변경 시)
Pre-commit

Summary of Pull Request

This PR includes Document Preprocessor for enhanced insert utilities.
To use this, new tool, named insert_document is added.
This will ensure reliable document retrieval and answers qualities without requiring extra user effort.

Background of Pull Request

In general, users send documents in various formats and qualities, RAG requires automatic / reliable preprocessing.
In practical, we needed to use enVector SDK to insert in bulk.

What Pull Request Changes

New Files:

srcs/adapter/document_preprocess.py

Updated Files:

srcs/server.py
srcs/adapter/__init__.py

Thanks for your contribution!

Copilot

Pull request overview

This PR introduces document preprocessing capabilities to enhance the enVector MCP server's insert utilities. The changes enable automatic preprocessing, chunking, and embedding of documents from file paths or text inputs, making document insertion more reliable and user-friendly.

Key Changes:

Added DocumentPreprocessingAdapter for document loading, chunking, and preprocessing
Introduced two new MCP tools: insert_documents_from_path and insert_documents_from_text
Migrated from es2 to pyenvector SDK throughout the codebase

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 11 comments.

File	Description
srcs/adapter/document_preprocess.py	New adapter implementing document loading, language detection, and text chunking using LangChain
srcs/server.py	Added document preprocessor initialization and two new document insertion tools
srcs/adapter/envector_sdk.py	Updated import from `es2` to `pyenvector` (ev)
srcs/adapter/init.py	Exported the new `DocumentPreprocessingAdapter` class

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

srcs/server.py

srcs/adapter/document_preprocess.py

update document preprocessor

4644e59

Base automatically changed from feat/add-embeddings to main December 4, 2025 01:28

euphoria0-0 added 3 commits December 7, 2025 14:23

split docs text and path

010ab98

load pdf

6af3d13

fix args

1663cee

euphoria0-0 marked this pull request as ready for review December 7, 2025 12:04

Merge branch 'main' into feat/add-insert-tool

da52fce

euphoria0-0 self-assigned this Dec 7, 2025

jyjoo-cryptolab requested a review from Copilot December 8, 2025 00:19

Merge branch 'main' into feat/add-insert-tool

ea32085

Copilot AI reviewed Dec 8, 2025

View reviewed changes

jyjoo-cryptolab approved these changes Dec 8, 2025

View reviewed changes

jyjoo-cryptolab merged commit eca1957 into main Dec 9, 2025

jyjoo-cryptolab deleted the feat/add-insert-tool branch December 9, 2025 01:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

update document preprocessor #9

update document preprocessor #9

Uh oh!

euphoria0-0 commented Dec 2, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

update document preprocessor #9

update document preprocessor #9

Uh oh!

Conversation

euphoria0-0 commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Template (Highly Recommended)

CheckList

Summary of Pull Request

Background of Pull Request

What Pull Request Changes

New Files:

Updated Files:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

euphoria0-0 commented Dec 2, 2025 •

edited

Loading