Skip to content

upgrade#3

Merged
CocoRoF merged 2 commits intodeployfrom
main
Jan 20, 2026
Merged

upgrade#3
CocoRoF merged 2 commits intodeployfrom
main

Conversation

@CocoRoF
Copy link
Copy Markdown
Owner

@CocoRoF CocoRoF commented Jan 20, 2026

No description provided.

…gging

- Introduced PageTagProcessor to manage page, slide, and sheet markers across document handlers.
- Updated DocumentProcessor to initialize and propagate PageTagProcessor to all handlers.
- Refactored existing handlers (PDF, DOCX, DOC, PPT, Excel) to utilize the new tagging system.
- Replaced hardcoded page/slide/sheet markers with dynamic tags generated by PageTagProcessor.
- Enhanced ChunkResult to support position metadata for extracted chunks.
- Cleaned up code and improved documentation for better clarity and maintainability.
- Updated DocumentProcessor to invalidate handler registry when OCR engine is changed.
- Modified text extraction to support custom image patterns for OCR processing.
- Added ImageFileHandler for standalone image file processing with OCR support.
- Introduced BedrockOCR class for AWS Bedrock Vision model integration.
- Enhanced ImageProcessor with regex pattern generation for image tag matching.
- Updated image tag extraction to allow custom regex patterns.
- Improved documentation and examples across modules for clarity.
- Added langchain-aws dependency for BedrockOCR functionality.
@CocoRoF CocoRoF merged commit ba7b6dd into deploy Jan 20, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant