diff --git a/Process Logic.md b/Process Logic.md
new file mode 100644
index 0000000..0c54aa3
--- /dev/null
+++ b/Process Logic.md	
@@ -0,0 +1,558 @@
+# Contextify Processing Flow
+
+---
+
+## Main Flow
+
+```
+User calls: processor.extract_chunks(file_path)
+                    │
+                    ▼
+         DocumentProcessor.extract_chunks()
+                    │
+                    ├─► extract_text()
+                    │       │
+                    │       ├─► _create_current_file(file_path)
+                    │       ├─► _get_handler(extension)
+                    │       ├─► handler.extract_text(current_file)
+                    │       └─► OCR processing (optional)
+                    │
+                    └─► chunk_text()
+                            │
+                            └─► create_chunks()
+```
+
+---
+
+## PDF Handler Flow
+
+```
+PDFHandler.extract_text(current_file)
+    │
+    ├─► file_converter.convert(file_data)               [INTERFACE: PDFFileConverter]
+    │       └─► Binary → fitz.Document
+    │
+    ├─► preprocessor.preprocess(doc)                    [INTERFACE: PDFPreprocessor]
+    │       └─► Pass-through (returns PreprocessedData with doc unchanged)
+    │
+    ├─► metadata_extractor.extract()                    [INTERFACE: PDFMetadataExtractor]
+    │
+    ├─► _extract_all_tables(doc, file_path)             [INTERNAL]
+    │
+    └─► For each page:
+            │
+            ├─► ComplexityAnalyzer.analyze()            [CLASS: pdf_complexity_analyzer]
+            │       └─► Returns PageComplexity with recommended_strategy
+            │
+            ├─► Branch by strategy:
+            │       │
+            │       ├─► FULL_PAGE_OCR:
+            │       │       └─► _process_page_full_ocr()
+            │       │
+            │       ├─► BLOCK_IMAGE_OCR:
+            │       │       └─► _process_page_block_ocr()
+            │       │
+            │       ├─► HYBRID:
+            │       │       └─► _process_page_hybrid()
+            │       │
+            │       └─► TEXT_EXTRACTION (default):
+            │               └─► _process_page_text_extraction()
+            │                       │
+            │                       ├─► VectorTextOCREngine.detect_and_extract()
+            │                       ├─► extract_text_blocks()           [FUNCTION]
+            │                       ├─► format_image_processor methods  [INTERFACE: PDFImageProcessor]
+            │                       └─► merge_page_elements()           [FUNCTION]
+            │
+            └─► page_tag_processor.create_page_tag()    [INTERFACE: PageTagProcessor]
+```
+
+---
+
+## DOCX Handler Flow
+
+```
+DOCXHandler.extract_text(current_file)
+    │
+    ├─► file_converter.validate(file_data)              [INTERFACE: DOCXFileConverter]
+    │       └─► Check if valid ZIP with [Content_Types].xml
+    │
+    ├─► If not valid DOCX:
+    │       └─► _extract_with_doc_handler_fallback()    [INTERNAL]
+    │               └─► DOCHandler.extract_text()       [DELEGATION]
+    │
+    ├─► file_converter.convert(file_data)               [INTERFACE: DOCXFileConverter]
+    │       └─► Binary → docx.Document
+    │
+    ├─► preprocessor.preprocess(doc)                    [INTERFACE: DOCXPreprocessor]
+    │       └─► Returns PreprocessedData (doc in extracted_resources)
+    │
+    ├─► chart_extractor.extract_all_from_file()         [INTERFACE: DOCXChartExtractor]
+    │       └─► Pre-extract all charts (callback pattern)
+    │
+    ├─► metadata_extractor.extract()                    [INTERFACE: DOCXMetadataExtractor]
+    │
+    └─► For each element in doc.element.body:
+            │
+            ├─► If paragraph ('p'):
+            │       └─► process_paragraph_element()     [FUNCTION: docx_helper]
+            │               ├─► format_image_processor.process_drawing_element()
+            │               ├─► format_image_processor.extract_from_pict()
+            │               └─► get_next_chart() callback for charts
+            │
+            └─► If table ('tbl'):
+                    └─► process_table_element()         [FUNCTION: docx_helper]
+```
+
+---
+
+## DOC Handler Flow
+
+```
+DOCHandler.extract_text(current_file)
+    │
+    ├─► file_converter.convert()                        [INTERFACE: DOCFileConverter]
+    │       │
+    │       ├─► _detect_format() → DocFormat (RTF/OLE/HTML/DOCX)
+    │       │
+    │       ├─► RTF: file_data (bytes) 반환             [Pass-through]
+    │       ├─► OLE: _convert_ole() → olefile.OleFileIO
+    │       ├─► HTML: _convert_html() → BeautifulSoup
+    │       └─► DOCX: _convert_docx() → docx.Document
+    │
+    ├─► preprocessor.preprocess(converted_obj)          [INTERFACE: DOCPreprocessor]
+    │       └─► Returns PreprocessedData (converted_obj in extracted_resources)
+    │
+    ├─► RTF format detected:
+    │       └─► _delegate_to_rtf_handler()              [DELEGATION]
+    │               └─► RTFHandler.extract_text(current_file)
+    │
+    ├─► OLE format detected:
+    │       └─► _extract_from_ole_obj()                 [INTERNAL]
+    │               ├─► _extract_ole_metadata()
+    │               ├─► _extract_ole_text()
+    │               └─► _extract_ole_images()
+    │
+    ├─► HTML format detected:
+    │       └─► _extract_from_html_obj()                [INTERNAL]
+    │               ├─► _extract_html_metadata()
+    │               └─► BeautifulSoup parsing
+    │
+    └─► DOCX format detected:
+            └─► _extract_from_docx_obj()                [INTERNAL]
+                    └─► docx.Document paragraph/table extraction
+```
+
+---
+
+## RTF Handler Flow
+
+**구조**: Converter는 pass-through, Preprocessor에서 binary 처리, Handler에서 순차적 처리.
+
+```
+RTFHandler.extract_text(current_file)
+    │
+    ├─► file_converter.convert()                        [INTERFACE: RTFFileConverter]
+    │       └─► Pass-through (returns raw bytes)
+    │
+    ├─► preprocessor.preprocess()                       [INTERFACE: RTFPreprocessor]
+    │       │
+    │       ├─► \binN tag processing (skip binary data)
+    │       ├─► \pict group image extraction
+    │       └─► Returns PreprocessedData (clean_content, image_tags, encoding)
+    │
+    ├─► decode_content()                                [FUNCTION: rtf_decoder]
+    │       └─► bytes → string with detected encoding
+    │
+    ├─► Build RTFConvertedData                          [DATACLASS]
+    │
+    └─► _extract_from_converted()                       [INTERNAL]
+            │
+            ├─► metadata_extractor.extract()            [INTERFACE: RTFMetadataExtractor]
+            ├─► metadata_extractor.format()
+            │
+            ├─► extract_tables_with_positions()         [FUNCTION: rtf_table_extractor]
+            │
+            ├─► extract_inline_content()                [FUNCTION: rtf_content_extractor]
+            │
+            └─► Build result string
+```
+
+---
+
+## Excel Handler Flow (XLSX)
+
+```
+ExcelHandler.extract_text(current_file) [XLSX]
+    │
+    ├─► file_converter.convert(file_data, extension='xlsx')  [INTERFACE: ExcelFileConverter]
+    │       └─► Binary → openpyxl.Workbook
+    │
+    ├─► preprocessor.preprocess(wb)                     [INTERFACE: ExcelPreprocessor]
+    │       └─► Returns PreprocessedData (wb in extracted_resources)
+    │
+    ├─► _preload_xlsx_data()                            [INTERNAL]
+    │       ├─► metadata_extractor.extract()            [INTERFACE: XLSXMetadataExtractor]
+    │       ├─► chart_extractor.extract_all_from_file() [INTERFACE: ExcelChartExtractor]
+    │       └─► format_image_processor.extract_images() [INTERFACE: ExcelImageProcessor]
+    │
+    └─► For each sheet:
+            │
+            ├─► _process_xlsx_sheet()                   [INTERNAL]
+            │       ├─► page_tag_processor.create_sheet_tag()  [INTERFACE: PageTagProcessor]
+            │       ├─► extract_textboxes_from_xlsx()   [FUNCTION]
+            │       ├─► convert_xlsx_sheet_to_table()   [FUNCTION]
+            │       └─► convert_xlsx_objects_to_tables()[FUNCTION]
+            │
+            └─► format_image_processor.get_sheet_images()  [INTERFACE: ExcelImageProcessor]
+```
+
+---
+
+## Excel Handler Flow (XLS)
+
+```
+ExcelHandler.extract_text(current_file) [XLS]
+    │
+    ├─► file_converter.convert(file_data, extension='xls')   [INTERFACE: ExcelFileConverter]
+    │       └─► Binary → xlrd.Book
+    │
+    ├─► preprocessor.preprocess(wb)                     [INTERFACE: ExcelPreprocessor]
+    │       └─► Returns PreprocessedData (wb in extracted_resources)
+    │
+    ├─► _get_xls_metadata_extractor().extract_and_format()   [INTERFACE: XLSMetadataExtractor]
+    │
+    └─► For each sheet:
+            │
+            ├─► page_tag_processor.create_sheet_tag()   [INTERFACE: PageTagProcessor]
+            │
+            ├─► convert_xls_sheet_to_table()            [FUNCTION]
+            │
+            └─► convert_xls_objects_to_tables()         [FUNCTION]
+```
+
+---
+
+## PPT Handler Flow
+
+```
+PPTHandler.extract_text(current_file)
+    │
+    ├─► file_converter.convert(file_data, file_stream)  [INTERFACE: PPTFileConverter]
+    │       └─► Binary → pptx.Presentation
+    │
+    ├─► preprocessor.preprocess(prs)                    [INTERFACE: PPTPreprocessor]
+    │       └─► Returns PreprocessedData (prs in extracted_resources)
+    │
+    ├─► chart_extractor.extract_all_from_file()         [INTERFACE: PPTChartExtractor]
+    │       └─► Pre-extract all charts (callback pattern)
+    │
+    ├─► metadata_extractor.extract()                    [INTERFACE: PPTMetadataExtractor]
+    ├─► metadata_extractor.format()                     [INTERFACE: PPTMetadataExtractor]
+    │
+    └─► For each slide:
+            │
+            ├─► page_tag_processor.create_slide_tag()   [INTERFACE: PageTagProcessor]
+            │
+            └─► For each shape:
+                    │
+                    ├─► If table: convert_table_to_html()       [FUNCTION]
+                    ├─► If chart: get_next_chart() callback     [Pre-extracted]
+                    ├─► If picture: process_image_shape()       [FUNCTION]
+                    ├─► If group: process_group_shape()         [FUNCTION]
+                    └─► If text: extract_text_with_bullets()    [FUNCTION]
+```
+
+---
+
+## HWP Handler Flow
+
+```
+HWPHandler.extract_text(current_file)
+    │
+    ├─► file_converter.validate(file_data)              [INTERFACE: HWPFileConverter]
+    │       └─► Check if OLE file (magic number check)
+    │
+    ├─► If not OLE file:
+    │       └─► _handle_non_ole_file()                  [INTERNAL]
+    │               ├─► ZIP detected → HWPXHandler delegation
+    │               └─► HWP 3.0 → Not supported
+    │
+    ├─► chart_extractor.extract_all_from_file()         [INTERFACE: HWPChartExtractor]
+    │
+    ├─► file_converter.convert()                        [INTERFACE: HWPFileConverter]
+    │       └─► Binary → olefile.OleFileIO
+    │
+    ├─► preprocessor.preprocess(ole)                    [INTERFACE: HWPPreprocessor]
+    │       └─► Returns PreprocessedData (ole in extracted_resources)
+    │
+    ├─► metadata_extractor.extract()                    [INTERFACE: HWPMetadataExtractor]
+    ├─► metadata_extractor.format()                     [INTERFACE: HWPMetadataExtractor]
+    │
+    ├─► _parse_docinfo(ole)                             [INTERNAL]
+    │       └─► parse_doc_info()                        [FUNCTION]
+    │
+    ├─► _extract_body_text(ole)                         [INTERNAL]
+    │       │
+    │       └─► For each section:
+    │               ├─► decompress_section()            [FUNCTION]
+    │               └─► _parse_section()                [INTERNAL]
+    │                       └─► _process_picture()      [INTERNAL - format_image_processor 사용]
+    │
+    ├─► format_image_processor.process_images_from_bindata()  [INTERFACE: HWPImageProcessor]
+    │
+    └─► file_converter.close(ole)                       [INTERFACE: HWPFileConverter]
+```
+
+---
+
+## HWPX Handler Flow
+
+```
+HWPXHandler.extract_text(current_file)
+    │
+    ├─► get_file_stream(current_file)                   [INHERITED: BaseHandler]
+    │       └─► BytesIO(file_data)
+    │
+    ├─► _is_valid_zip(file_stream)                      [INTERNAL]
+    │
+    ├─► chart_extractor.extract_all_from_file()         [INTERFACE: HWPXChartExtractor]
+    │
+    ├─► zipfile.ZipFile(file_stream)                    [EXTERNAL LIBRARY]
+    │
+    ├─► preprocessor.preprocess(zf)                     [INTERFACE: HWPXPreprocessor]
+    │       └─► Returns PreprocessedData (extracted_resources available)
+    │
+    ├─► metadata_extractor.extract()                    [INTERFACE: HWPXMetadataExtractor]
+    ├─► metadata_extractor.format()                     [INTERFACE: HWPXMetadataExtractor]
+    │
+    ├─► parse_bin_item_map(zf)                          [FUNCTION]
+    │
+    ├─► For each section:
+    │       │
+    │       └─► parse_hwpx_section()                    [FUNCTION]
+    │               │
+    │               ├─► format_image_processor.process_images()  [INTERFACE: HWPXImageProcessor]
+    │               │
+    │               └─► parse_hwpx_table()              [FUNCTION]
+    │
+    └─► format_image_processor.get_remaining_images()   [INTERFACE: HWPXImageProcessor]
+        format_image_processor.process_images()         [INTERFACE: HWPXImageProcessor]
+```
+
+---
+
+## CSV Handler Flow
+
+```
+CSVHandler.extract_text(current_file)
+    │
+    ├─► file_converter.convert(file_data, encoding)     [INTERFACE: CSVFileConverter]
+    │       └─► Binary → Text (with encoding detection)
+    │
+    ├─► preprocessor.preprocess(content)                [INTERFACE: CSVPreprocessor]
+    │       └─► Returns PreprocessedData (content in clean_content)
+    │
+    ├─► detect_delimiter(content)                       [FUNCTION]
+    │
+    ├─► parse_csv_content(content, delimiter)           [FUNCTION]
+    │
+    ├─► detect_header(rows)                             [FUNCTION]
+    │
+    ├─► metadata_extractor.extract(source_info)         [INTERFACE: CSVMetadataExtractor]
+    │       └─► CSVSourceInfo contains: file_path, encoding, delimiter, rows, has_header
+    │
+    └─► convert_rows_to_table(rows, has_header)         [FUNCTION]
+            └─► Returns HTML table
+```
+
+---
+
+## Text Handler Flow
+
+```
+TextHandler.extract_text(current_file)
+    │
+    ├─► preprocessor.preprocess(file_data)              [INTERFACE: TextPreprocessor]
+    │       └─► Returns PreprocessedData (file_data in clean_content)
+    │
+    ├─► file_data.decode(encoding)                      [DIRECT: No FileConverter used]
+    │       └─► Try encodings: utf-8, utf-8-sig, cp949, euc-kr, latin-1, ascii
+    │
+    └─► clean_text() / clean_code_text()                [FUNCTION: utils.py]
+```
+
+Note: TextHandler는 file_converter를 사용하지 않고 직접 decode합니다.
+
+---
+
+## HTML Handler Flow
+
+```
+HTMLReprocessor (Utility - NOT a BaseHandler subclass)
+    │
+    ├─► clean_html_file(html_content)                   [FUNCTION]
+    │       │
+    │       ├─► BeautifulSoup parsing
+    │       ├─► Remove unwanted tags (script, style, etc.)
+    │       ├─► Remove style attributes
+    │       ├─► _process_table_merged_cells()
+    │       └─► Return cleaned HTML string
+    │
+    └─► Used by DOCHandler when HTML format detected
+```
+
+Note: HTML은 별도의 BaseHandler 서브클래스가 없습니다.
+      DOCHandler가 HTML 형식을 감지하면 내부적으로 BeautifulSoup으로 처리합니다.
+
+---
+
+## Image File Handler Flow
+
+```
+ImageFileHandler.extract_text(current_file)
+    │
+    ├─► preprocessor.preprocess(file_data)              [INTERFACE: ImageFilePreprocessor]
+    │       └─► Returns PreprocessedData (file_data in clean_content)
+    │
+    ├─► Validate file extension                         [INTERNAL]
+    │       └─► SUPPORTED_IMAGE_EXTENSIONS: jpg, jpeg, png, gif, bmp, webp
+    │
+    ├─► If OCR engine is None:
+    │       └─► _build_image_tag(file_path)             [INTERNAL]
+    │               └─► Return [image:path] tag
+    │
+    └─► If OCR engine available:
+            └─► _ocr_engine.extract_text()              [INTERFACE: BaseOCR]
+                    └─► Image → Text via OCR
+```
+
+Note: ImageFileHandler는 OCR 엔진이 설정된 경우에만 실제 텍스트 추출이 가능합니다.
+
+---
+
+## Chunking Flow
+
+```
+chunk_text(text, chunk_size, chunk_overlap)
+    │
+    └─► create_chunks()                                 [FUNCTION]
+            │
+            ├─► _extract_document_metadata()            [FUNCTION]
+            │
+            ├─► Detect file type:
+            │       │
+            │       ├─► Table-based (xlsx, xls, csv):
+            │       │       └─► chunk_multi_sheet_content()  [FUNCTION]
+            │       │
+            │       ├─► Text with page markers:
+            │       │       └─► chunk_by_pages()        [FUNCTION]
+            │       │
+            │       └─► Plain text:
+            │               └─► chunk_plain_text()      [FUNCTION]
+            │
+            └─► _prepend_metadata_to_chunks()           [FUNCTION]
+```
+
+---
+
+## Interface Integration Summary
+
+```
+┌─────────────┬─────────────────────┬─────────────────────┬─────────────────────┬─────────────────────┬─────────────────────┐
+│ Handler     │ FileConverter       │ Preprocessor        │ MetadataExtractor   │ ChartExtractor      │ FormatImageProcessor│
+├─────────────┼─────────────────────┼─────────────────────┼─────────────────────┼─────────────────────┼─────────────────────┤
+│ PDF         │ ✅ PDFFileConverter  │ ✅ PDFPreprocessor   │ ✅ PDFMetadata       │ ❌ NullChart         │ ✅ PDFImage          │
+│ DOCX        │ ✅ DOCXFileConverter │ ✅ DOCXPreprocessor  │ ✅ DOCXMetadata      │ ✅ DOCXChart         │ ✅ DOCXImage         │
+│ DOC         │ ✅ DOCFileConverter  │ ✅ DOCPreprocessor   │ ❌ NullMetadata      │ ❌ NullChart         │ ✅ DOCImage          │
+│ RTF         │ ✅ RTFFileConverter  │ ✅ RTFPreprocessor*  │ ✅ RTFMetadata       │ ❌ NullChart         │ ❌ Uses base         │
+│ XLSX        │ ✅ ExcelFileConverter│ ✅ ExcelPreprocessor │ ✅ XLSXMetadata      │ ✅ ExcelChart        │ ✅ ExcelImage        │
+│ XLS         │ ✅ ExcelFileConverter│ ✅ ExcelPreprocessor │ ✅ XLSMetadata       │ ✅ ExcelChart        │ ✅ ExcelImage        │
+│ PPT/PPTX    │ ✅ PPTFileConverter  │ ✅ PPTPreprocessor   │ ✅ PPTMetadata       │ ✅ PPTChart          │ ✅ PPTImage          │
+│ HWP         │ ✅ HWPFileConverter  │ ✅ HWPPreprocessor   │ ✅ HWPMetadata       │ ✅ HWPChart          │ ✅ HWPImage          │
+│ HWPX        │ ❌ None (직접 ZIP)   │ ✅ HWPXPreprocessor  │ ✅ HWPXMetadata      │ ✅ HWPXChart         │ ✅ HWPXImage         │
+│ CSV         │ ✅ CSVFileConverter  │ ✅ CSVPreprocessor   │ ✅ CSVMetadata       │ ❌ NullChart         │ ✅ CSVImage          │
+│ TXT/MD/JSON │ ❌ None (직접 decode)│ ✅ TextPreprocessor  │ ❌ NullMetadata      │ ❌ NullChart         │ ✅ TextImage         │
+│ HTML        │ ❌ N/A (유틸리티)    │ ❌ N/A               │ ❌ N/A               │ ❌ N/A               │ ❌ N/A               │
+│ Image Files │ ✅ ImageFileConverter│ ✅ ImagePreprocessor │ ❌ NullMetadata      │ ❌ NullChart         │ ✅ ImageFileImage    │
+└─────────────┴─────────────────────┴─────────────────────┴─────────────────────┴─────────────────────┴─────────────────────┘
+
+✅ = Interface implemented
+❌ = Not applicable / NullExtractor / Not used
+* = RTFPreprocessor has actual processing logic (image extraction, binary cleanup)
+```
+
+---
+
+## Handler Processing Pipeline
+
+모든 핸들러는 동일한 처리 파이프라인을 따릅니다:
+
+```
+┌──────────────────────────────────────────────────────────────────────────────────┐
+│                           Handler Processing Pipeline                             │
+├──────────────────────────────────────────────────────────────────────────────────┤
+│                                                                                   │
+│  1. FileConverter.convert()     Binary → Format-specific object                  │
+│         │                       (fitz.Document, docx.Document, olefile, etc.)    │
+│         ▼                                                                         │
+│  2. Preprocessor.preprocess()   Process/clean the converted data                 │
+│         │                       (image extraction, binary cleanup, encoding)     │
+│         ▼                                                                         │
+│  3. MetadataExtractor.extract() Extract document metadata                        │
+│         │                       (title, author, created date, etc.)              │
+│         ▼                                                                         │
+│  4. Content Extraction          Format-specific content extraction               │
+│         │                       (text, tables, images, charts)                   │
+│         ▼                                                                         │
+│  5. Result Assembly             Build final result string                        │
+│                                                                                   │
+└──────────────────────────────────────────────────────────────────────────────────┘
+
+Note: 대부분의 핸들러에서 Preprocessor는 pass-through (NullPreprocessor).
+      RTF는 예외로, RTFPreprocessor에서 실제 바이너리 처리가 이루어짐.
+```
+
+---
+
+## Remaining Function-Based Components
+
+```
+┌─────────────┬────────────────────────────────────────────────────────────┐
+│ Handler     │ Function-Based Components                                  │
+├─────────────┼────────────────────────────────────────────────────────────┤
+│ PDF         │ extract_text_blocks(), merge_page_elements(),             │
+│             │ ComplexityAnalyzer, VectorTextOCREngine,                  │
+│             │ BlockImageEngine                                          │
+├─────────────┼────────────────────────────────────────────────────────────┤
+│ DOCX        │ process_paragraph_element(), process_table_element()      │
+├─────────────┼────────────────────────────────────────────────────────────┤
+│ DOC         │ Format detection, OLE/HTML/DOCX internal processing       │
+├─────────────┼────────────────────────────────────────────────────────────┤
+│ RTF         │ decode_content() (rtf_decoder.py)                         │
+│             │ extract_tables_with_positions() (rtf_table_extractor.py)  │
+│             │ extract_inline_content() (rtf_content_extractor.py)       │
+├─────────────┼────────────────────────────────────────────────────────────┤
+│ Excel       │ extract_textboxes_from_xlsx(), convert_xlsx_sheet_to_table│
+│             │ convert_xls_sheet_to_table(), convert_*_objects_to_tables │
+├─────────────┼────────────────────────────────────────────────────────────┤
+│ PPT         │ extract_text_with_bullets(), convert_table_to_html(),     │
+│             │ process_image_shape(), process_group_shape()              │
+├─────────────┼────────────────────────────────────────────────────────────┤
+│ HWP         │ parse_doc_info(), decompress_section()                    │
+├─────────────┼────────────────────────────────────────────────────────────┤
+│ HWPX        │ parse_bin_item_map(), parse_hwpx_section()                │
+├─────────────┼────────────────────────────────────────────────────────────┤
+│ CSV         │ detect_delimiter(), parse_csv_content(), detect_header(), │
+│             │ convert_rows_to_table()                                   │
+├─────────────┼────────────────────────────────────────────────────────────┤
+│ Text        │ clean_text(), clean_code_text() (utils.py)                │
+├─────────────┼────────────────────────────────────────────────────────────┤
+│ HTML        │ clean_html_file(), _process_table_merged_cells()          │
+│             │ (html_reprocessor.py - utility, not handler)              │
+├─────────────┼────────────────────────────────────────────────────────────┤
+│ Image       │ OCR engine integration (BaseOCR subclass)                 │
+├─────────────┼────────────────────────────────────────────────────────────┤
+│ Chunking    │ create_chunks(), chunk_by_pages(), chunk_plain_text(),    │
+│             │ chunk_multi_sheet_content(), chunk_large_table()          │
+└─────────────┴────────────────────────────────────────────────────────────┘
+```
diff --git a/contextifier/core/document_processor.py b/contextifier/core/document_processor.py
index c262971..809997f 100644
--- a/contextifier/core/document_processor.py
+++ b/contextifier/core/document_processor.py
@@ -263,8 +263,8 @@ class DocumentProcessor:
     """
 
     # === Supported File Type Classifications ===
-    DOCUMENT_TYPES = frozenset(['pdf', 'docx', 'doc', 'pptx', 'ppt', 'hwp', 'hwpx'])
-    TEXT_TYPES = frozenset(['txt', 'md', 'markdown', 'rtf'])
+    DOCUMENT_TYPES = frozenset(['pdf', 'docx', 'doc', 'rtf', 'pptx', 'ppt', 'hwp', 'hwpx'])
+    TEXT_TYPES = frozenset(['txt', 'md', 'markdown'])
     CODE_TYPES = frozenset([
         'py', 'js', 'ts', 'java', 'cpp', 'c', 'h', 'cs', 'go', 'rs',
         'php', 'rb', 'swift', 'kt', 'scala', 'dart', 'r', 'sql',
@@ -291,6 +291,8 @@ def __init__(
         slide_tag_suffix: Optional[str] = None,
         chart_tag_prefix: Optional[str] = None,
         chart_tag_suffix: Optional[str] = None,
+        metadata_tag_prefix: Optional[str] = None,
+        metadata_tag_suffix: Optional[str] = None,
         **kwargs
     ):
         """
@@ -328,6 +330,12 @@ def __init__(
             chart_tag_suffix: Suffix for chart tags in extracted text
                    - Default: "[/chart]"
                    - Example: "</chart>" for XML format
+            metadata_tag_prefix: Opening tag for metadata section
+                   - Default: "<Document-Metadata>"
+                   - Example: "<metadata>" for custom format
+            metadata_tag_suffix: Closing tag for metadata section
+                   - Default: "</Document-Metadata>"
+                   - Example: "</metadata>" for custom format
             **kwargs: Additional configuration options
 
         Example:
@@ -342,7 +350,9 @@ def __init__(
             ...     page_tag_prefix="<page>",
             ...     page_tag_suffix="</page>",
             ...     chart_tag_prefix="<chart>",
-            ...     chart_tag_suffix="</chart>"
+            ...     chart_tag_suffix="</chart>",
+            ...     metadata_tag_prefix="<meta>",
+            ...     metadata_tag_suffix="</meta>"
             ... )
 
             >>> # Markdown format
@@ -359,6 +369,10 @@ def __init__(
         self._ocr_engine = ocr_engine
         self._kwargs = kwargs
         self._supported_extensions: Optional[List[str]] = None
+        
+        # Store metadata tag settings
+        self._metadata_tag_prefix = metadata_tag_prefix
+        self._metadata_tag_suffix = metadata_tag_suffix
 
         # Logger setup
         self._logger = logging.getLogger("contextify.processor")
@@ -389,12 +403,19 @@ def __init__(
             chart_tag_prefix=chart_tag_prefix,
             chart_tag_suffix=chart_tag_suffix
         )
+        
+        # Create instance-specific MetadataFormatter
+        self._metadata_formatter = self._create_metadata_formatter(
+            metadata_tag_prefix=metadata_tag_prefix,
+            metadata_tag_suffix=metadata_tag_suffix
+        )
 
         # Add processors to config for handlers to access
         if isinstance(self._config, dict):
             self._config["image_processor"] = self._image_processor
             self._config["page_tag_processor"] = self._page_tag_processor
             self._config["chart_processor"] = self._chart_processor
+            self._config["metadata_formatter"] = self._metadata_formatter
 
     # =========================================================================
     # Public Properties
@@ -484,6 +505,26 @@ def chart_processor(self) -> Any:
         """Current ChartProcessor instance for this DocumentProcessor."""
         return self._chart_processor
 
+    @property
+    def metadata_tag_config(self) -> Dict[str, Any]:
+        """
+        Current metadata formatter configuration.
+
+        Returns:
+            Dictionary containing:
+            - metadata_tag_prefix: Opening tag for metadata section
+            - metadata_tag_suffix: Closing tag for metadata section
+        """
+        return {
+            "metadata_tag_prefix": self._metadata_formatter.metadata_tag_prefix,
+            "metadata_tag_suffix": self._metadata_formatter.metadata_tag_suffix,
+        }
+
+    @property
+    def metadata_formatter(self) -> Any:
+        """Current MetadataFormatter instance for this DocumentProcessor."""
+        return self._metadata_formatter
+
     @property
     def ocr_engine(self) -> Optional[Any]:
         """Current OCR engine instance."""
@@ -875,6 +916,34 @@ def _create_chart_processor(
             tag_suffix=chart_tag_suffix
         )
 
+    def _create_metadata_formatter(
+        self,
+        metadata_tag_prefix: Optional[str] = None,
+        metadata_tag_suffix: Optional[str] = None
+    ) -> Any:
+        """
+        Create a MetadataFormatter instance for this DocumentProcessor.
+
+        This creates an instance-specific MetadataFormatter that will be
+        passed to handlers via config.
+
+        Args:
+            metadata_tag_prefix: Opening tag (default: "<Document-Metadata>")
+            metadata_tag_suffix: Closing tag (default: "</Document-Metadata>")
+
+        Returns:
+            MetadataFormatter instance
+        """
+        from contextifier.core.functions.metadata_extractor import MetadataFormatter
+
+        kwargs = {}
+        if metadata_tag_prefix is not None:
+            kwargs["metadata_tag_prefix"] = metadata_tag_prefix
+        if metadata_tag_suffix is not None:
+            kwargs["metadata_tag_suffix"] = metadata_tag_suffix
+
+        return MetadataFormatter(**kwargs)
+
     def _build_supported_extensions(self) -> List[str]:
         """Build list of supported extensions."""
         extensions = list(
@@ -940,6 +1009,19 @@ def _get_handler_registry(self) -> Dict[str, Callable]:
         except ImportError as e:
             self._logger.warning(f"DOC handler not available: {e}")
 
+        # RTF handler
+        try:
+            from contextifier.core.processor.rtf_handler import RTFHandler
+            rtf_handler = RTFHandler(
+                config=self._config,
+                image_processor=self._image_processor,
+                page_tag_processor=self._page_tag_processor,
+                chart_processor=self._chart_processor
+            )
+            self._handler_registry['rtf'] = rtf_handler.extract_text
+        except ImportError as e:
+            self._logger.warning(f"RTF handler not available: {e}")
+
         # PPT/PPTX handler
         try:
             from contextifier.core.processor.ppt_handler import PPTHandler
@@ -997,7 +1079,7 @@ def _get_handler_registry(self) -> Dict[str, Callable]:
 
         # HWPX handler
         try:
-            from contextifier.core.processor.hwps_handler import HWPXHandler
+            from contextifier.core.processor.hwpx_handler import HWPXHandler
             hwpx_handler = HWPXHandler(
                 config=self._config,
                 image_processor=self._image_processor,
diff --git a/contextifier/core/functions/__init__.py b/contextifier/core/functions/__init__.py
index 905bb4e..07d125c 100644
--- a/contextifier/core/functions/__init__.py
+++ b/contextifier/core/functions/__init__.py
@@ -1,17 +1,19 @@
 # libs/core/functions/__init__.py
 """
-Functions - 공통 유틸리티 함수 모듈
+Functions - Common Utility Functions Module
 
-문서 처리에 사용되는 공통 유틸리티 함수들을 제공합니다.
+Provides common utility functions used in document processing.
 
-모듈 구성:
-- utils: 텍스트 정리, 코드 정리, JSON 정리 등 유틸리티 함수
-- img_processor: 이미지 처리 및 저장 (ImageProcessor 클래스)
-- ppt2pdf: PPT를 PDF로 변환하는 함수
+Module Components:
+- utils: Text cleaning, code cleaning, JSON sanitization utilities
+- img_processor: Image processing and storage (ImageProcessor class)
+- storage_backend: Storage backend implementations (Local, MinIO, S3)
+- metadata_extractor: Document metadata extraction interface
 
-사용 예시:
+Usage Example:
     from contextifier.core.functions import clean_text, clean_code_text
     from contextifier.core.functions import ImageProcessor, save_image_to_file
+    from contextifier.core.functions.storage_backend import LocalStorageBackend
     from contextifier.core.functions.utils import sanitize_text_for_json
 """
 
@@ -21,7 +23,18 @@
     sanitize_text_for_json,
 )
 
-# 이미지 처리 모듈
+# Storage backend module
+from contextifier.core.functions.storage_backend import (
+    StorageType,
+    BaseStorageBackend,
+    LocalStorageBackend,
+    MinIOStorageBackend,
+    S3StorageBackend,
+    create_storage_backend,
+    get_default_backend,
+)
+
+# Image processor module
 from contextifier.core.functions.img_processor import (
     ImageProcessor,
     ImageProcessorConfig,
@@ -29,18 +42,43 @@
     NamingStrategy,
     save_image_to_file,
     create_image_processor,
+    DEFAULT_IMAGE_CONFIG,
+)
+
+# Metadata extraction module
+from contextifier.core.functions.metadata_extractor import (
+    MetadataField,
+    DocumentMetadata,
+    MetadataFormatter,
+    BaseMetadataExtractor,
+    format_metadata,
 )
 
 __all__ = [
-    # 텍스트 유틸리티
+    # Text utilities
     "clean_text",
     "clean_code_text",
     "sanitize_text_for_json",
-    # 이미지 처리
+    # Storage backends
+    "StorageType",
+    "BaseStorageBackend",
+    "LocalStorageBackend",
+    "MinIOStorageBackend",
+    "S3StorageBackend",
+    "create_storage_backend",
+    "get_default_backend",
+    # Image processor (base class for all format-specific processors)
     "ImageProcessor",
     "ImageProcessorConfig",
     "ImageFormat",
     "NamingStrategy",
     "save_image_to_file",
     "create_image_processor",
+    "DEFAULT_IMAGE_CONFIG",
+    # Metadata extraction
+    "MetadataField",
+    "DocumentMetadata",
+    "MetadataFormatter",
+    "BaseMetadataExtractor",
+    "format_metadata",
 ]
diff --git a/contextifier/core/functions/file_converter.py b/contextifier/core/functions/file_converter.py
new file mode 100644
index 0000000..74c8d55
--- /dev/null
+++ b/contextifier/core/functions/file_converter.py
@@ -0,0 +1,219 @@
+# libs/core/functions/file_converter.py
+"""
+BaseFileConverter - Abstract base class for file format conversion
+
+Defines the interface for converting binary file data to a workable format.
+Each handler can optionally implement a format-specific converter.
+
+The converter's job is to transform raw binary data into a format-specific
+object that the handler can work with (e.g., Document, Workbook, OLE file).
+
+This is the FIRST step in the processing pipeline:
+    Binary Data → FileConverter → Workable Object → Handler Processing
+
+Usage:
+    class PDFFileConverter(BaseFileConverter):
+        def convert(self, file_data: bytes, file_stream: BinaryIO) -> Any:
+            import fitz
+            return fitz.open(stream=file_data, filetype="pdf")
+        
+        def get_format_name(self) -> str:
+            return "PDF Document"
+"""
+from abc import ABC, abstractmethod
+from io import BytesIO
+from typing import Any, Optional, Union, BinaryIO
+
+
+class BaseFileConverter(ABC):
+    """
+    Abstract base class for file format converters.
+    
+    Converts raw binary file data into a format-specific workable object.
+    This is the first processing step before text extraction.
+    
+    Subclasses must implement:
+    - convert(): Convert binary data to workable format
+    - get_format_name(): Return human-readable format name
+    """
+    
+    @abstractmethod
+    def convert(
+        self,
+        file_data: bytes,
+        file_stream: Optional[BinaryIO] = None,
+        **kwargs
+    ) -> Any:
+        """
+        Convert binary file data to a workable format.
+        
+        Args:
+            file_data: Raw binary file data
+            file_stream: Optional file stream (BytesIO) for libraries that prefer streams
+            **kwargs: Additional format-specific options
+            
+        Returns:
+            Format-specific object (Document, Workbook, OLE file, etc.)
+            
+        Raises:
+            ConversionError: If conversion fails
+        """
+        pass
+    
+    @abstractmethod
+    def get_format_name(self) -> str:
+        """
+        Return human-readable format name.
+        
+        Returns:
+            Format name string (e.g., "PDF Document", "DOCX Document")
+        """
+        pass
+    
+    def validate(self, file_data: bytes) -> bool:
+        """
+        Validate if the file data can be converted by this converter.
+        
+        Override this method to add format-specific validation.
+        Default implementation returns True.
+        
+        Args:
+            file_data: Raw binary file data
+            
+        Returns:
+            True if file can be converted, False otherwise
+        """
+        return True
+    
+    def close(self, converted_object: Any) -> None:
+        """
+        Close/cleanup the converted object if needed.
+        
+        Override this method if the converted object needs explicit cleanup.
+        Default implementation does nothing.
+        
+        Args:
+            converted_object: The object returned by convert()
+        """
+        pass
+
+
+class NullFileConverter(BaseFileConverter):
+    """
+    Null implementation of file converter.
+    
+    Used as default when no conversion is needed.
+    Returns the original file data unchanged.
+    """
+    
+    def convert(
+        self,
+        file_data: bytes,
+        file_stream: Optional[BinaryIO] = None,
+        **kwargs
+    ) -> bytes:
+        """Return file data unchanged."""
+        return file_data
+    
+    def get_format_name(self) -> str:
+        """Return generic format name."""
+        return "Raw Binary"
+
+
+class PassThroughConverter(BaseFileConverter):
+    """
+    Pass-through converter that returns file stream.
+    
+    Used for handlers that work directly with BytesIO streams.
+    """
+    
+    def convert(
+        self,
+        file_data: bytes,
+        file_stream: Optional[BinaryIO] = None,
+        **kwargs
+    ) -> BinaryIO:
+        """Return BytesIO stream of file data."""
+        if file_stream is not None:
+            file_stream.seek(0)
+            return file_stream
+        return BytesIO(file_data)
+    
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "Binary Stream"
+
+
+class TextFileConverter(BaseFileConverter):
+    """
+    Converter for text-based files.
+    
+    Decodes binary data to text string using encoding detection.
+    """
+    
+    DEFAULT_ENCODINGS = ['utf-8', 'utf-8-sig', 'cp949', 'euc-kr', 'latin-1', 'ascii']
+    
+    def __init__(self, encodings: Optional[list] = None):
+        """
+        Initialize TextFileConverter.
+        
+        Args:
+            encodings: List of encodings to try (default: common encodings)
+        """
+        self._encodings = encodings or self.DEFAULT_ENCODINGS
+        self._detected_encoding: Optional[str] = None
+    
+    def convert(
+        self,
+        file_data: bytes,
+        file_stream: Optional[BinaryIO] = None,
+        encoding: Optional[str] = None,
+        **kwargs
+    ) -> str:
+        """
+        Convert binary data to text string.
+        
+        Args:
+            file_data: Raw binary file data
+            file_stream: Ignored for text conversion
+            encoding: Specific encoding to use (None for auto-detect)
+            **kwargs: Additional options
+            
+        Returns:
+            Decoded text string
+            
+        Raises:
+            UnicodeDecodeError: If decoding fails with all encodings
+        """
+        # Try specified encoding first
+        if encoding:
+            try:
+                result = file_data.decode(encoding)
+                self._detected_encoding = encoding
+                return result
+            except UnicodeDecodeError:
+                pass
+        
+        # Try each encoding in order
+        for enc in self._encodings:
+            try:
+                result = file_data.decode(enc)
+                self._detected_encoding = enc
+                return result
+            except UnicodeDecodeError:
+                continue
+        
+        # Fallback: decode with errors='replace'
+        self._detected_encoding = 'utf-8'
+        return file_data.decode('utf-8', errors='replace')
+    
+    def get_format_name(self) -> str:
+        """Return format name with detected encoding."""
+        if self._detected_encoding:
+            return f"Text ({self._detected_encoding})"
+        return "Text"
+    
+    @property
+    def detected_encoding(self) -> Optional[str]:
+        """Return the encoding detected during last conversion."""
+        return self._detected_encoding
diff --git a/contextifier/core/functions/img_processor.py b/contextifier/core/functions/img_processor.py
index 3b3d755..a594ff1 100644
--- a/contextifier/core/functions/img_processor.py
+++ b/contextifier/core/functions/img_processor.py
@@ -1,25 +1,39 @@
-# libs/core/functions/img_processor.py
+# contextifier/core/functions/img_processor.py
 """
 Image Processing Module
 
-Provides functionality to save image data to the local file system and convert to tag format.
-A general-purpose image processing module that replaces the existing image upload functions.
+Provides functionality to save image data to various storage backends
+and convert to tag format. Uses Strategy pattern for storage backends.
+
+This is the BASE class for all image processors.
+Format-specific processors (PDFImageProcessor, DOCXImageProcessor, etc.)
+should inherit from ImageProcessor and override process_image() method.
 
 Main Features:
-- Save image data to a specified directory
+- Base ImageProcessor class with pluggable storage backend
+- Save image data to specified storage (Local, MinIO, S3, etc.)
 - Return saved path in custom tag format
 - Duplicate image detection and handling
 - Support for various image formats
+- Extensible for format-specific processing
 
 Usage Example:
     from contextifier.core.functions.img_processor import ImageProcessor
+    from contextifier.core.functions.storage_backend import (
+        LocalStorageBackend,
+        MinIOStorageBackend,
+    )
 
-    # Use with default settings
+    # Use with default settings (local storage)
     processor = ImageProcessor()
     tag = processor.save_image(image_bytes)
-    # Result: "[Image:temp/abc123.png]"
+    # Result: "[Image:temp/images/abc123.png]"
 
-    # Custom settings
+    # Use with MinIO storage (when implemented)
+    minio_backend = MinIOStorageBackend(endpoint="localhost:9000", bucket="images")
+    processor = ImageProcessor(storage_backend=minio_backend)
+    
+    # Custom tag format
     processor = ImageProcessor(
         directory_path="output/images",
         tag_prefix="<img src='",
@@ -27,23 +41,36 @@
     )
     tag = processor.save_image(image_bytes)
     # Result: "<img src='output/images/abc123.png'>"
+    
+    # Inherit for format-specific processing
+    class PDFImageProcessor(ImageProcessor):
+        def process_image(self, image_data: bytes, **kwargs) -> Optional[str]:
+            xref = kwargs.get('xref')
+            custom_name = f"pdf_xref_{xref}" if xref else None
+            return self.save_image(image_data, custom_name=custom_name)
 """
 import hashlib
 import io
 import logging
 import os
-import tempfile
 import uuid
 from dataclasses import dataclass, field
 from enum import Enum
 from pathlib import Path
-from typing import Any, Callable, Dict, List, Optional, Set, Tuple, Union
+from typing import Any, Dict, List, Optional, Set, Union
 
-logger = logging.getLogger("document-processor")
+from contextifier.core.functions.storage_backend import (
+    BaseStorageBackend,
+    LocalStorageBackend,
+    StorageType,
+    get_default_backend,
+)
+
+logger = logging.getLogger("contextify.image_processor")
 
 
 class ImageFormat(Enum):
-    """Supported image formats"""
+    """Supported image formats."""
     PNG = "png"
     JPEG = "jpeg"
     JPG = "jpg"
@@ -55,7 +82,7 @@ class ImageFormat(Enum):
 
 
 class NamingStrategy(Enum):
-    """Image file naming strategies"""
+    """Image file naming strategies."""
     HASH = "hash"           # Content-based hash (prevents duplicates)
     UUID = "uuid"           # Unique UUID
     SEQUENTIAL = "sequential"  # Sequential numbering
@@ -65,10 +92,10 @@ class NamingStrategy(Enum):
 @dataclass
 class ImageProcessorConfig:
     """
-    ImageProcessor Configuration
+    ImageProcessor Configuration.
 
     Attributes:
-        directory_path: Directory path to save images
+        directory_path: Directory path or bucket prefix for saving images
         tag_prefix: Tag prefix (e.g., "[Image:")
         tag_suffix: Tag suffix (e.g., "]")
         naming_strategy: File naming strategy
@@ -78,7 +105,7 @@ class ImageProcessorConfig:
         hash_algorithm: Hash algorithm (for hash strategy)
         max_filename_length: Maximum filename length
     """
-    directory_path: str = "temp"
+    directory_path: str = "temp/images"
     tag_prefix: str = "[Image:"
     tag_suffix: str = "]"
     naming_strategy: NamingStrategy = NamingStrategy.HASH
@@ -91,23 +118,29 @@ class ImageProcessorConfig:
 
 class ImageProcessor:
     """
-    Image Processing Class
-
-    Saves image data to the local file system and returns
+    Base Image Processing Class.
+    
+    Saves image data using a pluggable storage backend and returns
     the saved path in the specified tag format.
-
+    
+    This is the BASE CLASS for all format-specific image processors.
+    Subclasses should override process_image() for format-specific handling.
+    
     Args:
-        directory_path: Image save directory (default: "temp")
+        directory_path: Image save directory (default: "temp/images")
         tag_prefix: Tag prefix (default: "[Image:")
         tag_suffix: Tag suffix (default: "]")
         naming_strategy: File naming strategy (default: HASH)
-        config: ImageProcessorConfig object (takes precedence over individual parameters)
-
+        storage_backend: Storage backend instance (default: LocalStorageBackend)
+        config: ImageProcessorConfig object (takes precedence)
+        
     Examples:
+        >>> # Default usage (local storage)
         >>> processor = ImageProcessor()
         >>> tag = processor.save_image(image_bytes)
-        "[Image:temp/a1b2c3d4.png]"
-
+        "[Image:temp/images/a1b2c3d4.png]"
+        
+        >>> # Custom directory and tags
         >>> processor = ImageProcessor(
         ...     directory_path="images",
         ...     tag_prefix="![image](",
@@ -115,56 +148,90 @@ class ImageProcessor:
         ... )
         >>> tag = processor.save_image(image_bytes)
         "![image](images/a1b2c3d4.png)"
+        
+        >>> # Subclass for format-specific processing
+        >>> class PDFImageProcessor(ImageProcessor):
+        ...     def process_image(self, image_data, **kwargs):
+        ...         xref = kwargs.get('xref')
+        ...         return self.save_image(image_data, custom_name=f"pdf_{xref}")
     """
-
+    
     def __init__(
         self,
-        directory_path: str = "temp",
+        directory_path: str = "temp/images",
         tag_prefix: str = "[Image:",
         tag_suffix: str = "]",
         naming_strategy: Union[NamingStrategy, str] = NamingStrategy.HASH,
+        storage_backend: Optional[BaseStorageBackend] = None,
         config: Optional[ImageProcessorConfig] = None,
     ):
+        # Set config
         if config:
             self.config = config
         else:
-            # Convert string to Enum if needed
             if isinstance(naming_strategy, str):
                 naming_strategy = NamingStrategy(naming_strategy.lower())
-
+            
             self.config = ImageProcessorConfig(
                 directory_path=directory_path,
                 tag_prefix=tag_prefix,
                 tag_suffix=tag_suffix,
                 naming_strategy=naming_strategy,
             )
-
+        
+        # Set storage backend (default: local)
+        self._storage_backend = storage_backend or get_default_backend()
+        
         # Track processed image hashes (for duplicate prevention)
         self._processed_hashes: Dict[str, str] = {}
-
+        
         # Sequential counter (for sequential strategy)
         self._sequential_counter: int = 0
-
-        # Create directory
+        
+        # Logger
+        self._logger = logging.getLogger("contextify.image_processor.ImageProcessor")
+        
+        # Create directory if using local storage
         if self.config.create_directory:
-            self._ensure_directory_exists()
-
-    def _ensure_directory_exists(self) -> None:
-        """Check if directory exists and create if not"""
-        path = Path(self.config.directory_path)
-        if not path.exists():
-            path.mkdir(parents=True, exist_ok=True)
-            logger.debug(f"Created directory: {path}")
-
+            self._ensure_storage_ready()
+    
+    @property
+    def storage_backend(self) -> BaseStorageBackend:
+        """Get the current storage backend."""
+        return self._storage_backend
+    
+    @storage_backend.setter
+    def storage_backend(self, backend: BaseStorageBackend) -> None:
+        """
+        Set storage backend.
+        
+        Args:
+            backend: New storage backend instance
+        """
+        self._storage_backend = backend
+        if self.config.create_directory:
+            self._ensure_storage_ready()
+    
+    @property
+    def storage_type(self) -> StorageType:
+        """Get the current storage type."""
+        return self._storage_backend.storage_type
+    
+    def _ensure_storage_ready(self) -> None:
+        """Ensure storage is ready."""
+        self._storage_backend.ensure_ready(self.config.directory_path)
+    
     def _compute_hash(self, data: bytes) -> str:
-        """Compute hash of image data"""
+        """Compute hash of image data."""
         hasher = hashlib.new(self.config.hash_algorithm)
         hasher.update(data)
-        return hasher.hexdigest()[:32]  # Use first 32 characters
-
+        return hasher.hexdigest()[:32]
+    
     def _detect_format(self, data: bytes) -> ImageFormat:
-        """Detect format from image data"""
-        # Detect format using magic bytes
+        """Detect format from image data using magic bytes."""
+        if len(data) < 12:
+            return ImageFormat.UNKNOWN
+        
         if data[:8] == b'\x89PNG\r\n\x1a\n':
             return ImageFormat.PNG
         elif data[:2] == b'\xff\xd8':
@@ -179,25 +246,27 @@ def _detect_format(self, data: bytes) -> ImageFormat:
             return ImageFormat.TIFF
         else:
             return ImageFormat.UNKNOWN
-
+    
     def _generate_filename(
         self,
         data: bytes,
         image_format: ImageFormat,
         custom_name: Optional[str] = None
     ) -> str:
-        """Generate filename"""
+        """Generate filename based on naming strategy."""
         if custom_name:
-            # Add extension
-            if not any(custom_name.lower().endswith(f".{fmt.value}") for fmt in ImageFormat if fmt != ImageFormat.UNKNOWN):
-                ext = image_format.value if image_format != ImageFormat.UNKNOWN else self.config.default_format.value
+            if not any(custom_name.lower().endswith(f".{fmt.value}") 
+                      for fmt in ImageFormat if fmt != ImageFormat.UNKNOWN):
+                ext = (image_format.value if image_format != ImageFormat.UNKNOWN 
+                       else self.config.default_format.value)
                 return f"{custom_name}.{ext}"
             return custom_name
-
-        ext = image_format.value if image_format != ImageFormat.UNKNOWN else self.config.default_format.value
-
+        
+        ext = (image_format.value if image_format != ImageFormat.UNKNOWN 
+               else self.config.default_format.value)
+        
         strategy = self.config.naming_strategy
-
+        
         if strategy == NamingStrategy.HASH:
             base = self._compute_hash(data)
         elif strategy == NamingStrategy.UUID:
@@ -210,28 +279,29 @@ def _generate_filename(
             base = f"img_{int(time.time() * 1000)}"
         else:
             base = self._compute_hash(data)
-
+        
         filename = f"{base}.{ext}"
-
-        # Limit filename length
+        
         if len(filename) > self.config.max_filename_length:
             max_base_len = self.config.max_filename_length - len(ext) - 1
             filename = f"{base[:max_base_len]}.{ext}"
-
+        
         return filename
-
+    
+    def _build_file_path(self, filename: str) -> str:
+        """Build full file path from filename."""
+        return os.path.join(self.config.directory_path, filename)
+    
     def _build_tag(self, file_path: str) -> str:
-        """Build tag from saved file path"""
+        """Build tag from file path."""
         if self.config.use_absolute_path:
             path_str = str(Path(file_path).absolute())
         else:
-            path_str = file_path
-
-        # Normalize path separators (Windows -> Unix style)
+            path_str = self._storage_backend.build_url(file_path)
+        
         path_str = path_str.replace("\\", "/")
-
         return f"{self.config.tag_prefix}{path_str}{self.config.tag_suffix}"
-
+    
     def save_image(
         self,
         image_data: bytes,
@@ -240,72 +310,148 @@ def save_image(
         skip_duplicate: bool = True,
     ) -> Optional[str]:
         """
-        Save image data to file and return tag.
-
+        Save image data and return tag.
+        
         Args:
             image_data: Image binary data
             custom_name: Custom filename (extension optional)
             processed_images: Set of processed image paths (for external duplicate tracking)
-            skip_duplicate: If True, skip saving duplicate images (return existing path)
-
+            skip_duplicate: If True, skip saving duplicate images
+            
         Returns:
             Image tag string, or None on failure
-
+            
         Examples:
             >>> processor = ImageProcessor()
             >>> tag = processor.save_image(png_bytes)
-            "[Image:temp/abc123.png]"
+            "[Image:temp/images/abc123.png]"
         """
         if not image_data:
-            logger.warning("Empty image data provided")
+            self._logger.warning("Empty image data provided")
             return None
-
+        
         try:
             # Detect image format
             image_format = self._detect_format(image_data)
-
-            # Compute hash (for duplicate check)
+            
+            # Compute hash
             image_hash = self._compute_hash(image_data)
-
+            
             # Check for duplicates
             if skip_duplicate and image_hash in self._processed_hashes:
                 existing_path = self._processed_hashes[image_hash]
-                logger.debug(f"Duplicate image detected, returning existing: {existing_path}")
+                self._logger.debug(f"Duplicate image detected: {existing_path}")
                 return self._build_tag(existing_path)
-
+            
             # Generate filename
             filename = self._generate_filename(image_data, image_format, custom_name)
-
-            # Full path
-            file_path = os.path.join(self.config.directory_path, filename)
-
+            file_path = self._build_file_path(filename)
+            
             # Check external duplicate tracking
             if processed_images is not None and file_path in processed_images:
-                logger.debug(f"Image already processed externally: {file_path}")
+                self._logger.debug(f"Image already processed: {file_path}")
                 return self._build_tag(file_path)
-
-            # Ensure directory exists
-            self._ensure_directory_exists()
-
-            # Save file
-            with open(file_path, 'wb') as f:
-                f.write(image_data)
-
-            logger.debug(f"Image saved: {file_path}")
-
-            # Update internal duplicate tracking
+            
+            # Ensure storage is ready
+            self._ensure_storage_ready()
+            
+            # Save using storage backend
+            if not self._storage_backend.save(image_data, file_path):
+                return None
+            
+            self._logger.debug(f"Image saved: {file_path}")
+            
+            # Update tracking
             self._processed_hashes[image_hash] = file_path
-
-            # Update external duplicate tracking
             if processed_images is not None:
                 processed_images.add(file_path)
-
+            
             return self._build_tag(file_path)
-
+        
         except Exception as e:
-            logger.error(f"Failed to save image: {e}")
+            self._logger.error(f"Failed to save image: {e}")
             return None
-
+    
+    def process_image(
+        self,
+        image_data: bytes,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process and save image data.
+        
+        This is the main method for format-specific image processing.
+        Subclasses should override this method to provide format-specific
+        processing logic before saving.
+        
+        Default implementation simply saves the image.
+        
+        Args:
+            image_data: Raw image binary data
+            **kwargs: Format-specific options (e.g., xref, page_num, sheet_name)
+            
+        Returns:
+            Image tag string, or None on failure
+            
+        Examples:
+            >>> processor = ImageProcessor()
+            >>> tag = processor.process_image(png_bytes)
+            "[Image:temp/images/abc123.png]"
+            
+            >>> # Subclass example
+            >>> class PDFImageProcessor(ImageProcessor):
+            ...     def process_image(self, image_data, **kwargs):
+            ...         xref = kwargs.get('xref')
+            ...         custom_name = f"pdf_xref_{xref}" if xref else None
+            ...         return self.save_image(image_data, custom_name=custom_name)
+        """
+        custom_name = kwargs.get('custom_name')
+        return self.save_image(image_data, custom_name=custom_name)
+    
+    def process_embedded_image(
+        self,
+        image_data: bytes,
+        image_name: Optional[str] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process embedded image from document.
+        
+        Override in subclasses for format-specific embedded image handling.
+        Default implementation just saves the image.
+        
+        Args:
+            image_data: Image binary data
+            image_name: Original image name in document
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        return self.save_image(image_data, custom_name=image_name)
+    
+    def process_chart_image(
+        self,
+        chart_data: bytes,
+        chart_name: Optional[str] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process chart as image.
+        
+        Override in subclasses for format-specific chart image handling.
+        Default implementation just saves the image.
+        
+        Args:
+            chart_data: Chart image binary data
+            chart_name: Chart name
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        return self.save_image(chart_data, custom_name=chart_name)
+    
     def save_image_from_pil(
         self,
         pil_image,
@@ -315,128 +461,104 @@ def save_image_from_pil(
         quality: int = 95,
     ) -> Optional[str]:
         """
-        Save PIL Image object to file and return tag.
-
+        Save PIL Image object and return tag.
+        
         Args:
             pil_image: PIL Image object
-            image_format: Image format to save (None keeps original or uses default)
+            image_format: Image format to save
             custom_name: Custom filename
             processed_images: Set of processed image paths
             quality: JPEG quality (1-100)
-
+            
         Returns:
             Image tag string, or None on failure
         """
         try:
             from PIL import Image
-
+            
             if not isinstance(pil_image, Image.Image):
-                logger.error("Invalid PIL Image object")
+                self._logger.error("Invalid PIL Image object")
                 return None
-
-            # Determine format
+            
             fmt = image_format or ImageFormat.PNG
             if fmt == ImageFormat.UNKNOWN:
                 fmt = self.config.default_format
-
-            # Convert to bytes
+            
             buffer = io.BytesIO()
             save_format = fmt.value.upper()
             if save_format == "JPG":
                 save_format = "JPEG"
-
+            
             save_kwargs = {}
             if save_format == "JPEG":
                 save_kwargs["quality"] = quality
             elif save_format == "PNG":
                 save_kwargs["compress_level"] = 6
-
+            
             pil_image.save(buffer, format=save_format, **save_kwargs)
             image_data = buffer.getvalue()
-
+            
             return self.save_image(image_data, custom_name, processed_images)
-
+        
         except Exception as e:
-            logger.error(f"Failed to save PIL image: {e}")
+            self._logger.error(f"Failed to save PIL image: {e}")
             return None
-
+    
     def get_processed_count(self) -> int:
-        """Return number of processed images"""
+        """Return number of processed images."""
         return len(self._processed_hashes)
-
+    
     def get_processed_paths(self) -> List[str]:
-        """Return all processed image paths"""
+        """Return all processed image paths."""
         return list(self._processed_hashes.values())
-
+    
     def clear_cache(self) -> None:
-        """Clear internal duplicate tracking cache"""
+        """Clear internal duplicate tracking cache."""
         self._processed_hashes.clear()
         self._sequential_counter = 0
-
+    
     def cleanup(self, delete_files: bool = False) -> int:
         """
         Clean up resources.
-
+        
         Args:
-            delete_files: If True, also delete saved files
-
+            delete_files: If True, delete saved files
+            
         Returns:
             Number of deleted files
         """
         deleted = 0
-
         if delete_files:
             for path in self._processed_hashes.values():
-                try:
-                    if os.path.exists(path):
-                        os.remove(path)
-                        deleted += 1
-                except Exception as e:
-                    logger.warning(f"Failed to delete file {path}: {e}")
-
+                if self._storage_backend.delete(path):
+                    deleted += 1
         self.clear_cache()
         return deleted
-
+    
     def get_pattern_string(self) -> str:
         """
         Get regex pattern string for matching image tags.
-
-        Returns a regex pattern that matches the image tag format used by this processor.
-        The pattern captures the image path as group 1.
-
+        
         Returns:
-            Regex pattern string for matching image tags
-
-        Examples:
-            >>> processor = ImageProcessor()  # default: [Image:...]
-            >>> processor.get_pattern_string()
-            '\\[Image:([^\\]]+)\\]'
-
-            >>> processor = ImageProcessor(tag_prefix="<img src='", tag_suffix="'/>")
-            >>> processor.get_pattern_string()
-            "<img src='([^']+)'/>"
+            Regex pattern string
         """
         import re
         prefix = re.escape(self.config.tag_prefix)
         suffix = re.escape(self.config.tag_suffix)
-
-        # Determine the capture group pattern based on suffix
-        # If suffix is empty, capture everything until whitespace or end
+        
         if not self.config.tag_suffix:
             capture = r'(\S+)'
         else:
-            # Use negated character class based on first char of suffix
             first_char = self.config.tag_suffix[0]
             capture = f'([^{re.escape(first_char)}]+)'
-
+        
         return f'{prefix}{capture}{suffix}'
 
 
 # ============================================================================
-# Config-based ImageProcessor Access
+# Default Configuration
 # ============================================================================
 
-# Default configuration values
 DEFAULT_IMAGE_CONFIG = {
     "directory_path": "temp/images",
     "tag_prefix": "[Image:",
@@ -445,39 +567,39 @@ def get_pattern_string(self) -> str:
 }
 
 
+# ============================================================================
+# Factory Function
+# ============================================================================
+
 def create_image_processor(
     directory_path: Optional[str] = None,
     tag_prefix: Optional[str] = None,
     tag_suffix: Optional[str] = None,
     naming_strategy: Optional[Union[NamingStrategy, str]] = None,
+    storage_backend: Optional[BaseStorageBackend] = None,
 ) -> ImageProcessor:
     """
     Create a new ImageProcessor instance.
-
+    
     Args:
-        directory_path: Image save directory (default: "temp/images")
-        tag_prefix: Tag prefix (default: "[Image:")
-        tag_suffix: Tag suffix (default: "]")
-        naming_strategy: File naming strategy (default: HASH)
-
+        directory_path: Image save directory
+        tag_prefix: Tag prefix
+        tag_suffix: Tag suffix
+        naming_strategy: File naming strategy
+        storage_backend: Storage backend instance
+        
     Returns:
-        New ImageProcessor instance
-
-    Examples:
-        >>> processor = create_image_processor(
-        ...     directory_path="output/images",
-        ...     tag_prefix="<img src='",
-        ...     tag_suffix="'/>"
-        ... )
+        ImageProcessor instance
     """
     if naming_strategy is not None and isinstance(naming_strategy, str):
         naming_strategy = NamingStrategy(naming_strategy.lower())
-
+    
     return ImageProcessor(
         directory_path=directory_path or DEFAULT_IMAGE_CONFIG["directory_path"],
         tag_prefix=tag_prefix or DEFAULT_IMAGE_CONFIG["tag_prefix"],
         tag_suffix=tag_suffix or DEFAULT_IMAGE_CONFIG["tag_suffix"],
         naming_strategy=naming_strategy or DEFAULT_IMAGE_CONFIG["naming_strategy"],
+        storage_backend=storage_backend,
     )
 
 
@@ -490,44 +612,33 @@ def save_image_to_file(
 ) -> Optional[str]:
     """
     Save image to file and return tag.
-
-    A simple function that replaces the existing image upload functions.
-
+    
+    Convenience function for quick image saving using local storage.
+    
     Args:
         image_data: Image binary data
         directory_path: Save directory
         tag_prefix: Tag prefix
         tag_suffix: Tag suffix
         processed_images: Set for duplicate tracking
-
+        
     Returns:
         Image tag string, or None on failure
-
-    Examples:
-        >>> tag = save_image_to_file(image_bytes)
-        "[Image:temp/abc123.png]"
-
-        >>> tag = save_image_to_file(
-        ...     image_bytes,
-        ...     directory_path="output",
-        ...     tag_prefix="<img src='",
-        ...     tag_suffix="'/>"
-        ... )
-        "<img src='output/abc123.png'/>"
     """
     processor = ImageProcessor(
         directory_path=directory_path,
         tag_prefix=tag_prefix,
         tag_suffix=tag_suffix,
     )
-
     return processor.save_image(image_data, processed_images=processed_images)
 
 
 __all__ = [
-    # Classes
+    # Main class
     "ImageProcessor",
+    # Config
     "ImageProcessorConfig",
+    # Enums
     "ImageFormat",
     "NamingStrategy",
     # Factory function
diff --git a/contextifier/core/functions/metadata_extractor.py b/contextifier/core/functions/metadata_extractor.py
new file mode 100644
index 0000000..d47a84b
--- /dev/null
+++ b/contextifier/core/functions/metadata_extractor.py
@@ -0,0 +1,542 @@
+# contextifier/core/functions/metadata_extractor.py
+"""
+Metadata Extractor Interface
+
+Provides abstract base class and common utilities for document metadata extraction.
+Each handler's helper module should implement a concrete extractor inheriting from
+BaseMetadataExtractor.
+
+This module defines:
+- DocumentMetadata: Standardized metadata container dataclass
+- MetadataField: Enum for standard metadata field names
+- BaseMetadataExtractor: Abstract base class for metadata extractors
+- MetadataFormatter: Shared formatter for consistent metadata output
+
+Usage Example:
+    from contextifier.core.functions.metadata_extractor import (
+        BaseMetadataExtractor,
+        DocumentMetadata,
+        MetadataFormatter,
+    )
+
+    class PDFMetadataExtractor(BaseMetadataExtractor):
+        def extract(self, source: Any) -> DocumentMetadata:
+            # PDF-specific extraction logic
+            ...
+"""
+import logging
+from abc import ABC, abstractmethod
+from dataclasses import dataclass, field
+from datetime import datetime
+from enum import Enum
+from typing import Any, Dict, Optional
+
+logger = logging.getLogger("contextify.metadata")
+
+
+class MetadataField(str, Enum):
+    """
+    Standard metadata field names.
+    
+    These field names are used consistently across all document formats
+    to ensure uniform metadata handling.
+    """
+    TITLE = "title"
+    SUBJECT = "subject"
+    AUTHOR = "author"
+    KEYWORDS = "keywords"
+    COMMENTS = "comments"
+    LAST_SAVED_BY = "last_saved_by"
+    CREATE_TIME = "create_time"
+    LAST_SAVED_TIME = "last_saved_time"
+    
+    # Additional fields for specific formats
+    VERSION = "version"
+    CATEGORY = "category"
+    COMPANY = "company"
+    MANAGER = "manager"
+    
+    # File-level metadata (for CSV, etc.)
+    FILE_NAME = "file_name"
+    FILE_SIZE = "file_size"
+    ENCODING = "encoding"
+    ROW_COUNT = "row_count"
+    COL_COUNT = "col_count"
+
+
+@dataclass
+class DocumentMetadata:
+    """
+    Standardized metadata container for all document types.
+    
+    This dataclass provides a unified structure for storing document metadata
+    across all supported file formats. It includes common fields and allows
+    for format-specific custom fields.
+    
+    Attributes:
+        title: Document title
+        subject: Document subject
+        author: Document author/creator
+        keywords: Document keywords
+        comments: Document comments/description
+        last_saved_by: Last person who saved the document
+        create_time: Document creation timestamp
+        last_saved_time: Last modification timestamp
+        custom: Dictionary for format-specific additional fields
+        
+    Example:
+        >>> metadata = DocumentMetadata(
+        ...     title="Annual Report",
+        ...     author="John Doe",
+        ...     create_time=datetime.now()
+        ... )
+        >>> metadata.to_dict()
+        {'title': 'Annual Report', 'author': 'John Doe', ...}
+    """
+    title: Optional[str] = None
+    subject: Optional[str] = None
+    author: Optional[str] = None
+    keywords: Optional[str] = None
+    comments: Optional[str] = None
+    last_saved_by: Optional[str] = None
+    create_time: Optional[datetime] = None
+    last_saved_time: Optional[datetime] = None
+    custom: Dict[str, Any] = field(default_factory=dict)
+    
+    def to_dict(self) -> Dict[str, Any]:
+        """
+        Convert metadata to dictionary.
+        
+        Returns:
+            Dictionary containing all non-None metadata fields.
+        """
+        result = {}
+        
+        if self.title:
+            result[MetadataField.TITLE.value] = self.title
+        if self.subject:
+            result[MetadataField.SUBJECT.value] = self.subject
+        if self.author:
+            result[MetadataField.AUTHOR.value] = self.author
+        if self.keywords:
+            result[MetadataField.KEYWORDS.value] = self.keywords
+        if self.comments:
+            result[MetadataField.COMMENTS.value] = self.comments
+        if self.last_saved_by:
+            result[MetadataField.LAST_SAVED_BY.value] = self.last_saved_by
+        if self.create_time:
+            result[MetadataField.CREATE_TIME.value] = self.create_time
+        if self.last_saved_time:
+            result[MetadataField.LAST_SAVED_TIME.value] = self.last_saved_time
+        
+        # Add custom fields
+        result.update(self.custom)
+        
+        return result
+    
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "DocumentMetadata":
+        """
+        Create DocumentMetadata from dictionary.
+        
+        Standard fields are extracted into their respective attributes,
+        while non-standard fields go into the custom dictionary.
+        
+        Args:
+            data: Dictionary containing metadata fields.
+            
+        Returns:
+            DocumentMetadata instance.
+        """
+        standard_fields = {
+            MetadataField.TITLE.value,
+            MetadataField.SUBJECT.value,
+            MetadataField.AUTHOR.value,
+            MetadataField.KEYWORDS.value,
+            MetadataField.COMMENTS.value,
+            MetadataField.LAST_SAVED_BY.value,
+            MetadataField.CREATE_TIME.value,
+            MetadataField.LAST_SAVED_TIME.value,
+        }
+        
+        custom = {k: v for k, v in data.items() if k not in standard_fields}
+        
+        return cls(
+            title=data.get(MetadataField.TITLE.value),
+            subject=data.get(MetadataField.SUBJECT.value),
+            author=data.get(MetadataField.AUTHOR.value),
+            keywords=data.get(MetadataField.KEYWORDS.value),
+            comments=data.get(MetadataField.COMMENTS.value),
+            last_saved_by=data.get(MetadataField.LAST_SAVED_BY.value),
+            create_time=data.get(MetadataField.CREATE_TIME.value),
+            last_saved_time=data.get(MetadataField.LAST_SAVED_TIME.value),
+            custom=custom,
+        )
+    
+    def is_empty(self) -> bool:
+        """
+        Check if metadata is empty (no fields set).
+        
+        Returns:
+            True if no metadata fields are set.
+        """
+        return not self.to_dict()
+    
+    def __bool__(self) -> bool:
+        """Return True if metadata has any fields set."""
+        return not self.is_empty()
+
+
+class MetadataFormatter:
+    """
+    Shared formatter for consistent metadata output.
+    
+    This class provides a unified way to format DocumentMetadata objects
+    as strings for inclusion in extracted text output.
+    
+    Attributes:
+        metadata_tag_prefix: Opening tag for metadata section (default: "<Document-Metadata>")
+        metadata_tag_suffix: Closing tag for metadata section (default: "</Document-Metadata>")
+        field_labels: Dictionary mapping field names to display labels
+        date_format: Date/time format string
+        language: Output language ('ko' for Korean, 'en' for English)
+        
+    Example:
+        >>> formatter = MetadataFormatter(language='en')
+        >>> text = formatter.format(metadata)
+        >>> print(text)
+        <Document-Metadata>
+          Title: Annual Report
+          Author: John Doe
+        </Document-Metadata>
+    """
+    
+    # Field labels in Korean
+    LABELS_KO = {
+        MetadataField.TITLE.value: "제목",
+        MetadataField.SUBJECT.value: "주제",
+        MetadataField.AUTHOR.value: "작성자",
+        MetadataField.KEYWORDS.value: "키워드",
+        MetadataField.COMMENTS.value: "설명",
+        MetadataField.LAST_SAVED_BY.value: "마지막 저장자",
+        MetadataField.CREATE_TIME.value: "작성일",
+        MetadataField.LAST_SAVED_TIME.value: "수정일",
+        # Additional fields
+        MetadataField.VERSION.value: "버전",
+        MetadataField.CATEGORY.value: "범주",
+        MetadataField.COMPANY.value: "회사",
+        MetadataField.MANAGER.value: "관리자",
+        MetadataField.FILE_NAME.value: "파일명",
+        MetadataField.FILE_SIZE.value: "파일 크기",
+        MetadataField.ENCODING.value: "인코딩",
+        MetadataField.ROW_COUNT.value: "행 수",
+        MetadataField.COL_COUNT.value: "열 수",
+    }
+    
+    # Field labels in English
+    LABELS_EN = {
+        MetadataField.TITLE.value: "Title",
+        MetadataField.SUBJECT.value: "Subject",
+        MetadataField.AUTHOR.value: "Author",
+        MetadataField.KEYWORDS.value: "Keywords",
+        MetadataField.COMMENTS.value: "Comments",
+        MetadataField.LAST_SAVED_BY.value: "Last Saved By",
+        MetadataField.CREATE_TIME.value: "Created",
+        MetadataField.LAST_SAVED_TIME.value: "Last Modified",
+        # Additional fields
+        MetadataField.VERSION.value: "Version",
+        MetadataField.CATEGORY.value: "Category",
+        MetadataField.COMPANY.value: "Company",
+        MetadataField.MANAGER.value: "Manager",
+        MetadataField.FILE_NAME.value: "File Name",
+        MetadataField.FILE_SIZE.value: "File Size",
+        MetadataField.ENCODING.value: "Encoding",
+        MetadataField.ROW_COUNT.value: "Row Count",
+        MetadataField.COL_COUNT.value: "Column Count",
+    }
+    
+    # Standard field order for output
+    FIELD_ORDER = [
+        MetadataField.TITLE.value,
+        MetadataField.SUBJECT.value,
+        MetadataField.AUTHOR.value,
+        MetadataField.KEYWORDS.value,
+        MetadataField.COMMENTS.value,
+        MetadataField.LAST_SAVED_BY.value,
+        MetadataField.CREATE_TIME.value,
+        MetadataField.LAST_SAVED_TIME.value,
+    ]
+    
+    def __init__(
+        self,
+        metadata_tag_prefix: str = "<Document-Metadata>",
+        metadata_tag_suffix: str = "</Document-Metadata>",
+        date_format: str = "%Y-%m-%d %H:%M:%S",
+        language: str = "ko",
+        indent: str = "  ",
+    ):
+        """
+        Initialize MetadataFormatter.
+        
+        Args:
+            metadata_tag_prefix: Opening tag for metadata section
+            metadata_tag_suffix: Closing tag for metadata section
+            date_format: strftime format for datetime values
+            language: Output language ('ko' or 'en')
+            indent: Indentation string for each field
+        """
+        self.metadata_tag_prefix = metadata_tag_prefix
+        self.metadata_tag_suffix = metadata_tag_suffix
+        self.date_format = date_format
+        self.language = language
+        self.indent = indent
+        
+        # Select labels based on language
+        self.field_labels = self.LABELS_KO if language == "ko" else self.LABELS_EN
+    
+    def format(self, metadata: DocumentMetadata) -> str:
+        """
+        Format DocumentMetadata as a string.
+        
+        Args:
+            metadata: DocumentMetadata instance to format.
+            
+        Returns:
+            Formatted metadata string, or empty string if metadata is empty.
+        """
+        if not metadata:
+            return ""
+        
+        data = metadata.to_dict()
+        if not data:
+            return ""
+        
+        lines = [self.metadata_tag_prefix]
+        
+        # Output standard fields in order
+        for field_name in self.FIELD_ORDER:
+            if field_name in data:
+                value = data.pop(field_name)
+                formatted_line = self._format_field(field_name, value)
+                if formatted_line:
+                    lines.append(formatted_line)
+        
+        # Output remaining custom fields
+        for field_name, value in data.items():
+            formatted_line = self._format_field(field_name, value)
+            if formatted_line:
+                lines.append(formatted_line)
+        
+        lines.append(self.metadata_tag_suffix)
+        
+        return "\n".join(lines)
+    
+    def format_dict(self, metadata_dict: Dict[str, Any]) -> str:
+        """
+        Format metadata dictionary as a string.
+        
+        Convenience method for formatting raw dictionaries without
+        first converting to DocumentMetadata.
+        
+        Args:
+            metadata_dict: Dictionary containing metadata fields.
+            
+        Returns:
+            Formatted metadata string.
+        """
+        if not metadata_dict:
+            return ""
+        
+        return self.format(DocumentMetadata.from_dict(metadata_dict))
+    
+    def _format_field(self, field_name: str, value: Any) -> Optional[str]:
+        """
+        Format a single metadata field.
+        
+        Args:
+            field_name: Field name
+            value: Field value
+            
+        Returns:
+            Formatted field string, or None if value is empty.
+        """
+        if value is None:
+            return None
+        
+        # Format datetime values
+        if isinstance(value, datetime):
+            value = value.strftime(self.date_format)
+        
+        # Get label (use field name as fallback)
+        label = self.field_labels.get(field_name, field_name.replace("_", " ").title())
+        
+        return f"{self.indent}{label}: {value}"
+    
+    def get_label(self, field_name: str) -> str:
+        """
+        Get display label for a field name.
+        
+        Args:
+            field_name: Field name
+            
+        Returns:
+            Display label for the field.
+        """
+        return self.field_labels.get(field_name, field_name.replace("_", " ").title())
+
+
+class BaseMetadataExtractor(ABC):
+    """
+    Abstract base class for metadata extractors.
+    
+    Each document format should implement a concrete extractor
+    that inherits from this class and provides format-specific
+    extraction logic.
+    
+    Subclasses must implement:
+        - extract(): Extract metadata from format-specific source object
+        
+    Subclasses may optionally override:
+        - format(): Customize metadata formatting
+        - get_formatter(): Provide custom formatter instance
+        
+    Attributes:
+        formatter: MetadataFormatter instance for output formatting
+        logger: Logger instance for this extractor
+        
+    Example:
+        class PDFMetadataExtractor(BaseMetadataExtractor):
+            def extract(self, doc) -> DocumentMetadata:
+                # Extract from PyMuPDF document object
+                pdf_meta = doc.metadata
+                return DocumentMetadata(
+                    title=pdf_meta.get('title'),
+                    author=pdf_meta.get('author'),
+                    ...
+                )
+    """
+    
+    def __init__(
+        self,
+        formatter: Optional[MetadataFormatter] = None,
+        language: str = "ko",
+    ):
+        """
+        Initialize BaseMetadataExtractor.
+        
+        Args:
+            formatter: Custom MetadataFormatter instance (optional)
+            language: Default language for formatter if not provided
+        """
+        self._formatter = formatter or MetadataFormatter(language=language)
+        self._logger = logging.getLogger(
+            f"contextify.metadata.{self.__class__.__name__}"
+        )
+    
+    @property
+    def formatter(self) -> MetadataFormatter:
+        """Get the metadata formatter instance."""
+        return self._formatter
+    
+    @property
+    def logger(self) -> logging.Logger:
+        """Get the logger instance."""
+        return self._logger
+    
+    @abstractmethod
+    def extract(self, source: Any) -> DocumentMetadata:
+        """
+        Extract metadata from source object.
+        
+        This method must be implemented by subclasses to provide
+        format-specific metadata extraction logic.
+        
+        Args:
+            source: Format-specific source object (e.g., PyMuPDF doc,
+                    python-docx Document, openpyxl Workbook, etc.)
+                    
+        Returns:
+            DocumentMetadata instance containing extracted metadata.
+        """
+        pass
+    
+    def format(self, metadata: DocumentMetadata) -> str:
+        """
+        Format metadata as a string.
+        
+        Uses the formatter to convert DocumentMetadata to a string.
+        Can be overridden by subclasses for custom formatting.
+        
+        Args:
+            metadata: DocumentMetadata instance to format.
+            
+        Returns:
+            Formatted metadata string.
+        """
+        return self._formatter.format(metadata)
+    
+    def extract_and_format(self, source: Any) -> str:
+        """
+        Extract metadata and format as string in one step.
+        
+        Convenience method that combines extract() and format().
+        
+        Args:
+            source: Format-specific source object.
+            
+        Returns:
+            Formatted metadata string.
+        """
+        try:
+            metadata = self.extract(source)
+            return self.format(metadata)
+        except Exception as e:
+            self._logger.warning(f"Failed to extract metadata: {e}")
+            return ""
+    
+    def extract_to_dict(self, source: Any) -> Dict[str, Any]:
+        """
+        Extract metadata and return as dictionary.
+        
+        Convenience method that extracts metadata and converts to dict.
+        
+        Args:
+            source: Format-specific source object.
+            
+        Returns:
+            Dictionary containing metadata fields.
+        """
+        try:
+            metadata = self.extract(source)
+            return metadata.to_dict()
+        except Exception as e:
+            self._logger.warning(f"Failed to extract metadata: {e}")
+            return {}
+
+
+# Default formatter instance (Korean)
+_default_formatter = MetadataFormatter(language="ko")
+
+
+def format_metadata(metadata: Dict[str, Any]) -> str:
+    """
+    Format metadata dictionary as a string.
+    
+    Convenience function using default formatter for backward compatibility.
+    
+    Args:
+        metadata: Dictionary containing metadata fields.
+        
+    Returns:
+        Formatted metadata string.
+    """
+    return _default_formatter.format_dict(metadata)
+
+
+__all__ = [
+    "MetadataField",
+    "DocumentMetadata",
+    "MetadataFormatter",
+    "BaseMetadataExtractor",
+    "format_metadata",
+]
diff --git a/contextifier/core/functions/preprocessor.py b/contextifier/core/functions/preprocessor.py
new file mode 100644
index 0000000..cbf471d
--- /dev/null
+++ b/contextifier/core/functions/preprocessor.py
@@ -0,0 +1,161 @@
+# libs/core/functions/preprocessor.py
+"""
+BasePreprocessor - Abstract base class for data preprocessing
+
+Defines the interface for preprocessing data after file conversion.
+Used when converted data needs special handling before content extraction.
+
+The preprocessor's job is to:
+1. Clean/normalize converted data
+2. Extract embedded resources (images, etc.)
+3. Detect encoding information
+4. Return preprocessed data ready for further processing
+
+Processing Pipeline Position:
+    1. FileConverter.convert() → Format-specific object
+    2. Preprocessor.preprocess() → Cleaned/processed data (THIS STEP)
+    3. MetadataExtractor.extract() → Metadata
+    4. Content extraction
+
+Usage:
+    class PDFPreprocessor(BasePreprocessor):
+        def preprocess(self, converted_data: Any, **kwargs) -> PreprocessedData:
+            # Process the fitz.Document, normalize pages, etc.
+            return PreprocessedData(
+                clean_content=b"",
+                encoding="utf-8",
+                extracted_resources={"document": converted_data}
+            )
+
+        def get_format_name(self) -> str:
+            return "PDF Preprocessor"
+"""
+from abc import ABC, abstractmethod
+from dataclasses import dataclass, field
+from typing import Any, Dict
+
+
+@dataclass
+class PreprocessedData:
+    """
+    Result of preprocessing operation.
+
+    Contains cleaned content and any extracted resources.
+
+    Attributes:
+        raw_content: Original input data (for reference)
+        clean_content: Processed content ready for use - THIS IS THE TRUE SOURCE
+                      Can be any type: bytes, str, Document, Workbook, OleFileIO, etc.
+        encoding: Detected or default encoding (for text-based content)
+        extracted_resources: Dict of extracted resources (images, etc.)
+        metadata: Any metadata discovered during preprocessing
+    """
+    raw_content: Any = None
+    clean_content: Any = None  # TRUE SOURCE - The processed result
+    encoding: str = "utf-8"
+    extracted_resources: Dict[str, Any] = field(default_factory=dict)
+    metadata: Dict[str, Any] = field(default_factory=dict)
+
+
+class BasePreprocessor(ABC):
+    """
+    Abstract base class for data preprocessors.
+
+    Preprocesses converted data after FileConverter.convert().
+    Used when converted data needs normalization or special handling
+    before content extraction.
+
+    Processing Pipeline:
+        1. FileConverter.convert() → Format-specific object
+        2. Preprocessor.preprocess() → Cleaned/processed data (THIS STEP)
+        3. MetadataExtractor.extract() → Metadata
+        4. Content extraction
+
+    Subclasses must implement:
+    - preprocess(): Process converted data and return PreprocessedData
+    - get_format_name(): Return human-readable format name
+    """
+
+    @abstractmethod
+    def preprocess(
+        self,
+        converted_data: Any,
+        **kwargs
+    ) -> PreprocessedData:
+        """
+        Preprocess converted data.
+
+        Args:
+            converted_data: Data from FileConverter.convert()
+                           (format-specific object, bytes, or other type)
+            **kwargs: Additional format-specific options
+
+        Returns:
+            PreprocessedData containing cleaned content and extracted resources
+
+        Raises:
+            PreprocessingError: If preprocessing fails
+        """
+        pass
+
+    @abstractmethod
+    def get_format_name(self) -> str:
+        """
+        Return human-readable format name.
+
+        Returns:
+            Format name string (e.g., "PDF Preprocessor")
+        """
+        pass
+
+    def validate(self, data: Any) -> bool:
+        """
+        Validate if the data can be preprocessed by this preprocessor.
+
+        Override this method to add format-specific validation.
+        Default implementation returns True.
+
+        Args:
+            data: Data to validate (converted data or raw bytes)
+
+        Returns:
+            True if data can be preprocessed, False otherwise
+        """
+        _ = data  # Suppress unused argument warning
+        return True
+
+
+class NullPreprocessor(BasePreprocessor):
+    """
+    Null preprocessor that passes data through unchanged.
+
+    Used as default when no preprocessing is needed.
+    clean_content always contains the processed result (same as input for pass-through).
+    """
+
+    def preprocess(
+        self,
+        converted_data: Any,
+        **kwargs
+    ) -> PreprocessedData:
+        """Pass data through unchanged. clean_content = converted_data."""
+        encoding = kwargs.get("encoding", "utf-8")
+
+        # clean_content is ALWAYS the True Source - contains the processed result
+        # For pass-through, it's the same as the input
+        return PreprocessedData(
+            raw_content=converted_data,
+            clean_content=converted_data,  # TRUE SOURCE
+            encoding=encoding,
+        )
+
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "Null Preprocessor (pass-through)"
+
+
+__all__ = [
+    'BasePreprocessor',
+    'NullPreprocessor',
+    'PreprocessedData',
+]
diff --git a/contextifier/core/functions/storage_backend.py b/contextifier/core/functions/storage_backend.py
new file mode 100644
index 0000000..2118594
--- /dev/null
+++ b/contextifier/core/functions/storage_backend.py
@@ -0,0 +1,381 @@
+# contextifier/core/functions/storage_backend.py
+"""
+Storage Backend Module
+
+Provides abstract base class and implementations for image storage backends.
+ImageProcessor uses these backends to save images to different storage systems.
+
+Storage Backends:
+- LocalStorageBackend: Save to local file system
+- MinIOStorageBackend: Save to MinIO object storage (stub)
+- S3StorageBackend: Save to AWS S3 (stub)
+
+Usage Example:
+    from contextifier.core.functions.storage_backend import (
+        LocalStorageBackend,
+        MinIOStorageBackend,
+    )
+    from contextifier.core.functions.img_processor import ImageProcessor
+
+    # Use local storage (default)
+    processor = ImageProcessor()
+
+    # Use MinIO storage
+    minio_backend = MinIOStorageBackend(
+        endpoint="localhost:9000",
+        bucket="images"
+    )
+    processor = ImageProcessor(storage_backend=minio_backend)
+"""
+import logging
+import os
+from abc import ABC, abstractmethod
+from enum import Enum
+from pathlib import Path
+from typing import Any, Dict, Optional
+
+logger = logging.getLogger("contextify.storage")
+
+
+class StorageType(Enum):
+    """Storage backend types."""
+    LOCAL = "local"
+    MINIO = "minio"
+    S3 = "s3"
+    AZURE_BLOB = "azure_blob"
+    GCS = "gcs"  # Google Cloud Storage
+
+
+class BaseStorageBackend(ABC):
+    """
+    Abstract base class for storage backends.
+    
+    Each storage type implements this interface to provide
+    storage-specific save/delete logic.
+    
+    Subclasses must implement:
+        - save(): Save data to storage
+        - delete(): Delete file from storage
+        - exists(): Check if file exists
+        - ensure_ready(): Prepare storage (create dirs, validate connection)
+    """
+    
+    def __init__(self, storage_type: StorageType):
+        self._storage_type = storage_type
+        self._logger = logging.getLogger(
+            f"contextify.storage.{self.__class__.__name__}"
+        )
+    
+    @property
+    def storage_type(self) -> StorageType:
+        """Get storage type."""
+        return self._storage_type
+    
+    @property
+    def logger(self) -> logging.Logger:
+        """Get logger."""
+        return self._logger
+    
+    @abstractmethod
+    def save(self, data: bytes, file_path: str) -> bool:
+        """
+        Save data to storage.
+        
+        Args:
+            data: Binary data to save
+            file_path: Target file path or key
+            
+        Returns:
+            True if successful, False otherwise
+        """
+        pass
+    
+    @abstractmethod
+    def delete(self, file_path: str) -> bool:
+        """
+        Delete file from storage.
+        
+        Args:
+            file_path: File path or key to delete
+            
+        Returns:
+            True if successful, False otherwise
+        """
+        pass
+    
+    @abstractmethod
+    def exists(self, file_path: str) -> bool:
+        """
+        Check if file exists in storage.
+        
+        Args:
+            file_path: File path or key to check
+            
+        Returns:
+            True if file exists
+        """
+        pass
+    
+    @abstractmethod
+    def ensure_ready(self, directory_path: str) -> None:
+        """
+        Ensure storage is ready (create directory, validate connection, etc.).
+        
+        Args:
+            directory_path: Base directory or bucket path
+        """
+        pass
+    
+    def build_url(self, file_path: str) -> str:
+        """
+        Build URL or path for the saved file.
+        
+        Override in subclasses for storage-specific URL formats.
+        
+        Args:
+            file_path: File path or key
+            
+        Returns:
+            URL or path string
+        """
+        return file_path.replace("\\", "/")
+
+
+class LocalStorageBackend(BaseStorageBackend):
+    """
+    Local file system storage backend.
+    
+    Saves files to the local file system.
+    """
+    
+    def __init__(self):
+        super().__init__(StorageType.LOCAL)
+    
+    def save(self, data: bytes, file_path: str) -> bool:
+        """Save data to local file."""
+        try:
+            with open(file_path, 'wb') as f:
+                f.write(data)
+            return True
+        except Exception as e:
+            self._logger.error(f"Failed to save file {file_path}: {e}")
+            return False
+    
+    def delete(self, file_path: str) -> bool:
+        """Delete local file."""
+        try:
+            if os.path.exists(file_path):
+                os.remove(file_path)
+                return True
+            return False
+        except Exception as e:
+            self._logger.warning(f"Failed to delete file {file_path}: {e}")
+            return False
+    
+    def exists(self, file_path: str) -> bool:
+        """Check if local file exists."""
+        return os.path.exists(file_path)
+    
+    def ensure_ready(self, directory_path: str) -> None:
+        """Create directory if it doesn't exist."""
+        path = Path(directory_path)
+        if not path.exists():
+            path.mkdir(parents=True, exist_ok=True)
+            self._logger.debug(f"Created directory: {path}")
+
+
+class MinIOStorageBackend(BaseStorageBackend):
+    """
+    MinIO object storage backend (STUB - Not Implemented).
+    
+    This is a placeholder for MinIO integration.
+    Requires minio package to be installed.
+    
+    Args:
+        endpoint: MinIO server endpoint
+        access_key: MinIO access key
+        secret_key: MinIO secret key
+        bucket: Target bucket name
+        secure: Use HTTPS (default: True)
+    """
+    
+    def __init__(
+        self,
+        endpoint: str = "localhost:9000",
+        access_key: str = "",
+        secret_key: str = "",
+        bucket: str = "images",
+        secure: bool = True,
+    ):
+        super().__init__(StorageType.MINIO)
+        self._endpoint = endpoint
+        self._access_key = access_key
+        self._secret_key = secret_key
+        self._bucket = bucket
+        self._secure = secure
+        self._client = None
+        
+        self._logger.warning(
+            "MinIOStorageBackend is a stub implementation. "
+            "Full implementation is pending."
+        )
+    
+    @property
+    def bucket(self) -> str:
+        """Get bucket name."""
+        return self._bucket
+    
+    @property
+    def endpoint(self) -> str:
+        """Get endpoint."""
+        return self._endpoint
+    
+    def save(self, data: bytes, file_path: str) -> bool:
+        """Upload data to MinIO bucket."""
+        raise NotImplementedError(
+            "MinIOStorageBackend.save() is not yet implemented. "
+            "Use LocalStorageBackend for now."
+        )
+    
+    def delete(self, file_path: str) -> bool:
+        """Delete object from MinIO bucket."""
+        raise NotImplementedError(
+            "MinIOStorageBackend.delete() is not yet implemented."
+        )
+    
+    def exists(self, file_path: str) -> bool:
+        """Check if object exists in MinIO bucket."""
+        raise NotImplementedError(
+            "MinIOStorageBackend.exists() is not yet implemented."
+        )
+    
+    def ensure_ready(self, directory_path: str) -> None:
+        """Initialize MinIO client and ensure bucket exists."""
+        raise NotImplementedError(
+            "MinIOStorageBackend.ensure_ready() is not yet implemented."
+        )
+    
+    def build_url(self, file_path: str) -> str:
+        """Build MinIO URL for the file."""
+        # Would return presigned URL or object path
+        protocol = "https" if self._secure else "http"
+        return f"{protocol}://{self._endpoint}/{self._bucket}/{file_path}"
+
+
+class S3StorageBackend(BaseStorageBackend):
+    """
+    AWS S3 storage backend (STUB - Not Implemented).
+    
+    This is a placeholder for AWS S3 integration.
+    Requires boto3 package to be installed.
+    
+    Args:
+        bucket: S3 bucket name
+        region: AWS region (default: "us-east-1")
+        prefix: Key prefix for uploaded objects
+    """
+    
+    def __init__(
+        self,
+        bucket: str = "",
+        region: str = "us-east-1",
+        prefix: str = "",
+    ):
+        super().__init__(StorageType.S3)
+        self._bucket = bucket
+        self._region = region
+        self._prefix = prefix
+        self._client = None
+        
+        self._logger.warning(
+            "S3StorageBackend is a stub implementation. "
+            "Full implementation is pending."
+        )
+    
+    @property
+    def bucket(self) -> str:
+        """Get bucket name."""
+        return self._bucket
+    
+    @property
+    def region(self) -> str:
+        """Get region."""
+        return self._region
+    
+    def save(self, data: bytes, file_path: str) -> bool:
+        """Upload data to S3 bucket."""
+        raise NotImplementedError(
+            "S3StorageBackend.save() is not yet implemented. "
+            "Use LocalStorageBackend for now."
+        )
+    
+    def delete(self, file_path: str) -> bool:
+        """Delete object from S3 bucket."""
+        raise NotImplementedError(
+            "S3StorageBackend.delete() is not yet implemented."
+        )
+    
+    def exists(self, file_path: str) -> bool:
+        """Check if object exists in S3 bucket."""
+        raise NotImplementedError(
+            "S3StorageBackend.exists() is not yet implemented."
+        )
+    
+    def ensure_ready(self, directory_path: str) -> None:
+        """Initialize S3 client and verify bucket access."""
+        raise NotImplementedError(
+            "S3StorageBackend.ensure_ready() is not yet implemented."
+        )
+    
+    def build_url(self, file_path: str) -> str:
+        """Build S3 URL for the file."""
+        # Would return S3 URI or presigned URL
+        return f"s3://{self._bucket}/{file_path}"
+
+
+# Default backend instance
+_default_backend = LocalStorageBackend()
+
+
+def get_default_backend() -> BaseStorageBackend:
+    """Get the default storage backend (local)."""
+    return _default_backend
+
+
+def create_storage_backend(
+    storage_type: StorageType = StorageType.LOCAL,
+    **kwargs
+) -> BaseStorageBackend:
+    """
+    Factory function to create a storage backend.
+    
+    Args:
+        storage_type: Type of storage backend
+        **kwargs: Storage-specific options
+        
+    Returns:
+        BaseStorageBackend instance
+    """
+    if storage_type == StorageType.LOCAL:
+        return LocalStorageBackend()
+    elif storage_type == StorageType.MINIO:
+        return MinIOStorageBackend(**kwargs)
+    elif storage_type == StorageType.S3:
+        return S3StorageBackend(**kwargs)
+    else:
+        raise ValueError(f"Unsupported storage type: {storage_type}")
+
+
+__all__ = [
+    # Enum
+    "StorageType",
+    # Base class
+    "BaseStorageBackend",
+    # Implementations
+    "LocalStorageBackend",
+    "MinIOStorageBackend",
+    "S3StorageBackend",
+    # Factory
+    "create_storage_backend",
+    "get_default_backend",
+]
diff --git a/contextifier/core/processor/__init__.py b/contextifier/core/processor/__init__.py
index 35f83ed..6543ba0 100644
--- a/contextifier/core/processor/__init__.py
+++ b/contextifier/core/processor/__init__.py
@@ -7,7 +7,8 @@
 Handler List:
 - pdf_handler: PDF document processing (adaptive complexity-based)
 - docx_handler: DOCX document processing
-- doc_handler: DOC document processing (including RTF)
+- doc_handler: DOC document processing (OLE, HTML, misnamed DOCX)
+- rtf_handler: RTF document processing
 - ppt_handler: PPT/PPTX document processing
 - excel_handler: Excel (XLSX/XLS) document processing
 - hwp_processor: HWP document processing
@@ -19,7 +20,8 @@
 Helper Modules (subdirectories):
 - csv_helper/: CSV processing helper
 - docx_helper/: DOCX processing helper
-- doc_helpers/: DOC/RTF processing helper
+- doc_helpers/: DOC processing helper
+- rtf_helper/: RTF processing helper
 - excel_helper/: Excel processing helper
 - hwp_helper/: HWP processing helper
 - hwpx_helper/: HWPX processing helper
@@ -29,6 +31,7 @@
 Usage Example:
     from contextifier.core.processor import PDFHandler
     from contextifier.core.processor import DOCXHandler
+    from contextifier.core.processor import RTFHandler
     from contextifier.core.processor.pdf_helpers import extract_pdf_metadata
 """
 
@@ -38,6 +41,7 @@
 # === Document Handlers ===
 from contextifier.core.processor.docx_handler import DOCXHandler
 from contextifier.core.processor.doc_handler import DOCHandler
+from contextifier.core.processor.rtf_handler import RTFHandler
 from contextifier.core.processor.ppt_handler import PPTHandler
 
 # === Data Handlers ===
@@ -47,19 +51,21 @@
 
 # === HWP Handlers ===
 from contextifier.core.processor.hwp_handler import HWPHandler
-from contextifier.core.processor.hwps_handler import HWPXHandler
+from contextifier.core.processor.hwpx_handler import HWPXHandler
 
 # === Other Processors ===
 # from contextifier.core.processor.html_reprocessor import ...  # HTML reprocessing
 
 # === Helper Modules (subpackages) ===
 from contextifier.core.processor import csv_helper
+from contextifier.core.processor import doc_helpers
 from contextifier.core.processor import docx_helper
 from contextifier.core.processor import excel_helper
 from contextifier.core.processor import hwp_helper
 from contextifier.core.processor import hwpx_helper
 from contextifier.core.processor import pdf_helpers
 from contextifier.core.processor import ppt_helper
+from contextifier.core.processor import rtf_helper
 
 __all__ = [
     # PDF Handler
@@ -67,6 +73,7 @@
     # Document Handlers
     "DOCXHandler",
     "DOCHandler",
+    "RTFHandler",
     "PPTHandler",
     # Data Handlers
     "ExcelHandler",
@@ -77,10 +84,12 @@
     "HWPXHandler",
     # Helper subpackages
     "csv_helper",
+    "doc_helpers",
     "docx_helper",
     "excel_helper",
     "hwp_helper",
     "hwpx_helper",
     "pdf_helpers",
     "ppt_helper",
+    "rtf_helper",
 ]
diff --git a/contextifier/core/processor/base_handler.py b/contextifier/core/processor/base_handler.py
index 6b8d42f..14f5b7c 100644
--- a/contextifier/core/processor/base_handler.py
+++ b/contextifier/core/processor/base_handler.py
@@ -3,28 +3,69 @@
 BaseHandler - Abstract base class for document processing handlers
 
 Defines the base interface for all document handlers.
-Manages config, ImageProcessor, PageTagProcessor, and ChartProcessor passed from 
-DocumentProcessor at instance level for reuse by internal methods.
+Manages config, ImageProcessor, PageTagProcessor, ChartProcessor, MetadataExtractor,
+Preprocessor, and format-specific ImageProcessor passed from DocumentProcessor at
+instance level for reuse by internal methods.
 
-Each handler should override _create_chart_extractor() to provide a format-specific
-chart extractor implementation.
+Each handler should override:
+- _create_file_converter(): Provide format-specific file converter
+- _create_preprocessor(): Provide format-specific preprocessor
+- _create_chart_extractor(): Provide format-specific chart extractor
+- _create_metadata_extractor(): Provide format-specific metadata extractor
+- _create_format_image_processor(): Provide format-specific image processor
+
+Processing Pipeline:
+    1. file_converter.convert() - Binary → Format-specific object (e.g., bytes → fitz.Document)
+    2. preprocessor.preprocess() - Process/clean the converted data
+    3. metadata_extractor.extract() - Extract document metadata
+    4. Format-specific content extraction (text, images, charts, tables)
 
 Usage Example:
     class PDFHandler(BaseHandler):
+        def _create_file_converter(self):
+            return PDFFileConverter()
+
+        def _create_preprocessor(self):
+            return PDFPreprocessor()  # Or NullPreprocessor() if no preprocessing needed
+
+        def _create_metadata_extractor(self):
+            return PDFMetadataExtractor()
+
+        def _create_format_image_processor(self):
+            return PDFImageProcessor(image_processor=self._image_processor)
+
         def extract_text(self, current_file: CurrentFile, extract_metadata: bool = True) -> str:
-            # Access self.config, self.image_processor, self.page_tag_processor
-            # Use self.chart_extractor.process(chart_element) for chart extraction
+            # Step 1: Convert binary to format-specific object
+            doc = self.convert_file(current_file)
+            # Step 2: Preprocess the converted object
+            preprocessed = self.preprocess(doc)
+            # Step 3: Extract metadata
+            metadata = self.extract_metadata(doc)
+            # Step 4: Process content
             ...
 """
 import io
 import logging
 from abc import ABC, abstractmethod
-from typing import Any, Dict, List, Optional, TYPE_CHECKING
+from typing import Any, Dict, Optional, TYPE_CHECKING
 
 from contextifier.core.functions.img_processor import ImageProcessor
 from contextifier.core.functions.page_tag_processor import PageTagProcessor
 from contextifier.core.functions.chart_processor import ChartProcessor
 from contextifier.core.functions.chart_extractor import BaseChartExtractor, NullChartExtractor
+from contextifier.core.functions.metadata_extractor import (
+    BaseMetadataExtractor,
+    DocumentMetadata,
+)
+from contextifier.core.functions.file_converter import (
+    BaseFileConverter,
+    NullFileConverter,
+)
+from contextifier.core.functions.preprocessor import (
+    BasePreprocessor,
+    NullPreprocessor,
+    PreprocessedData,
+)
 
 if TYPE_CHECKING:
     from contextifier.core.document_processor import CurrentFile
@@ -32,27 +73,56 @@ def extract_text(self, current_file: CurrentFile, extract_metadata: bool = True)
 logger = logging.getLogger("document-processor")
 
 
+class NullMetadataExtractor(BaseMetadataExtractor):
+    """
+    Null implementation of metadata extractor.
+
+    Used as default when no format-specific extractor is provided.
+    Always returns empty metadata.
+    """
+
+    def extract(self, source: Any) -> DocumentMetadata:
+        """Return empty metadata."""
+        return DocumentMetadata()
+
+
 class BaseHandler(ABC):
     """
     Abstract base class for document handlers.
-    
+
     All handlers inherit from this class.
-    config, image_processor, page_tag_processor, and chart_processor are passed 
-    at creation and stored as instance variables.
-    
-    Each handler should override _create_chart_extractor() to provide a
-    format-specific chart extractor. The chart_extractor is lazy-initialized
-    on first access.
-    
+    config, image_processor, page_tag_processor, chart_processor, metadata_extractor,
+    preprocessor, and format_image_processor are passed at creation and stored as
+    instance variables.
+
+    Each handler should override:
+    - _create_file_converter(): Provide format-specific file converter
+    - _create_preprocessor(): Provide format-specific preprocessor
+    - _create_chart_extractor(): Provide format-specific chart extractor
+    - _create_metadata_extractor(): Provide format-specific metadata extractor
+    - _create_format_image_processor(): Provide format-specific image processor
+
+    All are lazy-initialized on first access.
+
+    Processing Pipeline:
+        1. file_converter.convert() - Binary → Format-specific object
+        2. preprocessor.preprocess() - Process/clean the converted data
+        3. metadata_extractor.extract() - Extract document metadata
+        4. Format-specific content extraction
+
     Attributes:
         config: Configuration dictionary passed from DocumentProcessor
-        image_processor: ImageProcessor instance passed from DocumentProcessor
+        image_processor: Core ImageProcessor instance passed from DocumentProcessor
+        format_image_processor: Format-specific image processor (lazy-initialized)
         page_tag_processor: PageTagProcessor instance passed from DocumentProcessor
         chart_processor: ChartProcessor instance passed from DocumentProcessor
         chart_extractor: Format-specific chart extractor instance
+        preprocessor: Format-specific preprocessor instance
+        metadata_extractor: Format-specific metadata extractor instance
+        file_converter: Format-specific file converter instance
         logger: Logging instance
     """
-    
+
     def __init__(
         self,
         config: Optional[Dict[str, Any]] = None,
@@ -62,7 +132,7 @@ def __init__(
     ):
         """
         Initialize BaseHandler.
-        
+
         Args:
             config: Configuration dictionary (passed from DocumentProcessor)
             image_processor: ImageProcessor instance (passed from DocumentProcessor)
@@ -74,68 +144,194 @@ def __init__(
         self._page_tag_processor = page_tag_processor or self._get_page_tag_processor_from_config()
         self._chart_processor = chart_processor or self._get_chart_processor_from_config()
         self._chart_extractor: Optional[BaseChartExtractor] = None
+        self._metadata_extractor: Optional[BaseMetadataExtractor] = None
+        self._file_converter: Optional[BaseFileConverter] = None
+        self._preprocessor: Optional[BasePreprocessor] = None
+        self._format_image_processor: Optional[ImageProcessor] = None
         self._logger = logging.getLogger(f"document-processor.{self.__class__.__name__}")
-    
+
     def _get_page_tag_processor_from_config(self) -> PageTagProcessor:
         """Get PageTagProcessor from config or create default."""
         if self._config and "page_tag_processor" in self._config:
             return self._config["page_tag_processor"]
         return PageTagProcessor()
-    
+
     def _get_chart_processor_from_config(self) -> ChartProcessor:
         """Get ChartProcessor from config or create default."""
         if self._config and "chart_processor" in self._config:
             return self._config["chart_processor"]
         return ChartProcessor()
-    
+
     def _create_chart_extractor(self) -> BaseChartExtractor:
         """
         Create format-specific chart extractor.
-        
+
         Override this method in subclasses to provide the appropriate
         chart extractor for the file format.
-        
+
         Returns:
             BaseChartExtractor subclass instance
         """
         return NullChartExtractor(self._chart_processor)
-    
+
+    def _create_metadata_extractor(self) -> BaseMetadataExtractor:
+        """
+        Create format-specific metadata extractor.
+
+        Override this method in subclasses to provide the appropriate
+        metadata extractor for the file format.
+
+        Returns:
+            BaseMetadataExtractor subclass instance
+        """
+        return NullMetadataExtractor()
+
+    def _create_format_image_processor(self) -> ImageProcessor:
+        """
+        Create format-specific image processor.
+
+        Override this method in subclasses to provide the appropriate
+        image processor for the file format.
+
+        Returns:
+            ImageProcessor subclass instance
+        """
+        return self._image_processor
+
+    def _create_file_converter(self) -> BaseFileConverter:
+        """
+        Create format-specific file converter.
+
+        Override this method in subclasses to provide the appropriate
+        file converter for the file format.
+
+        The file converter transforms raw binary data into a workable
+        format-specific object (e.g., Document, Workbook, OLE file).
+
+        Returns:
+            BaseFileConverter subclass instance
+        """
+        return NullFileConverter()
+
+    def _create_preprocessor(self) -> BasePreprocessor:
+        """
+        Create format-specific preprocessor.
+
+        Override this method in subclasses to provide the appropriate
+        preprocessor for the file format.
+
+        The preprocessor processes/cleans the converted data before
+        further extraction. This is the SECOND step in the pipeline,
+        after file_converter.convert().
+
+        Pipeline:
+            1. file_converter.convert() → Format-specific object
+            2. preprocessor.preprocess() → Cleaned/processed data
+            3. metadata_extractor.extract() → Metadata
+            4. Content extraction
+
+        Returns:
+            BasePreprocessor subclass instance (NullPreprocessor if no preprocessing needed)
+        """
+        return NullPreprocessor()
+
     @property
     def config(self) -> Dict[str, Any]:
         """Configuration dictionary."""
         return self._config
-    
+
     @property
     def image_processor(self) -> ImageProcessor:
         """ImageProcessor instance."""
         return self._image_processor
-    
+
     @property
     def page_tag_processor(self) -> PageTagProcessor:
         """PageTagProcessor instance."""
         return self._page_tag_processor
-    
+
     @property
     def chart_processor(self) -> ChartProcessor:
         """ChartProcessor instance."""
         return self._chart_processor
-    
+
     @property
     def chart_extractor(self) -> BaseChartExtractor:
         """
         Format-specific chart extractor (lazy-initialized).
-        
+
         Returns the chart extractor for this handler's file format.
         """
         if self._chart_extractor is None:
             self._chart_extractor = self._create_chart_extractor()
         return self._chart_extractor
-    
+
+    @property
+    def metadata_extractor(self) -> BaseMetadataExtractor:
+        """
+        Format-specific metadata extractor (lazy-initialized).
+
+        Returns the metadata extractor for this handler's file format.
+        """
+        if self._metadata_extractor is None:
+            extractor = self._create_metadata_extractor()
+            # If subclass returns None, use NullMetadataExtractor
+            self._metadata_extractor = extractor if extractor is not None else NullMetadataExtractor()
+        return self._metadata_extractor
+
+    @property
+    def format_image_processor(self) -> ImageProcessor:
+        """
+        Format-specific image processor (lazy-initialized).
+
+        Returns the image processor for this handler's file format.
+        Each handler should override _create_format_image_processor() to provide
+        format-specific image handling capabilities.
+        """
+        if self._format_image_processor is None:
+            processor = self._create_format_image_processor()
+            # If subclass returns None, use default image_processor
+            self._format_image_processor = processor if processor is not None else self._image_processor
+        return self._format_image_processor
+
+    @property
+    def file_converter(self) -> BaseFileConverter:
+        """
+        Format-specific file converter (lazy-initialized).
+
+        Returns the file converter for this handler's file format.
+        Each handler should override _create_file_converter() to provide
+        format-specific binary-to-object conversion.
+        """
+        if self._file_converter is None:
+            converter = self._create_file_converter()
+            # If subclass returns None, use NullFileConverter
+            self._file_converter = converter if converter is not None else NullFileConverter()
+        return self._file_converter
+
+    @property
+    def preprocessor(self) -> BasePreprocessor:
+        """
+        Format-specific preprocessor (lazy-initialized).
+
+        Returns the preprocessor for this handler's file format.
+        Each handler should override _create_preprocessor() to provide
+        format-specific data preprocessing after conversion.
+
+        This is called AFTER file_converter.convert() to process/clean
+        the converted data before content extraction.
+        """
+        if self._preprocessor is None:
+            preprocessor = self._create_preprocessor()
+            # If subclass returns None, use NullPreprocessor
+            self._preprocessor = preprocessor if preprocessor is not None else NullPreprocessor()
+        return self._preprocessor
+
     @property
     def logger(self) -> logging.Logger:
         """Logger instance."""
         return self._logger
-    
+
     @abstractmethod
     def extract_text(
         self,
@@ -145,26 +341,115 @@ def extract_text(
     ) -> str:
         """
         Extract text from file.
-        
+
         Args:
             current_file: CurrentFile dict containing file info and binary data
             extract_metadata: Whether to extract metadata
             **kwargs: Additional options
-            
+
         Returns:
             Extracted text
         """
         pass
-    
+
+    def extract_metadata(self, source: Any) -> DocumentMetadata:
+        """
+        Extract metadata from source using format-specific extractor.
+
+        Convenience method that wraps self.metadata_extractor.extract().
+
+        Args:
+            source: Format-specific source object
+
+        Returns:
+            DocumentMetadata instance
+        """
+        return self.metadata_extractor.extract(source)
+
+    def format_metadata(self, metadata: DocumentMetadata) -> str:
+        """
+        Format metadata as string.
+
+        Convenience method that wraps self.metadata_extractor.format().
+
+        Args:
+            metadata: DocumentMetadata instance
+
+        Returns:
+            Formatted metadata string
+        """
+        return self.metadata_extractor.format(metadata)
+
+    def extract_and_format_metadata(self, source: Any) -> str:
+        """
+        Extract and format metadata in one step.
+
+        Convenience method that combines extract and format.
+
+        Args:
+            source: Format-specific source object
+
+        Returns:
+            Formatted metadata string
+        """
+        return self.metadata_extractor.extract_and_format(source)
+
+    def convert_file(self, current_file: "CurrentFile", **kwargs) -> Any:
+        """
+        Convert binary file data to workable format.
+
+        Convenience method that wraps self.file_converter.convert().
+
+        This is the first step in the processing pipeline:
+        Binary Data → FileConverter → Workable Object
+
+        Args:
+            current_file: CurrentFile dict containing file info and binary data
+            **kwargs: Additional format-specific options
+
+        Returns:
+            Format-specific workable object (Document, Workbook, OLE file, etc.)
+        """
+        file_data = current_file.get("file_data", b"")
+        file_stream = self.get_file_stream(current_file)
+        return self.file_converter.convert(file_data, file_stream, **kwargs)
+
+    def preprocess(self, converted_data: Any, **kwargs) -> PreprocessedData:
+        """
+        Preprocess the converted data.
+
+        Convenience method that wraps self.preprocessor.preprocess().
+
+        This is the SECOND step in the processing pipeline:
+        1. file_converter.convert() → Format-specific object
+        2. preprocessor.preprocess() → Cleaned/processed data (THIS STEP)
+        3. metadata_extractor.extract() → Metadata
+        4. Content extraction
+
+        Args:
+            converted_data: The data returned from file_converter.convert()
+            **kwargs: Additional format-specific options
+
+        Returns:
+            PreprocessedData containing cleaned content and extracted resources
+        """
+        # If converted_data is bytes, pass it directly
+        if isinstance(converted_data, bytes):
+            return self.preprocessor.preprocess(converted_data, **kwargs)
+
+        # For other types, the preprocessor should handle it
+        # (e.g., Document object preprocessing)
+        return self.preprocessor.preprocess(converted_data, **kwargs)
+
     def get_file_stream(self, current_file: "CurrentFile") -> io.BytesIO:
         """
         Get a fresh BytesIO stream from current_file.
-        
+
         Resets the stream position to the beginning for reuse.
-        
+
         Args:
             current_file: CurrentFile dict
-            
+
         Returns:
             BytesIO stream ready for reading
         """
@@ -174,17 +459,17 @@ def get_file_stream(self, current_file: "CurrentFile") -> io.BytesIO:
             return stream
         # Fallback: create new stream from file_data
         return io.BytesIO(current_file.get("file_data", b""))
-    
+
     def save_image(self, image_data: bytes, processed_images: Optional[set] = None) -> Optional[str]:
         """
         Save image and return tag.
-        
+
         Convenience method that wraps self.image_processor.save_image().
-        
+
         Args:
             image_data: Image binary data
             processed_images: Set of processed image hashes (for deduplication)
-            
+
         Returns:
             Image tag string or None
         """
@@ -193,12 +478,12 @@ def save_image(self, image_data: bytes, processed_images: Optional[set] = None)
     def create_page_tag(self, page_number: int) -> str:
         """
         Create a page number tag.
-        
+
         Convenience method that wraps self.page_tag_processor.create_page_tag().
-        
+
         Args:
             page_number: Page number
-            
+
         Returns:
             Page tag string (e.g., "[Page Number: 1]")
         """
@@ -207,12 +492,12 @@ def create_page_tag(self, page_number: int) -> str:
     def create_slide_tag(self, slide_number: int) -> str:
         """
         Create a slide number tag.
-        
+
         Convenience method that wraps self.page_tag_processor.create_slide_tag().
-        
+
         Args:
             slide_number: Slide number
-            
+
         Returns:
             Slide tag string (e.g., "[Slide Number: 1]")
         """
@@ -221,12 +506,12 @@ def create_slide_tag(self, slide_number: int) -> str:
     def create_sheet_tag(self, sheet_name: str) -> str:
         """
         Create a sheet name tag.
-        
+
         Convenience method that wraps self.page_tag_processor.create_sheet_tag().
-        
+
         Args:
             sheet_name: Sheet name
-            
+
         Returns:
             Sheet tag string (e.g., "[Sheet: Sheet1]")
         """
@@ -235,18 +520,24 @@ def create_sheet_tag(self, sheet_name: str) -> str:
     def process_chart(self, chart_element: Any) -> str:
         """
         Process chart element using the format-specific chart extractor.
-        
+
         This is the main method for chart processing. It uses the chart_extractor
         to extract data from the format-specific chart element and formats it
         using ChartProcessor.
-        
+
         Args:
             chart_element: Format-specific chart object/element
-            
+
         Returns:
             Formatted chart text with tags
         """
         return self.chart_extractor.process(chart_element)
 
 
-__all__ = ["BaseHandler"]
+__all__ = [
+    "BaseHandler",
+    "NullMetadataExtractor",
+    "BasePreprocessor",
+    "NullPreprocessor",
+    "PreprocessedData",
+]
diff --git a/contextifier/core/processor/csv_handler.py b/contextifier/core/processor/csv_handler.py
index dbd8b2e..f7d52df 100644
--- a/contextifier/core/processor/csv_handler.py
+++ b/contextifier/core/processor/csv_handler.py
@@ -11,15 +11,15 @@
 from contextifier.core.processor.base_handler import BaseHandler
 from contextifier.core.functions.chart_extractor import BaseChartExtractor, NullChartExtractor
 from contextifier.core.processor.csv_helper import (
-    CSVMetadata,
-    extract_csv_metadata,
-    format_metadata,
     detect_bom,
     detect_delimiter,
     parse_csv_content,
     detect_header,
     convert_rows_to_table,
 )
+from contextifier.core.processor.csv_helper.csv_metadata import CSVMetadataExtractor, CSVSourceInfo
+from contextifier.core.processor.csv_helper.csv_image_processor import CSVImageProcessor
+from contextifier.core.functions.img_processor import ImageProcessor
 
 if TYPE_CHECKING:
     from contextifier.core.document_processor import CurrentFile
@@ -32,11 +32,29 @@
 
 class CSVHandler(BaseHandler):
     """CSV/TSV File Processing Handler Class"""
-    
+
+    def _create_file_converter(self):
+        """Create CSV-specific file converter."""
+        from contextifier.core.processor.csv_helper.csv_file_converter import CSVFileConverter
+        return CSVFileConverter()
+
+    def _create_preprocessor(self):
+        """Create CSV-specific preprocessor."""
+        from contextifier.core.processor.csv_helper.csv_preprocessor import CSVPreprocessor
+        return CSVPreprocessor()
+
     def _create_chart_extractor(self) -> BaseChartExtractor:
         """CSV files do not contain charts. Return NullChartExtractor."""
         return NullChartExtractor(self._chart_processor)
-    
+
+    def _create_metadata_extractor(self):
+        """Create CSV-specific metadata extractor."""
+        return CSVMetadataExtractor()
+
+    def _create_format_image_processor(self) -> ImageProcessor:
+        """Create CSV-specific image processor."""
+        return CSVImageProcessor()
+
     def extract_text(
         self,
         current_file: "CurrentFile",
@@ -47,113 +65,70 @@ def extract_text(
     ) -> str:
         """
         Extract text from CSV/TSV file.
-        
+
         Args:
             current_file: CurrentFile dict containing file info and binary data
             extract_metadata: Whether to extract metadata
             encoding: Encoding (None for auto-detect)
             delimiter: Delimiter (None for auto-detect)
             **kwargs: Additional options
-            
+
         Returns:
             Extracted text
         """
         file_path = current_file.get("file_path", "unknown")
         ext = current_file.get("file_extension", os.path.splitext(file_path)[1]).lower()
         self.logger.info(f"CSV processing: {file_path}, ext: {ext}")
-        
+
         if ext == '.tsv' and delimiter is None:
             delimiter = '\t'
-        
+
         try:
             result_parts = []
-            
-            # Decode file_data with encoding detection
+
+            # Step 1: Decode file_data using file_converter
             file_data = current_file.get("file_data", b"")
-            content, detected_encoding = self._decode_with_encoding(file_data, encoding)
-            
+            content, detected_encoding = self.file_converter.convert(file_data, encoding=encoding)
+
+            # Step 2: Preprocess - clean_content is the TRUE SOURCE
+            preprocessed = self.preprocess(content)
+            content = preprocessed.clean_content  # TRUE SOURCE
+
             if delimiter is None:
                 delimiter = detect_delimiter(content)
-            
+
             self.logger.info(f"CSV: encoding={detected_encoding}, delimiter={repr(delimiter)}")
-            
+
             rows = parse_csv_content(content, delimiter)
-            
+
             if not rows:
                 return ""
-            
+
             has_header = detect_header(rows)
-            
+
             if extract_metadata:
-                metadata = extract_csv_metadata(file_path, detected_encoding, delimiter, rows, has_header)
-                metadata_str = format_metadata(metadata)
+                source_info = CSVSourceInfo(
+                    file_path=file_path,
+                    encoding=detected_encoding,
+                    delimiter=delimiter,
+                    rows=rows,
+                    has_header=has_header
+                )
+                metadata_str = self.extract_and_format_metadata(source_info)
                 if metadata_str:
                     result_parts.append(metadata_str + "\n\n")
-            
+
             table = convert_rows_to_table(rows, has_header)
             if table:
                 result_parts.append(table)
-            
+
             result = "".join(result_parts)
             self.logger.info(f"CSV processing completed: {len(rows)} rows")
-            
+
             return result
-            
+
         except Exception as e:
             self.logger.error(f"Error extracting text from CSV {file_path}: {e}")
             import traceback
             self.logger.debug(traceback.format_exc())
             raise
-    
-    def _decode_with_encoding(
-        self,
-        file_data: bytes,
-        preferred_encoding: Optional[str] = None
-    ) -> Tuple[str, str]:
-        """
-        Decode bytes with encoding detection.
-        
-        Args:
-            file_data: Raw bytes data
-            preferred_encoding: Preferred encoding (None for auto-detect)
-            
-        Returns:
-            Tuple of (decoded content, detected encoding)
-        """
-        # BOM detection
-        bom_encoding = detect_bom(file_data)
-        if bom_encoding:
-            try:
-                return file_data.decode(bom_encoding), bom_encoding
-            except UnicodeDecodeError:
-                pass
-        
-        # Try preferred encoding
-        if preferred_encoding:
-            try:
-                return file_data.decode(preferred_encoding), preferred_encoding
-            except UnicodeDecodeError:
-                self.logger.debug(f"Preferred encoding {preferred_encoding} failed")
-        
-        # Try chardet if available
-        try:
-            import chardet
-            detected = chardet.detect(file_data)
-            if detected and detected.get('encoding'):
-                enc = detected['encoding']
-                try:
-                    return file_data.decode(enc), enc
-                except UnicodeDecodeError:
-                    pass
-        except ImportError:
-            pass
-        
-        # Try encoding candidates
-        for enc in ENCODING_CANDIDATES:
-            try:
-                return file_data.decode(enc), enc
-            except UnicodeDecodeError:
-                continue
-        
-        # Fallback to latin-1 (always succeeds)
-        return file_data.decode('latin-1'), 'latin-1'
diff --git a/contextifier/core/processor/csv_helper/__init__.py b/contextifier/core/processor/csv_helper/__init__.py
index a9da879..3c4e2e4 100644
--- a/contextifier/core/processor/csv_helper/__init__.py
+++ b/contextifier/core/processor/csv_helper/__init__.py
@@ -24,10 +24,13 @@
 
 # Metadata
 from contextifier.core.processor.csv_helper.csv_metadata import (
-    format_file_size,
-    get_delimiter_name,
-    extract_csv_metadata,
-    format_metadata,
+    CSVMetadataExtractor,
+    CSVSourceInfo,
+)
+
+# Image Processor
+from contextifier.core.processor.csv_helper.csv_image_processor import (
+    CSVImageProcessor,
 )
 
 # Encoding
@@ -63,10 +66,11 @@
     "MAX_COLS",
     "CSVMetadata",
     # Metadata
-    "format_file_size",
-    "get_delimiter_name",
-    "extract_csv_metadata",
-    "format_metadata",
+    "CSVMetadataExtractor",
+    "CSVSourceInfo",
+    # Image Processor
+    "CSVImageProcessor",
+    # Encoding
     # Encoding
     "detect_bom",
     "read_file_with_encoding",
diff --git a/contextifier/core/processor/csv_helper/csv_constants.py b/contextifier/core/processor/csv_helper/csv_constants.py
index 649d0a1..aa69a7e 100644
--- a/contextifier/core/processor/csv_helper/csv_constants.py
+++ b/contextifier/core/processor/csv_helper/csv_constants.py
@@ -27,10 +27,10 @@
 
 # === 구분자 관련 상수 ===
 
-# CSV 구분자 후보
+# CSV delimiter candidates
 DELIMITER_CANDIDATES = [',', '\t', ';', '|']
 
-# 구분자 이름 매핑
+# Delimiter name mapping (Korean for output display)
 DELIMITER_NAMES = {
     ',': '쉼표 (,)',
     '\t': '탭 (\\t)',
@@ -39,12 +39,12 @@
 }
 
 
-# === 처리 제한 상수 ===
+# === Processing limit constants ===
 
-# 최대 처리 행 수 (메모리 보호)
+# Maximum rows to process (memory protection)
 MAX_ROWS = 100000
 
-# 최대 열 수
+# Maximum columns
 MAX_COLS = 1000
 
 
diff --git a/contextifier/core/processor/csv_helper/csv_file_converter.py b/contextifier/core/processor/csv_helper/csv_file_converter.py
new file mode 100644
index 0000000..8c9f56f
--- /dev/null
+++ b/contextifier/core/processor/csv_helper/csv_file_converter.py
@@ -0,0 +1,77 @@
+# libs/core/processor/csv_helper/csv_file_converter.py
+"""
+CSVFileConverter - CSV file format converter
+
+Converts binary CSV data to text string with encoding detection.
+"""
+from typing import Any, Optional, BinaryIO, Tuple
+
+from contextifier.core.functions.file_converter import TextFileConverter
+
+
+class CSVFileConverter(TextFileConverter):
+    """
+    CSV file converter.
+    
+    Converts binary CSV data to decoded text string.
+    Extends TextFileConverter with BOM detection.
+    """
+    
+    # BOM markers
+    BOM_UTF8 = b'\xef\xbb\xbf'
+    BOM_UTF16_LE = b'\xff\xfe'
+    BOM_UTF16_BE = b'\xfe\xff'
+    
+    def __init__(self):
+        """Initialize CSVFileConverter."""
+        super().__init__(encodings=['utf-8', 'utf-8-sig', 'cp949', 'euc-kr', 'iso-8859-1', 'latin-1'])
+        self._delimiter: Optional[str] = None
+    
+    def convert(
+        self,
+        file_data: bytes,
+        file_stream: Optional[BinaryIO] = None,
+        encoding: Optional[str] = None,
+        delimiter: Optional[str] = None,
+        **kwargs
+    ) -> Tuple[str, str]:
+        """
+        Convert binary CSV data to text string.
+        
+        Args:
+            file_data: Raw binary CSV data
+            file_stream: Ignored
+            encoding: Specific encoding to use
+            delimiter: CSV delimiter (for reference)
+            **kwargs: Additional options
+            
+        Returns:
+            Tuple of (decoded text, detected encoding)
+        """
+        self._delimiter = delimiter
+        
+        # Check for BOM
+        bom_encoding = self._detect_bom(file_data)
+        if bom_encoding:
+            text = file_data.decode(bom_encoding)
+            self._detected_encoding = bom_encoding
+            return text, bom_encoding
+        
+        # Use parent's convert logic
+        text = super().convert(file_data, file_stream, encoding, **kwargs)
+        return text, self._detected_encoding or 'utf-8'
+    
+    def _detect_bom(self, file_data: bytes) -> Optional[str]:
+        """Detect encoding from BOM."""
+        if file_data.startswith(self.BOM_UTF8):
+            return 'utf-8-sig'
+        elif file_data.startswith(self.BOM_UTF16_LE):
+            return 'utf-16-le'
+        elif file_data.startswith(self.BOM_UTF16_BE):
+            return 'utf-16-be'
+        return None
+    
+    def get_format_name(self) -> str:
+        """Return format name."""
+        enc = self._detected_encoding or 'unknown'
+        return f"CSV ({enc})"
diff --git a/contextifier/core/processor/csv_helper/csv_image_processor.py b/contextifier/core/processor/csv_helper/csv_image_processor.py
new file mode 100644
index 0000000..87b7032
--- /dev/null
+++ b/contextifier/core/processor/csv_helper/csv_image_processor.py
@@ -0,0 +1,75 @@
+# contextifier/core/processor/csv_helper/csv_image_processor.py
+"""
+CSV Image Processor
+
+Provides CSV-specific image processing that inherits from ImageProcessor.
+CSV files do not contain embedded images, so this is a minimal implementation.
+"""
+import logging
+from typing import Any, Optional
+
+from contextifier.core.functions.img_processor import ImageProcessor
+from contextifier.core.functions.storage_backend import BaseStorageBackend
+
+logger = logging.getLogger("contextify.image_processor.csv")
+
+
+class CSVImageProcessor(ImageProcessor):
+    """
+    CSV-specific image processor.
+    
+    Inherits from ImageProcessor and provides CSV-specific processing.
+    CSV files do not contain embedded images, so this processor
+    provides a consistent interface without additional functionality.
+    
+    This class exists to maintain interface consistency across all handlers.
+    
+    Example:
+        processor = CSVImageProcessor()
+        
+        # No images in CSV, but interface is consistent
+        tag = processor.process_image(image_data)  # Falls back to base implementation
+    """
+    
+    def __init__(
+        self,
+        directory_path: str = "temp/images",
+        tag_prefix: str = "[Image:",
+        tag_suffix: str = "]",
+        storage_backend: Optional[BaseStorageBackend] = None,
+    ):
+        """
+        Initialize CSVImageProcessor.
+        
+        Args:
+            directory_path: Image save directory
+            tag_prefix: Tag prefix for image references
+            tag_suffix: Tag suffix for image references
+            storage_backend: Storage backend for saving images
+        """
+        super().__init__(
+            directory_path=directory_path,
+            tag_prefix=tag_prefix,
+            tag_suffix=tag_suffix,
+            storage_backend=storage_backend,
+        )
+    
+    def process_image(
+        self,
+        image_data: bytes,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process and save image data.
+        
+        CSV files do not contain embedded images, so this method
+        delegates to the base implementation.
+        
+        Args:
+            image_data: Raw image binary data
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string or None if processing failed
+        """
+        return super().process_image(image_data, **kwargs)
diff --git a/contextifier/core/processor/csv_helper/csv_metadata.py b/contextifier/core/processor/csv_helper/csv_metadata.py
index 8229f48..ef493dc 100644
--- a/contextifier/core/processor/csv_helper/csv_metadata.py
+++ b/contextifier/core/processor/csv_helper/csv_metadata.py
@@ -1,14 +1,26 @@
-# csv_helper/csv_metadata.py
+# contextifier/core/processor/csv_helper/csv_metadata.py
 """
-CSV 메타데이터 추출 및 포맷팅
+CSV Metadata Extraction Module
 
-CSV 파일의 메타데이터를 추출하고 읽기 쉬운 형식으로 변환합니다.
+Provides CSVMetadataExtractor class for extracting metadata from CSV files.
+Implements BaseMetadataExtractor interface.
+
+CSV differs from regular documents - it provides file structure information as metadata:
+- File name, file size, modification time
+- Encoding, delimiter
+- Row/column count, header information
 """
 import logging
 import os
+from dataclasses import dataclass
 from datetime import datetime
-from typing import Any, Dict, List
+from typing import Any, Dict, List, Optional
 
+from contextifier.core.functions.metadata_extractor import (
+    BaseMetadataExtractor,
+    DocumentMetadata,
+    MetadataFormatter,
+)
 from contextifier.core.processor.csv_helper.csv_constants import DELIMITER_NAMES
 
 logger = logging.getLogger("document-processor")
@@ -16,13 +28,13 @@
 
 def format_file_size(size_bytes: int) -> str:
     """
-    파일 크기를 읽기 쉬운 형식으로 변환합니다.
+    Convert file size to human-readable format.
 
     Args:
-        size_bytes: 파일 크기 (바이트)
+        size_bytes: File size in bytes
 
     Returns:
-        포맷된 파일 크기 문자열 (예: "1.5 MB")
+        Formatted file size string (e.g., "1.5 MB")
     """
     if size_bytes < 1024:
         return f"{size_bytes} B"
@@ -36,87 +48,57 @@ def format_file_size(size_bytes: int) -> str:
 
 def get_delimiter_name(delimiter: str) -> str:
     """
-    구분자를 읽기 쉬운 이름으로 변환합니다.
+    Convert delimiter to human-readable name.
 
     Args:
-        delimiter: 구분자 문자
+        delimiter: Delimiter character
 
     Returns:
-        구분자의 읽기 쉬운 이름 (예: "쉼표 (,)")
+        Human-readable delimiter name (e.g., "Comma (,)")
     """
     return DELIMITER_NAMES.get(delimiter, repr(delimiter))
 
 
-def extract_csv_metadata(
-    file_path: str,
-    encoding: str,
-    delimiter: str,
-    rows: List[List[str]],
-    has_header: bool
-) -> Dict[str, Any]:
+@dataclass
+class CSVSourceInfo:
     """
-    CSV 파일에서 메타데이터를 추출합니다.
-
-    Args:
-        file_path: 파일 경로
-        encoding: 감지된 인코딩
-        delimiter: 감지된 구분자
-        rows: 파싱된 행 데이터
-        has_header: 헤더 존재 여부
-
-    Returns:
-        메타데이터 딕셔너리
+    Source information for CSV metadata extraction.
+    
+    Container for data passed to CSVMetadataExtractor.extract().
     """
-    metadata = {}
-
-    try:
-        # 파일 정보
-        file_stat = os.stat(file_path)
-        file_name = os.path.basename(file_path)
-
-        metadata['file_name'] = file_name
-        metadata['file_size'] = format_file_size(file_stat.st_size)
-        metadata['modified_time'] = datetime.fromtimestamp(file_stat.st_mtime)
-
-        # CSV 구조 정보
-        metadata['encoding'] = encoding
-        metadata['delimiter'] = get_delimiter_name(delimiter)
-        metadata['row_count'] = len(rows)
-        metadata['col_count'] = len(rows[0]) if rows else 0
-        metadata['has_header'] = '예' if has_header else '아니오'
-
-        # 헤더 정보 (있는 경우)
-        if has_header and rows:
-            headers = [h.strip() for h in rows[0] if h.strip()]
-            if headers:
-                metadata['columns'] = ', '.join(headers[:10])  # 최대 10개
-                if len(rows[0]) > 10:
-                    metadata['columns'] += f' 외 {len(rows[0]) - 10}개'
-
-        logger.debug(f"Extracted CSV metadata: {list(metadata.keys())}")
-
-    except Exception as e:
-        logger.warning(f"Failed to extract CSV metadata: {e}")
-
-    return metadata
+    file_path: str
+    encoding: str
+    delimiter: str
+    rows: List[List[str]]
+    has_header: bool
 
 
-def format_metadata(metadata: Dict[str, Any]) -> str:
+class CSVMetadataExtractor(BaseMetadataExtractor):
     """
-    메타데이터 딕셔너리를 읽기 쉬운 문자열로 변환합니다.
-
-    Args:
-        metadata: 메타데이터 딕셔너리
-
-    Returns:
-        포맷된 메타데이터 문자열 (<Document-Metadata> 태그 형식)
+    CSV Metadata Extractor.
+    
+    CSV 파일의 구조 정보를 메타데이터로 추출합니다.
+    
+    지원 필드 (custom 필드에 저장):
+    - file_name, file_size, modified_time
+    - encoding, delimiter
+    - row_count, col_count, has_header, columns
+    
+    사용법:
+        extractor = CSVMetadataExtractor()
+        source = CSVSourceInfo(
+            file_path="data.csv",
+            encoding="utf-8",
+            delimiter=",",
+            rows=parsed_rows,
+            has_header=True
+        )
+        metadata = extractor.extract(source)
+        text = extractor.format(metadata)
     """
-    if not metadata:
-        return ""
-
-    lines = ["<Document-Metadata>"]
-
-    field_names = {
+    
+    # CSV 특화 필드 라벨
+    CSV_FIELD_LABELS = {
         'file_name': '파일명',
         'file_size': '파일 크기',
         'modified_time': '수정일',
@@ -127,17 +109,60 @@ def format_metadata(metadata: Dict[str, Any]) -> str:
         'has_header': '헤더 존재',
         'columns': '컬럼 목록',
     }
-
-    for key, label in field_names.items():
-        if key in metadata and metadata[key] is not None:
-            value = metadata[key]
-
-            # datetime 객체 포맷팅
-            if isinstance(value, datetime):
-                value = value.strftime('%Y-%m-%d %H:%M:%S')
-
-            lines.append(f"  {label}: {value}")
-
-    lines.append("</Document-Metadata>")
-
-    return "\n".join(lines)
+    
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+        # CSV용 커스텀 포맷터 설정
+        self._formatter.field_labels.update(self.CSV_FIELD_LABELS)
+    
+    def extract(self, source: CSVSourceInfo) -> DocumentMetadata:
+        """
+        CSV 파일에서 메타데이터를 추출합니다.
+        
+        Args:
+            source: CSVSourceInfo 객체 (파일 경로, 인코딩, 구분자, 행 데이터, 헤더 여부)
+            
+        Returns:
+            추출된 메타데이터가 담긴 DocumentMetadata 인스턴스
+        """
+        custom_fields: Dict[str, Any] = {}
+
+        try:
+            # 파일 정보
+            file_stat = os.stat(source.file_path)
+            file_name = os.path.basename(source.file_path)
+
+            custom_fields['file_name'] = file_name
+            custom_fields['file_size'] = format_file_size(file_stat.st_size)
+            custom_fields['modified_time'] = datetime.fromtimestamp(file_stat.st_mtime)
+
+            # CSV 구조 정보
+            custom_fields['encoding'] = source.encoding
+            custom_fields['delimiter'] = get_delimiter_name(source.delimiter)
+            custom_fields['row_count'] = len(source.rows)
+            custom_fields['col_count'] = len(source.rows[0]) if source.rows else 0
+            custom_fields['has_header'] = '예' if source.has_header else '아니오'
+
+            # 헤더 정보 (있는 경우)
+            if source.has_header and source.rows:
+                headers = [h.strip() for h in source.rows[0] if h.strip()]
+                if headers:
+                    custom_fields['columns'] = ', '.join(headers[:10])  # 최대 10개
+                    if len(source.rows[0]) > 10:
+                        custom_fields['columns'] += f' 외 {len(source.rows[0]) - 10}개'
+
+            self.logger.debug(f"Extracted CSV metadata: {list(custom_fields.keys())}")
+
+        except Exception as e:
+            self.logger.warning(f"Failed to extract CSV metadata: {e}")
+        
+        # CSV는 표준 필드가 없고 모두 custom 필드
+        return DocumentMetadata(custom=custom_fields)
+
+
+__all__ = [
+    'CSVMetadataExtractor',
+    'CSVSourceInfo',
+    'format_file_size',
+    'get_delimiter_name',
+]
diff --git a/contextifier/core/processor/csv_helper/csv_preprocessor.py b/contextifier/core/processor/csv_helper/csv_preprocessor.py
new file mode 100644
index 0000000..0914754
--- /dev/null
+++ b/contextifier/core/processor/csv_helper/csv_preprocessor.py
@@ -0,0 +1,86 @@
+# contextifier/core/processor/csv_helper/csv_preprocessor.py
+"""
+CSV Preprocessor - Process CSV content after conversion.
+
+Processing Pipeline Position:
+    1. CSVFileConverter.convert() → (content: str, encoding: str)
+    2. CSVPreprocessor.preprocess() → PreprocessedData (THIS STEP)
+    3. CSVMetadataExtractor.extract() → DocumentMetadata
+    4. Content extraction (rows, columns)
+
+Current Implementation:
+    - Pass-through (CSV uses decoded string content directly)
+"""
+import logging
+from typing import Any, Dict
+
+from contextifier.core.functions.preprocessor import (
+    BasePreprocessor,
+    PreprocessedData,
+)
+
+logger = logging.getLogger("contextify.csv.preprocessor")
+
+
+class CSVPreprocessor(BasePreprocessor):
+    """
+    CSV Content Preprocessor.
+
+    Currently a pass-through implementation as CSV processing
+    is handled during the content extraction phase.
+    """
+
+    def preprocess(
+        self,
+        converted_data: Any,
+        **kwargs
+    ) -> PreprocessedData:
+        """
+        Preprocess the converted CSV content.
+
+        Args:
+            converted_data: Tuple of (content: str, encoding: str) from CSVFileConverter
+            **kwargs: Additional options
+
+        Returns:
+            PreprocessedData with the content and encoding
+        """
+        metadata: Dict[str, Any] = {}
+
+        content = ""
+        encoding = "utf-8"
+
+        # Handle tuple return from CSVFileConverter
+        if isinstance(converted_data, tuple) and len(converted_data) >= 2:
+            content, encoding = converted_data[0], converted_data[1]
+            metadata['detected_encoding'] = encoding
+            if content:
+                lines = content.split('\n')
+                metadata['line_count'] = len(lines)
+        elif isinstance(converted_data, str):
+            content = converted_data
+            metadata['line_count'] = len(content.split('\n'))
+
+        logger.debug("CSV preprocessor: pass-through, metadata=%s", metadata)
+
+        # clean_content is the TRUE SOURCE - contains the processed string content
+        return PreprocessedData(
+            raw_content=content,
+            clean_content=content,  # TRUE SOURCE - string content for CSV
+            encoding=encoding,
+            extracted_resources={},
+            metadata=metadata,
+        )
+
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "CSV Preprocessor"
+
+    def validate(self, data: Any) -> bool:
+        """Validate if data is CSV content."""
+        if isinstance(data, tuple) and len(data) >= 2:
+            return isinstance(data[0], str)
+        return isinstance(data, str)
+
+
+__all__ = ['CSVPreprocessor']
diff --git a/contextifier/core/processor/doc_handler.py b/contextifier/core/processor/doc_handler.py
index f768e74..721f748 100644
--- a/contextifier/core/processor/doc_handler.py
+++ b/contextifier/core/processor/doc_handler.py
@@ -4,28 +4,25 @@
 
 Class-based handler for DOC files inheriting from BaseHandler.
 Automatically detects file format (RTF, OLE, HTML, DOCX) and processes accordingly.
+RTF processing is delegated to RTFHandler.
 """
 import io
 import logging
 import os
 import re
-import shutil
-import tempfile
 import struct
 import base64
-from datetime import datetime
 from typing import Any, Dict, List, Optional, Set, TYPE_CHECKING
 from enum import Enum
 import zipfile
 
 import olefile
 from bs4 import BeautifulSoup
-from striprtf.striprtf import rtf_to_text
 
-from contextifier.core.processor.doc_helpers.rtf_parser import parse_rtf, RTFDocument
 from contextifier.core.processor.base_handler import BaseHandler
 from contextifier.core.functions.img_processor import ImageProcessor
 from contextifier.core.functions.chart_extractor import BaseChartExtractor, NullChartExtractor
+from contextifier.core.processor.doc_helpers.doc_image_processor import DOCImageProcessor
 
 if TYPE_CHECKING:
     from contextifier.core.document_processor import CurrentFile
@@ -48,25 +45,32 @@ class DocFormat(Enum):
     'ZIP': b'PK\x03\x04',
 }
 
-METADATA_FIELD_NAMES = {
-    'title': '제목',
-    'subject': '주제',
-    'author': '작성자',
-    'keywords': '키워드',
-    'comments': '설명',
-    'last_saved_by': '마지막 저장자',
-    'create_time': '작성일',
-    'last_saved_time': '수정일',
-}
-
 
 class DOCHandler(BaseHandler):
     """DOC file processing handler class."""
-    
+
+    def _create_file_converter(self):
+        """Create DOC-specific file converter."""
+        from contextifier.core.processor.doc_helpers.doc_file_converter import DOCFileConverter
+        return DOCFileConverter()
+
+    def _create_preprocessor(self):
+        """Create DOC-specific preprocessor."""
+        from contextifier.core.processor.doc_helpers.doc_preprocessor import DOCPreprocessor
+        return DOCPreprocessor()
+
     def _create_chart_extractor(self) -> BaseChartExtractor:
         """DOC files chart extraction not yet implemented. Return NullChartExtractor."""
         return NullChartExtractor(self._chart_processor)
-    
+
+    def _create_metadata_extractor(self):
+        """DOC metadata extraction not yet implemented. Return None to use NullMetadataExtractor."""
+        return None
+
+    def _create_format_image_processor(self) -> ImageProcessor:
+        """Create DOC-specific image processor."""
+        return DOCImageProcessor()
+
     def extract_text(
         self,
         current_file: "CurrentFile",
@@ -76,222 +80,142 @@ def extract_text(
         """Extract text from DOC file."""
         file_path = current_file.get("file_path", "unknown")
         file_data = current_file.get("file_data", b"")
-        
+
         self.logger.info(f"DOC processing: {file_path}")
-        
+
         if not file_data:
             self.logger.error(f"Empty file data: {file_path}")
             return f"[DOC file is empty: {file_path}]"
-        
-        doc_format = self._detect_format_from_bytes(file_data)
-        
+
         try:
+            # Step 1: Use file_converter to detect format and convert
+            converted_obj, doc_format = self.file_converter.convert(file_data)
+
+            # Step 2: Preprocess - may transform converted_obj in the future
+            preprocessed = self.preprocess(converted_obj)
+            converted_obj = preprocessed.clean_content  # TRUE SOURCE
+
             if doc_format == DocFormat.RTF:
-                return self._extract_from_rtf(current_file, extract_metadata)
+                # Delegate to RTFHandler for RTF processing
+                return self._delegate_to_rtf_handler(converted_obj, current_file, extract_metadata)
             elif doc_format == DocFormat.OLE:
-                return self._extract_from_ole(current_file, extract_metadata)
+                return self._extract_from_ole_obj(converted_obj, current_file, extract_metadata)
             elif doc_format == DocFormat.HTML:
-                return self._extract_from_html(current_file, extract_metadata)
+                return self._extract_from_html_obj(converted_obj, current_file, extract_metadata)
             elif doc_format == DocFormat.DOCX:
-                return self._extract_from_docx_misnamed(current_file, extract_metadata)
+                return self._extract_from_docx_obj(converted_obj, current_file, extract_metadata)
             else:
-                self.logger.warning(f"Unknown DOC format, trying OLE: {file_path}")
+                self.logger.warning(f"Unknown DOC format, trying OLE fallback: {file_path}")
                 return self._extract_from_ole(current_file, extract_metadata)
         except Exception as e:
             self.logger.error(f"Error in DOC processing: {e}")
             return f"[DOC file processing failed: {str(e)}]"
-    
-    def _detect_format_from_bytes(self, file_data: bytes) -> DocFormat:
-        """Detect file format from binary data."""
-        try:
-            header = file_data[:32] if len(file_data) >= 32 else file_data
-            
-            if not header:
-                return DocFormat.UNKNOWN
-            
-            if header.startswith(MAGIC_NUMBERS['RTF']):
-                return DocFormat.RTF
-            
-            if header.startswith(MAGIC_NUMBERS['OLE']):
-                return DocFormat.OLE
-            
-            if header.startswith(MAGIC_NUMBERS['ZIP']):
-                try:
-                    file_stream = io.BytesIO(file_data)
-                    with zipfile.ZipFile(file_stream, 'r') as zf:
-                        if '[Content_Types].xml' in zf.namelist():
-                            return DocFormat.DOCX
-                except zipfile.BadZipFile:
-                    pass
-            
-            header_lower = header.lower()
-            if header_lower.startswith(b'<!doctype') or header_lower.startswith(b'<html') or b'<html' in header_lower[:100]:
-                return DocFormat.HTML
-            
-            try:
-                if header.startswith(b'\xef\xbb\xbf'):
-                    text_header = header[3:].decode('utf-8', errors='ignore').lower()
-                else:
-                    text_header = header.decode('utf-8', errors='ignore').lower()
-                
-                if text_header.startswith('{\\rtf'):
-                    return DocFormat.RTF
-                if text_header.startswith('<!doctype') or text_header.startswith('<html'):
-                    return DocFormat.HTML
-            except:
-                pass
-            
-            return DocFormat.UNKNOWN
-        except Exception as e:
-            self.logger.error(f"Error detecting format: {e}")
-            return DocFormat.UNKNOWN
-    
-    def _extract_from_rtf(self, current_file: "CurrentFile", extract_metadata: bool) -> str:
-        """RTF file processing."""
+
+    def _delegate_to_rtf_handler(self, rtf_doc, current_file: "CurrentFile", extract_metadata: bool) -> str:
+        """
+        Delegate RTF processing to RTFHandler.
+
+        DOC 파일이 실제로는 RTF 형식인 경우, RTFHandler에 위임합니다.
+        RTFHandler.extract_text()는 raw bytes를 받으므로 current_file을 그대로 전달합니다.
+
+        Args:
+            rtf_doc: Pre-converted RTFDocument object (unused, for consistency)
+            current_file: CurrentFile dict containing original file_data
+            extract_metadata: Whether to extract metadata
+
+        Returns:
+            Extracted text
+        """
+        from contextifier.core.processor.rtf_handler import RTFHandler
+
+        rtf_handler = RTFHandler(
+            config=self.config,
+            image_processor=self._image_processor,
+            page_tag_processor=self._page_tag_processor,
+            chart_processor=self._chart_processor
+        )
+
+        # RTFHandler.extract_text()는 current_file에서 file_data를 직접 읽어 처리
+        return rtf_handler.extract_text(current_file, extract_metadata=extract_metadata)
+
+    def _extract_from_ole_obj(self, ole, current_file: "CurrentFile", extract_metadata: bool) -> str:
+        """OLE Compound Document processing using pre-converted OLE object."""
         file_path = current_file.get("file_path", "unknown")
-        file_data = current_file.get("file_data", b"")
-        
-        self.logger.info(f"Processing RTF: {file_path}")
-        
+
+        self.logger.info(f"Processing OLE: {file_path}")
+
+        result_parts = []
+        processed_images: Set[str] = set()
+
         try:
-            content = file_data
-            
-            processed_images: Set[str] = set()
-            doc = parse_rtf(content, processed_images=processed_images, image_processor=self.image_processor)
-            
-            result_parts = []
-            
+            # Metadata extraction
             if extract_metadata:
-                metadata_str = self._format_metadata(doc.metadata)
+                metadata = self._extract_ole_metadata(ole)
+                metadata_str = self.extract_and_format_metadata(metadata)
                 if metadata_str:
                     result_parts.append(metadata_str + "\n\n")
-            
+
             page_tag = self.create_page_tag(1)
             result_parts.append(f"{page_tag}\n")
-            
-            inline_content = doc.get_inline_content()
-            if inline_content:
-                result_parts.append(inline_content)
-            else:
-                if doc.text_content:
-                    result_parts.append(doc.text_content)
-                
-                for table in doc.tables:
-                    if not table.rows:
-                        continue
-                    if table.is_real_table():
-                        result_parts.append("\n" + table.to_html() + "\n")
-                    else:
-                        result_parts.append("\n" + table.to_text_list() + "\n")
-            
-            result = "\n".join(result_parts)
-            result = re.sub(r'\[image:[^\]]*uploads/\.[^\]]*\]', '', result)
-            
-            return result
-            
+
+            # Extract text from WordDocument stream
+            text = self._extract_ole_text(ole)
+            if text:
+                result_parts.append(text)
+
+            # Extract images
+            images = self._extract_ole_images(ole, processed_images)
+            for img_tag in images:
+                result_parts.append(img_tag)
+
         except Exception as e:
-            self.logger.error(f"RTF processing error: {e}")
-            return self._extract_rtf_fallback(current_file, extract_metadata)
-    
-    def _extract_rtf_fallback(self, current_file: "CurrentFile", extract_metadata: bool) -> str:
-        """RTF fallback (striprtf)."""
-        file_data = current_file.get("file_data", b"")
-        
-        content = None
-        for encoding in ['utf-8', 'cp949', 'euc-kr', 'cp1252', 'latin-1']:
-            try:
-                content = file_data.decode(encoding)
-                break
-            except (UnicodeDecodeError, UnicodeError):
-                continue
-        
-        if content is None:
-            content = file_data.decode('cp1252', errors='replace')
-        
-        result_parts = []
-        
-        if extract_metadata:
-            metadata = self._extract_rtf_metadata(content)
-            metadata_str = self._format_metadata(metadata)
-            if metadata_str:
-                result_parts.append(metadata_str + "\n\n")
-        
-        page_tag = self.create_page_tag(1)
-        result_parts.append(f"{page_tag}\n")
-        
-        try:
-            text = rtf_to_text(content)
-        except:
-            text = re.sub(r'\\[a-z]+\d*\s?', '', content)
-            text = re.sub(r"\\'[0-9a-fA-F]{2}", '', text)
-            text = re.sub(r'[{}]', '', text)
-        
-        if text:
-            text = re.sub(r'\n{3,}', '\n\n', text)
-            result_parts.append(text.strip())
-        
+            self.logger.error(f"OLE processing error: {e}")
+            return f"[DOC file processing failed: {str(e)}]"
+        finally:
+            # Close the OLE object
+            self.file_converter.close(ole)
+
         return "\n".join(result_parts)
-    
-    def _extract_rtf_metadata(self, content: str) -> Dict[str, Any]:
-        """RTF metadata extraction."""
-        metadata = {}
-        patterns = {
-            'title': r'\\title\s*\{([^}]*)\}',
-            'subject': r'\\subject\s*\{([^}]*)\}',
-            'author': r'\\author\s*\{([^}]*)\}',
-            'keywords': r'\\keywords\s*\{([^}]*)\}',
-            'comments': r'\\doccomm\s*\{([^}]*)\}',
-            'last_saved_by': r'\\operator\s*\{([^}]*)\}',
-        }
-        
-        for key, pattern in patterns.items():
-            match = re.search(pattern, content, re.IGNORECASE)
-            if match:
-                value = match.group(1).strip()
-                if value:
-                    metadata[key] = value
-        
-        return metadata
-    
+
     def _extract_from_ole(self, current_file: "CurrentFile", extract_metadata: bool) -> str:
         """OLE Compound Document processing - extract text directly from WordDocument stream."""
         file_path = current_file.get("file_path", "unknown")
         file_data = current_file.get("file_data", b"")
-        
+
         self.logger.info(f"Processing OLE: {file_path}")
-        
+
         result_parts = []
         processed_images: Set[str] = set()
-        
+
         try:
             file_stream = io.BytesIO(file_data)
             with olefile.OleFileIO(file_stream) as ole:
                 # Metadata extraction
                 if extract_metadata:
                     metadata = self._extract_ole_metadata(ole)
-                    metadata_str = self._format_metadata(metadata)
+                    metadata_str = self.extract_and_format_metadata(metadata)
                     if metadata_str:
                         result_parts.append(metadata_str + "\n\n")
-                
+
                 page_tag = self.create_page_tag(1)
                 result_parts.append(f"{page_tag}\n")
-                
+
                 # Extract text from WordDocument stream
                 text = self._extract_ole_text(ole)
                 if text:
                     result_parts.append(text)
-                
+
                 # Extract images
                 images = self._extract_ole_images(ole, processed_images)
                 for img_tag in images:
                     result_parts.append(img_tag)
-                
+
         except Exception as e:
             self.logger.error(f"OLE processing error: {e}")
             return f"[DOC file processing failed: {str(e)}]"
-        
+
         return "\n".join(result_parts)
-    
+
     def _extract_ole_metadata(self, ole: olefile.OleFileIO) -> Dict[str, Any]:
         """OLE 메타데이터 추출"""
         metadata = {}
@@ -317,7 +241,7 @@ def _extract_ole_metadata(self, ole: olefile.OleFileIO) -> Dict[str, Any]:
         except Exception as e:
             self.logger.warning(f"Error extracting OLE metadata: {e}")
         return metadata
-    
+
     def _decode_ole_string(self, value) -> str:
         """OLE 문자열 디코딩"""
         if value is None:
@@ -332,7 +256,7 @@ def _decode_ole_string(self, value) -> str:
                     continue
             return value.decode('utf-8', errors='replace').strip()
         return str(value).strip()
-    
+
     def _extract_ole_images(self, ole: olefile.OleFileIO, processed_images: Set[str]) -> List[str]:
         """OLE에서 이미지 추출"""
         images = []
@@ -342,10 +266,10 @@ def _extract_ole_images(self, ole: olefile.OleFileIO, processed_images: Set[str]
                     try:
                         stream = ole.openstream(entry)
                         data = stream.read()
-                        
+
                         if data[:8] == b'\x89PNG\r\n\x1a\n' or data[:2] == b'\xff\xd8' or \
                            data[:6] in (b'GIF87a', b'GIF89a') or data[:2] == b'BM':
-                            image_tag = self.image_processor.save_image(data)
+                            image_tag = self.format_image_processor.save_image(data)
                             if image_tag:
                                 images.append(f"\n{image_tag}\n")
                     except:
@@ -353,14 +277,108 @@ def _extract_ole_images(self, ole: olefile.OleFileIO, processed_images: Set[str]
         except Exception as e:
             self.logger.warning(f"Error extracting OLE images: {e}")
         return images
-    
+
+    def _extract_from_html_obj(self, soup, current_file: "CurrentFile", extract_metadata: bool) -> str:
+        """HTML DOC processing using pre-converted BeautifulSoup object."""
+        file_path = current_file.get("file_path", "unknown")
+
+        self.logger.info(f"Processing HTML DOC: {file_path}")
+
+        result_parts = []
+
+        if extract_metadata:
+            metadata = self._extract_html_metadata(soup)
+            metadata_str = self.extract_and_format_metadata(metadata)
+            if metadata_str:
+                result_parts.append(metadata_str + "\n\n")
+
+        page_tag = self.create_page_tag(1)
+        result_parts.append(f"{page_tag}\n")
+
+        # Copy soup to avoid modifying the original
+        soup_copy = BeautifulSoup(str(soup), 'html.parser')
+
+        for tag in soup_copy(['script', 'style', 'meta', 'link', 'head']):
+            tag.decompose()
+
+        text = soup_copy.get_text(separator='\n', strip=True)
+        text = re.sub(r'\n{3,}', '\n\n', text)
+
+        if text:
+            result_parts.append(text)
+
+        for table in soup_copy.find_all('table'):
+            table_html = str(table)
+            table_html = re.sub(r'\s+style="[^"]*"', '', table_html)
+            table_html = re.sub(r'\s+class="[^"]*"', '', table_html)
+            result_parts.append("\n" + table_html + "\n")
+
+        for img in soup_copy.find_all('img'):
+            src = img.get('src', '')
+            if src and src.startswith('data:image'):
+                try:
+                    match = re.match(r'data:image/(\w+);base64,(.+)', src)
+                    if match:
+                        image_data = base64.b64decode(match.group(2))
+                        image_tag = self.format_image_processor.save_image(image_data)
+                        if image_tag:
+                            result_parts.append(f"\n{image_tag}\n")
+                except:
+                    pass
+
+        return "\n".join(result_parts)
+
+    def _extract_from_docx_obj(self, doc, current_file: "CurrentFile", extract_metadata: bool) -> str:
+        """Extract from misnamed DOCX using pre-converted Document object."""
+        file_path = current_file.get("file_path", "unknown")
+
+        self.logger.info(f"Processing misnamed DOCX: {file_path}")
+
+        try:
+            result_parts = []
+
+            if extract_metadata:
+                # Basic metadata from docx Document
+                if hasattr(doc, 'core_properties'):
+                    metadata = {
+                        'title': doc.core_properties.title or '',
+                        'author': doc.core_properties.author or '',
+                        'subject': doc.core_properties.subject or '',
+                        'keywords': doc.core_properties.keywords or '',
+                    }
+                    metadata = {k: v for k, v in metadata.items() if v}
+                    metadata_str = self.extract_and_format_metadata(metadata)
+                    if metadata_str:
+                        result_parts.append(metadata_str + "\n\n")
+
+            page_tag = self.create_page_tag(1)
+            result_parts.append(f"{page_tag}\n")
+
+            for para in doc.paragraphs:
+                if para.text.strip():
+                    result_parts.append(para.text)
+
+            for table in doc.tables:
+                for row in table.rows:
+                    row_texts = []
+                    for cell in row.cells:
+                        row_texts.append(cell.text.strip())
+                    if any(t for t in row_texts):
+                        result_parts.append(" | ".join(row_texts))
+
+            return "\n".join(result_parts)
+
+        except Exception as e:
+            self.logger.error(f"Error processing misnamed DOCX: {e}")
+            return f"[DOCX processing failed: {str(e)}]"
+
     def _extract_from_html(self, current_file: "CurrentFile", extract_metadata: bool) -> str:
         """HTML DOC processing."""
         file_path = current_file.get("file_path", "unknown")
         file_data = current_file.get("file_data", b"")
-        
+
         self.logger.info(f"Processing HTML DOC: {file_path}")
-        
+
         content = None
         for encoding in ['utf-8', 'utf-8-sig', 'cp949', 'euc-kr', 'cp1252', 'latin-1']:
             try:
@@ -368,37 +386,37 @@ def _extract_from_html(self, current_file: "CurrentFile", extract_metadata: bool
                 break
             except (UnicodeDecodeError, UnicodeError):
                 continue
-        
+
         if content is None:
             content = file_data.decode('utf-8', errors='replace')
-        
+
         result_parts = []
         soup = BeautifulSoup(content, 'html.parser')
-        
+
         if extract_metadata:
             metadata = self._extract_html_metadata(soup)
-            metadata_str = self._format_metadata(metadata)
+            metadata_str = self.extract_and_format_metadata(metadata)
             if metadata_str:
                 result_parts.append(metadata_str + "\n\n")
-        
+
         page_tag = self.create_page_tag(1)
         result_parts.append(f"{page_tag}\n")
-        
+
         for tag in soup(['script', 'style', 'meta', 'link', 'head']):
             tag.decompose()
-        
+
         text = soup.get_text(separator='\n', strip=True)
         text = re.sub(r'\n{3,}', '\n\n', text)
-        
+
         if text:
             result_parts.append(text)
-        
+
         for table in soup.find_all('table'):
             table_html = str(table)
             table_html = re.sub(r'\s+style="[^"]*"', '', table_html)
             table_html = re.sub(r'\s+class="[^"]*"', '', table_html)
             result_parts.append("\n" + table_html + "\n")
-        
+
         for img in soup.find_all('img'):
             src = img.get('src', '')
             if src and src.startswith('data:image'):
@@ -406,50 +424,50 @@ def _extract_from_html(self, current_file: "CurrentFile", extract_metadata: bool
                     match = re.match(r'data:image/(\w+);base64,(.+)', src)
                     if match:
                         image_data = base64.b64decode(match.group(2))
-                        image_tag = self.image_processor.save_image(image_data)
+                        image_tag = self.format_image_processor.save_image(image_data)
                         if image_tag:
                             result_parts.append(f"\n{image_tag}\n")
                 except:
                     pass
-        
+
         return "\n".join(result_parts)
-    
+
     def _extract_html_metadata(self, soup: BeautifulSoup) -> Dict[str, Any]:
         """HTML metadata extraction."""
         metadata = {}
         title_tag = soup.find('title')
         if title_tag and title_tag.string:
             metadata['title'] = title_tag.string.strip()
-        
+
         meta_mappings = {
             'author': 'author', 'description': 'comments', 'keywords': 'keywords',
             'subject': 'subject', 'creator': 'author', 'producer': 'last_saved_by',
         }
-        
+
         for meta in soup.find_all('meta'):
             name = meta.get('name', '').lower()
             content = meta.get('content', '')
             if name in meta_mappings and content:
                 metadata[meta_mappings[name]] = content.strip()
-        
+
         return metadata
-    
+
     def _extract_from_docx_misnamed(self, current_file: "CurrentFile", extract_metadata: bool) -> str:
         """Process misnamed DOCX file."""
         file_path = current_file.get("file_path", "unknown")
-        
+
         self.logger.info(f"Processing misnamed DOCX: {file_path}")
-        
+
         try:
             from contextifier.core.processor.docx_handler import DOCXHandler
-            
+
             # Pass current_file directly - DOCXHandler now accepts CurrentFile
-            docx_handler = DOCXHandler(config=self.config, image_processor=self.image_processor)
+            docx_handler = DOCXHandler(config=self.config, image_processor=self.format_image_processor)
             return docx_handler.extract_text(current_file, extract_metadata=extract_metadata)
         except Exception as e:
             self.logger.error(f"Error processing misnamed DOCX: {e}")
             return f"[DOC file processing failed: {str(e)}]"
-    
+
     def _extract_ole_text(self, ole: olefile.OleFileIO) -> str:
         """Extract text from OLE WordDocument stream."""
         try:
@@ -457,47 +475,47 @@ def _extract_ole_text(self, ole: olefile.OleFileIO) -> str:
             if not ole.exists('WordDocument'):
                 self.logger.warning("WordDocument stream not found")
                 return ""
-            
+
             # Read Word Document stream
             word_stream = ole.openstream('WordDocument')
             word_data = word_stream.read()
-            
+
             if len(word_data) < 12:
                 return ""
-            
+
             # FIB (File Information Block) parsing
             # Check magic number (0xA5EC or 0xA5DC)
             magic = struct.unpack('<H', word_data[0:2])[0]
             if magic not in (0xA5EC, 0xA5DC):
                 self.logger.warning(f"Invalid Word magic number: {hex(magic)}")
                 return ""
-            
+
             # Text extraction attempt
             text_parts = []
-            
+
             # 1. Table 스트림에서 텍스트 조각 찾기 시도
             table_stream_name = None
             if ole.exists('1Table'):
                 table_stream_name = '1Table'
             elif ole.exists('0Table'):
                 table_stream_name = '0Table'
-            
+
             # 2. 간단한 방식: 유니코드/ASCII 텍스트 직접 추출
             # Word 97-2003은 대부분 유니코드 텍스트를 포함
             extracted_text = self._extract_text_from_word_stream(word_data)
             if extracted_text:
                 text_parts.append(extracted_text)
-            
+
             return '\n'.join(text_parts)
-            
+
         except Exception as e:
             self.logger.warning(f"Error extracting OLE text: {e}")
             return ""
-    
+
     def _extract_text_from_word_stream(self, data: bytes) -> str:
         """Word 스트림에서 텍스트 추출 (휴리스틱 방식)"""
         text_parts = []
-        
+
         # 방법 1: UTF-16LE 유니코드 텍스트 추출
         try:
             # 연속된 유니코드 문자열 찾기
@@ -511,7 +529,7 @@ def _extract_text_from_word_stream(self, data: bytes) -> str:
                     while j < len(data) - 1:
                         char = data[j]
                         next_byte = data[j+1]
-                        
+
                         # ASCII 범위 유니코드 문자 또는 한글
                         if next_byte == 0x00 and (0x20 <= char <= 0x7E or char in (0x0D, 0x0A, 0x09)):
                             unicode_bytes.extend([char, next_byte])
@@ -524,7 +542,7 @@ def _extract_text_from_word_stream(self, data: bytes) -> str:
                             j += 2
                         else:
                             break
-                    
+
                     if len(unicode_bytes) >= 8:  # 최소 4자 이상
                         try:
                             text = bytes(unicode_bytes).decode('utf-16-le', errors='ignore')
@@ -542,7 +560,7 @@ def _extract_text_from_word_stream(self, data: bytes) -> str:
                     i += 1
         except Exception as e:
             self.logger.debug(f"Unicode extraction error: {e}")
-        
+
         # 결과 정리
         if text_parts:
             # 중복 제거 및 연결
@@ -552,26 +570,10 @@ def _extract_text_from_word_stream(self, data: bytes) -> str:
                 if part not in seen and len(part) > 3:
                     seen.add(part)
                     unique_parts.append(part)
-            
+
             result = '\n'.join(unique_parts)
             # 과도한 줄바꿈 정리
             result = re.sub(r'\n{3,}', '\n\n', result)
             return result.strip()
-        
+
         return ""
-    
-    def _format_metadata(self, metadata: Dict[str, Any]) -> str:
-        """메타데이터 포맷팅"""
-        if not metadata:
-            return ""
-        
-        lines = ["<Document-Metadata>"]
-        for key, label in METADATA_FIELD_NAMES.items():
-            if key in metadata and metadata[key]:
-                value = metadata[key]
-                if isinstance(value, datetime):
-                    value = value.strftime('%Y-%m-%d %H:%M:%S')
-                lines.append(f"  {label}: {value}")
-        lines.append("</Document-Metadata>")
-        
-        return "\n".join(lines)
diff --git a/contextifier/core/processor/doc_helpers/__init__.py b/contextifier/core/processor/doc_helpers/__init__.py
index 2f5b8d5..70b2e7e 100644
--- a/contextifier/core/processor/doc_helpers/__init__.py
+++ b/contextifier/core/processor/doc_helpers/__init__.py
@@ -1,48 +1,24 @@
 # libs/core/processor/doc_helpers/__init__.py
 """
-DOC/RTF Helper 모듈
+DOC Helper 모듈
 
-DOC 및 RTF 문서 처리에 필요한 유틸리티를 제공합니다.
+DOC 문서 처리에 필요한 유틸리티를 제공합니다.
+
+RTF 관련 모듈들은 rtf_helper로 이동했습니다.
+RTF 처리가 필요한 경우 rtf_helper를 사용하세요:
+    from contextifier.core.processor import rtf_helper
+    from contextifier.core.processor.rtf_helper import RTFParser
 
 모듈 구성:
-- rtf_constants: RTF 관련 상수 정의
-- rtf_models: RTF 데이터 모델
-- rtf_parser: RTF 파싱
-- rtf_decoder: RTF 디코딩
-- rtf_content_extractor: RTF 콘텐츠 추출
-- rtf_table_extractor: RTF 테이블 추출
-- rtf_metadata_extractor: RTF 메타데이터 추출
-- rtf_region_finder: RTF 영역 탐색
-- rtf_text_cleaner: RTF 텍스트 정리
-- rtf_bin_processor: RTF 바이너리 처리
+- doc_file_converter: DOC 파일 변환기
+- doc_image_processor: DOC 이미지 처리기
 """
 
-# Constants
-from contextifier.core.processor.doc_helpers.rtf_constants import *
-
-# Models
-from contextifier.core.processor.doc_helpers.rtf_models import *
-
-# Parser
-from contextifier.core.processor.doc_helpers.rtf_parser import *
-
-# Decoder
-from contextifier.core.processor.doc_helpers.rtf_decoder import *
-
-# Content Extractor
-from contextifier.core.processor.doc_helpers.rtf_content_extractor import *
-
-# Table Extractor
-from contextifier.core.processor.doc_helpers.rtf_table_extractor import *
-
-# Metadata Extractor
-from contextifier.core.processor.doc_helpers.rtf_metadata_extractor import *
-
-# Region Finder
-from contextifier.core.processor.doc_helpers.rtf_region_finder import *
-
-# Text Cleaner
-from contextifier.core.processor.doc_helpers.rtf_text_cleaner import *
+# DOC-specific components
+from contextifier.core.processor.doc_helpers.doc_file_converter import DOCFileConverter
+from contextifier.core.processor.doc_helpers.doc_image_processor import DOCImageProcessor
 
-# Binary Processor
-from contextifier.core.processor.doc_helpers.rtf_bin_processor import *
+__all__ = [
+    'DOCFileConverter',
+    'DOCImageProcessor',
+]
diff --git a/contextifier/core/processor/doc_helpers/doc_file_converter.py b/contextifier/core/processor/doc_helpers/doc_file_converter.py
new file mode 100644
index 0000000..2b3b3c9
--- /dev/null
+++ b/contextifier/core/processor/doc_helpers/doc_file_converter.py
@@ -0,0 +1,159 @@
+# libs/core/processor/doc_helpers/doc_file_converter.py
+"""
+DOCFileConverter - DOC file format converter
+
+Converts binary DOC data to appropriate format based on detection.
+Supports RTF, OLE, HTML, and misnamed DOCX files.
+"""
+from io import BytesIO
+from typing import Any, Optional, BinaryIO, Tuple
+from enum import Enum
+import zipfile
+
+from contextifier.core.functions.file_converter import BaseFileConverter
+
+
+class DocFormat(Enum):
+    """Detected DOC file format."""
+    RTF = "rtf"
+    OLE = "ole"
+    HTML = "html"
+    DOCX = "docx"
+    UNKNOWN = "unknown"
+
+
+class DOCFileConverter(BaseFileConverter):
+    """
+    DOC file converter with format auto-detection.
+
+    Detects actual format (RTF, OLE, HTML, DOCX) and converts accordingly.
+    """
+
+    # Magic numbers for format detection
+    MAGIC_RTF = b'{\\rtf'
+    MAGIC_OLE = b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1'
+    MAGIC_ZIP = b'PK\x03\x04'
+
+    def __init__(self):
+        """Initialize DOCFileConverter."""
+        self._detected_format: DocFormat = DocFormat.UNKNOWN
+
+    def convert(
+        self,
+        file_data: bytes,
+        file_stream: Optional[BinaryIO] = None,
+        **kwargs
+    ) -> Tuple[Any, DocFormat]:
+        """
+        Convert binary DOC data to appropriate format.
+
+        Args:
+            file_data: Raw binary DOC data
+            file_stream: Optional file stream
+            **kwargs: Additional options
+
+        Returns:
+            Tuple of (converted object, detected format)
+            - RTF: (bytes, DocFormat.RTF) - 원본 바이너리 반환 (RTFHandler에서 처리)
+            - OLE: (olefile.OleFileIO, DocFormat.OLE)
+            - HTML: (BeautifulSoup, DocFormat.HTML)
+            - DOCX: (docx.Document, DocFormat.DOCX)
+
+        Raises:
+            Exception: If conversion fails
+        """
+        self._detected_format = self._detect_format(file_data)
+
+        if self._detected_format == DocFormat.RTF:
+            # RTF는 원본 바이너리 반환 - RTFHandler.extract_text()에서 처리
+            return file_data, self._detected_format
+        elif self._detected_format == DocFormat.OLE:
+            return self._convert_ole(file_data), self._detected_format
+        elif self._detected_format == DocFormat.HTML:
+            return self._convert_html(file_data), self._detected_format
+        elif self._detected_format == DocFormat.DOCX:
+            return self._convert_docx(file_data), self._detected_format
+        else:
+            # Try OLE as fallback
+            return self._convert_ole(file_data), DocFormat.OLE
+
+    def _detect_format(self, file_data: bytes) -> DocFormat:
+        """Detect actual file format from binary data."""
+        if not file_data:
+            return DocFormat.UNKNOWN
+
+        header = file_data[:32] if len(file_data) >= 32 else file_data
+
+        # Check RTF
+        if header.startswith(self.MAGIC_RTF):
+            return DocFormat.RTF
+
+        # Check OLE
+        if header.startswith(self.MAGIC_OLE):
+            return DocFormat.OLE
+
+        # Check ZIP (possible DOCX)
+        if header.startswith(self.MAGIC_ZIP):
+            try:
+                with zipfile.ZipFile(BytesIO(file_data), 'r') as zf:
+                    if '[Content_Types].xml' in zf.namelist():
+                        return DocFormat.DOCX
+            except zipfile.BadZipFile:
+                pass
+
+        # Check HTML
+        header_lower = header.lower()
+        if (header_lower.startswith(b'<!doctype') or
+            header_lower.startswith(b'<html') or
+            b'<html' in header_lower[:100]):
+            return DocFormat.HTML
+
+        # Check for BOM + RTF
+        if header.startswith(b'\xef\xbb\xbf'):
+            text_header = header[3:].decode('utf-8', errors='ignore').lower()
+            if text_header.startswith('{\\rtf'):
+                return DocFormat.RTF
+
+        return DocFormat.UNKNOWN
+
+    def _convert_ole(self, file_data: bytes) -> Any:
+        """Convert OLE data."""
+        import olefile
+        return olefile.OleFileIO(BytesIO(file_data))
+
+    def _convert_html(self, file_data: bytes) -> Any:
+        """Convert HTML data."""
+        from bs4 import BeautifulSoup
+        # Decode with fallback
+        try:
+            text = file_data.decode('utf-8')
+        except UnicodeDecodeError:
+            text = file_data.decode('cp949', errors='replace')
+        return BeautifulSoup(text, 'html.parser')
+
+    def _convert_docx(self, file_data: bytes) -> Any:
+        """Convert misnamed DOCX data."""
+        from docx import Document
+        return Document(BytesIO(file_data))
+
+    def get_format_name(self) -> str:
+        """Return detected format name."""
+        format_names = {
+            DocFormat.RTF: "RTF Document",
+            DocFormat.OLE: "OLE Document (DOC)",
+            DocFormat.HTML: "HTML Document",
+            DocFormat.DOCX: "DOCX Document (misnamed)",
+            DocFormat.UNKNOWN: "Unknown DOC Format",
+        }
+        return format_names.get(self._detected_format, "Unknown")
+
+    @property
+    def detected_format(self) -> DocFormat:
+        """Return detected format after conversion."""
+        return self._detected_format
+
+    def close(self, converted_object: Any) -> None:
+        """Close the converted object if needed."""
+        if converted_object is not None:
+            if hasattr(converted_object, 'close'):
+                converted_object.close()
diff --git a/contextifier/core/processor/doc_helpers/doc_image_processor.py b/contextifier/core/processor/doc_helpers/doc_image_processor.py
new file mode 100644
index 0000000..c118c11
--- /dev/null
+++ b/contextifier/core/processor/doc_helpers/doc_image_processor.py
@@ -0,0 +1,179 @@
+# contextifier/core/processor/doc_helpers/doc_image_processor.py
+"""
+DOC Image Processor
+
+Provides DOC-specific image processing that inherits from ImageProcessor.
+Handles images from RTF, OLE compound documents, and HTML-formatted DOC files.
+"""
+import logging
+from typing import Any, Dict, Optional, Set
+
+from contextifier.core.functions.img_processor import ImageProcessor
+from contextifier.core.functions.storage_backend import BaseStorageBackend
+
+logger = logging.getLogger("contextify.image_processor.doc")
+
+
+class DOCImageProcessor(ImageProcessor):
+    """
+    DOC-specific image processor.
+    
+    Inherits from ImageProcessor and provides DOC-specific processing.
+    
+    Handles:
+    - RTF embedded images (pict, shppict, blipuid)
+    - OLE compound document images (Pictures stream, embedded objects)
+    - HTML-format DOC images (base64 encoded)
+    - WMF/EMF metafiles
+    
+    Example:
+        processor = DOCImageProcessor()
+        
+        # Process RTF picture
+        tag = processor.process_image(image_data, source="rtf", blipuid="abc123")
+        
+        # Process OLE embedded image
+        tag = processor.process_ole_image(ole_data, stream_name="Pictures/image1.png")
+        
+        # Process HTML base64 image
+        tag = processor.process_html_image(base64_data, src_attr="data:image/png;base64,...")
+    """
+    
+    def __init__(
+        self,
+        directory_path: str = "temp/images",
+        tag_prefix: str = "[Image:",
+        tag_suffix: str = "]",
+        storage_backend: Optional[BaseStorageBackend] = None,
+    ):
+        """
+        Initialize DOCImageProcessor.
+        
+        Args:
+            directory_path: Image save directory
+            tag_prefix: Tag prefix for image references
+            tag_suffix: Tag suffix for image references
+            storage_backend: Storage backend for saving images
+        """
+        super().__init__(
+            directory_path=directory_path,
+            tag_prefix=tag_prefix,
+            tag_suffix=tag_suffix,
+            storage_backend=storage_backend,
+        )
+        self._processed_blipuids: Set[str] = set()
+    
+    def process_image(
+        self,
+        image_data: bytes,
+        source: Optional[str] = None,
+        blipuid: Optional[str] = None,
+        stream_name: Optional[str] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process and save DOC image data.
+        
+        Args:
+            image_data: Raw image binary data
+            source: Image source type ("rtf", "ole", "html")
+            blipuid: RTF BLIP unique ID (for deduplication)
+            stream_name: OLE stream name
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string or None if processing failed
+        """
+        # Custom naming based on source
+        custom_name = None
+        
+        if source == "rtf" and blipuid:
+            # Use blipuid for RTF images (deduplication key)
+            if blipuid in self._processed_blipuids:
+                logger.debug(f"Skipping duplicate RTF image: {blipuid}")
+                return None
+            self._processed_blipuids.add(blipuid)
+            custom_name = f"rtf_{blipuid[:16]}"
+        elif source == "ole" and stream_name:
+            # Use stream name for OLE images
+            import os
+            custom_name = f"ole_{os.path.basename(stream_name).split('.')[0]}"
+        elif source == "html":
+            custom_name = None  # Use hash-based naming
+        
+        return self.save_image(image_data, custom_name=custom_name)
+    
+    def process_ole_image(
+        self,
+        image_data: bytes,
+        stream_name: Optional[str] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process OLE compound document embedded image.
+        
+        Args:
+            image_data: Raw image binary data from OLE stream
+            stream_name: Name of the OLE stream
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string or None if processing failed
+        """
+        return self.process_image(
+            image_data,
+            source="ole",
+            stream_name=stream_name,
+            **kwargs
+        )
+    
+    def process_rtf_image(
+        self,
+        image_data: bytes,
+        blipuid: Optional[str] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process RTF embedded image.
+        
+        Args:
+            image_data: Raw image binary data from RTF
+            blipuid: BLIP unique ID for deduplication
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string or None if processing failed
+        """
+        return self.process_image(
+            image_data,
+            source="rtf",
+            blipuid=blipuid,
+            **kwargs
+        )
+    
+    def process_html_image(
+        self,
+        image_data: bytes,
+        src_attr: Optional[str] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process HTML-format DOC base64 image.
+        
+        Args:
+            image_data: Decoded image binary data
+            src_attr: Original src attribute value
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string or None if processing failed
+        """
+        return self.process_image(
+            image_data,
+            source="html",
+            **kwargs
+        )
+    
+    def reset_tracking(self) -> None:
+        """Reset processed image tracking for new document."""
+        self._processed_blipuids.clear()
diff --git a/contextifier/core/processor/doc_helpers/doc_preprocessor.py b/contextifier/core/processor/doc_helpers/doc_preprocessor.py
new file mode 100644
index 0000000..cd2f5d3
--- /dev/null
+++ b/contextifier/core/processor/doc_helpers/doc_preprocessor.py
@@ -0,0 +1,83 @@
+# contextifier/core/processor/doc_helpers/doc_preprocessor.py
+"""
+DOC Preprocessor - Process DOC content after conversion.
+
+Processing Pipeline Position:
+    1. DOCFileConverter.convert() → (converted_obj, DocFormat)
+    2. DOCPreprocessor.preprocess() → PreprocessedData (THIS STEP)
+    3. Content extraction (depends on format: RTF, OLE, HTML, DOCX)
+
+Current Implementation:
+    - Pass-through (DOC delegates to format-specific handlers)
+"""
+import logging
+from typing import Any, Dict
+
+from contextifier.core.functions.preprocessor import (
+    BasePreprocessor,
+    PreprocessedData,
+)
+
+logger = logging.getLogger("contextify.doc.preprocessor")
+
+
+class DOCPreprocessor(BasePreprocessor):
+    """
+    DOC Document Preprocessor.
+
+    Currently a pass-through implementation as DOC processing
+    delegates to format-specific handlers (RTF, OLE, HTML, DOCX).
+    """
+
+    def preprocess(
+        self,
+        converted_data: Any,
+        **kwargs
+    ) -> PreprocessedData:
+        """
+        Preprocess the converted DOC content.
+
+        Args:
+            converted_data: Tuple of (converted_obj, DocFormat) from DOCFileConverter
+            **kwargs: Additional options
+
+        Returns:
+            PreprocessedData with the converted object
+        """
+        metadata: Dict[str, Any] = {}
+
+        converted_obj = converted_data
+        doc_format = None
+
+        # Handle tuple return from DOCFileConverter
+        if isinstance(converted_data, tuple) and len(converted_data) >= 2:
+            converted_obj, doc_format = converted_data[0], converted_data[1]
+            if hasattr(doc_format, 'value'):
+                metadata['detected_format'] = doc_format.value
+            else:
+                metadata['detected_format'] = str(doc_format)
+
+        logger.debug("DOC preprocessor: pass-through, metadata=%s", metadata)
+
+        # clean_content is the TRUE SOURCE - contains the converted object
+        # For DOC, this is the format-specific object (OLE, BeautifulSoup, etc.)
+        return PreprocessedData(
+            raw_content=converted_data,
+            clean_content=converted_obj,  # TRUE SOURCE - the converted object
+            encoding="utf-8",
+            extracted_resources={"doc_format": doc_format},
+            metadata=metadata,
+        )
+
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "DOC Preprocessor"
+
+    def validate(self, data: Any) -> bool:
+        """Validate if data is DOC conversion result."""
+        if isinstance(data, tuple) and len(data) >= 2:
+            return True
+        return data is not None
+
+
+__all__ = ['DOCPreprocessor']
diff --git a/contextifier/core/processor/doc_helpers/rtf_bin_processor.py b/contextifier/core/processor/doc_helpers/rtf_bin_processor.py
deleted file mode 100644
index 316d556..0000000
--- a/contextifier/core/processor/doc_helpers/rtf_bin_processor.py
+++ /dev/null
@@ -1,537 +0,0 @@
-# service/document_processor/processor/doc_helpers/rtf_bin_processor.py
-"""
-RTF Binary Data Processor - RTF 파일의 바이너리 데이터 처리기
-
-RTF 파일 내의 바이너리 이미지 데이터를 처리합니다:
-- bin 태그: 직접 바이너리 데이터 (JPEG, PNG, WMF 등)
-- pict 그룹: 16진수 인코딩 또는 바이너리 이미지
-
-주요 기능:
-1. \binN 태그 스킵 (N 바이트의 바이너리 데이터를 건너뜀)
-2. \pict 그룹에서 이미지 추출
-3. 이미지를 로컬에 저장하고 [image:path] 태그로 변환
-
-RTF 스펙:
-- \binN: N 바이트의 raw 바이너리 데이터가 뒤따름
-- \pict: 이미지 그룹 시작
-- \jpegblip: JPEG 형식
-- \pngblip: PNG 형식
-- \wmetafile: Windows Metafile
-- \emfblip: Enhanced Metafile
-"""
-import logging
-import re
-import struct
-from dataclasses import dataclass, field
-from typing import Any, Dict, List, Optional, Set, Tuple
-
-from contextifier.core.functions.img_processor import ImageProcessor
-
-logger = logging.getLogger("document-processor")
-
-
-# === 이미지 형식 상수 ===
-
-# 매직 넘버로 이미지 형식 판별
-IMAGE_SIGNATURES = {
-    b'\xff\xd8\xff': 'jpeg',           # JPEG
-    b'\x89PNG\r\n\x1a\n': 'png',       # PNG
-    b'GIF87a': 'gif',                  # GIF87
-    b'GIF89a': 'gif',                  # GIF89
-    b'BM': 'bmp',                      # BMP
-    b'\xd7\xcd\xc6\x9a': 'wmf',        # WMF (placeable)
-    b'\x01\x00\x09\x00': 'wmf',        # WMF (standard)
-    b'\x01\x00\x00\x00': 'emf',        # EMF
-}
-
-# RTF 이미지 타입 매핑
-RTF_IMAGE_TYPES = {
-    'jpegblip': 'jpeg',
-    'pngblip': 'png',
-    'wmetafile': 'wmf',
-    'emfblip': 'emf',
-    'dibitmap': 'bmp',
-    'wbitmap': 'bmp',
-}
-
-
-@dataclass
-class RTFBinaryRegion:
-    """RTF 바이너리 데이터 영역 정보"""
-    start_pos: int          # 원본에서의 시작 위치 (바이트)
-    end_pos: int            # 원본에서의 끝 위치 (바이트)
-    bin_type: str           # "bin" 또는 "pict"
-    data_size: int          # 바이너리 데이터 크기
-    image_format: str = ""  # 이미지 형식 (jpeg, png, wmf 등)
-    image_data: bytes = b"" # 추출된 이미지 데이터
-
-
-@dataclass
-class RTFBinaryProcessResult:
-    """RTF 바이너리 처리 결과"""
-    clean_content: bytes                    # 바이너리가 제거/치환된 콘텐츠
-    binary_regions: List[RTFBinaryRegion] = field(default_factory=list)
-    image_tags: Dict[int, str] = field(default_factory=dict)  # 위치 -> 이미지 태그
-
-
-class RTFBinaryProcessor:
-    """
-    RTF 바이너리 데이터 처리기
-
-    RTF 파일에서 바이너리 이미지 데이터를 추출하고,
-    로컬에 저장하여 이미지 태그로 변환합니다.
-    """
-
-    def __init__(
-        self,
-        processed_images: Optional[Set[str]] = None,
-        image_processor: ImageProcessor = None
-    ):
-        """
-        Args:
-            processed_images: 이미 처리된 이미지 해시 집합 (중복 방지)
-            image_processor: 이미지 처리기
-        """
-        self.processed_images = processed_images if processed_images is not None else set()
-        self.image_processor = image_processor
-        self.binary_regions: List[RTFBinaryRegion] = []
-        self.image_tags: Dict[int, str] = {}
-
-    def process(self, content: bytes) -> RTFBinaryProcessResult:
-        """
-        RTF 바이너리 콘텐츠를 처리합니다.
-
-        bin 태그의 바이너리 데이터를 스킵하고,
-        pict 그룹의 이미지를 추출하여 로컬에 저장합니다.
-
-        Args:
-            content: RTF 파일 바이너리 콘텐츠
-
-        Returns:
-            처리 결과 (정제된 콘텐츠, 바이너리 영역 정보, 이미지 태그)
-        """
-        self.binary_regions = []
-        self.image_tags = {}
-
-        # 1단계: \bin 태그 위치 및 크기 파악
-        bin_regions = self._find_bin_regions(content)
-
-        # 2단계: \pict 그룹에서 이미지 추출 (bin 영역 외부)
-        pict_regions = self._find_pict_regions(content, bin_regions)
-
-        # 3단계: 바이너리 영역 통합 및 정렬
-        all_regions = bin_regions + pict_regions
-        all_regions.sort(key=lambda r: r.start_pos)
-        self.binary_regions = all_regions
-
-        # 4단계: 이미지 추출 및 로컬 저장
-        self._process_images()
-
-        # 5단계: 바이너리 데이터를 제거한 콘텐츠 생성
-        clean_content = self._remove_binary_data(content)
-
-        return RTFBinaryProcessResult(
-            clean_content=clean_content,
-            binary_regions=self.binary_regions,
-            image_tags=self.image_tags
-        )
-
-    def _find_bin_regions(self, content: bytes) -> List[RTFBinaryRegion]:
-        """
-        \binN 태그를 찾아 바이너리 영역을 식별합니다.
-
-        RTF 스펙에서 binN은 N 바이트의 raw 바이너리 데이터가 뒤따름을 의미합니다.
-        이 데이터는 문자열 디코딩 시 깨지므로 건너뛰어야 합니다.
-
-        중요: bin을 포함하는 상위 shppict 그룹 전체를 제거 영역으로 설정합니다.
-
-        Args:
-            content: RTF 바이너리 콘텐츠
-
-        Returns:
-            바이너리 영역 리스트
-        """
-        regions = []
-
-        # \bin 패턴 찾기: \binN (N은 바이트 수)
-        # RTF에서 \bin 다음의 숫자가 바이트 수를 나타냄
-        pattern = rb'\\bin(\d+)'
-
-        for match in re.finditer(pattern, content):
-            try:
-                bin_size = int(match.group(1))
-                bin_tag_start = match.start()
-                bin_tag_end = match.end()
-
-                # \bin 태그 다음에 공백이 있을 수 있음
-                data_start = bin_tag_end
-                if data_start < len(content) and content[data_start:data_start+1] == b' ':
-                    data_start += 1
-
-                data_end = data_start + bin_size
-
-                if data_end <= len(content):
-                    # 바이너리 데이터 추출
-                    binary_data = content[data_start:data_end]
-
-                    # 이미지 형식 감지
-                    image_format = self._detect_image_format(binary_data)
-
-                    # 상위 \shppict 그룹 찾기
-                    # \bin 위치에서 역방향으로 {\*\shppict 또는 {\\shppict 찾기
-                    group_start = bin_tag_start
-                    group_end = data_end
-
-                    # 역방향으로 \shppict 검색 (최대 500바이트 뒤로)
-                    search_start = max(0, bin_tag_start - 500)
-                    search_area = content[search_start:bin_tag_start]
-
-                    # \shppict 찾기
-                    shppict_pos = search_area.rfind(b'\\shppict')
-                    if shppict_pos != -1:
-                        # 그룹 시작 { 찾기
-                        abs_pos = search_start + shppict_pos
-                        brace_pos = abs_pos
-                        while brace_pos > 0 and content[brace_pos:brace_pos+1] != b'{':
-                            brace_pos -= 1
-                        group_start = brace_pos
-
-                        # 그룹 끝 } 찾기 (바이너리 데이터 이후)
-                        depth = 1
-                        j = data_end
-                        while j < len(content) and depth > 0:
-                            if content[j:j+1] == b'{':
-                                depth += 1
-                            elif content[j:j+1] == b'}':
-                                depth -= 1
-                            j += 1
-                        group_end = j
-
-                    region = RTFBinaryRegion(
-                        start_pos=group_start,
-                        end_pos=group_end,
-                        bin_type="bin",
-                        data_size=bin_size,
-                        image_format=image_format,
-                        image_data=binary_data
-                    )
-                    regions.append(region)
-
-                    logger.debug(
-                        f"Found \\bin region: group_pos={group_start}-{group_end}, "
-                        f"bin_pos={bin_tag_start}, size={bin_size}, "
-                        f"format={image_format or 'unknown'}"
-                    )
-
-            except (ValueError, IndexError) as e:
-                logger.debug(f"Error parsing \\bin tag: {e}")
-                continue
-
-        logger.info(f"Found {len(regions)} \\bin regions in RTF")
-        return regions
-
-    def _find_pict_regions(
-        self,
-        content: bytes,
-        exclude_regions: List[RTFBinaryRegion]
-    ) -> List[RTFBinaryRegion]:
-        """
-        pict 그룹에서 16진수 인코딩된 이미지를 찾습니다.
-
-        주의: pict 그룹이 bin 태그를 포함하는 경우는 이미 _find_bin_regions에서
-        처리되었으므로 여기서는 스킵합니다.
-
-        RTF 이미지 인코딩 방식:
-        1. \bin 태그: 직접 바이너리 데이터 (이미 처리됨)
-        2. 16진수: \pict ... [hex data] } 형태
-
-        Args:
-            content: RTF 바이너리 콘텐츠
-            exclude_regions: 제외할 영역 (이미 처리된 \bin 영역)
-
-        Returns:
-            pict 이미지 영역 리스트 (16진수 인코딩된 것만)
-        """
-        regions = []
-
-        # \bin 태그 위치 집합 생성 (근처에 \bin이 있는 \pict는 스킵)
-        bin_tag_positions = set()
-        for region in exclude_regions:
-            if region.bin_type == "bin":
-                bin_tag_positions.add(region.start_pos)
-
-        # 제외 영역을 빠르게 체크하기 위한 집합 생성
-        excluded_ranges = [(r.start_pos, r.end_pos) for r in exclude_regions]
-
-        def is_excluded(pos: int) -> bool:
-            """주어진 위치가 제외 영역에 포함되는지 확인"""
-            for start, end in excluded_ranges:
-                if start <= pos < end:
-                    return True
-            return False
-
-        def has_bin_nearby(pict_pos: int, search_range: int = 200) -> bool:
-            """
-            pict 근처에 bin 태그가 있는지 확인.
-            pict 그룹이 bin 태그를 포함하면 True 반환.
-            """
-            # \pict 위치부터 search_range 내에 \bin 태그가 있는지 확인
-            for bin_pos in bin_tag_positions:
-                if pict_pos < bin_pos < pict_pos + search_range:
-                    return True
-            return False
-
-        try:
-            text_content = content.decode('cp1252', errors='replace')
-
-            # \pict 그룹 찾기
-            # 패턴: \pict\jpegblip... [hex data]}
-            pict_start_pattern = r'\\pict\s*((?:\\[a-zA-Z]+\d*\s*)*)'
-
-            for match in re.finditer(pict_start_pattern, text_content):
-                start_pos = match.start()
-
-                # 제외 영역인지 확인
-                if is_excluded(start_pos):
-                    continue
-
-                # 근처에 \bin 태그가 있으면 스킵 (이미 처리됨)
-                if has_bin_nearby(start_pos):
-                    logger.debug(f"Skipping \\pict at {start_pos} - has \\bin tag nearby")
-                    continue
-
-                attrs = match.group(1)
-
-                # 이미지 타입 확인
-                image_format = ""
-                for rtf_type, fmt in RTF_IMAGE_TYPES.items():
-                    if rtf_type in attrs:
-                        image_format = fmt
-                        break
-
-                # 16진수 데이터 추출
-                # \pict 속성들 다음에 16진수 데이터가 옴
-                hex_start = match.end()
-                hex_data = []
-                i = hex_start
-
-                while i < len(text_content):
-                    ch = text_content[i]
-                    if ch in '0123456789abcdefABCDEF':
-                        hex_data.append(ch)
-                    elif ch in ' \t\r\n':
-                        pass  # 공백 무시
-                    elif ch == '}':
-                        break  # 그룹 끝
-                    elif ch == '\\':
-                        # \bin 태그 확인
-                        if text_content[i:i+4] == '\\bin':
-                            # \bin 태그가 있으면 이 \pict는 스킵
-                            logger.debug(f"Skipping \\pict at {start_pos} - contains \\bin tag")
-                            hex_data = []  # 데이터 버리기
-                            break
-                        # 다른 제어 워드까지 스킵
-                        while i < len(text_content) and text_content[i] not in ' \t\r\n}':
-                            i += 1
-                        continue
-                    else:
-                        break
-                    i += 1
-
-                hex_str = ''.join(hex_data)
-
-                # 충분한 16진수 데이터가 있는 경우만 처리
-                if len(hex_str) >= 32:  # 최소 16바이트 이상
-                    try:
-                        image_data = bytes.fromhex(hex_str)
-
-                        # 이미지 형식이 없으면 데이터에서 감지
-                        if not image_format:
-                            image_format = self._detect_image_format(image_data)
-
-                        # 유효한 이미지인지 확인
-                        if image_format:
-                            region = RTFBinaryRegion(
-                                start_pos=start_pos,
-                                end_pos=i,
-                                bin_type="pict",
-                                data_size=len(image_data),
-                                image_format=image_format,
-                                image_data=image_data
-                            )
-                            regions.append(region)
-
-                            logger.debug(
-                                f"Found \\pict region (hex): pos={start_pos}, "
-                                f"hex_len={len(hex_str)}, format={image_format}"
-                            )
-                    except ValueError as e:
-                        logger.debug(f"Failed to decode hex data at {start_pos}: {e}")
-
-        except Exception as e:
-            logger.warning(f"Error finding \\pict regions: {e}")
-
-        logger.info(f"Found {len(regions)} \\pict regions (hex-encoded) in RTF")
-        return regions
-
-    def _detect_image_format(self, data: bytes) -> str:
-        """
-        바이너리 데이터의 이미지 형식을 감지합니다.
-
-        Args:
-            data: 이미지 바이너리 데이터
-
-        Returns:
-            이미지 형식 문자열 (jpeg, png, wmf 등) 또는 빈 문자열
-        """
-        if not data or len(data) < 4:
-            return ""
-
-        for signature, format_name in IMAGE_SIGNATURES.items():
-            if data.startswith(signature):
-                return format_name
-
-        # JPEG 확장 체크 (EXIF 헤더 등)
-        if len(data) >= 3:
-            if data[0:2] == b'\xff\xd8':
-                return 'jpeg'
-
-        return ""
-
-    def _process_images(self) -> None:
-        """
-        추출된 이미지를 로컬에 저장하고 태그를 생성합니다.
-        """
-        for region in self.binary_regions:
-            if not region.image_data:
-                continue
-
-            # 지원 가능한 이미지 형식인지 확인
-            # WMF, EMF는 PIL에서 지원하지 않을 수 있음
-            supported_formats = {'jpeg', 'png', 'gif', 'bmp'}
-
-            if region.image_format in supported_formats:
-                image_tag = self.image_processor.save_image(region.image_data)
-
-                if image_tag:
-                    self.image_tags[region.start_pos] = f"\n{image_tag}\n"
-                    logger.info(
-                        f"Saved RTF image locally: {image_tag} "
-                        f"(format={region.image_format}, size={region.data_size})"
-                    )
-                else:
-                    # 저장 실패 시 빈 태그 (무시됨)
-                    self.image_tags[region.start_pos] = ""
-                    logger.warning(f"Image save failed, removing (pos={region.start_pos})")
-            else:
-                # WMF, EMF 등 미지원 형식은 플레이스홀더
-                if region.image_format:
-                    logger.debug(
-                        f"Skipping unsupported image format: {region.image_format}"
-                    )
-                self.image_tags[region.start_pos] = ""  # 빈 태그 (무시)
-
-    def _remove_binary_data(self, content: bytes) -> bytes:
-        """
-        바이너리 데이터 영역을 제거한 콘텐츠를 생성합니다.
-
-        \bin 태그와 바이너리 데이터를 이미지 태그로 치환하거나 제거합니다.
-
-        Args:
-            content: 원본 RTF 바이너리 콘텐츠
-
-        Returns:
-            정제된 콘텐츠
-        """
-        if not self.binary_regions:
-            return content
-
-        # 영역을 역순으로 정렬하여 뒤에서부터 치환 (위치 변경 방지)
-        sorted_regions = sorted(self.binary_regions, key=lambda r: r.start_pos, reverse=True)
-
-        result = bytearray(content)
-
-        for region in sorted_regions:
-            # 해당 영역을 빈 바이트로 치환 (완전히 제거)
-            # 이미지 태그는 나중에 텍스트 레벨에서 삽입
-            replacement = b''
-
-            # 이미지 태그가 있으면 마커 삽입 (나중에 텍스트 처리 시 사용)
-            if region.start_pos in self.image_tags:
-                tag = self.image_tags[region.start_pos]
-                if tag:
-                    # 이미지 태그를 마커로 삽입 (ASCII 안전)
-                    replacement = tag.encode('ascii', errors='replace')
-
-            result[region.start_pos:region.end_pos] = replacement
-
-        return bytes(result)
-
-    def get_image_tag(self, position: int) -> str:
-        """
-        특정 위치의 이미지 태그를 반환합니다.
-
-        Args:
-            position: RTF 내 위치
-
-        Returns:
-            이미지 태그 문자열 또는 빈 문자열
-        """
-        return self.image_tags.get(position, "")
-
-
-def preprocess_rtf_binary(
-    content: bytes,
-    processed_images: Optional[Set[str]] = None,
-    image_processor: ImageProcessor = None
-) -> Tuple[bytes, Dict[int, str]]:
-    """
-    RTF 콘텐츠에서 바이너리 데이터를 전처리합니다.
-
-    \bin 태그의 바이너리 데이터를 제거하고,
-    이미지는 로컬에 저장하여 태그로 변환합니다.
-
-    이 함수는 RTF 파서 전에 호출하여 바이너리 데이터로 인한
-    텍스트 깨짐을 방지합니다.
-
-    Args:
-        content: RTF 파일 바이너리 콘텐츠
-        processed_images: 처리된 이미지 해시 집합 (optional)
-        image_processor: 이미지 처리기
-
-    Returns:
-        (정제된 콘텐츠, 위치->이미지태그 딕셔너리) 튜플
-
-    Example:
-        >>> with open('file.rtf', 'rb') as f:
-        ...     raw_content = f.read()
-        >>> clean_content, image_tags = preprocess_rtf_binary(raw_content)
-        >>> # 이후 RTF 파서에 clean_content 전달
-    """
-    processor = RTFBinaryProcessor(processed_images, image_processor)
-    result = processor.process(content)
-    return result.clean_content, result.image_tags
-
-
-def extract_rtf_images(
-    content: bytes,
-    processed_images: Optional[Set[str]] = None,
-    image_processor: ImageProcessor = None
-) -> List[str]:
-    """
-    RTF 콘텐츠에서 모든 이미지를 추출하여 로컬에 저장합니다.
-
-    Args:
-        content: RTF 파일 바이너리 콘텐츠
-        processed_images: 처리된 이미지 해시 집합 (optional)
-        image_processor: 이미지 처리기
-
-    Returns:
-        이미지 태그 리스트 (예: ["[image:bucket/uploads/hash.png]", ...])
-    """
-    processor = RTFBinaryProcessor(processed_images, image_processor)
-    result = processor.process(content)
-
-    # 위치순으로 정렬된 이미지 태그 반환
-    sorted_tags = sorted(result.image_tags.items(), key=lambda x: x[0])
-    return [tag for pos, tag in sorted_tags if tag]
diff --git a/contextifier/core/processor/doc_helpers/rtf_constants.py b/contextifier/core/processor/doc_helpers/rtf_constants.py
deleted file mode 100644
index c7a7516..0000000
--- a/contextifier/core/processor/doc_helpers/rtf_constants.py
+++ /dev/null
@@ -1,60 +0,0 @@
-# service/document_processor/processor/doc_helpers/rtf_constants.py
-"""
-RTF Parser 상수 정의
-
-RTF 파싱에 사용되는 상수들을 정의합니다.
-"""
-
-# Shape 속성 이름들 (\sn으로 시작하는 속성들) - 텍스트에서 제거해야 함
-SHAPE_PROPERTY_NAMES = {
-    'shapeType', 'fFlipH', 'fFlipV', 'txflTextFlow', 'fFilled', 'fLine',
-    'dxTextLeft', 'dxTextRight', 'dyTextTop', 'dyTextBottom',
-    'posrelh', 'posrelv', 'fBehindDocument', 'fLayoutInCell', 'fAllowOverlap',
-    'fillColor', 'fillBackColor', 'fNoFillHitTest', 'lineColor', 'lineWidth',
-    'posh', 'posv', 'fLockAnchor', 'fLockPosition', 'fLockAspectRatio',
-    'fLockRotation', 'fLockCropping', 'fLockAgainstGrouping', 'fNoLineDrawDash',
-    'wzName', 'wzDescription', 'pWrapPolygonVertices', 'dxWrapDistLeft',
-    'dxWrapDistRight', 'dyWrapDistTop', 'dyWrapDistBottom', 'lidRegroup',
-    'fEditedWrap', 'fBehindDocument', 'fOnDblClickNotify', 'fIsButton',
-    'fOneD', 'fHidden', 'fPrint', 'geoLeft', 'geoTop', 'geoRight', 'geoBottom',
-    'shapePath', 'pSegmentInfo', 'pVertices', 'fFillOK', 'fFillShadeShapeOK',
-    'fGtextOK', 'fLineOK', 'f3DOK', 'fShadowOK', 'fArrowheadsOK',
-}
-
-# 제외할 destination 키워드들 (본문이 아닌 영역)
-EXCLUDE_DESTINATION_KEYWORDS = [
-    r'\\header(?:f|l|r)?\b',      # 헤더
-    r'\\footer(?:f|l|r)?\b',      # 푸터
-    r'\\footnote\b',               # 각주
-    r'\\ftnsep\b', r'\\ftnsepc\b',  # 각주 구분선
-    r'\\aftncn\b', r'\\aftnsep\b', r'\\aftnsepc\b',  # 미주
-    r'\\pntext\b', r'\\pntxta\b', r'\\pntxtb\b',  # 번호 매기기
-]
-
-# 제거할 destination 패턴들
-SKIP_DESTINATIONS = [
-    'themedata', 'colorschememapping', 'latentstyles', 'datastore',
-    'xmlnstbl', 'wgrffmtfilter', 'generator', 'mmathPr', 'xmlopen',
-    'background', 'pgptbl', 'listpicture', 'pnseclvl', 'revtbl',
-    'bkmkstart', 'bkmkend', 'fldinst', 'objdata', 'objclass',
-    'objemb', 'result', 'category', 'comment', 'company', 'creatim',
-    'doccomm', 'hlinkbase', 'keywords', 'manager', 'operator',
-    'revtim', 'subject', 'title', 'userprops',
-    'nonshppict', 'blipuid', 'picprop',
-]
-
-# 이미지 관련 destination
-IMAGE_DESTINATIONS = ['shppict']
-
-# 코드 페이지 -> 인코딩 매핑
-CODEPAGE_ENCODING_MAP = {
-    949: 'cp949',
-    932: 'cp932',
-    936: 'gb2312',
-    950: 'big5',
-    1252: 'cp1252',
-    65001: 'utf-8',
-}
-
-# 기본 인코딩 시도 순서
-DEFAULT_ENCODINGS = ['cp949', 'utf-8', 'cp1252', 'latin-1']
diff --git a/contextifier/core/processor/doc_helpers/rtf_metadata_extractor.py b/contextifier/core/processor/doc_helpers/rtf_metadata_extractor.py
deleted file mode 100644
index b456e8f..0000000
--- a/contextifier/core/processor/doc_helpers/rtf_metadata_extractor.py
+++ /dev/null
@@ -1,78 +0,0 @@
-# service/document_processor/processor/doc_helpers/rtf_metadata_extractor.py
-"""
-RTF 메타데이터 추출기
-
-RTF 문서에서 메타데이터를 추출하는 기능을 제공합니다.
-"""
-import logging
-import re
-from datetime import datetime
-from typing import Any, Dict
-
-from contextifier.core.processor.doc_helpers.rtf_decoder import (
-    decode_hex_escapes,
-)
-from contextifier.core.processor.doc_helpers.rtf_text_cleaner import (
-    clean_rtf_text,
-)
-
-logger = logging.getLogger("document-processor")
-
-
-def extract_metadata(content: str, encoding: str = "cp949") -> Dict[str, Any]:
-    """
-    RTF 콘텐츠에서 메타데이터를 추출합니다.
-
-    Args:
-        content: RTF 문자열 콘텐츠
-        encoding: 사용할 인코딩
-
-    Returns:
-        메타데이터 딕셔너리
-    """
-    metadata = {}
-
-    # \info 그룹 찾기
-    info_match = re.search(r'\\info\s*\{([^}]*(?:\{[^}]*\}[^}]*)*)\}', content)
-    if info_match:
-        info_content = info_match.group(1)
-
-        # 각 메타데이터 필드 추출
-        field_patterns = {
-            'title': r'\\title\s*\{([^}]*)\}',
-            'subject': r'\\subject\s*\{([^}]*)\}',
-            'author': r'\\author\s*\{([^}]*)\}',
-            'keywords': r'\\keywords\s*\{([^}]*)\}',
-            'comments': r'\\doccomm\s*\{([^}]*)\}',
-            'last_saved_by': r'\\operator\s*\{([^}]*)\}',
-        }
-
-        for key, pattern in field_patterns.items():
-            match = re.search(pattern, info_content)
-            if match:
-                value = decode_hex_escapes(match.group(1), encoding)
-                value = clean_rtf_text(value, encoding)
-                if value:
-                    metadata[key] = value
-
-        # 날짜 추출
-        date_patterns = {
-            'create_time': r'\\creatim\\yr(\d+)\\mo(\d+)\\dy(\d+)(?:\\hr(\d+))?(?:\\min(\d+))?',
-            'last_saved_time': r'\\revtim\\yr(\d+)\\mo(\d+)\\dy(\d+)(?:\\hr(\d+))?(?:\\min(\d+))?',
-        }
-
-        for key, pattern in date_patterns.items():
-            match = re.search(pattern, content)
-            if match:
-                try:
-                    year = int(match.group(1))
-                    month = int(match.group(2))
-                    day = int(match.group(3))
-                    hour = int(match.group(4)) if match.group(4) else 0
-                    minute = int(match.group(5)) if match.group(5) else 0
-                    metadata[key] = datetime(year, month, day, hour, minute)
-                except (ValueError, TypeError):
-                    pass
-
-    logger.debug(f"Extracted metadata: {list(metadata.keys())}")
-    return metadata
diff --git a/contextifier/core/processor/doc_helpers/rtf_models.py b/contextifier/core/processor/doc_helpers/rtf_models.py
deleted file mode 100644
index 0a0ee86..0000000
--- a/contextifier/core/processor/doc_helpers/rtf_models.py
+++ /dev/null
@@ -1,364 +0,0 @@
-# service/document_processor/processor/doc_helpers/rtf_models.py
-"""
-RTF Parser 데이터 모델
-
-RTF 파싱에 사용되는 데이터 클래스들을 정의합니다.
-"""
-import re
-from dataclasses import dataclass, field
-from typing import Any, Dict, List, Optional, NamedTuple, Tuple
-
-
-class RTFCellInfo(NamedTuple):
-    """RTF 셀 정보 (병합 정보 포함)"""
-    text: str           # 셀 텍스트
-    h_merge_first: bool  # 수평 병합 첫 번째 셀 (clmgf)
-    h_merge_cont: bool   # 수평 병합 연속 셀 (clmrg)
-    v_merge_first: bool  # 수직 병합 첫 번째 셀 (clvmgf)
-    v_merge_cont: bool   # 수직 병합 연속 셀 (clvmrg)
-    right_boundary: int  # 셀 오른쪽 경계 (twips)
-
-
-@dataclass
-class RTFTable:
-    """RTF 테이블 구조 (병합 셀 지원)"""
-    rows: List[List[RTFCellInfo]] = field(default_factory=list)
-    col_count: int = 0
-    position: int = 0  # 문서 내 시작 위치
-    end_position: int = 0  # 문서 내 종료 위치
-    _logical_cells: List[List[Optional[RTFCellInfo]]] = field(default_factory=list, repr=False)
-
-    def get_effective_col_count(self) -> int:
-        """
-        실제 유효한 열 수를 계산합니다.
-        빈 셀만 있는 열은 제외합니다.
-
-        Returns:
-            실제 내용이 있는 최대 열 수
-        """
-        if not self.rows:
-            return 0
-
-        effective_counts = []
-        for row in self.rows:
-            # 빈 셀과 병합된 셀을 제외한 유효 셀 수 계산
-            non_empty_cells = []
-            for i, cell in enumerate(row):
-                # 병합으로 건너뛰는 셀 제외
-                if cell.h_merge_cont:
-                    continue
-                # 내용이 있거나 수직 병합 시작인 경우 유효
-                if cell.text.strip() or cell.v_merge_first:
-                    non_empty_cells.append(i)
-
-            if non_empty_cells:
-                # 마지막 유효 셀의 인덱스 + 1
-                effective_counts.append(max(non_empty_cells) + 1)
-
-        return max(effective_counts) if effective_counts else 0
-
-    def is_real_table(self) -> bool:
-        """
-        실제 테이블인지 판단합니다.
-
-        n rows × 1 column 형태는 테이블이 아닌 단순 리스트로 간주합니다.
-        빈 셀만 있는 열은 열 수에서 제외합니다.
-
-        Returns:
-            True if 실제 테이블 (유효 열이 2개 이상), False otherwise
-        """
-        if not self.rows:
-            return False
-
-        # 유효 열 수로 판단
-        effective_cols = self.get_effective_col_count()
-        return effective_cols >= 2
-
-    def _calculate_merge_info(self) -> List[List[Tuple[int, int]]]:
-        """
-        각 셀의 colspan, rowspan을 계산합니다.
-
-        RTF의 병합 처리:
-        1. 명시적 병합 플래그 (clmgf/clmrg, clvmgf/clvmrg) 사용
-        2. 열 경계(cellx) 값을 기반으로 암시적 colspan 계산
-           - 테이블 전체의 고유 열 경계를 수집
-           - 각 행의 셀이 몇 개의 논리적 열을 차지하는지 계산
-
-        Returns:
-            각 셀별 (colspan, rowspan) 정보 2D 리스트
-            (0, 0)은 이 셀이 다른 셀에 병합되어 건너뛰어야 함을 의미
-        """
-        if not self.rows:
-            return []
-
-        num_rows = len(self.rows)
-
-        # 1단계: 전체 테이블의 고유 열 경계 수집
-        all_boundaries = set()
-        for row in self.rows:
-            for cell in row:
-                if cell.right_boundary > 0:
-                    all_boundaries.add(cell.right_boundary)
-
-        # 정렬된 열 경계 리스트
-        sorted_boundaries = sorted(all_boundaries)
-        total_logical_cols = len(sorted_boundaries)
-
-        if total_logical_cols == 0:
-            # 열 경계 정보가 없으면 기본 처리
-            max_cols = max(len(row) for row in self.rows) if self.rows else 0
-            return [[(1, 1) for _ in range(max_cols)] for _ in range(num_rows)]
-
-        # 경계값 -> 논리적 열 인덱스 매핑
-        boundary_to_col = {b: i for i, b in enumerate(sorted_boundaries)}
-
-        # 2단계: 각 행별로 셀의 colspan 계산
-        # merge_info[row][logical_col] = (colspan, rowspan) 또는 (0, 0)
-        merge_info = [[None for _ in range(total_logical_cols)] for _ in range(num_rows)]
-
-        for row_idx, row in enumerate(self.rows):
-            prev_boundary = 0
-            for cell in row:
-                if cell.right_boundary <= 0:
-                    continue
-
-                # 이 셀이 차지하는 논리적 열 범위 계산
-                start_col = 0
-                for i, b in enumerate(sorted_boundaries):
-                    if b <= prev_boundary:
-                        start_col = i + 1
-                    else:
-                        break
-
-                end_col = boundary_to_col[cell.right_boundary]
-                colspan = end_col - start_col + 1
-
-                if colspan <= 0:
-                    colspan = 1
-
-                # 시작 열에 셀 정보 기록
-                if start_col < total_logical_cols:
-                    merge_info[row_idx][start_col] = (colspan, 1, cell)
-                    # 병합된 열들은 (0, 0)으로 표시
-                    for col in range(start_col + 1, start_col + colspan):
-                        if col < total_logical_cols:
-                            merge_info[row_idx][col] = (0, 0, None)
-
-                prev_boundary = cell.right_boundary
-
-        # 3단계: 수직 병합 (rowspan) 처리
-        for col_idx in range(total_logical_cols):
-            row_idx = 0
-            while row_idx < num_rows:
-                info = merge_info[row_idx][col_idx]
-                if info is None or len(info) < 3 or info[2] is None:
-                    row_idx += 1
-                    continue
-
-                colspan, _, cell = info
-                if colspan == 0:
-                    row_idx += 1
-                    continue
-
-                if cell.v_merge_first:
-                    # 수직 병합 시작
-                    rowspan = 1
-                    for next_row in range(row_idx + 1, num_rows):
-                        next_info = merge_info[next_row][col_idx]
-                        if next_info is None or len(next_info) < 3 or next_info[2] is None:
-                            break
-                        _, _, next_cell = next_info
-                        if next_cell.v_merge_cont:
-                            rowspan += 1
-                            merge_info[next_row][col_idx] = (0, 0, None)
-                        else:
-                            break
-
-                    merge_info[row_idx][col_idx] = (colspan, rowspan, cell)
-                    row_idx += rowspan
-                elif cell.v_merge_cont:
-                    merge_info[row_idx][col_idx] = (0, 0, None)
-                    row_idx += 1
-                else:
-                    row_idx += 1
-
-        # 4단계: 최종 결과 (colspan, rowspan)만 반환
-        result = []
-        for row_idx in range(num_rows):
-            row_result = []
-            for col_idx in range(total_logical_cols):
-                info = merge_info[row_idx][col_idx]
-                if info is None:
-                    row_result.append((1, 1))
-                elif len(info) >= 2:
-                    row_result.append((info[0], info[1]))
-                else:
-                    row_result.append((1, 1))
-            result.append(row_result)
-
-        # 실제 셀 데이터도 저장 (to_html에서 사용)
-        self._logical_cells = []
-        for row_idx in range(num_rows):
-            row_cells = []
-            for col_idx in range(total_logical_cols):
-                info = merge_info[row_idx][col_idx]
-                if info is not None and len(info) >= 3 and info[2] is not None:
-                    row_cells.append(info[2])
-                else:
-                    row_cells.append(None)
-            self._logical_cells.append(row_cells)
-
-        return result
-
-    def to_html(self) -> str:
-        """테이블을 HTML로 변환 (병합 셀 지원)"""
-        if not self.rows:
-            return ""
-
-        merge_info = self._calculate_merge_info()
-
-        # _logical_cells가 없으면 기존 방식 사용
-        if not hasattr(self, '_logical_cells') or not self._logical_cells:
-            return self._to_html_legacy(merge_info)
-
-        html_parts = ['<table border="1">']
-
-        for row_idx, row_merge in enumerate(merge_info):
-            html_parts.append('<tr>')
-
-            for col_idx, (colspan, rowspan) in enumerate(row_merge):
-                if colspan == 0 or rowspan == 0:
-                    continue
-
-                cell = self._logical_cells[row_idx][col_idx] if col_idx < len(self._logical_cells[row_idx]) else None
-                cell_text = cell.text if cell and cell.text else ''
-
-                attrs = []
-                if colspan > 1:
-                    attrs.append(f'colspan="{colspan}"')
-                if rowspan > 1:
-                    attrs.append(f'rowspan="{rowspan}"')
-
-                attr_str = ' ' + ' '.join(attrs) if attrs else ''
-                html_parts.append(f'<td{attr_str}>{cell_text}</td>')
-
-            html_parts.append('</tr>')
-
-        html_parts.append('</table>')
-        return '\n'.join(html_parts)
-
-    def _to_html_legacy(self, merge_info: List[List[Tuple[int, int]]]) -> str:
-        """기존 HTML 변환 (열 경계 정보 없을 때)"""
-        html_parts = ['<table border="1">']
-
-        for row_idx, row in enumerate(self.rows):
-            html_parts.append('<tr>')
-
-            for col_idx, cell in enumerate(row):
-                # 병합 정보 확인
-                if col_idx < len(merge_info[row_idx]) and merge_info[row_idx][col_idx]:
-                    colspan, rowspan = merge_info[row_idx][col_idx]
-
-                    if colspan == 0 and rowspan == 0:
-                        # 이 셀은 다른 셀에 병합됨 - 건너뜀
-                        continue
-
-                    # 셀 내용 정리
-                    cell_text = re.sub(r'\s+', ' ', cell.text).strip()
-
-                    # 속성 생성
-                    attrs = []
-                    if colspan > 1:
-                        attrs.append(f'colspan="{colspan}"')
-                    if rowspan > 1:
-                        attrs.append(f'rowspan="{rowspan}"')
-
-                    attr_str = ' ' + ' '.join(attrs) if attrs else ''
-                    html_parts.append(f'<td{attr_str}>{cell_text}</td>')
-                else:
-                    # 병합 정보 없음 - 일반 셀
-                    cell_text = re.sub(r'\s+', ' ', cell.text).strip()
-                    html_parts.append(f'<td>{cell_text}</td>')
-
-            html_parts.append('</tr>')
-
-        html_parts.append('</table>')
-        return '\n'.join(html_parts)
-
-    def to_text_list(self) -> str:
-        """
-        1열 테이블을 텍스트 리스트로 변환합니다.
-
-        - 1×1 테이블: 셀 내용만 반환 (컨테이너 테이블)
-        - n×1 테이블: 각 행을 빈 줄로 구분하여 반환
-
-        Returns:
-            텍스트 형식의 문자열
-        """
-        if not self.rows:
-            return ""
-
-        # 1×1 테이블: 셀 내용만 반환 (컨테이너 테이블)
-        if len(self.rows) == 1 and len(self.rows[0]) == 1:
-            return self.rows[0][0].text
-
-        lines = []
-        for row in self.rows:
-            if row:
-                # 첫 번째 셀만 사용 (1열 테이블)
-                cell_text = row[0].text
-                if cell_text:
-                    lines.append(cell_text)
-
-        # 빈 줄로 구분
-        return '\n\n'.join(lines)
-
-
-@dataclass
-class RTFContentPart:
-    """문서 내 콘텐츠 조각 (텍스트 또는 테이블)"""
-    content_type: str  # "text" 또는 "table"
-    position: int      # 원본 문서 내 위치
-    text: str = ""     # content_type이 "text"인 경우
-    table: Optional['RTFTable'] = None  # content_type이 "table"인 경우
-
-
-@dataclass
-class RTFDocument:
-    """RTF 문서 구조"""
-    text_content: str = ""
-    tables: List[RTFTable] = field(default_factory=list)
-    metadata: Dict[str, Any] = field(default_factory=dict)
-    images: List[bytes] = field(default_factory=list)
-    image_tags: List[str] = field(default_factory=list)  # v3: 로컬 저장된 이미지 태그
-    encoding: str = "cp949"
-    # v2: 인라인 콘텐츠 - 원래 순서대로 정렬된 콘텐츠 조각들
-    content_parts: List[RTFContentPart] = field(default_factory=list)
-
-    def get_inline_content(self) -> str:
-        """
-        테이블이 원래 위치에 인라인으로 배치된 전체 콘텐츠를 반환합니다.
-
-        Returns:
-            인라인 배치된 전체 텍스트
-        """
-        if not self.content_parts:
-            # 호환성: content_parts가 없으면 기존 방식으로 반환
-            return self.text_content
-
-        # 위치순 정렬
-        sorted_parts = sorted(self.content_parts, key=lambda p: p.position)
-
-        result_parts = []
-        for part in sorted_parts:
-            if part.content_type == "text" and part.text.strip():
-                result_parts.append(part.text)
-            elif part.content_type == "table" and part.table:
-                if part.table.is_real_table():
-                    result_parts.append(part.table.to_html())
-                else:
-                    text_list = part.table.to_text_list()
-                    if text_list:
-                        result_parts.append(text_list)
-
-        return '\n\n'.join(result_parts)
diff --git a/contextifier/core/processor/doc_helpers/rtf_parser.py b/contextifier/core/processor/doc_helpers/rtf_parser.py
deleted file mode 100644
index 56a047f..0000000
--- a/contextifier/core/processor/doc_helpers/rtf_parser.py
+++ /dev/null
@@ -1,200 +0,0 @@
-# service/document_processor/processor/doc_helpers/rtf_parser.py
-"""
-RTF Parser - RTF 파일 바이너리 직접 파싱 (리팩터링 버전)
-
-LibreOffice 없이 RTF 파일을 직접 분석하여:
-- 텍스트 추출 (원래 위치 유지)
-- 테이블을 HTML로 변환 (인라인 배치)
-- 병합 셀 처리 (clmgf/clmrg/clvmgf/clvmrg)
-- 메타데이터 추출
-- 이미지 추출
-
-RTF 1.5+ 스펙 기반 구현
-
-이 파일은 기능별로 분리된 모듈들을 조합하여 사용합니다:
-- rtf_constants.py: 상수 정의
-- rtf_models.py: 데이터 모델 (RTFCellInfo, RTFTable, RTFContentPart, RTFDocument)
-- rtf_decoder.py: 인코딩/디코딩 유틸리티
-- rtf_text_cleaner.py: 텍스트 정리 유틸리티
-- rtf_metadata_extractor.py: 메타데이터 추출
-- rtf_table_extractor.py: 테이블 추출/파싱
-- rtf_content_extractor.py: 인라인 콘텐츠 추출
-- rtf_region_finder.py: 제외 영역 탐색
-- rtf_bin_processor.py: 바이너리 전처리
-"""
-import logging
-from typing import Optional, Set
-
-from contextifier.core.functions.img_processor import ImageProcessor
-
-# 모델 임포트 (외부에서 사용할 수 있도록)
-from contextifier.core.processor.doc_helpers.rtf_models import (
-    RTFCellInfo,
-    RTFTable,
-    RTFContentPart,
-    RTFDocument,
-)
-
-# 디코더 임포트
-from contextifier.core.processor.doc_helpers.rtf_decoder import (
-    detect_encoding,
-    decode_content,
-    decode_hex_escapes,
-)
-
-# 텍스트 클리너 임포트
-from contextifier.core.processor.doc_helpers.rtf_text_cleaner import (
-    clean_rtf_text,
-    remove_shprslt_blocks,
-)
-
-# 메타데이터 추출기 임포트
-from contextifier.core.processor.doc_helpers.rtf_metadata_extractor import (
-    extract_metadata,
-)
-
-# 테이블 추출기 임포트
-from contextifier.core.processor.doc_helpers.rtf_table_extractor import (
-    extract_tables_with_positions,
-)
-
-# 콘텐츠 추출기 임포트
-from contextifier.core.processor.doc_helpers.rtf_content_extractor import (
-    extract_inline_content,
-    extract_text_legacy,
-)
-
-# 바이너리 처리기 임포트
-from contextifier.core.processor.doc_helpers.rtf_bin_processor import (
-    preprocess_rtf_binary,
-)
-
-logger = logging.getLogger("document-processor")
-
-
-class RTFParser:
-    """
-    RTF 파일 파서 (리팩터링 버전)
-
-    RTF 바이너리를 직접 파싱하여 텍스트, 테이블, 메타데이터를 추출합니다.
-
-    기능별로 분리된 모듈들을 조합하여 사용합니다.
-    """
-
-    def __init__(
-        self,
-        encoding: str = "cp949",
-        processed_images: Optional[Set[str]] = None,
-        image_processor: ImageProcessor = None
-    ):
-        """
-        Args:
-            encoding: 기본 인코딩 (한글 문서는 보통 cp949)
-            processed_images: 처리된 이미지 해시 집합 (중복 방지)
-            image_processor: 이미지 처리기
-        """
-        self.encoding = encoding
-        self.processed_images = processed_images if processed_images is not None else set()
-        self.image_processor = image_processor
-        self.document = RTFDocument(encoding=encoding)
-
-        # 파싱 상태
-        self._content: str = ""
-        self._raw_content: bytes = b""  # 원본 바이너리
-        self._image_tags = {}  # 위치 -> 이미지 태그
-
-    def parse(self, content: bytes) -> RTFDocument:
-        """
-        RTF 바이너리를 파싱합니다.
-
-        Args:
-            content: RTF 파일 바이트 데이터
-
-        Returns:
-            파싱된 RTFDocument 객체
-        """
-        self._raw_content = content
-
-        # 바이너리 데이터 전처리 (\bin 태그 처리, 이미지 추출)
-        clean_content, self._image_tags = preprocess_rtf_binary(
-            content,
-            processed_images=self.processed_images,
-            image_processor=self.image_processor
-        )
-
-        # 이미지 태그를 문서에 저장 (유효한 태그만)
-        self.document.image_tags = [
-            tag for tag in self._image_tags.values()
-            if tag and tag.strip() and '/uploads/.' not in tag
-        ]
-
-        # 인코딩 감지 및 디코딩
-        self.encoding = detect_encoding(clean_content, self.encoding)
-        self._content = decode_content(clean_content, self.encoding)
-
-        # \shprslt 블록 제거 (중복 콘텐츠 방지)
-        self._content = remove_shprslt_blocks(self._content)
-
-        # 메타데이터 추출
-        self.document.metadata = extract_metadata(self._content, self.encoding)
-
-        # 테이블 추출 (위치 정보 포함)
-        tables, table_regions = extract_tables_with_positions(
-            self._content,
-            self.encoding
-        )
-        self.document.tables = tables
-
-        # 인라인 콘텐츠 추출 (테이블 위치 유지)
-        self.document.content_parts = extract_inline_content(
-            self._content,
-            table_regions,
-            self.encoding
-        )
-
-        # 호환성을 위해 기존 text_content도 설정
-        self.document.text_content = extract_text_legacy(
-            self._content,
-            self.encoding
-        )
-
-        return self.document
-
-
-def parse_rtf(
-    content: bytes,
-    encoding: str = "cp949",
-    processed_images: Optional[Set[str]] = None,
-    image_processor: ImageProcessor = None
-) -> RTFDocument:
-    """
-    RTF 파일을 파싱합니다.
-
-    바이너리 이미지 데이터를 로컬에 저장하고 태그로 변환합니다.
-
-    Args:
-        content: RTF 파일 바이트 데이터
-        encoding: 기본 인코딩
-        processed_images: 처리된 이미지 해시 집합 (중복 방지, optional)
-        image_processor: 이미지 처리기
-
-    Returns:
-        파싱된 RTFDocument 객체
-    """
-    parser = RTFParser(
-        encoding=encoding,
-        processed_images=processed_images,
-        image_processor=image_processor
-    )
-    return parser.parse(content)
-
-
-# 하위 호환성을 위한 re-export
-__all__ = [
-    'RTFParser',
-    'RTFDocument',
-    'RTFTable',
-    'RTFCellInfo',
-    'RTFContentPart',
-    'parse_rtf',
-]
diff --git a/contextifier/core/processor/doc_helpers/rtf_region_finder.py b/contextifier/core/processor/doc_helpers/rtf_region_finder.py
deleted file mode 100644
index 946ade0..0000000
--- a/contextifier/core/processor/doc_helpers/rtf_region_finder.py
+++ /dev/null
@@ -1,121 +0,0 @@
-# service/document_processor/processor/doc_helpers/rtf_region_finder.py
-"""
-RTF 영역 탐색기
-
-RTF 문서에서 제외해야 할 영역(헤더, 푸터, 각주 등)을 찾는 기능을 제공합니다.
-"""
-import logging
-import re
-from typing import List, Tuple
-
-from contextifier.core.processor.doc_helpers.rtf_constants import (
-    EXCLUDE_DESTINATION_KEYWORDS,
-)
-
-logger = logging.getLogger("document-processor")
-
-
-def find_excluded_regions(content: str) -> List[Tuple[int, int]]:
-    r"""
-    문서 본문이 아닌 제외 영역을 찾습니다.
-
-    RTF에서 \header, \footer, \footnote 등의 그룹은 본문이 아니므로
-    테이블 및 텍스트 추출에서 제외해야 합니다.
-
-    주의: RTF 테이블은 \trowd에서 시작하여 \row로 끝나는데,
-    footer/header 그룹이 \trowd만 포함하고 셀 내용과 \row는 그룹 밖에
-    있을 수 있습니다. 따라서 footer/header 그룹 안에서 시작하는 테이블의
-    전체 범위(\row까지)를 제외해야 합니다.
-
-    제외 대상:
-    - \header, \headerf, \headerl, \headerr (헤더)
-    - \footer, \footerf, \footerl, \footerr (푸터)
-    - \footnote, \ftnsep, \ftnsepc, \aftncn, \aftnsep, \aftnsepc (각주)
-    - \pntext, \pntxta, \pntxtb (번호 매기기 텍스트)
-    - 위 그룹 안에서 시작하는 테이블의 전체 범위 (\trowd ~ \row)
-
-    Args:
-        content: RTF 콘텐츠
-
-    Returns:
-        제외 영역 리스트 [(start, end), ...]
-    """
-    excluded_regions = []
-
-    pattern = '|'.join(EXCLUDE_DESTINATION_KEYWORDS)
-
-    for match in re.finditer(pattern, content):
-        keyword_start = match.start()
-        keyword_end = match.end()
-
-        # 이 키워드가 속한 그룹의 시작점('{') 찾기
-        group_start = keyword_start
-        search_back = min(keyword_start, 50)  # 최대 50자 뒤로 검색
-        for i in range(keyword_start - 1, keyword_start - search_back - 1, -1):
-            if i < 0:
-                break
-            if content[i] == '{':
-                group_start = i
-                break
-            elif content[i] == '}':
-                # 다른 그룹이 끝났으면 중단
-                break
-
-        # 그룹의 끝('}') 찾기 - 중첩 괄호 처리
-        depth = 1
-        i = keyword_end
-        while i < len(content) and depth > 0:
-            if content[i] == '{':
-                depth += 1
-            elif content[i] == '}':
-                depth -= 1
-            i += 1
-        group_end = i
-
-        # footer/header 그룹 안에 \trowd가 있으면, \row까지 확장
-        group_content = content[group_start:group_end]
-        if '\\trowd' in group_content:
-            # 이 그룹 끝 이후에 매칭되는 \row 찾기
-            row_match = re.search(r'\\row(?![a-z])', content[group_end:])
-            if row_match:
-                # \row의 끝까지 제외 영역 확장
-                extended_end = group_end + row_match.end()
-                group_end = extended_end
-                logger.debug(f"Extended excluded region to include table row: {group_start}~{group_end}")
-
-        excluded_regions.append((group_start, group_end))
-
-    # 겹치는 영역 병합 및 정렬
-    if not excluded_regions:
-        return []
-
-    excluded_regions.sort(key=lambda x: x[0])
-    merged = [excluded_regions[0]]
-
-    for start, end in excluded_regions[1:]:
-        last_start, last_end = merged[-1]
-        if start <= last_end:
-            # 겹치면 병합
-            merged[-1] = (last_start, max(last_end, end))
-        else:
-            merged.append((start, end))
-
-    logger.debug(f"Found {len(merged)} excluded regions (header/footer/footnote)")
-    return merged
-
-
-def is_in_excluded_region(pos: int, excluded_regions: List[Tuple[int, int]]) -> bool:
-    """
-    주어진 위치가 제외 영역 안에 있는지 확인합니다.
-
-    Args:
-        pos: 확인할 위치
-        excluded_regions: 제외 영역 리스트
-
-    Returns:
-        제외 영역 안에 있으면 True
-    """
-    for start, end in excluded_regions:
-        if start <= pos < end:
-            return True
-    return False
diff --git a/contextifier/core/processor/doc_helpers/rtf_table_extractor.py b/contextifier/core/processor/doc_helpers/rtf_table_extractor.py
deleted file mode 100644
index 27de72d..0000000
--- a/contextifier/core/processor/doc_helpers/rtf_table_extractor.py
+++ /dev/null
@@ -1,307 +0,0 @@
-# service/document_processor/processor/doc_helpers/rtf_table_extractor.py
-"""
-RTF 테이블 추출기
-
-RTF 문서에서 테이블을 추출하고 파싱하는 기능을 제공합니다.
-"""
-import logging
-import re
-from typing import List, Optional, Tuple
-
-from contextifier.core.processor.doc_helpers.rtf_models import (
-    RTFCellInfo,
-    RTFTable,
-)
-from contextifier.core.processor.doc_helpers.rtf_decoder import (
-    decode_hex_escapes,
-)
-from contextifier.core.processor.doc_helpers.rtf_text_cleaner import (
-    clean_rtf_text,
-)
-from contextifier.core.processor.doc_helpers.rtf_region_finder import (
-    find_excluded_regions,
-    is_in_excluded_region,
-)
-
-logger = logging.getLogger("document-processor")
-
-
-def extract_tables_with_positions(
-    content: str,
-    encoding: str = "cp949"
-) -> Tuple[List[RTFTable], List[Tuple[int, int, RTFTable]]]:
-    r"""
-    RTF에서 테이블을 추출합니다 (위치 정보 포함).
-
-    RTF 테이블 구조:
-    - \trowd: 테이블 행 시작 (row definition)
-    - \cellx<N>: 셀 경계 위치 정의
-    - \clmgf: 수평 병합 시작
-    - \clmrg: 수평 병합 계속
-    - \clvmgf: 수직 병합 시작
-    - \clvmrg: 수직 병합 계속
-    - \intbl: 셀 내 단락
-    - \cell: 셀 끝
-    - \row: 행 끝
-
-    Args:
-        content: RTF 문자열 콘텐츠
-        encoding: 사용할 인코딩
-
-    Returns:
-        (테이블 리스트, 테이블 영역 리스트) 튜플
-    """
-    tables = []
-    table_regions = []
-
-    # 제외 영역 찾기 (header, footer, footnote 등)
-    excluded_regions = find_excluded_regions(content)
-
-    # 1단계: \row로 끝나는 모든 위치 찾기
-    row_positions = []
-    for match in re.finditer(r'\\row(?![a-z])', content):
-        row_positions.append(match.end())
-
-    if not row_positions:
-        return tables, table_regions
-
-    # 2단계: 각 \row 전에 있는 \trowd 찾기 (해당 행의 시작)
-    all_rows = []
-    for i, row_end in enumerate(row_positions):
-        # 이전 \row 위치 또는 시작점
-        if i == 0:
-            search_start = 0
-        else:
-            search_start = row_positions[i - 1]
-
-        # 이 영역에서 첫 번째 \trowd 찾기
-        segment = content[search_start:row_end]
-        trowd_match = re.search(r'\\trowd', segment)
-
-        if trowd_match:
-            row_start = search_start + trowd_match.start()
-
-            # 제외 영역(header/footer/footnote) 안에 있는 행은 무시
-            if is_in_excluded_region(row_start, excluded_regions):
-                logger.debug(f"Skipping table row at {row_start} (in header/footer/footnote)")
-                continue
-
-            row_text = content[row_start:row_end]
-            all_rows.append((row_start, row_end, row_text))
-
-    if not all_rows:
-        return tables, table_regions
-
-    # 연속된 행들을 테이블로 그룹화
-    table_groups = []  # [(start_pos, end_pos, [row_texts])]
-    current_table = []
-    current_start = -1
-    current_end = -1
-    prev_end = -1
-
-    for row_start, row_end, row_text in all_rows:
-        # 이전 행과 150자 이내면 같은 테이블
-        if prev_end == -1 or row_start - prev_end < 150:
-            if current_start == -1:
-                current_start = row_start
-            current_table.append(row_text)
-            current_end = row_end
-        else:
-            if current_table:
-                table_groups.append((current_start, current_end, current_table))
-            current_table = [row_text]
-            current_start = row_start
-            current_end = row_end
-        prev_end = row_end
-
-    if current_table:
-        table_groups.append((current_start, current_end, current_table))
-
-    logger.info(f"Found {len(table_groups)} table groups")
-
-    # 각 테이블 그룹 파싱
-    for start_pos, end_pos, table_rows in table_groups:
-        table = _parse_table_with_merge(table_rows, encoding)
-        if table and table.rows:
-            table.position = start_pos
-            table.end_position = end_pos
-            tables.append(table)
-            table_regions.append((start_pos, end_pos, table))
-
-    logger.info(f"Extracted {len(tables)} tables")
-    return tables, table_regions
-
-
-def _parse_table_with_merge(rows: List[str], encoding: str = "cp949") -> Optional[RTFTable]:
-    """
-    테이블 행들을 파싱하여 RTFTable 객체로 변환 (병합 셀 지원)
-
-    Args:
-        rows: 테이블 행 텍스트 리스트
-        encoding: 사용할 인코딩
-
-    Returns:
-        RTFTable 객체
-    """
-    table = RTFTable()
-
-    for row_text in rows:
-        cells = _extract_cells_with_merge(row_text, encoding)
-        if cells:
-            table.rows.append(cells)
-            if len(cells) > table.col_count:
-                table.col_count = len(cells)
-
-    return table if table.rows else None
-
-
-def _extract_cells_with_merge(row_text: str, encoding: str = "cp949") -> List[RTFCellInfo]:
-    """
-    테이블 행에서 셀 내용과 병합 정보를 추출합니다.
-
-    Args:
-        row_text: 테이블 행 RTF 텍스트
-        encoding: 사용할 인코딩
-
-    Returns:
-        RTFCellInfo 리스트
-    """
-    cells = []
-
-    # 1단계: 셀 정의 파싱 (cellx 전까지의 속성들)
-    cell_defs = []
-
-    # \cell 다음에 x가 오지 않는 첫 번째 \cell 찾기
-    first_cell_idx = -1
-    pos = 0
-    while True:
-        idx = row_text.find('\\cell', pos)
-        if idx == -1:
-            first_cell_idx = len(row_text)
-            break
-        # \cell 다음이 x인지 확인 (\cellx는 건너뜀)
-        if idx + 5 < len(row_text) and row_text[idx + 5] == 'x':
-            pos = idx + 1
-            continue
-        first_cell_idx = idx
-        break
-
-    def_part = row_text[:first_cell_idx]
-
-    current_def = {
-        'h_merge_first': False,
-        'h_merge_cont': False,
-        'v_merge_first': False,
-        'v_merge_cont': False,
-        'right_boundary': 0
-    }
-
-    cell_def_pattern = r'\\cl(?:mgf|mrg|vmgf|vmrg)|\\cellx(-?\d+)'
-
-    for match in re.finditer(cell_def_pattern, def_part):
-        token = match.group()
-        if token == '\\clmgf':
-            current_def['h_merge_first'] = True
-        elif token == '\\clmrg':
-            current_def['h_merge_cont'] = True
-        elif token == '\\clvmgf':
-            current_def['v_merge_first'] = True
-        elif token == '\\clvmrg':
-            current_def['v_merge_cont'] = True
-        elif token.startswith('\\cellx'):
-            if match.group(1):
-                current_def['right_boundary'] = int(match.group(1))
-            cell_defs.append(current_def.copy())
-            # 다음 셀을 위해 초기화
-            current_def = {
-                'h_merge_first': False,
-                'h_merge_cont': False,
-                'v_merge_first': False,
-                'v_merge_cont': False,
-                'right_boundary': 0
-            }
-
-    # 2단계: 셀 내용 추출
-    cell_texts = _extract_cell_texts(row_text, encoding)
-
-    # 3단계: 셀 정의와 내용 매칭
-    for i, cell_text in enumerate(cell_texts):
-        if i < len(cell_defs):
-            cell_def = cell_defs[i]
-        else:
-            cell_def = {
-                'h_merge_first': False,
-                'h_merge_cont': False,
-                'v_merge_first': False,
-                'v_merge_cont': False,
-                'right_boundary': 0
-            }
-
-        cells.append(RTFCellInfo(
-            text=cell_text,
-            h_merge_first=cell_def['h_merge_first'],
-            h_merge_cont=cell_def['h_merge_cont'],
-            v_merge_first=cell_def['v_merge_first'],
-            v_merge_cont=cell_def['v_merge_cont'],
-            right_boundary=cell_def['right_boundary']
-        ))
-
-    return cells
-
-
-def _extract_cell_texts(row_text: str, encoding: str = "cp949") -> List[str]:
-    r"""
-    행에서 셀 텍스트만 추출합니다.
-
-    Args:
-        row_text: 테이블 행 RTF 텍스트
-        encoding: 사용할 인코딩
-
-    Returns:
-        셀 텍스트 리스트
-    """
-    cell_texts = []
-
-    # 1단계: 모든 \cell 위치 찾기 (cellx가 아닌 순수 \cell만)
-    cell_positions = []
-    pos = 0
-    while True:
-        idx = row_text.find('\\cell', pos)
-        if idx == -1:
-            break
-        # \cell 다음이 x인지 확인
-        next_pos = idx + 5
-        if next_pos < len(row_text) and row_text[next_pos] == 'x':
-            pos = idx + 1
-            continue
-        cell_positions.append(idx)
-        pos = idx + 1
-
-    if not cell_positions:
-        return cell_texts
-
-    # 2단계: 첫 번째 \cell 위치 이전에서 마지막 \cellx 찾기
-    first_cell_pos = cell_positions[0]
-    def_part = row_text[:first_cell_pos]
-
-    last_cellx_end = 0
-    for match in re.finditer(r'\\cellx-?\d+', def_part):
-        last_cellx_end = match.end()
-
-    if last_cellx_end == 0:
-        last_cellx_end = 0
-
-    # 3단계: 각 셀 내용 추출
-    prev_end = last_cellx_end
-    for cell_end in cell_positions:
-        cell_content = row_text[prev_end:cell_end]
-
-        # RTF 디코딩 및 클리닝
-        decoded = decode_hex_escapes(cell_content, encoding)
-        clean = clean_rtf_text(decoded, encoding)
-        cell_texts.append(clean)
-
-        # 다음 셀은 \cell 다음부터
-        prev_end = cell_end + 5  # len('\\cell') = 5
-
-    return cell_texts
diff --git a/contextifier/core/processor/docx_handler.py b/contextifier/core/processor/docx_handler.py
index b83b885..9de1418 100644
--- a/contextifier/core/processor/docx_handler.py
+++ b/contextifier/core/processor/docx_handler.py
@@ -45,14 +45,13 @@
 from contextifier.core.processor.docx_helper import (
     # Constants
     ElementType,
-    # Metadata
-    extract_docx_metadata,
-    format_metadata,
     # Table
     process_table_element,
     # Paragraph
     process_paragraph_element,
 )
+from contextifier.core.processor.docx_helper.docx_metadata import DOCXMetadataExtractor
+from contextifier.core.processor.docx_helper.docx_image_processor import DOCXImageProcessor
 
 logger = logging.getLogger("document-processor")
 
@@ -64,24 +63,47 @@
 class DOCXHandler(BaseHandler):
     """
     DOCX Document Processing Handler
-    
+
     Inherits from BaseHandler to manage config and image_processor at instance level.
-    
+
     Fallback Chain:
     1. Enhanced DOCX processing (python-docx with BytesIO stream)
     2. DOCHandler fallback (for non-ZIP files: RTF, OLE, HTML, etc.)
     3. Simple text extraction
     4. Error message
-    
+
     Usage:
         handler = DOCXHandler(config=config, image_processor=image_processor)
         text = handler.extract_text(current_file)
     """
-    
+
+    def _create_file_converter(self):
+        """Create DOCX-specific file converter."""
+        from contextifier.core.processor.docx_helper.docx_file_converter import DOCXFileConverter
+        return DOCXFileConverter()
+
+    def _create_preprocessor(self):
+        """Create DOCX-specific preprocessor."""
+        from contextifier.core.processor.docx_helper.docx_preprocessor import DOCXPreprocessor
+        return DOCXPreprocessor()
+
     def _create_chart_extractor(self) -> BaseChartExtractor:
         """Create DOCX-specific chart extractor."""
         return DOCXChartExtractor(self._chart_processor)
-    
+
+    def _create_metadata_extractor(self):
+        """Create DOCX-specific metadata extractor."""
+        return DOCXMetadataExtractor()
+
+    def _create_format_image_processor(self):
+        """Create DOCX-specific image processor."""
+        return DOCXImageProcessor(
+            directory_path=self._image_processor.config.directory_path,
+            tag_prefix=self._image_processor.config.tag_prefix,
+            tag_suffix=self._image_processor.config.tag_suffix,
+            storage_backend=self._image_processor.storage_backend,
+        )
+
     def extract_text(
         self,
         current_file: "CurrentFile",
@@ -90,36 +112,27 @@ def extract_text(
     ) -> str:
         """
         Extract text from DOCX file.
-        
+
         Args:
             current_file: CurrentFile dict containing file info and binary data
             extract_metadata: Whether to extract metadata
             **kwargs: Additional options
-            
+
         Returns:
             Extracted text (with inline image tags, table HTML)
         """
         file_path = current_file.get("file_path", "unknown")
+        file_data = current_file.get("file_data", b"")
         self.logger.info(f"DOCX processing: {file_path}")
-        
-        # Check if file is a valid ZIP (DOCX is a ZIP-based format)
-        if self._is_valid_zip(current_file):
+
+        # Check if file is a valid DOCX using file_converter validation
+        if self.file_converter.validate(file_data):
             return self._extract_docx_enhanced(current_file, extract_metadata)
         else:
-            # Not a valid ZIP, try DOCHandler fallback
-            self.logger.warning(f"File is not a valid ZIP, trying DOCHandler fallback: {file_path}")
+            # Not a valid DOCX, try DOCHandler fallback
+            self.logger.warning(f"File is not a valid DOCX, trying DOCHandler fallback: {file_path}")
             return self._extract_with_doc_handler_fallback(current_file, extract_metadata)
-    
-    def _is_valid_zip(self, current_file: "CurrentFile") -> bool:
-        """Check if file is a valid ZIP archive."""
-        try:
-            file_stream = self.get_file_stream(current_file)
-            with zipfile.ZipFile(file_stream, 'r') as zf:
-                # Check for DOCX-specific content
-                return '[Content_Types].xml' in zf.namelist()
-        except (zipfile.BadZipFile, Exception):
-            return False
-    
+
     def _extract_with_doc_handler_fallback(
         self,
         current_file: "CurrentFile",
@@ -127,41 +140,41 @@ def _extract_with_doc_handler_fallback(
     ) -> str:
         """
         Fallback to DOCHandler for non-ZIP files.
-        
+
         Handles RTF, OLE, HTML, and other formats that might be
         incorrectly named as .docx files.
         """
         file_path = current_file.get("file_path", "unknown")
-        
+
         try:
             from contextifier.core.processor.doc_handler import DOCHandler
-            
+
             doc_handler = DOCHandler(
                 config=self.config,
-                image_processor=self.image_processor
+                image_processor=self.format_image_processor
             )
-            
+
             # DOCHandler still uses file_path, so pass it directly
             result = doc_handler.extract_text(current_file, extract_metadata=extract_metadata)
-            
+
             if result and not result.startswith("[DOC"):
                 self.logger.info(f"DOCHandler fallback successful for: {file_path}")
                 return result
             else:
                 # DOCHandler also failed, try simple extraction
                 return self._extract_simple_text_fallback(current_file)
-                
+
         except Exception as e:
             self.logger.error(f"DOCHandler fallback failed: {e}")
             return self._extract_simple_text_fallback(current_file)
-    
+
     def _extract_simple_text_fallback(self, current_file: "CurrentFile") -> str:
         """
         Last resort: try to extract any readable text from the file.
         """
         file_path = current_file.get("file_path", "unknown")
         file_data = current_file.get("file_data", b"")
-        
+
         try:
             # Try different encodings
             for encoding in ['utf-8', 'cp949', 'euc-kr', 'latin-1']:
@@ -171,20 +184,20 @@ def _extract_simple_text_fallback(self, current_file: "CurrentFile") -> str:
                     import re
                     text = re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f\x7f-\x9f]', '', text)
                     text = text.strip()
-                    
+
                     if text and len(text) > 50:  # Must have meaningful content
                         self.logger.info(f"Simple text extraction successful with {encoding}: {file_path}")
                         return text
                 except (UnicodeDecodeError, Exception):
                     continue
-            
+
             raise ValueError("Could not decode file with any known encoding")
-            
+
         except Exception as e:
             self.logger.error(f"All extraction methods failed for: {file_path}")
             raise RuntimeError(f"DOCX file processing failed: {file_path}. "
                              f"File is not a valid DOCX, DOC, RTF, or text file.")
-    
+
     def _extract_docx_enhanced(
         self,
         current_file: "CurrentFile",
@@ -192,7 +205,7 @@ def _extract_docx_enhanced(
     ) -> str:
         """
         Enhanced DOCX processing.
-        
+
         - Document order preservation (body element traversal)
         - Metadata extraction
         - Inline image extraction and local saving
@@ -201,12 +214,17 @@ def _extract_docx_enhanced(
         - Page break handling
         """
         file_path = current_file.get("file_path", "unknown")
+        file_data = current_file.get("file_data", b"")
         self.logger.info(f"Enhanced DOCX processing: {file_path}")
 
         try:
-            # Use BytesIO stream to avoid path encoding issues
-            file_stream = self.get_file_stream(current_file)
-            doc = Document(file_stream)
+            # Step 1: Use file_converter to convert binary to Document
+            doc = self.file_converter.convert(file_data)
+
+            # Step 2: Preprocess - may transform doc in the future
+            preprocessed = self.preprocess(doc)
+            doc = preprocessed.clean_content  # TRUE SOURCE
+
             result_parts = []
             processed_images: Set[str] = set()
             current_page = 1
@@ -215,10 +233,10 @@ def _extract_docx_enhanced(
             total_charts = 0
 
             # Pre-extract all charts using ChartExtractor
-            file_stream.seek(0)
+            file_stream = self.get_file_stream(current_file)
             chart_data_list = self.chart_extractor.extract_all_from_file(file_stream)
             chart_idx = [0]  # Mutable container for closure
-            
+
             def get_next_chart() -> str:
                 """Callback to get the next pre-extracted chart content."""
                 if chart_idx[0] < len(chart_data_list):
@@ -229,11 +247,10 @@ def get_next_chart() -> str:
 
             # Metadata extraction
             if extract_metadata:
-                metadata = extract_docx_metadata(doc)
-                metadata_str = format_metadata(metadata)
+                metadata_str = self.extract_and_format_metadata(doc)
                 if metadata_str:
                     result_parts.append(metadata_str + "\n\n")
-                    self.logger.info(f"DOCX metadata extracted: {list(metadata.keys())}")
+                    self.logger.info(f"DOCX metadata extracted")
 
             # Start page 1
             page_tag = self.create_page_tag(current_page)
@@ -247,7 +264,7 @@ def get_next_chart() -> str:
                     # Paragraph processing - pass chart_callback for pre-extracted charts
                     content, has_page_break, img_count, chart_count = process_paragraph_element(
                         body_elem, doc, processed_images, file_path,
-                        image_processor=self.image_processor,
+                        image_processor=self.format_image_processor,
                         chart_callback=get_next_chart
                     )
 
@@ -281,14 +298,14 @@ def get_next_chart() -> str:
             self.logger.error(f"Error in enhanced DOCX processing: {e}")
             self.logger.debug(traceback.format_exc())
             return self._extract_docx_simple_text(current_file)
-    
+
     def _format_chart_data(self, chart_data) -> str:
         """Format ChartData using ChartProcessor."""
         from contextifier.core.functions.chart_extractor import ChartData
-        
+
         if not isinstance(chart_data, ChartData):
             return ""
-        
+
         if chart_data.has_data():
             return self.chart_processor.format_chart_data(
                 chart_type=chart_data.chart_type,
@@ -301,12 +318,12 @@ def _format_chart_data(self, chart_data) -> str:
                 chart_type=chart_data.chart_type,
                 title=chart_data.title
             )
-    
+
     def _extract_docx_simple_text(self, current_file: "CurrentFile") -> str:
         """Simple text extraction (fallback)."""
         try:
-            file_stream = self.get_file_stream(current_file)
-            doc = Document(file_stream)
+            file_data = current_file.get("file_data", b"")
+            doc = self.file_converter.convert(file_data)
             result_parts = []
 
             for para in doc.paragraphs:
diff --git a/contextifier/core/processor/docx_helper/__init__.py b/contextifier/core/processor/docx_helper/__init__.py
index e04ad87..e4a572a 100644
--- a/contextifier/core/processor/docx_helper/__init__.py
+++ b/contextifier/core/processor/docx_helper/__init__.py
@@ -1,17 +1,16 @@
-# service/document_processor/processor/docx_helper/__init__.py
+# contextifier/core/processor/docx_helper/__init__.py
 """
-DOCX Helper 모듈
+DOCX Helper Module
 
-DOCX 문서 처리에 필요한 유틸리티를 기능별로 분리한 모듈입니다.
+Utility modules for DOCX document processing.
 
-모듈 구성:
-- docx_constants: 상수, Enum, 데이터클래스 (ElementType, NAMESPACES 등)
-- docx_metadata: 메타데이터 추출 및 포맷팅
-- docx_chart_extractor: 차트 추출 (ChartExtractor)
-- docx_image: 이미지 추출 및 업로드
-- docx_table: 테이블 HTML 변환 (rowspan/colspan 지원)
-- docx_drawing: Drawing 요소 처리 (이미지/다이어그램)
-- docx_paragraph: Paragraph 처리 및 페이지 브레이크
+Module structure:
+- docx_constants: Constants, Enum, dataclasses (ElementType, NAMESPACES, etc.)
+- docx_metadata: Metadata extraction (DOCXMetadataExtractor)
+- docx_chart_extractor: Chart extraction (DOCXChartExtractor)
+- docx_image_processor: Image/drawing processing (DOCXImageProcessor)
+- docx_table: Table HTML conversion (rowspan/colspan support)
+- docx_paragraph: Paragraph processing and page breaks
 """
 
 # Constants
@@ -24,8 +23,7 @@
 
 # Metadata
 from contextifier.core.processor.docx_helper.docx_metadata import (
-    extract_docx_metadata,
-    format_metadata,
+    DOCXMetadataExtractor,
 )
 
 # Chart Extractor
@@ -33,10 +31,9 @@
     DOCXChartExtractor,
 )
 
-# Image
-from contextifier.core.processor.docx_helper.docx_image import (
-    extract_image_from_drawing,
-    process_pict_element,
+# Image Processor (replaces docx_image.py utility functions)
+from contextifier.core.processor.docx_helper.docx_image_processor import (
+    DOCXImageProcessor,
 )
 
 # Table
@@ -49,11 +46,6 @@
     extract_table_as_text,
 )
 
-# Drawing
-from contextifier.core.processor.docx_helper.docx_drawing import (
-    process_drawing_element,
-)
-
 # Paragraph
 from contextifier.core.processor.docx_helper.docx_paragraph import (
     process_paragraph_element,
@@ -68,13 +60,11 @@
     'NAMESPACES',
     'CHART_TYPE_MAP',
     # Metadata
-    'extract_docx_metadata',
-    'format_metadata',
+    'DOCXMetadataExtractor',
     # Chart Extractor
     'DOCXChartExtractor',
-    # Image
-    'extract_image_from_drawing',
-    'process_pict_element',
+    # Image Processor
+    'DOCXImageProcessor',
     # Table
     'TableCellInfo',
     'process_table_element',
@@ -82,8 +72,6 @@
     'estimate_column_count',
     'extract_cell_text',
     'extract_table_as_text',
-    # Drawing
-    'process_drawing_element',
     # Paragraph
     'process_paragraph_element',
     'has_page_break_element',
diff --git a/contextifier/core/processor/docx_helper/docx_drawing.py b/contextifier/core/processor/docx_helper/docx_drawing.py
deleted file mode 100644
index 3fab728..0000000
--- a/contextifier/core/processor/docx_helper/docx_drawing.py
+++ /dev/null
@@ -1,121 +0,0 @@
-# service/document_processor/processor/docx_helper/docx_drawing.py
-"""
-DOCX Drawing Element Processing Utility
-
-Processes Drawing elements (images, charts, diagrams) in DOCX documents.
-- process_drawing_element: Process Drawing element (branch to image/chart/diagram)
-- extract_diagram_from_drawing: Extract diagram from Drawing
-
-Note: Chart extraction is handled separately by DOCXChartExtractor.
-      This module only detects chart presence for counting/positioning.
-"""
-import logging
-from typing import Optional, Set, Tuple, Callable
-
-from docx import Document
-
-from contextifier.core.processor.docx_helper.docx_constants import ElementType, NAMESPACES
-from contextifier.core.processor.docx_helper.docx_image import extract_image_from_drawing
-from contextifier.core.functions.img_processor import ImageProcessor
-
-logger = logging.getLogger("document-processor")
-
-
-def process_drawing_element(
-    drawing_elem,
-    doc: Document,
-    processed_images: Set[str],
-    file_path: str = None,
-    image_processor: Optional[ImageProcessor] = None,
-    chart_callback: Optional[Callable[[], str]] = None
-) -> Tuple[str, Optional[ElementType]]:
-    """
-    Process Drawing element (image, chart, diagram).
-
-    Args:
-        drawing_elem: drawing XML element
-        doc: python-docx Document object
-        processed_images: Set of processed image paths (deduplication)
-        file_path: Original file path
-        image_processor: ImageProcessor instance
-        chart_callback: Callback function to get next chart content.
-                       Called when chart is detected, should return formatted chart string.
-
-    Returns:
-        (content, element_type) tuple
-    """
-    try:
-        # Check inline or anchor
-        inline = drawing_elem.find('.//wp:inline', NAMESPACES)
-        anchor = drawing_elem.find('.//wp:anchor', NAMESPACES)
-
-        container = inline if inline is not None else anchor
-        if container is None:
-            return "", None
-
-        # Check graphic data
-        graphic = container.find('.//a:graphic', NAMESPACES)
-        if graphic is None:
-            return "", None
-
-        graphic_data = graphic.find('a:graphicData', NAMESPACES)
-        if graphic_data is None:
-            return "", None
-
-        uri = graphic_data.get('uri', '')
-
-        # Image case
-        if 'picture' in uri.lower():
-            return extract_image_from_drawing(graphic_data, doc, processed_images, image_processor)
-
-        # Chart case - use callback to get pre-extracted chart content
-        if 'chart' in uri.lower():
-            if chart_callback:
-                chart_content = chart_callback()
-                return chart_content, ElementType.CHART
-            return "", ElementType.CHART
-
-        # Diagram case
-        if 'diagram' in uri.lower():
-            return extract_diagram_from_drawing(graphic_data, doc)
-
-        # Other drawing
-        return "", None
-
-    except Exception as e:
-        logger.warning(f"Error processing drawing element: {e}")
-        return "", None
-
-
-def extract_diagram_from_drawing(graphic_data, doc: Document) -> Tuple[str, Optional[ElementType]]:
-    """
-    Extract diagram information from Drawing.
-
-    Args:
-        graphic_data: graphicData XML element
-        doc: python-docx Document object
-
-    Returns:
-        (content, element_type) tuple
-    """
-    try:
-        # Try to extract text from diagram
-        texts = []
-        for t_elem in graphic_data.findall('.//{http://schemas.openxmlformats.org/drawingml/2006/main}t'):
-            if t_elem.text:
-                texts.append(t_elem.text.strip())
-
-        if texts:
-            return f"[Diagram: {' / '.join(texts)}]", ElementType.DIAGRAM
-
-        return "[Diagram]", ElementType.DIAGRAM
-
-    except Exception as e:
-        logger.warning(f"Error extracting diagram from drawing: {e}")
-        return "[Diagram]", ElementType.DIAGRAM
-
-
-__all__ = [
-    'process_drawing_element',
-    'extract_diagram_from_drawing',
-]
diff --git a/contextifier/core/processor/docx_helper/docx_drawing_new.py b/contextifier/core/processor/docx_helper/docx_drawing_new.py
deleted file mode 100644
index 3fab728..0000000
--- a/contextifier/core/processor/docx_helper/docx_drawing_new.py
+++ /dev/null
@@ -1,121 +0,0 @@
-# service/document_processor/processor/docx_helper/docx_drawing.py
-"""
-DOCX Drawing Element Processing Utility
-
-Processes Drawing elements (images, charts, diagrams) in DOCX documents.
-- process_drawing_element: Process Drawing element (branch to image/chart/diagram)
-- extract_diagram_from_drawing: Extract diagram from Drawing
-
-Note: Chart extraction is handled separately by DOCXChartExtractor.
-      This module only detects chart presence for counting/positioning.
-"""
-import logging
-from typing import Optional, Set, Tuple, Callable
-
-from docx import Document
-
-from contextifier.core.processor.docx_helper.docx_constants import ElementType, NAMESPACES
-from contextifier.core.processor.docx_helper.docx_image import extract_image_from_drawing
-from contextifier.core.functions.img_processor import ImageProcessor
-
-logger = logging.getLogger("document-processor")
-
-
-def process_drawing_element(
-    drawing_elem,
-    doc: Document,
-    processed_images: Set[str],
-    file_path: str = None,
-    image_processor: Optional[ImageProcessor] = None,
-    chart_callback: Optional[Callable[[], str]] = None
-) -> Tuple[str, Optional[ElementType]]:
-    """
-    Process Drawing element (image, chart, diagram).
-
-    Args:
-        drawing_elem: drawing XML element
-        doc: python-docx Document object
-        processed_images: Set of processed image paths (deduplication)
-        file_path: Original file path
-        image_processor: ImageProcessor instance
-        chart_callback: Callback function to get next chart content.
-                       Called when chart is detected, should return formatted chart string.
-
-    Returns:
-        (content, element_type) tuple
-    """
-    try:
-        # Check inline or anchor
-        inline = drawing_elem.find('.//wp:inline', NAMESPACES)
-        anchor = drawing_elem.find('.//wp:anchor', NAMESPACES)
-
-        container = inline if inline is not None else anchor
-        if container is None:
-            return "", None
-
-        # Check graphic data
-        graphic = container.find('.//a:graphic', NAMESPACES)
-        if graphic is None:
-            return "", None
-
-        graphic_data = graphic.find('a:graphicData', NAMESPACES)
-        if graphic_data is None:
-            return "", None
-
-        uri = graphic_data.get('uri', '')
-
-        # Image case
-        if 'picture' in uri.lower():
-            return extract_image_from_drawing(graphic_data, doc, processed_images, image_processor)
-
-        # Chart case - use callback to get pre-extracted chart content
-        if 'chart' in uri.lower():
-            if chart_callback:
-                chart_content = chart_callback()
-                return chart_content, ElementType.CHART
-            return "", ElementType.CHART
-
-        # Diagram case
-        if 'diagram' in uri.lower():
-            return extract_diagram_from_drawing(graphic_data, doc)
-
-        # Other drawing
-        return "", None
-
-    except Exception as e:
-        logger.warning(f"Error processing drawing element: {e}")
-        return "", None
-
-
-def extract_diagram_from_drawing(graphic_data, doc: Document) -> Tuple[str, Optional[ElementType]]:
-    """
-    Extract diagram information from Drawing.
-
-    Args:
-        graphic_data: graphicData XML element
-        doc: python-docx Document object
-
-    Returns:
-        (content, element_type) tuple
-    """
-    try:
-        # Try to extract text from diagram
-        texts = []
-        for t_elem in graphic_data.findall('.//{http://schemas.openxmlformats.org/drawingml/2006/main}t'):
-            if t_elem.text:
-                texts.append(t_elem.text.strip())
-
-        if texts:
-            return f"[Diagram: {' / '.join(texts)}]", ElementType.DIAGRAM
-
-        return "[Diagram]", ElementType.DIAGRAM
-
-    except Exception as e:
-        logger.warning(f"Error extracting diagram from drawing: {e}")
-        return "[Diagram]", ElementType.DIAGRAM
-
-
-__all__ = [
-    'process_drawing_element',
-    'extract_diagram_from_drawing',
-]
diff --git a/contextifier/core/processor/docx_helper/docx_file_converter.py b/contextifier/core/processor/docx_helper/docx_file_converter.py
new file mode 100644
index 0000000..38e278d
--- /dev/null
+++ b/contextifier/core/processor/docx_helper/docx_file_converter.py
@@ -0,0 +1,75 @@
+# libs/core/processor/docx_helper/docx_file_converter.py
+"""
+DOCXFileConverter - DOCX file format converter
+
+Converts binary DOCX data to python-docx Document object.
+"""
+from io import BytesIO
+from typing import Any, Optional, BinaryIO
+import zipfile
+
+from contextifier.core.functions.file_converter import BaseFileConverter
+
+
+class DOCXFileConverter(BaseFileConverter):
+    """
+    DOCX file converter using python-docx.
+    
+    Converts binary DOCX data to Document object.
+    """
+    
+    # ZIP magic number (DOCX is a ZIP file)
+    ZIP_MAGIC = b'PK\x03\x04'
+    
+    def convert(
+        self,
+        file_data: bytes,
+        file_stream: Optional[BinaryIO] = None,
+        **kwargs
+    ) -> Any:
+        """
+        Convert binary DOCX data to Document object.
+        
+        Args:
+            file_data: Raw binary DOCX data
+            file_stream: Optional file stream
+            **kwargs: Additional options
+            
+        Returns:
+            docx.Document object
+            
+        Raises:
+            Exception: If DOCX cannot be opened
+        """
+        from docx import Document
+        
+        stream = file_stream if file_stream is not None else BytesIO(file_data)
+        stream.seek(0)
+        return Document(stream)
+    
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "DOCX Document"
+    
+    def validate(self, file_data: bytes) -> bool:
+        """
+        Validate if data is a valid DOCX (ZIP with specific structure).
+        
+        Args:
+            file_data: Raw binary file data
+            
+        Returns:
+            True if file appears to be a DOCX
+        """
+        if not file_data or len(file_data) < 4:
+            return False
+        
+        if not file_data[:4] == self.ZIP_MAGIC:
+            return False
+        
+        # Check for DOCX-specific content
+        try:
+            with zipfile.ZipFile(BytesIO(file_data), 'r') as zf:
+                return '[Content_Types].xml' in zf.namelist()
+        except zipfile.BadZipFile:
+            return False
diff --git a/contextifier/core/processor/docx_helper/docx_image.py b/contextifier/core/processor/docx_helper/docx_image.py
index e066597..e515c59 100644
--- a/contextifier/core/processor/docx_helper/docx_image.py
+++ b/contextifier/core/processor/docx_helper/docx_image.py
@@ -5,15 +5,20 @@
 DOCX 문서에서 이미지를 추출하고 로컬에 저장합니다.
 - extract_image_from_drawing: Drawing 요소에서 이미지 추출
 - process_pict_element: 레거시 VML pict 요소 처리
+
+Note: 이 함수들은 DOCXImageProcessor의 메서드를 호출하는 wrapper입니다.
+      실제 로직은 DOCXImageProcessor에 통합되어 있습니다.
 """
 import logging
-from typing import Optional, Set, Tuple
+from typing import Optional, Set, Tuple, TYPE_CHECKING
 
 from docx import Document
-from docx.oxml.ns import qn
 
-from contextifier.core.functions.img_processor import ImageProcessor
-from contextifier.core.processor.docx_helper.docx_constants import ElementType, NAMESPACES
+from contextifier.core.processor.docx_helper.docx_constants import ElementType
+
+if TYPE_CHECKING:
+    from contextifier.core.processor.docx_helper.docx_image_processor import DOCXImageProcessor
+    from contextifier.core.functions.img_processor import ImageProcessor
 
 logger = logging.getLogger("document-processor")
 
@@ -22,7 +27,7 @@ def extract_image_from_drawing(
     graphic_data,
     doc: Document,
     processed_images: Set[str],
-    image_processor: ImageProcessor
+    image_processor: "ImageProcessor"
 ) -> Tuple[str, Optional[ElementType]]:
     """
     Drawing에서 이미지를 추출합니다.
@@ -31,39 +36,42 @@ def extract_image_from_drawing(
         graphic_data: graphicData XML 요소
         doc: python-docx Document 객체
         processed_images: 처리된 이미지 경로 집합 (중복 방지)
-        image_processor: ImageProcessor 인스턴스
+        image_processor: ImageProcessor 인스턴스 (DOCXImageProcessor 권장)
 
     Returns:
         (content, element_type) 튜플
     """
+    # DOCXImageProcessor인 경우 통합된 메서드 사용
+    if hasattr(image_processor, 'extract_from_drawing'):
+        content, is_image = image_processor.extract_from_drawing(
+            graphic_data, doc, processed_images
+        )
+        return (content, ElementType.IMAGE) if is_image else ("", None)
+    
+    # Fallback: 기존 로직 (ImageProcessor 기본 클래스인 경우)
+    from docx.oxml.ns import qn
+    from contextifier.core.processor.docx_helper.docx_constants import NAMESPACES
     
     try:
-        # blip 요소 찾기 (이미지 참조)
         blip = graphic_data.find('.//a:blip', NAMESPACES)
         if blip is None:
             return "", None
 
-        # Relationship ID
         r_embed = blip.get(qn('r:embed'))
         r_link = blip.get(qn('r:link'))
-
         rId = r_embed or r_link
+        
         if not rId:
             return "", None
 
-        # Relationship에서 이미지 파트 찾기
         try:
             rel = doc.part.rels.get(rId)
             if rel is None:
                 return "", None
 
-            # 이미지 데이터 추출
             if hasattr(rel, 'target_part') and hasattr(rel.target_part, 'blob'):
                 image_data = rel.target_part.blob
-
-                # 로컬에 저장
                 image_tag = image_processor.save_image(image_data, processed_images=processed_images)
-
                 if image_tag:
                     return f"\n{image_tag}\n", ElementType.IMAGE
 
@@ -82,7 +90,7 @@ def process_pict_element(
     pict_elem,
     doc: Document,
     processed_images: Set[str],
-    image_processor: ImageProcessor
+    image_processor: "ImageProcessor"
 ) -> str:
     """
     레거시 VML pict 요소를 처리합니다.
@@ -91,14 +99,17 @@ def process_pict_element(
         pict_elem: pict XML 요소
         doc: python-docx Document 객체
         processed_images: 처리된 이미지 경로 집합 (중복 방지)
-        image_processor: ImageProcessor 인스턴스
+        image_processor: ImageProcessor 인스턴스 (DOCXImageProcessor 권장)
 
     Returns:
         이미지 마크업 문자열
     """
+    # DOCXImageProcessor인 경우 통합된 메서드 사용
+    if hasattr(image_processor, 'extract_from_pict'):
+        return image_processor.extract_from_pict(pict_elem, doc, processed_images)
     
+    # Fallback: 기존 로직 (ImageProcessor 기본 클래스인 경우)
     try:
-        # VML imagedata 찾기
         ns_v = 'urn:schemas-microsoft-com:vml'
         ns_r = 'http://schemas.openxmlformats.org/officeDocument/2006/relationships'
 
diff --git a/contextifier/core/processor/docx_helper/docx_image_processor.py b/contextifier/core/processor/docx_helper/docx_image_processor.py
new file mode 100644
index 0000000..a9966c9
--- /dev/null
+++ b/contextifier/core/processor/docx_helper/docx_image_processor.py
@@ -0,0 +1,410 @@
+# contextifier/core/processor/docx_helper/docx_image_processor.py
+"""
+DOCX Image Processor
+
+Provides DOCX-specific image processing that inherits from ImageProcessor.
+Handles embedded images, drawing elements (image/diagram), and relationship-based images.
+
+This class consolidates all DOCX image and drawing extraction logic including:
+- Drawing/picture element extraction (blip)
+- Diagram text extraction from drawings
+- Legacy VML pict element processing
+- Relationship-based image loading
+"""
+import logging
+from typing import Any, Callable, Dict, List, Optional, Set, Tuple, TYPE_CHECKING
+
+from docx.oxml.ns import qn
+
+from contextifier.core.functions.img_processor import ImageProcessor
+from contextifier.core.functions.storage_backend import BaseStorageBackend
+from contextifier.core.processor.docx_helper.docx_constants import ElementType
+
+if TYPE_CHECKING:
+    from docx import Document
+    from docx.opc.part import Part
+
+logger = logging.getLogger("contextify.image_processor.docx")
+
+# DOCX XML namespaces
+NAMESPACES = {
+    'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main',
+    'wp': 'http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing',
+    'a': 'http://schemas.openxmlformats.org/drawingml/2006/main',
+    'r': 'http://schemas.openxmlformats.org/officeDocument/2006/relationships',
+    'pic': 'http://schemas.openxmlformats.org/drawingml/2006/picture',
+}
+
+
+class DOCXImageProcessor(ImageProcessor):
+    """
+    DOCX-specific image processor.
+    
+    Inherits from ImageProcessor and provides DOCX-specific processing.
+    
+    Handles:
+    - Embedded images via relationships
+    - Drawing/picture elements
+    - Inline images in runs
+    - Shape images
+    
+    Example:
+        processor = DOCXImageProcessor()
+        
+        # Process relationship-based image
+        tag = processor.process_image(image_data, rel_id="rId1")
+        
+        # Process from part
+        tag = processor.process_image_part(image_part)
+    """
+    
+    def __init__(
+        self,
+        directory_path: str = "temp/images",
+        tag_prefix: str = "[Image:",
+        tag_suffix: str = "]",
+        storage_backend: Optional[BaseStorageBackend] = None,
+    ):
+        """
+        Initialize DOCXImageProcessor.
+        
+        Args:
+            directory_path: Image save directory
+            tag_prefix: Tag prefix for image references
+            tag_suffix: Tag suffix for image references
+            storage_backend: Storage backend for saving images
+        """
+        super().__init__(
+            directory_path=directory_path,
+            tag_prefix=tag_prefix,
+            tag_suffix=tag_suffix,
+            storage_backend=storage_backend,
+        )
+    
+    def process_image(
+        self,
+        image_data: bytes,
+        rel_id: Optional[str] = None,
+        image_name: Optional[str] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process and save DOCX image data.
+        
+        Args:
+            image_data: Raw image binary data
+            rel_id: Relationship ID (for naming)
+            image_name: Original image name
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        custom_name = image_name
+        if custom_name is None and rel_id is not None:
+            custom_name = f"docx_{rel_id}"
+        
+        return self.save_image(image_data, custom_name=custom_name)
+    
+    def process_image_part(
+        self,
+        image_part: "Part",
+        rel_id: Optional[str] = None,
+    ) -> Optional[str]:
+        """
+        Process image from OOXML part.
+        
+        Args:
+            image_part: OOXML Part containing image data
+            rel_id: Relationship ID
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        try:
+            image_data = image_part.blob
+            if not image_data:
+                return None
+            
+            # Try to get original filename
+            image_name = None
+            if hasattr(image_part, 'partname'):
+                partname = str(image_part.partname)
+                if '/' in partname:
+                    image_name = partname.split('/')[-1]
+            
+            return self.process_image(
+                image_data,
+                rel_id=rel_id,
+                image_name=image_name
+            )
+            
+        except Exception as e:
+            self._logger.warning(f"Failed to process image part: {e}")
+            return None
+    
+    def process_embedded_image(
+        self,
+        image_data: bytes,
+        image_name: Optional[str] = None,
+        embed_id: Optional[str] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process embedded DOCX image.
+        
+        Args:
+            image_data: Image binary data
+            image_name: Original image filename
+            embed_id: Embed relationship ID
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        custom_name = image_name
+        if custom_name is None and embed_id is not None:
+            custom_name = f"docx_embed_{embed_id}"
+        
+        return self.save_image(image_data, custom_name=custom_name)
+    
+    def process_drawing_image(
+        self,
+        image_data: bytes,
+        drawing_id: Optional[str] = None,
+        description: Optional[str] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process DOCX drawing/picture element image.
+        
+        Args:
+            image_data: Image binary data
+            drawing_id: Drawing element ID
+            description: Image description/alt text
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        custom_name = None
+        if drawing_id is not None:
+            custom_name = f"docx_drawing_{drawing_id}"
+        
+        return self.save_image(image_data, custom_name=custom_name)
+    
+    def extract_from_drawing(
+        self,
+        graphic_data,
+        doc: "Document",
+        processed_images: Set[str],
+    ) -> Tuple[str, bool]:
+        """
+        Extract image from Drawing graphic data element.
+        
+        This is the core DOCX image extraction logic that was previously
+        in docx_image.py extract_image_from_drawing() function.
+        
+        Args:
+            graphic_data: graphicData XML element
+            doc: python-docx Document object
+            processed_images: Set of processed image paths (deduplication)
+            
+        Returns:
+            (image_tag, is_image) tuple. image_tag is the tag string or empty,
+            is_image indicates if an image was found.
+        """
+        try:
+            # Find blip element (image reference)
+            blip = graphic_data.find('.//a:blip', NAMESPACES)
+            if blip is None:
+                return "", False
+
+            # Get relationship ID
+            r_embed = blip.get(qn('r:embed'))
+            r_link = blip.get(qn('r:link'))
+            rId = r_embed or r_link
+            
+            if not rId:
+                return "", False
+
+            # Find image part from relationship
+            try:
+                rel = doc.part.rels.get(rId)
+                if rel is None:
+                    return "", False
+
+                # Extract image data
+                if hasattr(rel, 'target_part') and hasattr(rel.target_part, 'blob'):
+                    image_data = rel.target_part.blob
+                    
+                    # Save using process_image with rel_id
+                    image_tag = self.process_image(
+                        image_data, 
+                        rel_id=rId,
+                        processed_images=processed_images
+                    )
+                    
+                    if image_tag:
+                        return f"\n{image_tag}\n", True
+
+                return "[Unknown Image]", True
+
+            except Exception as e:
+                logger.warning(f"Error extracting image from relationship: {e}")
+                return "[Unknown Image]", True
+
+        except Exception as e:
+            logger.warning(f"Error extracting image from drawing: {e}")
+            return "", False
+    
+    def extract_from_pict(
+        self,
+        pict_elem,
+        doc: "Document",
+        processed_images: Set[str],
+    ) -> str:
+        """
+        Extract image from legacy VML pict element.
+        
+        This is the core DOCX VML image extraction logic that was previously
+        in docx_image.py process_pict_element() function.
+        
+        Args:
+            pict_elem: pict XML element
+            doc: python-docx Document object
+            processed_images: Set of processed image paths (deduplication)
+            
+        Returns:
+            Image tag string or placeholder
+        """
+        try:
+            # Find VML imagedata
+            ns_v = 'urn:schemas-microsoft-com:vml'
+            ns_r = 'http://schemas.openxmlformats.org/officeDocument/2006/relationships'
+
+            imagedata = pict_elem.find('.//{%s}imagedata' % ns_v)
+            if imagedata is None:
+                return "[Unknown Image]"
+
+            rId = imagedata.get('{%s}id' % ns_r)
+            if not rId:
+                return "[Unknown Image]"
+
+            try:
+                rel = doc.part.rels.get(rId)
+                if rel and hasattr(rel, 'target_part') and hasattr(rel.target_part, 'blob'):
+                    image_data = rel.target_part.blob
+                    image_tag = self.process_image(
+                        image_data,
+                        rel_id=rId,
+                        processed_images=processed_images
+                    )
+                    if image_tag:
+                        return f"\n{image_tag}\n"
+            except Exception:
+                pass
+
+            return "[Unknown Image]"
+
+        except Exception as e:
+            logger.warning(f"Error processing pict element: {e}")
+            return ""
+    
+    def process_drawing_element(
+        self,
+        drawing_elem,
+        doc: "Document",
+        processed_images: Set[str],
+        chart_callback: Optional[Callable[[], str]] = None,
+    ) -> Tuple[str, Optional[ElementType]]:
+        """
+        Process Drawing element (image, chart, diagram).
+        
+        Main entry point for handling all drawing elements in DOCX.
+        Branches to appropriate handler based on content type.
+        
+        Args:
+            drawing_elem: drawing XML element
+            doc: python-docx Document object
+            processed_images: Set of processed image paths (deduplication)
+            chart_callback: Callback function to get next chart content
+            
+        Returns:
+            (content, element_type) tuple
+        """
+        try:
+            # Check inline or anchor
+            inline = drawing_elem.find('.//wp:inline', NAMESPACES)
+            anchor = drawing_elem.find('.//wp:anchor', NAMESPACES)
+
+            container = inline if inline is not None else anchor
+            if container is None:
+                return "", None
+
+            # Check graphic data
+            graphic = container.find('.//a:graphic', NAMESPACES)
+            if graphic is None:
+                return "", None
+
+            graphic_data = graphic.find('a:graphicData', NAMESPACES)
+            if graphic_data is None:
+                return "", None
+
+            uri = graphic_data.get('uri', '')
+
+            # Image case
+            if 'picture' in uri.lower():
+                content, is_image = self.extract_from_drawing(
+                    graphic_data, doc, processed_images
+                )
+                return (content, ElementType.IMAGE) if is_image else ("", None)
+
+            # Chart case - delegate to callback
+            if 'chart' in uri.lower():
+                if chart_callback:
+                    chart_content = chart_callback()
+                    return chart_content, ElementType.CHART
+                return "", ElementType.CHART
+
+            # Diagram case
+            if 'diagram' in uri.lower():
+                return self.extract_diagram(graphic_data)
+
+            return "", None
+
+        except Exception as e:
+            logger.warning(f"Error processing drawing element: {e}")
+            return "", None
+    
+    def extract_diagram(
+        self,
+        graphic_data,
+    ) -> Tuple[str, Optional[ElementType]]:
+        """
+        Extract diagram information from Drawing.
+        
+        Args:
+            graphic_data: graphicData XML element
+            
+        Returns:
+            (content, element_type) tuple
+        """
+        try:
+            texts = []
+            ns_a = 'http://schemas.openxmlformats.org/drawingml/2006/main'
+            for t_elem in graphic_data.findall('.//{%s}t' % ns_a):
+                if t_elem.text:
+                    texts.append(t_elem.text.strip())
+
+            if texts:
+                return f"[Diagram: {' / '.join(texts)}]", ElementType.DIAGRAM
+
+            return "[Diagram]", ElementType.DIAGRAM
+
+        except Exception as e:
+            logger.warning(f"Error extracting diagram: {e}")
+            return "[Diagram]", ElementType.DIAGRAM
+
+
+__all__ = ["DOCXImageProcessor"]
diff --git a/contextifier/core/processor/docx_helper/docx_metadata.py b/contextifier/core/processor/docx_helper/docx_metadata.py
index 15bb127..a651734 100644
--- a/contextifier/core/processor/docx_helper/docx_metadata.py
+++ b/contextifier/core/processor/docx_helper/docx_metadata.py
@@ -1,112 +1,71 @@
-# service/document_processor/processor/docx_helper/docx_metadata.py
+# contextifier/core/processor/docx_helper/docx_metadata.py
 """
-DOCX 메타데이터 추출 유틸리티
+DOCX Metadata Extraction Module
 
-DOCX 문서의 core_properties에서 메타데이터를 추출하고 포맷팅합니다.
-- extract_docx_metadata: 메타데이터 딕셔너리 추출
-- format_metadata: 메타데이터를 읽기 쉬운 문자열로 변환
+Provides DOCXMetadataExtractor class for extracting metadata from DOCX documents
+using python-docx core_properties. Implements BaseMetadataExtractor interface.
 """
 import logging
-from datetime import datetime
-from typing import Any, Dict
+from typing import Any, Optional
 
 from docx import Document
 
-logger = logging.getLogger("document-processor")
-
-
-def extract_docx_metadata(doc: Document) -> Dict[str, Any]:
-    """
-    DOCX 문서에서 메타데이터를 추출합니다.
-
-    python-docx의 core_properties를 통해 다음 정보를 추출합니다:
-    - 제목 (title)
-    - 주제 (subject)
-    - 작성자 (author)
-    - 키워드 (keywords)
-    - 설명 (comments)
-    - 마지막 수정자 (last_modified_by)
-    - 작성일 (created)
-    - 수정일 (modified)
-
-    Args:
-        doc: python-docx Document 객체
-
-    Returns:
-        메타데이터 딕셔너리
-    """
-    metadata = {}
-
-    try:
-        props = doc.core_properties
-
-        if props.title:
-            metadata['title'] = props.title.strip()
-        if props.subject:
-            metadata['subject'] = props.subject.strip()
-        if props.author:
-            metadata['author'] = props.author.strip()
-        if props.keywords:
-            metadata['keywords'] = props.keywords.strip()
-        if props.comments:
-            metadata['comments'] = props.comments.strip()
-        if props.last_modified_by:
-            metadata['last_saved_by'] = props.last_modified_by.strip()
-        if props.created:
-            metadata['create_time'] = props.created
-        if props.modified:
-            metadata['last_saved_time'] = props.modified
-
-        logger.debug(f"Extracted DOCX metadata: {list(metadata.keys())}")
+from contextifier.core.functions.metadata_extractor import (
+    BaseMetadataExtractor,
+    DocumentMetadata,
+)
 
-    except Exception as e:
-        logger.warning(f"Failed to extract DOCX metadata: {e}")
-
-    return metadata
+logger = logging.getLogger("document-processor")
 
 
-def format_metadata(metadata: Dict[str, Any]) -> str:
+class DOCXMetadataExtractor(BaseMetadataExtractor):
     """
-    메타데이터 딕셔너리를 읽기 쉬운 문자열로 변환합니다.
-
-    Args:
-        metadata: 메타데이터 딕셔너리
-
-    Returns:
-        포맷된 메타데이터 문자열
+    DOCX Metadata Extractor.
+    
+    Extracts metadata from python-docx Document objects.
+    
+    Supported fields:
+    - title, subject, author, keywords, comments
+    - last_saved_by, create_time, last_saved_time
+    
+    Usage:
+        extractor = DOCXMetadataExtractor()
+        metadata = extractor.extract(docx_document)
+        text = extractor.format(metadata)
     """
-    if not metadata:
-        return ""
-
-    lines = ["<Document-Metadata>"]
-
-    field_names = {
-        'title': '제목',
-        'subject': '주제',
-        'author': '작성자',
-        'keywords': '키워드',
-        'comments': '설명',
-        'last_saved_by': '마지막 저장자',
-        'create_time': '작성일',
-        'last_saved_time': '수정일',
-    }
-
-    for key, label in field_names.items():
-        if key in metadata and metadata[key]:
-            value = metadata[key]
-
-            # datetime 객체 포맷팅
-            if isinstance(value, datetime):
-                value = value.strftime('%Y-%m-%d %H:%M:%S')
-
-            lines.append(f"  {label}: {value}")
-
-    lines.append("</Document-Metadata>")
-
-    return "\n".join(lines)
+    
+    def extract(self, source: Document) -> DocumentMetadata:
+        """
+        Extract metadata from DOCX document.
+        
+        Args:
+            source: python-docx Document object
+            
+        Returns:
+            DocumentMetadata instance containing extracted metadata.
+        """
+        try:
+            props = source.core_properties
+            
+            return DocumentMetadata(
+                title=self._get_stripped(props.title),
+                subject=self._get_stripped(props.subject),
+                author=self._get_stripped(props.author),
+                keywords=self._get_stripped(props.keywords),
+                comments=self._get_stripped(props.comments),
+                last_saved_by=self._get_stripped(props.last_modified_by),
+                create_time=props.created,
+                last_saved_time=props.modified,
+            )
+        except Exception as e:
+            self.logger.warning(f"Failed to extract DOCX metadata: {e}")
+            return DocumentMetadata()
+    
+    def _get_stripped(self, value: Optional[str]) -> Optional[str]:
+        """Return stripped string value, or None if empty."""
+        return value.strip() if value else None
 
 
 __all__ = [
-    'extract_docx_metadata',
-    'format_metadata',
+    'DOCXMetadataExtractor',
 ]
diff --git a/contextifier/core/processor/docx_helper/docx_paragraph.py b/contextifier/core/processor/docx_helper/docx_paragraph.py
index 0c929fc..e3b842c 100644
--- a/contextifier/core/processor/docx_helper/docx_paragraph.py
+++ b/contextifier/core/processor/docx_helper/docx_paragraph.py
@@ -1,20 +1,22 @@
-# service/document_processor/processor/docx_helper/docx_paragraph.py
+# contextifier/core/processor/docx_helper/docx_paragraph.py
 """
 DOCX Paragraph Processing Utility
 
 Processes Paragraph elements in DOCX documents.
 - process_paragraph_element: Process Paragraph element
 - has_page_break_element: Check for page break
+
+Image and drawing extraction is handled by DOCXImageProcessor.
 """
 import logging
-from typing import Optional, Set, Tuple, Callable
+from typing import Optional, Set, Tuple, Callable, TYPE_CHECKING
 
 from docx import Document
 
 from contextifier.core.processor.docx_helper.docx_constants import ElementType, NAMESPACES
-from contextifier.core.processor.docx_helper.docx_drawing import process_drawing_element
-from contextifier.core.processor.docx_helper.docx_image import process_pict_element
-from contextifier.core.functions.img_processor import ImageProcessor
+
+if TYPE_CHECKING:
+    from contextifier.core.processor.docx_helper.docx_image_processor import DOCXImageProcessor
 
 logger = logging.getLogger("document-processor")
 
@@ -24,7 +26,7 @@ def process_paragraph_element(
     doc: Document,
     processed_images: Set[str],
     file_path: str = None,
-    image_processor: Optional[ImageProcessor] = None,
+    image_processor: Optional["DOCXImageProcessor"] = None,
     chart_callback: Optional[Callable[[], str]] = None
 ) -> Tuple[str, bool, int, int]:
     """
@@ -37,7 +39,7 @@ def process_paragraph_element(
         doc: python-docx Document object
         processed_images: Set of processed image paths (deduplication)
         file_path: Original file path
-        image_processor: ImageProcessor instance
+        image_processor: DOCXImageProcessor instance
         chart_callback: Callback function to get next chart content
 
     Returns:
@@ -59,13 +61,14 @@ def process_paragraph_element(
                 if t_elem.text:
                     content_parts.append(t_elem.text)
 
-            # Process Drawing (image/chart/diagram)
+            # Process Drawing (image/chart/diagram) via DOCXImageProcessor
             for drawing_elem in run_elem.findall('w:drawing', NAMESPACES):
-                drawing_content, drawing_type = process_drawing_element(
-                    drawing_elem, doc, processed_images, file_path, 
-                    image_processor, 
-                    chart_callback=chart_callback
-                )
+                if image_processor and hasattr(image_processor, 'process_drawing_element'):
+                    drawing_content, drawing_type = image_processor.process_drawing_element(
+                        drawing_elem, doc, processed_images, chart_callback=chart_callback
+                    )
+                else:
+                    drawing_content, drawing_type = "", None
                 if drawing_content:
                     content_parts.append(drawing_content)
                     if drawing_type == ElementType.IMAGE:
@@ -73,9 +76,12 @@ def process_paragraph_element(
                     elif drawing_type == ElementType.CHART:
                         chart_count += 1
 
-            # Process pict element (legacy VML image)
+            # Process pict element (legacy VML image) - use DOCXImageProcessor
             for pict_elem in run_elem.findall('w:pict', NAMESPACES):
-                pict_content = process_pict_element(pict_elem, doc, processed_images, image_processor)
+                if image_processor and hasattr(image_processor, 'extract_from_pict'):
+                    pict_content = image_processor.extract_from_pict(pict_elem, doc, processed_images)
+                else:
+                    pict_content = "[Unknown Image]"
                 if pict_content:
                     content_parts.append(pict_content)
                     image_count += 1
diff --git a/contextifier/core/processor/docx_helper/docx_preprocessor.py b/contextifier/core/processor/docx_helper/docx_preprocessor.py
new file mode 100644
index 0000000..00a896f
--- /dev/null
+++ b/contextifier/core/processor/docx_helper/docx_preprocessor.py
@@ -0,0 +1,82 @@
+# contextifier/core/processor/docx_helper/docx_preprocessor.py
+"""
+DOCX Preprocessor - Process DOCX document after conversion.
+
+Processing Pipeline Position:
+    1. DOCXFileConverter.convert() → docx.Document
+    2. DOCXPreprocessor.preprocess() → PreprocessedData (THIS STEP)
+    3. DOCXMetadataExtractor.extract() → DocumentMetadata
+    4. Content extraction (paragraphs, tables, images)
+
+Current Implementation:
+    - Pass-through (DOCX uses python-docx Document object directly)
+"""
+import logging
+from typing import Any, Dict
+
+from contextifier.core.functions.preprocessor import (
+    BasePreprocessor,
+    PreprocessedData,
+)
+
+logger = logging.getLogger("contextify.docx.preprocessor")
+
+
+class DOCXPreprocessor(BasePreprocessor):
+    """
+    DOCX Document Preprocessor.
+
+    Currently a pass-through implementation as DOCX processing
+    is handled during the content extraction phase using python-docx.
+    """
+
+    def preprocess(
+        self,
+        converted_data: Any,
+        **kwargs
+    ) -> PreprocessedData:
+        """
+        Preprocess the converted DOCX document.
+
+        Args:
+            converted_data: docx.Document object from DOCXFileConverter
+            **kwargs: Additional options
+
+        Returns:
+            PreprocessedData with the document and any extracted resources
+        """
+        metadata: Dict[str, Any] = {}
+
+        # Extract basic document info if available
+        if hasattr(converted_data, 'core_properties'):
+            props = converted_data.core_properties
+            if hasattr(props, 'title') and props.title:
+                metadata['title'] = props.title
+
+        if hasattr(converted_data, 'paragraphs'):
+            metadata['paragraph_count'] = len(converted_data.paragraphs)
+
+        if hasattr(converted_data, 'tables'):
+            metadata['table_count'] = len(converted_data.tables)
+
+        logger.debug("DOCX preprocessor: pass-through, metadata=%s", metadata)
+
+        # clean_content is the TRUE SOURCE - contains the docx.Document
+        return PreprocessedData(
+            raw_content=converted_data,
+            clean_content=converted_data,  # TRUE SOURCE - docx.Document
+            encoding="utf-8",
+            extracted_resources={},
+            metadata=metadata,
+        )
+
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "DOCX Preprocessor"
+
+    def validate(self, data: Any) -> bool:
+        """Validate if data is a DOCX Document object."""
+        return hasattr(data, 'paragraphs') and hasattr(data, 'tables')
+
+
+__all__ = ['DOCXPreprocessor']
diff --git a/contextifier/core/processor/excel_handler.py b/contextifier/core/processor/excel_handler.py
index becdb65..8af8e0b 100644
--- a/contextifier/core/processor/excel_handler.py
+++ b/contextifier/core/processor/excel_handler.py
@@ -7,7 +7,7 @@
 - Text extraction (direct parsing via openpyxl/xlrd)
 - Table extraction (Markdown or HTML conversion based on merged cells)
 - Inline image extraction and local storage
-- Chart processing (1st priority: convert to table, 2nd priority: matplotlib image)
+- Chart processing (convert to table)
 - Multi-sheet support
 
 Class-based Handler:
@@ -31,13 +31,6 @@
 from contextifier.core.processor.excel_helper import (
     # Textbox
     extract_textboxes_from_xlsx,
-    # Metadata
-    extract_xlsx_metadata,
-    extract_xls_metadata,
-    format_metadata,
-    # Image
-    extract_images_from_xlsx,
-    get_sheet_images,
     # Table
     convert_xlsx_sheet_to_table,
     convert_xls_sheet_to_table,
@@ -45,9 +38,13 @@
     convert_xlsx_objects_to_tables,
     convert_xls_objects_to_tables,
 )
-
-import xlrd
-from openpyxl import load_workbook
+from contextifier.core.processor.excel_helper.excel_metadata import (
+    XLSXMetadataExtractor,
+    XLSMetadataExtractor,
+)
+from contextifier.core.processor.excel_helper.excel_image_processor import (
+    ExcelImageProcessor,
+)
 
 logger = logging.getLogger("document-processor")
 
@@ -59,18 +56,52 @@
 class ExcelHandler(BaseHandler):
     """
     Excel Document Handler (XLSX/XLS)
-    
+
     Inherits from BaseHandler to manage config and image_processor at instance level.
-    
+
     Usage:
         handler = ExcelHandler(config=config, image_processor=image_processor)
         text = handler.extract_text(current_file)
     """
-    
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self._xlsx_metadata_extractor = None
+        self._xls_metadata_extractor = None
+
+    def _create_file_converter(self):
+        """Create Excel-specific file converter."""
+        from contextifier.core.processor.excel_helper.excel_file_converter import ExcelFileConverter
+        return ExcelFileConverter()
+
+    def _create_preprocessor(self):
+        """Create Excel-specific preprocessor."""
+        from contextifier.core.processor.excel_helper.excel_preprocessor import ExcelPreprocessor
+        return ExcelPreprocessor()
+
     def _create_chart_extractor(self) -> BaseChartExtractor:
         """Create Excel-specific chart extractor."""
         return ExcelChartExtractor(self._chart_processor)
-    
+
+    def _create_metadata_extractor(self):
+        """Create XLSX-specific metadata extractor (default)."""
+        return XLSXMetadataExtractor()
+
+    def _create_format_image_processor(self):
+        """Create Excel-specific image processor."""
+        return ExcelImageProcessor(
+            directory_path=self._image_processor.config.directory_path,
+            tag_prefix=self._image_processor.config.tag_prefix,
+            tag_suffix=self._image_processor.config.tag_suffix,
+            storage_backend=self._image_processor.storage_backend,
+        )
+
+    def _get_xls_metadata_extractor(self):
+        """Get XLS-specific metadata extractor."""
+        if self._xls_metadata_extractor is None:
+            self._xls_metadata_extractor = XLSMetadataExtractor()
+        return self._xls_metadata_extractor
+
     def extract_text(
         self,
         current_file: "CurrentFile",
@@ -79,12 +110,12 @@ def extract_text(
     ) -> str:
         """
         Extract text from Excel file.
-        
+
         Args:
             current_file: CurrentFile dict containing file info and binary data
             extract_metadata: Whether to extract metadata
             **kwargs: Additional options
-            
+
         Returns:
             Extracted text
         """
@@ -100,7 +131,7 @@ def extract_text(
             return self._extract_xls(current_file, extract_metadata)
         else:
             raise ValueError(f"Unsupported Excel format: {ext}")
-    
+
     def _extract_xlsx(
         self,
         current_file: "CurrentFile",
@@ -111,9 +142,14 @@ def _extract_xlsx(
         self.logger.info(f"XLSX processing: {file_path}")
 
         try:
-            # Open from stream to avoid path encoding issues
-            file_stream = self.get_file_stream(current_file)
-            wb = load_workbook(file_stream, data_only=True)
+            # Step 1: Convert to Workbook using file_converter
+            file_data = current_file.get("file_data", b"")
+            wb = self.file_converter.convert(file_data, extension='xlsx')
+
+            # Step 2: Preprocess - may transform wb in the future
+            preprocessed = self.preprocess(wb)
+            wb = preprocessed.clean_content  # TRUE SOURCE
+
             preload = self._preload_xlsx_data(current_file, wb, extract_metadata)
 
             result_parts = [preload["metadata_str"]] if preload["metadata_str"] else []
@@ -155,14 +191,19 @@ def _extract_xls(
         self.logger.info(f"XLS processing: {file_path}")
 
         try:
-            # xlrd can open from file_contents (bytes)
+            # Step 1: Convert to Workbook using file_converter
             file_data = current_file.get("file_data", b"")
-            wb = xlrd.open_workbook(file_contents=file_data, formatting_info=True)
+            wb = self.file_converter.convert(file_data, extension='xls')
+
+            # Step 2: Preprocess - may transform wb in the future
+            preprocessed = self.preprocess(wb)
+            wb = preprocessed.clean_content  # TRUE SOURCE
+
             result_parts = []
 
             if extract_metadata:
-                metadata = extract_xls_metadata(wb)
-                metadata_str = format_metadata(metadata)
+                xls_extractor = self._get_xls_metadata_extractor()
+                metadata_str = xls_extractor.extract_and_format(wb)
                 if metadata_str:
                     result_parts.append(metadata_str + "\n\n")
 
@@ -195,7 +236,7 @@ def _preload_xlsx_data(
         """Extract preprocessing data from XLSX file."""
         file_path = current_file.get("file_path", "unknown")
         file_stream = self.get_file_stream(current_file)
-        
+
         result = {
             "metadata_str": "",
             "chart_data_list": [],  # ChartData instances from extractor
@@ -205,16 +246,19 @@ def _preload_xlsx_data(
         }
 
         if extract_metadata:
-            metadata = extract_xlsx_metadata(wb)
-            result["metadata_str"] = format_metadata(metadata)
+            result["metadata_str"] = self.extract_and_format_metadata(wb)
             if result["metadata_str"]:
                 result["metadata_str"] += "\n\n"
 
         # Use ChartExtractor for chart extraction
         result["chart_data_list"] = self.chart_extractor.extract_all_from_file(file_stream)
-        
-        # NOTE: These helper functions still require file_path for now
-        result["images_data"] = extract_images_from_xlsx(file_path)
+
+        # Use format_image_processor directly for image extraction
+        image_processor = self.format_image_processor
+        if hasattr(image_processor, 'extract_images_from_xlsx'):
+            result["images_data"] = image_processor.extract_images_from_xlsx(file_path)
+        else:
+            result["images_data"] = {}
         result["textboxes_by_sheet"] = extract_textboxes_from_xlsx(file_path)
 
         return result
@@ -248,11 +292,15 @@ def _process_xlsx_sheet(
                         stats["charts"] += 1
                     preload["chart_idx"] += 1
 
-        # Image processing
-        sheet_images = get_sheet_images(ws, preload["images_data"], "")
+        # Image processing - use format_image_processor directly
+        image_processor = self.format_image_processor
+        if hasattr(image_processor, 'get_sheet_images'):
+            sheet_images = image_processor.get_sheet_images(ws, preload["images_data"], "")
+        else:
+            sheet_images = []
         for image_data, anchor in sheet_images:
             if image_data:
-                image_tag = self.image_processor.save_image(image_data)
+                image_tag = self.format_image_processor.save_image(image_data)
                 if image_tag:
                     parts.append(f"\n{image_tag}\n")
                     stats["images"] += 1
@@ -265,14 +313,14 @@ def _process_xlsx_sheet(
                 stats["textboxes"] += 1
 
         return "".join(parts)
-    
+
     def _format_chart_data(self, chart_data) -> str:
         """Format ChartData using ChartProcessor."""
         from contextifier.core.functions.chart_extractor import ChartData
-        
+
         if not isinstance(chart_data, ChartData):
             return ""
-        
+
         if chart_data.has_data():
             return self.chart_processor.format_chart_data(
                 chart_type=chart_data.chart_type,
diff --git a/contextifier/core/processor/excel_helper/__init__.py b/contextifier/core/processor/excel_helper/__init__.py
index 2ae148c..f925618 100644
--- a/contextifier/core/processor/excel_helper/__init__.py
+++ b/contextifier/core/processor/excel_helper/__init__.py
@@ -1,19 +1,17 @@
 """
-Excel Helper 모듈
+Excel Helper Module
 
-XLSX/XLS 파일의 세부 요소(텍스트박스, 차트, 이미지, 테이블 등) 추출을 담당합니다.
+Handles extraction of elements (textboxes, charts, images, tables, etc.) from XLSX/XLS files.
 
-모듈 구성:
-- excel_chart_constants: 차트 타입 맵핑 상수
-- excel_chart_parser: OOXML 차트 XML 파싱
-- excel_chart_formatter: 차트 데이터 테이블 포맷팅
-- excel_chart_renderer: matplotlib 이미지 렌더링
-- excel_chart_processor: 차트 처리 메인 (테이블/이미지 폴백)
-- excel_table_xlsx: XLSX 테이블 변환
-- excel_table_xls: XLS 테이블 변환
-- textbox_extractor: 텍스트박스 추출
-- metadata_extractor: 메타데이터 추출
-- image_extractor: 이미지 추출
+Module Structure:
+- excel_chart_constants: Chart type mapping constants
+- excel_chart_extractor: Chart extraction (ChartExtractor)
+- excel_table_xlsx: XLSX table conversion
+- excel_table_xls: XLS table conversion
+- excel_textbox: Textbox extraction
+- excel_metadata: Metadata extraction
+- excel_image: Image extraction
+- excel_layout_detector: Layout detection
 """
 
 # === Textbox ===
@@ -21,45 +19,20 @@
 
 # === Metadata ===
 from contextifier.core.processor.excel_helper.excel_metadata import (
-    extract_xlsx_metadata,
-    extract_xls_metadata,
-    format_metadata,
+    ExcelMetadataExtractor,
+    XLSXMetadataExtractor,
+    XLSMetadataExtractor,
 )
 
-# === Chart Constants ===
-from contextifier.core.processor.excel_helper.excel_chart_constants import (
+# === Chart Extractor ===
+from contextifier.core.processor.excel_helper.excel_chart_extractor import (
+    ExcelChartExtractor,
     CHART_TYPE_MAP,
-    CHART_NAMESPACES,
 )
 
-# === Chart Parser ===
-from contextifier.core.processor.excel_helper.excel_chart_parser import (
-    extract_charts_from_xlsx,
-    parse_ooxml_chart_xml,
-    extract_chart_info_basic,
-)
-
-# === Chart Formatter ===
-from contextifier.core.processor.excel_helper.excel_chart_formatter import (
-    format_chart_data_as_table,
-    format_chart_fallback,
-)
-
-# === Chart Renderer ===
-from contextifier.core.processor.excel_helper.excel_chart_renderer import (
-    render_chart_to_image,
-)
-
-# === Chart Processor ===
-from contextifier.core.processor.excel_helper.excel_chart_processor import (
-    process_chart,
-)
-
-# === Image ===
-from contextifier.core.processor.excel_helper.excel_image import (
-    extract_images_from_xlsx,
-    get_sheet_images,
-    SUPPORTED_IMAGE_EXTENSIONS,
+# === Image Processor (replaces excel_image.py utility functions) ===
+from contextifier.core.processor.excel_helper.excel_image_processor import (
+    ExcelImageProcessor,
 )
 
 # === Table XLSX ===
@@ -94,27 +67,15 @@
     # Textbox
     'extract_textboxes_from_xlsx',
     # Metadata
-    'extract_xlsx_metadata',
-    'extract_xls_metadata',
-    'format_metadata',
+    'ExcelMetadataExtractor',
+    'XLSXMetadataExtractor',
+    'XLSMetadataExtractor',
     # Chart Constants
     'CHART_TYPE_MAP',
-    'CHART_NAMESPACES',
-    # Chart Parser
-    'extract_charts_from_xlsx',
-    'parse_ooxml_chart_xml',
-    'extract_chart_info_basic',
-    # Chart Formatter
-    'format_chart_data_as_table',
-    'format_chart_fallback',
-    # Chart Renderer
-    'render_chart_to_image',
-    # Chart Processor
-    'process_chart',
-    # Image
-    'extract_images_from_xlsx',
-    'get_sheet_images',
-    'SUPPORTED_IMAGE_EXTENSIONS',
+    # Chart Extractor
+    'ExcelChartExtractor',
+    # Image Processor
+    'ExcelImageProcessor',
     # Table XLSX
     'has_merged_cells_xlsx',
     'convert_xlsx_sheet_to_table',
diff --git a/contextifier/core/processor/excel_helper/excel_chart_constants.py b/contextifier/core/processor/excel_helper/excel_chart_constants.py
deleted file mode 100644
index 233a58c..0000000
--- a/contextifier/core/processor/excel_helper/excel_chart_constants.py
+++ /dev/null
@@ -1,31 +0,0 @@
-"""
-Excel 차트 상수 모듈
-
-OOXML 차트 타입 맵핑 및 관련 상수 정의
-"""
-
-# OOXML 차트 타입 맵핑
-CHART_TYPE_MAP = {
-    'barChart': '막대 차트',
-    'bar3DChart': '3D 막대 차트',
-    'lineChart': '선 차트',
-    'line3DChart': '3D 선 차트',
-    'pieChart': '파이 차트',
-    'pie3DChart': '3D 파이 차트',
-    'doughnutChart': '도넛 차트',
-    'areaChart': '영역 차트',
-    'area3DChart': '3D 영역 차트',
-    'scatterChart': '분산형 차트',
-    'radarChart': '방사형 차트',
-    'bubbleChart': '거품형 차트',
-    'stockChart': '주식형 차트',
-    'surfaceChart': '표면 차트',
-    'surface3DChart': '3D 표면 차트',
-    'ofPieChart': '분리형 파이 차트',
-}
-
-# OOXML 네임스페이스
-CHART_NAMESPACES = {
-    'c': 'http://schemas.openxmlformats.org/drawingml/2006/chart',
-    'a': 'http://schemas.openxmlformats.org/drawingml/2006/main',
-}
diff --git a/contextifier/core/processor/excel_helper/excel_chart_formatter.py b/contextifier/core/processor/excel_helper/excel_chart_formatter.py
deleted file mode 100644
index 907261f..0000000
--- a/contextifier/core/processor/excel_helper/excel_chart_formatter.py
+++ /dev/null
@@ -1,106 +0,0 @@
-"""
-Excel 차트 데이터 포맷팅 모듈
-
-차트 데이터를 Markdown 테이블 형식으로 변환합니다.
-"""
-
-import logging
-from typing import Any, Dict, Optional
-
-logger = logging.getLogger("document-processor")
-
-
-def format_chart_data_as_table(chart_info: Dict[str, Any]) -> Optional[str]:
-    """
-    차트 데이터를 Markdown 테이블 형식으로 포맷합니다.
-
-    데이터가 충분하면 테이블 문자열 반환, 없으면 None 반환.
-    None 반환 시 이미지 폴백이 트리거됩니다.
-    
-    Args:
-        chart_info: 차트 정보 딕셔너리
-        
-    Returns:
-        Markdown 테이블 형식의 문자열 또는 None
-    """
-    if not chart_info:
-        return None
-
-    categories = chart_info.get('categories', [])
-    series_list = chart_info.get('series', [])
-
-    # 데이터가 없으면 None 반환 (이미지 폴백 필요)
-    if not series_list or all(len(s.get('values', [])) == 0 for s in series_list):
-        return None
-
-    result_parts = ["[chart]"]
-
-    if chart_info.get('title'):
-        result_parts.append(f"제목: {chart_info['title']}")
-
-    if chart_info.get('chart_type'):
-        result_parts.append(f"유형: {chart_info['chart_type']}")
-
-    result_parts.append("")
-
-    # 테이블 헤더 생성
-    header = ["카테고리"] + [s.get('name', f'시리즈 {i+1}') for i, s in enumerate(series_list)]
-    result_parts.append("| " + " | ".join(str(h) for h in header) + " |")
-    result_parts.append("| " + " | ".join(["---"] * len(header)) + " |")
-
-    # 데이터 행 생성
-    max_len = max(
-        len(categories),
-        max((len(s.get('values', [])) for s in series_list), default=0)
-    )
-
-    for i in range(max_len):
-        row = []
-        
-        # 카테고리
-        if i < len(categories):
-            row.append(str(categories[i]))
-        else:
-            row.append(f"항목 {i+1}")
-
-        # 시리즈 값
-        for series in series_list:
-            values = series.get('values', [])
-            if i < len(values):
-                val = values[i]
-                if isinstance(val, float):
-                    row.append(f"{val:,.2f}")
-                elif val is not None:
-                    row.append(str(val))
-                else:
-                    row.append("")
-            else:
-                row.append("")
-
-        result_parts.append("| " + " | ".join(row) + " |")
-
-    result_parts.append("[/chart]")
-    return "\n".join(result_parts)
-
-
-def format_chart_fallback(chart_info: Dict[str, Any]) -> str:
-    """
-    차트 정보만 출력하는 폴백 포맷터.
-    
-    테이블/이미지 변환 모두 실패 시 사용됩니다.
-    
-    Args:
-        chart_info: 차트 정보 딕셔너리
-        
-    Returns:
-        [chart]...[/chart] 형태의 기본 문자열
-    """
-    result_parts = ["[chart]"]
-    
-    if chart_info and chart_info.get('title'):
-        result_parts.append(f"제목: {chart_info['title']}")
-    if chart_info and chart_info.get('chart_type'):
-        result_parts.append(f"유형: {chart_info['chart_type']}")
-    
-    result_parts.append("[/chart]")
-    return "\n".join(result_parts)
diff --git a/contextifier/core/processor/excel_helper/excel_chart_parser.py b/contextifier/core/processor/excel_helper/excel_chart_parser.py
deleted file mode 100644
index 217d819..0000000
--- a/contextifier/core/processor/excel_helper/excel_chart_parser.py
+++ /dev/null
@@ -1,381 +0,0 @@
-"""
-Excel 차트 OOXML 파싱 모듈
-
-XLSX 파일의 차트 XML을 파싱하여 데이터를 추출합니다.
-"""
-
-import logging
-import re
-import zipfile
-import xml.etree.ElementTree as ET
-from typing import Any, Dict, List, Optional
-
-from contextifier.core.processor.excel_helper.excel_chart_constants import CHART_TYPE_MAP, CHART_NAMESPACES
-
-logger = logging.getLogger("document-processor")
-
-
-def extract_charts_from_xlsx(file_path: str) -> List[Dict[str, Any]]:
-    """
-    XLSX 파일에서 차트 데이터를 추출합니다.
-
-    XLSX 차트는 xl/charts/ 폴더에 chart*.xml 파일로 저장됩니다.
-
-    Args:
-        file_path: XLSX 파일 경로
-
-    Returns:
-        차트 정보 딕셔너리 리스트
-    """
-    charts = []
-
-    try:
-        with zipfile.ZipFile(file_path, 'r') as zf:
-            for name in zf.namelist():
-                if name.startswith('xl/charts/chart') and name.endswith('.xml'):
-                    try:
-                        chart_xml = zf.read(name)
-                        chart_info = parse_ooxml_chart_xml(chart_xml)
-                        if chart_info:
-                            charts.append(chart_info)
-                    except Exception as e:
-                        logger.debug(f"Error parsing chart {name}: {e}")
-
-        logger.info(f"Extracted {len(charts)} charts from XLSX")
-
-    except Exception as e:
-        logger.warning(f"Error extracting charts from XLSX: {e}")
-
-    return charts
-
-
-def parse_ooxml_chart_xml(chart_xml: bytes) -> Optional[Dict[str, Any]]:
-    """
-    OOXML 차트 XML을 파싱하여 차트 데이터를 추출합니다.
-
-    Args:
-        chart_xml: 차트 XML 바이트
-
-    Returns:
-        차트 데이터 딕셔너리
-    """
-    try:
-        ns = CHART_NAMESPACES
-
-        try:
-            root = ET.fromstring(chart_xml)
-        except ET.ParseError:
-            try:
-                chart_str = chart_xml.decode('utf-8-sig', errors='ignore')
-                root = ET.fromstring(chart_str)
-            except:
-                return None
-
-        chart_info = {
-            'type': 'ooxml',
-            'chart_type': None,
-            'title': None,
-            'series': [],
-            'categories': []
-        }
-
-        # chart 요소 찾기
-        chart_elem = root.find('.//c:chart', ns)
-        if chart_elem is None:
-            chart_elem = root.find('.//{http://schemas.openxmlformats.org/drawingml/2006/chart}chart')
-
-        if chart_elem is None:
-            if root.tag.endswith('}chart') or root.tag == 'chart':
-                chart_elem = root
-            else:
-                return None
-
-        # 차트 제목 추출
-        title_elem = chart_elem.find('.//c:title//c:tx//c:rich//a:t', ns)
-        if title_elem is not None and title_elem.text:
-            chart_info['title'] = title_elem.text.strip()
-        else:
-            title_elem = chart_elem.find('.//{http://schemas.openxmlformats.org/drawingml/2006/chart}tx//{http://schemas.openxmlformats.org/drawingml/2006/main}t')
-            if title_elem is not None and title_elem.text:
-                chart_info['title'] = title_elem.text.strip()
-
-        # 차트 유형 및 시리즈 데이터 추출
-        plot_area = chart_elem.find('.//c:plotArea', ns)
-        if plot_area is None:
-            plot_area = chart_elem.find('.//{http://schemas.openxmlformats.org/drawingml/2006/chart}plotArea')
-
-        if plot_area is not None:
-            for chart_tag, chart_name in CHART_TYPE_MAP.items():
-                elem = plot_area.find(f'.//c:{chart_tag}', ns)
-                if elem is None:
-                    elem = plot_area.find(f'.//{{{ns["c"]}}}{chart_tag}')
-                if elem is not None:
-                    chart_info['chart_type'] = chart_name
-                    _extract_chart_series(elem, chart_info, ns)
-                    break
-
-        return chart_info if chart_info['series'] else None
-
-    except Exception as e:
-        logger.debug(f"Error parsing OOXML chart: {e}")
-        return None
-
-
-def _extract_chart_series(chart_type_elem, chart_info: Dict[str, Any], ns: Dict[str, str]):
-    """
-    차트 요소에서 시리즈 데이터를 추출합니다.
-    
-    Args:
-        chart_type_elem: 차트 타입 XML 요소
-        chart_info: 차트 정보 딕셔너리 (수정됨)
-        ns: XML 네임스페이스 딕셔너리
-    """
-    ns_c = ns.get('c', 'http://schemas.openxmlformats.org/drawingml/2006/chart')
-
-    series_elements = chart_type_elem.findall('.//c:ser', ns)
-    if not series_elements:
-        series_elements = chart_type_elem.findall(f'.//{{{ns_c}}}ser')
-
-    categories_extracted = False
-
-    for ser_elem in series_elements:
-        series_data = {
-            'name': None,
-            'values': [],
-        }
-
-        # 시리즈 이름 추출
-        tx_elem = ser_elem.find('.//c:tx//c:v', ns)
-        if tx_elem is None:
-            tx_elem = ser_elem.find(f'.//{{{ns_c}}}tx//{{{ns_c}}}v')
-        if tx_elem is not None and tx_elem.text:
-            series_data['name'] = tx_elem.text.strip()
-        else:
-            str_ref = ser_elem.find('.//c:tx//c:strRef//c:strCache//c:pt//c:v', ns)
-            if str_ref is None:
-                str_ref = ser_elem.find(f'.//{{{ns_c}}}tx//{{{ns_c}}}strRef//{{{ns_c}}}strCache//{{{ns_c}}}pt//{{{ns_c}}}v')
-            if str_ref is not None and str_ref.text:
-                series_data['name'] = str_ref.text.strip()
-
-        # 카테고리 레이블 추출
-        if not categories_extracted:
-            _extract_categories(ser_elem, chart_info, ns, ns_c)
-            categories_extracted = True
-
-        # 값 추출
-        _extract_values(ser_elem, series_data, ns, ns_c)
-
-        if series_data['values']:
-            chart_info['series'].append(series_data)
-
-
-def _extract_categories(ser_elem, chart_info: Dict[str, Any], ns: Dict[str, str], ns_c: str):
-    """
-    시리즈 요소에서 카테고리 레이블을 추출합니다.
-    """
-    cat_elem = ser_elem.find('.//c:cat', ns)
-    if cat_elem is None:
-        cat_elem = ser_elem.find(f'.//{{{ns_c}}}cat')
-
-    if cat_elem is None:
-        return
-
-    # strCache에서 추출
-    str_cache = cat_elem.find('.//c:strCache', ns)
-    if str_cache is None:
-        str_cache = cat_elem.find(f'.//{{{ns_c}}}strCache')
-
-    if str_cache is not None:
-        pts = str_cache.findall('.//c:pt', ns)
-        if not pts:
-            pts = str_cache.findall(f'.//{{{ns_c}}}pt')
-
-        for pt in sorted(pts, key=lambda x: int(x.get('idx', 0))):
-            v_elem = pt.find('c:v', ns)
-            if v_elem is None:
-                v_elem = pt.find(f'{{{ns_c}}}v')
-            if v_elem is not None and v_elem.text:
-                chart_info['categories'].append(v_elem.text.strip())
-
-    # numCache에서 추출 (폴백)
-    if not chart_info['categories']:
-        num_cache = cat_elem.find('.//c:numCache', ns)
-        if num_cache is None:
-            num_cache = cat_elem.find(f'.//{{{ns_c}}}numCache')
-
-        if num_cache is not None:
-            pts = num_cache.findall('.//c:pt', ns)
-            if not pts:
-                pts = num_cache.findall(f'.//{{{ns_c}}}pt')
-
-            for pt in sorted(pts, key=lambda x: int(x.get('idx', 0))):
-                v_elem = pt.find('c:v', ns)
-                if v_elem is None:
-                    v_elem = pt.find(f'{{{ns_c}}}v')
-                if v_elem is not None and v_elem.text:
-                    chart_info['categories'].append(v_elem.text.strip())
-
-
-def _extract_values(ser_elem, series_data: Dict[str, Any], ns: Dict[str, str], ns_c: str):
-    """
-    시리즈 요소에서 값을 추출합니다.
-    """
-    # val 요소에서 추출
-    val_elem = ser_elem.find('.//c:val', ns)
-    if val_elem is None:
-        val_elem = ser_elem.find(f'.//{{{ns_c}}}val')
-
-    if val_elem is not None:
-        _extract_num_cache_values(val_elem, series_data, ns, ns_c)
-
-    # yVal 확인 (scatter/bubble 차트용)
-    if not series_data['values']:
-        yval_elem = ser_elem.find('.//c:yVal', ns)
-        if yval_elem is None:
-            yval_elem = ser_elem.find(f'.//{{{ns_c}}}yVal')
-
-        if yval_elem is not None:
-            _extract_num_cache_values(yval_elem, series_data, ns, ns_c)
-
-
-def _extract_num_cache_values(val_elem, series_data: Dict[str, Any], ns: Dict[str, str], ns_c: str):
-    """
-    numCache에서 숫자 값을 추출합니다.
-    """
-    num_cache = val_elem.find('.//c:numCache', ns)
-    if num_cache is None:
-        num_cache = val_elem.find(f'.//{{{ns_c}}}numCache')
-
-    if num_cache is not None:
-        pts = num_cache.findall('.//c:pt', ns)
-        if not pts:
-            pts = num_cache.findall(f'.//{{{ns_c}}}pt')
-
-        for pt in sorted(pts, key=lambda x: int(x.get('idx', 0))):
-            v_elem = pt.find('c:v', ns)
-            if v_elem is None:
-                v_elem = pt.find(f'{{{ns_c}}}v')
-            if v_elem is not None and v_elem.text:
-                try:
-                    series_data['values'].append(float(v_elem.text))
-                except ValueError:
-                    series_data['values'].append(v_elem.text)
-
-
-def extract_chart_info_basic(chart, ws) -> str:
-    """
-    차트 정보를 추출합니다 (openpyxl 객체에서 기본 추출).
-    OOXML 파싱 실패 시 폴백으로 사용됩니다.
-    
-    Args:
-        chart: openpyxl Chart 객체
-        ws: openpyxl Worksheet 객체
-        
-    Returns:
-        [chart]...[/chart] 형태의 문자열
-    """
-    try:
-        result_parts = ["[chart]"]
-
-        # 차트 타입
-        chart_type = type(chart).__name__
-        result_parts.append(f"유형: {chart_type}")
-
-        # 차트 제목
-        if chart.title:
-            title_text = _extract_chart_title(chart.title)
-            if title_text:
-                result_parts.append(f"제목: {title_text}")
-
-        # 시리즈 데이터
-        if hasattr(chart, 'series'):
-            for i, series in enumerate(chart.series):
-                series_info = _extract_series_info(series, ws, i)
-                if series_info:
-                    result_parts.append(series_info)
-
-        result_parts.append("[/chart]")
-        return "\n".join(result_parts)
-
-    except Exception as e:
-        logger.debug(f"Error extracting chart info: {e}")
-        return "[chart][/chart]"
-
-
-def _extract_chart_title(title_obj) -> str:
-    """
-    차트 제목을 추출합니다.
-    """
-    try:
-        if hasattr(title_obj, 'tx') and title_obj.tx:
-            if hasattr(title_obj.tx, 'rich') and title_obj.tx.rich:
-                # RichText에서 텍스트 추출
-                texts = []
-                if hasattr(title_obj.tx.rich, 'p'):
-                    for p in title_obj.tx.rich.p:
-                        if hasattr(p, 'r'):
-                            for r in p.r:
-                                if hasattr(r, 't') and r.t:
-                                    texts.append(r.t)
-                return "".join(texts)
-        return ""
-    except Exception:
-        return ""
-
-
-def _extract_series_info(series, ws, index: int) -> str:
-    """
-    차트 시리즈 정보를 추출합니다.
-    """
-    try:
-        parts = [f"시리즈 {index + 1}:"]
-
-        # 시리즈 이름
-        if hasattr(series, 'title') and series.title:
-            if hasattr(series.title, 'strRef') and series.title.strRef:
-                ref = series.title.strRef.f
-                parts.append(f"  이름 참조: {ref}")
-
-        # 데이터 참조
-        if hasattr(series, 'val') and series.val:
-            if hasattr(series.val, 'numRef') and series.val.numRef:
-                ref = series.val.numRef.f
-                parts.append(f"  데이터 참조: {ref}")
-
-                # 실제 데이터 값 추출 시도
-                try:
-                    values = _get_range_values(ws, ref)
-                    if values:
-                        parts.append(f"  데이터: {values[:10]}{'...' if len(values) > 10 else ''}")
-                except Exception:
-                    pass
-
-        return "\n".join(parts) if len(parts) > 1 else ""
-
-    except Exception:
-        return ""
-
-
-def _get_range_values(ws, ref: str) -> List[Any]:
-    """
-    셀 범위 참조에서 값을 추출합니다.
-    """
-    try:
-        # 참조 형식: 'Sheet1'!$A$1:$A$10 또는 Sheet1!A1:A10
-        match = re.search(r"['\"]?([^'\"!]+)['\"]?!\$?([A-Z]+)\$?(\d+):\$?([A-Z]+)\$?(\d+)", ref)
-        if not match:
-            return []
-
-        _, start_col, start_row, end_col, end_row = match.groups()
-        start_row, end_row = int(start_row), int(end_row)
-
-        values = []
-        for row in range(start_row, end_row + 1):
-            cell = ws[f"{start_col}{row}"]
-            if cell.value is not None:
-                values.append(cell.value)
-
-        return values
-
-    except Exception:
-        return []
diff --git a/contextifier/core/processor/excel_helper/excel_chart_processor.py b/contextifier/core/processor/excel_helper/excel_chart_processor.py
deleted file mode 100644
index 471123d..0000000
--- a/contextifier/core/processor/excel_helper/excel_chart_processor.py
+++ /dev/null
@@ -1,62 +0,0 @@
-"""
-Excel Chart Processing Module
-
-Extracts chart data from Excel files and formats using ChartProcessor.
-Output format:
-    {chart_prefix}
-    {chart_title}
-    {chart_type}
-    <table>...</table>
-    {chart_suffix}
-"""
-
-import logging
-from typing import Any, Callable, Dict, List, Optional, Set, TYPE_CHECKING
-
-if TYPE_CHECKING:
-    from contextifier.core.functions.chart_processor import ChartProcessor
-
-logger = logging.getLogger("document-processor")
-
-
-def process_chart(
-    chart_info: Dict[str, Any],
-    chart_processor: "ChartProcessor"
-) -> str:
-    """
-    Process a chart using ChartProcessor.
-
-    Args:
-        chart_info: Chart information dictionary containing:
-            - chart_type: Type of chart (bar, line, pie, etc.)
-            - title: Chart title (optional)
-            - categories: List of category labels
-            - series: List of series dicts with 'name' and 'values'
-        chart_processor: ChartProcessor instance for formatting
-
-    Returns:
-        Formatted chart string with tags
-    """
-    if not chart_info:
-        return chart_processor.format_chart_fallback(chart_type="Unknown")
-    
-    chart_type = chart_info.get('chart_type', 'Unknown')
-    title = chart_info.get('title')
-    categories = chart_info.get('categories', [])
-    series_list = chart_info.get('series', [])
-    
-    # Check if we have valid data
-    has_data = series_list and any(len(s.get('values', [])) > 0 for s in series_list)
-    
-    if has_data:
-        result = chart_processor.format_chart_data(
-            chart_type=chart_type,
-            series_data=series_list,
-            title=title,
-            categories=categories
-        )
-        logger.debug(f"Chart '{title}' converted to table successfully")
-        return result
-    
-    # Fallback: no data available
-    return chart_processor.format_chart_fallback(chart_type=chart_type, title=title)
diff --git a/contextifier/core/processor/excel_helper/excel_chart_renderer.py b/contextifier/core/processor/excel_helper/excel_chart_renderer.py
deleted file mode 100644
index aaecd5a..0000000
--- a/contextifier/core/processor/excel_helper/excel_chart_renderer.py
+++ /dev/null
@@ -1,155 +0,0 @@
-"""
-Excel 차트 이미지 렌더링 모듈
-
-matplotlib를 사용하여 차트 데이터를 이미지로 렌더링합니다.
-테이블 변환 실패 시 폴백으로 사용됩니다.
-"""
-
-import io
-import logging
-from typing import Any, Dict, Optional, Set
-
-import matplotlib
-matplotlib.use('Agg')  # Non-GUI backend
-import matplotlib.pyplot as plt
-
-logger = logging.getLogger("document-processor")
-
-
-def render_chart_to_image(
-    chart_info: Dict[str, Any],
-    processed_images: Set[str] = None,
-    upload_func=None
-) -> Optional[str]:
-    """
-    차트 데이터를 matplotlib로 이미지로 렌더링하고 로컬에 저장합니다.
-
-    테이블 변환 실패 시 폴백으로 사용됩니다.
-
-    Args:
-        chart_info: 차트 정보 딕셔너리
-        processed_images: 이미 처리된 이미지 해시 집합
-        upload_func: 이미지 업로드 함수
-
-    Returns:
-        [chart] 태그로 감싸진 이미지 참조 문자열, 실패 시 None
-    """
-    if not chart_info:
-        return None
-
-    try:
-        categories = chart_info.get('categories', [])
-        series_list = chart_info.get('series', [])
-        chart_type = chart_info.get('chart_type', '')
-        title = chart_info.get('title', '차트')
-
-        if not series_list:
-            return None
-
-        # 그래프 생성
-        fig, ax = plt.subplots(figsize=(10, 6))
-
-        # 차트 유형에 따른 렌더링
-        if '파이' in chart_type or 'pie' in chart_type.lower():
-            _render_pie_chart(ax, series_list, categories)
-        elif '선' in chart_type or 'line' in chart_type.lower():
-            _render_line_chart(ax, series_list, categories)
-        elif '영역' in chart_type or 'area' in chart_type.lower():
-            _render_area_chart(ax, series_list, categories)
-        else:
-            # 기본: 막대 차트
-            _render_bar_chart(ax, series_list, categories)
-
-        ax.set_title(title)
-        plt.tight_layout()
-
-        # 이미지를 바이트로 저장
-        img_buffer = io.BytesIO()
-        fig.savefig(img_buffer, format='png', dpi=150, bbox_inches='tight')
-        img_buffer.seek(0)
-        img_data = img_buffer.getvalue()
-        plt.close(fig)
-
-        # 로컬에 저장
-        if processed_images is None:
-            processed_images = set()
-
-        if upload_func:
-            image_tag = upload_func(img_data)
-
-            if image_tag:
-                result_parts = ["[chart]"]
-                if title:
-                    result_parts.append(f"제목: {title}")
-                if chart_type:
-                    result_parts.append(f"유형: {chart_type}")
-                result_parts.append(image_tag)
-                result_parts.append("[/chart]")
-                return "\n".join(result_parts)
-
-        return None
-
-    except Exception as e:
-        logger.warning(f"Error rendering chart to image: {e}")
-        if 'fig' in locals():
-            plt.close(fig)
-        return None
-
-
-def _render_pie_chart(ax, series_list, categories):
-    """
-    파이 차트를 렌더링합니다.
-    """
-    if series_list and series_list[0].get('values'):
-        values = series_list[0]['values']
-        labels = categories if categories else [f'항목 {i+1}' for i in range(len(values))]
-        ax.pie(values, labels=labels, autopct='%1.1f%%')
-
-
-def _render_line_chart(ax, series_list, categories):
-    """
-    선 차트를 렌더링합니다.
-    """
-    x = categories if categories else list(range(len(series_list[0].get('values', []))))
-    for series in series_list:
-        values = series.get('values', [])
-        name = series.get('name', '시리즈')
-        if values:
-            ax.plot(x[:len(values)], values, marker='o', label=name)
-    ax.legend()
-    ax.grid(True, alpha=0.3)
-
-
-def _render_area_chart(ax, series_list, categories):
-    """
-    영역 차트를 렌더링합니다.
-    """
-    for series in series_list:
-        values = series.get('values', [])
-        name = series.get('name', '시리즈')
-        if values:
-            ax.fill_between(range(len(values)), values, alpha=0.5, label=name)
-            ax.plot(values, marker='o', label=f'{name} (선)')
-    ax.legend()
-    ax.grid(True, alpha=0.3)
-
-
-def _render_bar_chart(ax, series_list, categories):
-    """
-    막대 차트를 렌더링합니다.
-    """
-    x = categories if categories else [f'항목 {i+1}' for i in range(len(series_list[0].get('values', [])))]
-    width = 0.8 / len(series_list) if len(series_list) > 1 else 0.6
-
-    for idx, series in enumerate(series_list):
-        values = series.get('values', [])
-        name = series.get('name', f'시리즈 {idx+1}')
-        if values:
-            offset = (idx - len(series_list) / 2 + 0.5) * width
-            positions = [i + offset for i in range(len(values))]
-            ax.bar(positions, values, width=width, label=name)
-
-    ax.set_xticks(range(len(x)))
-    ax.set_xticklabels(x, rotation=45, ha='right')
-    ax.legend()
-    ax.grid(True, alpha=0.3, axis='y')
diff --git a/contextifier/core/processor/excel_helper/excel_file_converter.py b/contextifier/core/processor/excel_helper/excel_file_converter.py
new file mode 100644
index 0000000..c845917
--- /dev/null
+++ b/contextifier/core/processor/excel_helper/excel_file_converter.py
@@ -0,0 +1,156 @@
+# libs/core/processor/excel_helper/excel_file_converter.py
+"""
+ExcelFileConverter - Excel file format converter
+
+Converts binary Excel data to Workbook object.
+Supports both XLSX and XLS formats.
+"""
+from io import BytesIO
+from typing import Any, Optional, BinaryIO, Union
+
+from contextifier.core.functions.file_converter import BaseFileConverter
+
+
+class XLSXFileConverter(BaseFileConverter):
+    """
+    XLSX file converter using openpyxl.
+    
+    Converts binary XLSX data to openpyxl Workbook object.
+    """
+    
+    # ZIP magic number (XLSX is a ZIP file)
+    ZIP_MAGIC = b'PK\x03\x04'
+    
+    def convert(
+        self,
+        file_data: bytes,
+        file_stream: Optional[BinaryIO] = None,
+        data_only: bool = True,
+        **kwargs
+    ) -> Any:
+        """
+        Convert binary XLSX data to Workbook object.
+        
+        Args:
+            file_data: Raw binary XLSX data
+            file_stream: Optional file stream
+            data_only: If True, return calculated values instead of formulas
+            **kwargs: Additional options
+            
+        Returns:
+            openpyxl.Workbook object
+        """
+        from openpyxl import load_workbook
+        
+        stream = file_stream if file_stream is not None else BytesIO(file_data)
+        stream.seek(0)
+        return load_workbook(stream, data_only=data_only)
+    
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "XLSX Workbook"
+    
+    def validate(self, file_data: bytes) -> bool:
+        """Validate if data is a valid XLSX."""
+        if not file_data or len(file_data) < 4:
+            return False
+        return file_data[:4] == self.ZIP_MAGIC
+
+
+class XLSFileConverter(BaseFileConverter):
+    """
+    XLS file converter using xlrd.
+    
+    Converts binary XLS data to xlrd Workbook object.
+    """
+    
+    # OLE magic number (XLS is an OLE file)
+    OLE_MAGIC = b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1'
+    
+    def convert(
+        self,
+        file_data: bytes,
+        file_stream: Optional[BinaryIO] = None,
+        **kwargs
+    ) -> Any:
+        """
+        Convert binary XLS data to xlrd Workbook object.
+        
+        Args:
+            file_data: Raw binary XLS data
+            file_stream: Optional file stream (not used)
+            **kwargs: Additional options
+            
+        Returns:
+            xlrd.Book object
+        """
+        import xlrd
+        return xlrd.open_workbook(file_contents=file_data)
+    
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "XLS Workbook"
+    
+    def validate(self, file_data: bytes) -> bool:
+        """Validate if data is a valid XLS."""
+        if not file_data or len(file_data) < 8:
+            return False
+        return file_data[:8] == self.OLE_MAGIC
+
+
+class ExcelFileConverter(BaseFileConverter):
+    """
+    Unified Excel file converter.
+    
+    Auto-detects format (XLSX/XLS) and uses appropriate converter.
+    """
+    
+    def __init__(self):
+        """Initialize with both converters."""
+        self._xlsx_converter = XLSXFileConverter()
+        self._xls_converter = XLSFileConverter()
+        self._used_converter: Optional[BaseFileConverter] = None
+    
+    def convert(
+        self,
+        file_data: bytes,
+        file_stream: Optional[BinaryIO] = None,
+        extension: Optional[str] = None,
+        **kwargs
+    ) -> Any:
+        """
+        Convert binary Excel data to Workbook object.
+        
+        Args:
+            file_data: Raw binary Excel data
+            file_stream: Optional file stream
+            extension: File extension hint ('xlsx' or 'xls')
+            **kwargs: Additional options
+            
+        Returns:
+            Workbook object (openpyxl or xlrd)
+        """
+        # Determine format from extension or magic number
+        if extension:
+            ext = extension.lower().lstrip('.')
+            if ext == 'xlsx':
+                self._used_converter = self._xlsx_converter
+            elif ext == 'xls':
+                self._used_converter = self._xls_converter
+        else:
+            # Auto-detect
+            if self._xlsx_converter.validate(file_data):
+                self._used_converter = self._xlsx_converter
+            elif self._xls_converter.validate(file_data):
+                self._used_converter = self._xls_converter
+            else:
+                # Default to XLSX
+                self._used_converter = self._xlsx_converter
+        
+        return self._used_converter.convert(file_data, file_stream, **kwargs)
+    
+    def get_format_name(self) -> str:
+        """Return format name based on detected type."""
+        if self._used_converter:
+            return self._used_converter.get_format_name()
+        return "Excel Workbook"
diff --git a/contextifier/core/processor/excel_helper/excel_image.py b/contextifier/core/processor/excel_helper/excel_image.py
deleted file mode 100644
index ea59829..0000000
--- a/contextifier/core/processor/excel_helper/excel_image.py
+++ /dev/null
@@ -1,88 +0,0 @@
-"""
-XLSX 이미지 추출 모듈
-
-Excel 파일에서 임베디드 이미지를 추출합니다.
-"""
-
-import os
-import logging
-import zipfile
-from typing import Dict, List, Tuple
-
-logger = logging.getLogger("document-processor")
-
-# PIL에서 지원하는 이미지 형식만 추출
-SUPPORTED_IMAGE_EXTENSIONS = ['.png', '.jpg', '.jpeg', '.gif', '.bmp', '.tiff']
-
-# 지원하지 않는 형식 (EMF, WMF 등)
-UNSUPPORTED_IMAGE_EXTENSIONS = ['.emf', '.wmf']
-
-
-def extract_images_from_xlsx(file_path: str) -> Dict[str, bytes]:
-    """
-    XLSX 파일에서 이미지를 추출합니다 (ZIP 직접 접근).
-    EMF, WMF 등 PIL에서 지원하지 않는 형식은 제외합니다.
-
-    Args:
-        file_path: XLSX 파일 경로
-
-    Returns:
-        {이미지 경로: 이미지 바이트} 딕셔너리
-    """
-    images = {}
-
-    try:
-        with zipfile.ZipFile(file_path, 'r') as zf:
-            for name in zf.namelist():
-                if name.startswith('xl/media/'):
-                    # 이미지 파일
-                    ext = os.path.splitext(name)[1].lower()
-                    if ext in SUPPORTED_IMAGE_EXTENSIONS:
-                        images[name] = zf.read(name)
-                    elif ext in UNSUPPORTED_IMAGE_EXTENSIONS:
-                        logger.debug(f"Skipping unsupported image format: {name}")
-
-        return images
-
-    except Exception as e:
-        logger.warning(f"Error extracting images from XLSX: {e}")
-        return {}
-
-
-def get_sheet_images(ws, images_data: Dict[str, bytes], file_path: str) -> List[Tuple[bytes, str]]:
-    """
-    시트에 포함된 이미지를 가져옵니다.
-
-    Args:
-        ws: openpyxl Worksheet 객체
-        images_data: extract_images_from_xlsx에서 추출한 이미지 딕셔너리
-        file_path: XLSX 파일 경로
-
-    Returns:
-        [(이미지 바이트, 앵커 정보)] 리스트
-    """
-    result = []
-
-    try:
-        # openpyxl의 _images 속성 사용
-        if hasattr(ws, '_images') and ws._images:
-            for img in ws._images:
-                try:
-                    # 이미지 데이터 접근
-                    if hasattr(img, '_data') and callable(img._data):
-                        img_data = img._data()
-                        anchor = str(img.anchor) if hasattr(img, 'anchor') else ""
-                        result.append((img_data, anchor))
-                except Exception as e:
-                    logger.debug(f"Error accessing image data: {e}")
-
-        # 직접 추출한 이미지 사용 (위에서 못 가져온 경우)
-        if not result and images_data:
-            for name, data in images_data.items():
-                result.append((data, name))
-
-        return result
-
-    except Exception as e:
-        logger.warning(f"Error getting sheet images: {e}")
-        return []
diff --git a/contextifier/core/processor/excel_helper/excel_image_processor.py b/contextifier/core/processor/excel_helper/excel_image_processor.py
new file mode 100644
index 0000000..1f8f816
--- /dev/null
+++ b/contextifier/core/processor/excel_helper/excel_image_processor.py
@@ -0,0 +1,316 @@
+# contextifier/core/processor/excel_helper/excel_image_processor.py
+"""
+Excel Image Processor
+
+Provides Excel-specific image processing that inherits from ImageProcessor.
+Handles embedded images, chart images, and drawing images for XLSX/XLS files.
+
+This class consolidates all Excel image extraction logic including:
+- XLSX ZIP-based image extraction
+- openpyxl Image object processing
+- Sheet image extraction
+"""
+import os
+import logging
+import zipfile
+from typing import Any, Dict, List, Optional, Set, Tuple, TYPE_CHECKING
+
+from contextifier.core.functions.img_processor import ImageProcessor
+from contextifier.core.functions.storage_backend import BaseStorageBackend
+
+if TYPE_CHECKING:
+    from openpyxl.workbook import Workbook
+    from openpyxl.worksheet.worksheet import Worksheet
+    from openpyxl.drawing.image import Image
+
+logger = logging.getLogger("contextify.image_processor.excel")
+
+# Image formats supported by PIL
+SUPPORTED_IMAGE_EXTENSIONS = ['.png', '.jpg', '.jpeg', '.gif', '.bmp', '.tiff']
+
+# Unsupported formats (EMF, WMF, etc.)
+UNSUPPORTED_IMAGE_EXTENSIONS = ['.emf', '.wmf']
+
+
+class ExcelImageProcessor(ImageProcessor):
+    """
+    Excel-specific image processor.
+    
+    Inherits from ImageProcessor and provides Excel-specific processing.
+    
+    Handles:
+    - Embedded worksheet images
+    - Drawing images
+    - Chart images
+    - Shape images
+    
+    Example:
+        processor = ExcelImageProcessor()
+        
+        # Process worksheet image
+        tag = processor.process_image(image_data, sheet_name="Sheet1")
+        
+        # Process from openpyxl Image object
+        tag = processor.process_openpyxl_image(image_obj)
+    """
+    
+    def __init__(
+        self,
+        directory_path: str = "temp/images",
+        tag_prefix: str = "[Image:",
+        tag_suffix: str = "]",
+        storage_backend: Optional[BaseStorageBackend] = None,
+    ):
+        """
+        Initialize ExcelImageProcessor.
+        
+        Args:
+            directory_path: Image save directory
+            tag_prefix: Tag prefix for image references
+            tag_suffix: Tag suffix for image references
+            storage_backend: Storage backend for saving images
+        """
+        super().__init__(
+            directory_path=directory_path,
+            tag_prefix=tag_prefix,
+            tag_suffix=tag_suffix,
+            storage_backend=storage_backend,
+        )
+    
+    def process_image(
+        self,
+        image_data: bytes,
+        sheet_name: Optional[str] = None,
+        image_index: Optional[int] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process and save Excel image data.
+        
+        Args:
+            image_data: Raw image binary data
+            sheet_name: Source sheet name (for naming)
+            image_index: Image index in sheet (for naming)
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        custom_name = None
+        if sheet_name is not None:
+            safe_sheet = sheet_name.replace(' ', '_').replace('/', '_')
+            if image_index is not None:
+                custom_name = f"excel_{safe_sheet}_{image_index}"
+            else:
+                custom_name = f"excel_{safe_sheet}"
+        
+        return self.save_image(image_data, custom_name=custom_name)
+    
+    def process_openpyxl_image(
+        self,
+        image: "Image",
+        sheet_name: Optional[str] = None,
+        image_index: Optional[int] = None,
+    ) -> Optional[str]:
+        """
+        Process openpyxl Image object.
+        
+        Args:
+            image: openpyxl Image object
+            sheet_name: Source sheet name
+            image_index: Image index
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        try:
+            # Get image data from openpyxl Image
+            if hasattr(image, '_data'):
+                image_data = image._data()
+            elif hasattr(image, 'ref'):
+                # For embedded images with reference
+                image_data = image.ref.blob
+            else:
+                self._logger.warning("Cannot extract data from openpyxl Image")
+                return None
+            
+            return self.process_image(
+                image_data,
+                sheet_name=sheet_name,
+                image_index=image_index
+            )
+            
+        except Exception as e:
+            self._logger.warning(f"Failed to process openpyxl image: {e}")
+            return None
+    
+    def process_embedded_image(
+        self,
+        image_data: bytes,
+        image_name: Optional[str] = None,
+        sheet_name: Optional[str] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process embedded Excel image.
+        
+        Args:
+            image_data: Image binary data
+            image_name: Original image filename
+            sheet_name: Source sheet name
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        custom_name = image_name
+        if custom_name is None and sheet_name is not None:
+            safe_sheet = sheet_name.replace(' ', '_').replace('/', '_')
+            custom_name = f"excel_embed_{safe_sheet}"
+        
+        return self.save_image(image_data, custom_name=custom_name)
+    
+    def process_chart_image(
+        self,
+        chart_data: bytes,
+        chart_name: Optional[str] = None,
+        sheet_name: Optional[str] = None,
+        chart_index: Optional[int] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process Excel chart as image.
+        
+        Args:
+            chart_data: Chart image binary data
+            chart_name: Chart title/name
+            sheet_name: Source sheet name
+            chart_index: Chart index in sheet
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        custom_name = chart_name
+        if custom_name is None:
+            if sheet_name is not None:
+                safe_sheet = sheet_name.replace(' ', '_').replace('/', '_')
+                if chart_index is not None:
+                    custom_name = f"excel_chart_{safe_sheet}_{chart_index}"
+                else:
+                    custom_name = f"excel_chart_{safe_sheet}"
+            elif chart_index is not None:
+                custom_name = f"excel_chart_{chart_index}"
+        
+        return self.save_image(chart_data, custom_name=custom_name)
+    
+    def extract_images_from_xlsx(
+        self,
+        file_path: str,
+    ) -> Dict[str, bytes]:
+        """
+        Extract images from XLSX file (direct ZIP access).
+        Excludes formats not supported by PIL (EMF, WMF, etc.).
+
+        Args:
+            file_path: Path to XLSX file
+
+        Returns:
+            {image_path: image_bytes} dictionary
+        """
+        images = {}
+
+        try:
+            with zipfile.ZipFile(file_path, 'r') as zf:
+                for name in zf.namelist():
+                    if name.startswith('xl/media/'):
+                        ext = os.path.splitext(name)[1].lower()
+                        if ext in SUPPORTED_IMAGE_EXTENSIONS:
+                            images[name] = zf.read(name)
+                        elif ext in UNSUPPORTED_IMAGE_EXTENSIONS:
+                            logger.debug(f"Skipping unsupported image format: {name}")
+
+            return images
+
+        except Exception as e:
+            logger.warning(f"Error extracting images from XLSX: {e}")
+            return {}
+    
+    def get_sheet_images(
+        self,
+        ws: "Worksheet",
+        images_data: Dict[str, bytes],
+        file_path: str,
+    ) -> List[Tuple[bytes, str]]:
+        """
+        Get images contained in a sheet.
+
+        Args:
+            ws: openpyxl Worksheet object
+            images_data: Image dictionary from extract_images_from_xlsx
+            file_path: Path to XLSX file
+
+        Returns:
+            [(image_bytes, anchor_info)] list
+        """
+        result = []
+
+        try:
+            # Use openpyxl's _images attribute
+            if hasattr(ws, '_images') and ws._images:
+                for img in ws._images:
+                    try:
+                        if hasattr(img, '_data') and callable(img._data):
+                            img_data = img._data()
+                            anchor = str(img.anchor) if hasattr(img, 'anchor') else ""
+                            result.append((img_data, anchor))
+                    except Exception as e:
+                        logger.debug(f"Error accessing image data: {e}")
+
+            # Use directly extracted images (if not obtained above)
+            if not result and images_data:
+                for name, data in images_data.items():
+                    result.append((data, name))
+
+            return result
+
+        except Exception as e:
+            logger.warning(f"Error getting sheet images: {e}")
+            return []
+    
+    def process_sheet_images(
+        self,
+        ws: "Worksheet",
+        sheet_name: str,
+        images_data: Optional[Dict[str, bytes]] = None,
+        file_path: Optional[str] = None,
+    ) -> str:
+        """
+        Process all images in a sheet.
+
+        Args:
+            ws: openpyxl Worksheet object
+            sheet_name: Sheet name
+            images_data: Pre-extracted image dictionary
+            file_path: Path to XLSX file
+
+        Returns:
+            Joined image tag strings
+        """
+        results = []
+        
+        if images_data is None and file_path:
+            images_data = self.extract_images_from_xlsx(file_path)
+        
+        images_data = images_data or {}
+        sheet_images = self.get_sheet_images(ws, images_data, file_path or "")
+        
+        for idx, (img_data, anchor) in enumerate(sheet_images):
+            tag = self.process_image(img_data, sheet_name=sheet_name, image_index=idx)
+            if tag:
+                results.append(tag)
+        
+        return "\n\n".join(results)
+
+
+__all__ = ["ExcelImageProcessor"]
diff --git a/contextifier/core/processor/excel_helper/excel_metadata.py b/contextifier/core/processor/excel_helper/excel_metadata.py
index 2627e47..f8375e4 100644
--- a/contextifier/core/processor/excel_helper/excel_metadata.py
+++ b/contextifier/core/processor/excel_helper/excel_metadata.py
@@ -1,129 +1,145 @@
+# contextifier/core/processor/excel_helper/excel_metadata.py
 """
-XLSX/XLS 메타데이터 추출 모듈
+Excel Metadata Extraction Module
 
-Excel 문서에서 메타데이터(제목, 작성자, 주제, 키워드, 작성일, 수정일 등)를 추출합니다.
+Provides ExcelMetadataExtractor classes for extracting metadata from Excel documents.
+Supports both XLSX (openpyxl) and XLS (xlrd) formats.
+Implements BaseMetadataExtractor interface.
 """
-
 import logging
-from datetime import datetime
-from typing import Any, Dict
+from typing import Any, Optional
+
+from contextifier.core.functions.metadata_extractor import (
+    BaseMetadataExtractor,
+    DocumentMetadata,
+)
 
 logger = logging.getLogger("document-processor")
 
 
-def extract_xlsx_metadata(wb) -> Dict[str, Any]:
+class XLSXMetadataExtractor(BaseMetadataExtractor):
     """
-    XLSX 문서에서 메타데이터를 추출합니다.
-
-    openpyxl의 properties를 통해 다음 정보를 추출합니다:
-    - 제목 (title)
-    - 주제 (subject)
-    - 작성자 (creator)
-    - 키워드 (keywords)
-    - 설명 (description)
-    - 마지막 수정자 (lastModifiedBy)
-    - 작성일 (created)
-    - 수정일 (modified)
-
-    Args:
-        wb: openpyxl Workbook 객체
-
-    Returns:
-        메타데이터 딕셔너리
+    XLSX Metadata Extractor.
+    
+    Extracts metadata from openpyxl Workbook objects.
+    
+    Supported fields:
+    - title, subject, author (creator), keywords
+    - comments (description), last_saved_by
+    - create_time, last_saved_time
+    
+    Usage:
+        extractor = XLSXMetadataExtractor()
+        metadata = extractor.extract(workbook)
+        text = extractor.format(metadata)
     """
-    metadata = {}
-
-    try:
-        props = wb.properties
-
-        if props.title:
-            metadata['title'] = props.title.strip()
-        if props.subject:
-            metadata['subject'] = props.subject.strip()
-        if props.creator:
-            metadata['author'] = props.creator.strip()
-        if props.keywords:
-            metadata['keywords'] = props.keywords.strip()
-        if props.description:
-            metadata['comments'] = props.description.strip()
-        if props.lastModifiedBy:
-            metadata['last_saved_by'] = props.lastModifiedBy.strip()
-        if props.created:
-            metadata['create_time'] = props.created
-        if props.modified:
-            metadata['last_saved_time'] = props.modified
-
-        logger.debug(f"Extracted XLSX metadata: {list(metadata.keys())}")
-
-    except Exception as e:
-        logger.warning(f"Failed to extract XLSX metadata: {e}")
-
-    return metadata
-
-
-def extract_xls_metadata(wb) -> Dict[str, Any]:
+    
+    def extract(self, source: Any) -> DocumentMetadata:
+        """
+        Extract metadata from XLSX document.
+        
+        Args:
+            source: openpyxl Workbook object
+            
+        Returns:
+            DocumentMetadata instance containing extracted metadata.
+        """
+        try:
+            props = source.properties
+            
+            return DocumentMetadata(
+                title=self._get_stripped(props.title),
+                subject=self._get_stripped(props.subject),
+                author=self._get_stripped(props.creator),
+                keywords=self._get_stripped(props.keywords),
+                comments=self._get_stripped(props.description),
+                last_saved_by=self._get_stripped(props.lastModifiedBy),
+                create_time=props.created,
+                last_saved_time=props.modified,
+            )
+        except Exception as e:
+            self.logger.warning(f"Failed to extract XLSX metadata: {e}")
+            return DocumentMetadata()
+    
+    def _get_stripped(self, value: Optional[str]) -> Optional[str]:
+        """Return stripped string value, or None if empty."""
+        return value.strip() if value else None
+
+
+class XLSMetadataExtractor(BaseMetadataExtractor):
     """
-    XLS 문서에서 메타데이터를 추출합니다.
-
-    xlrd는 제한된 메타데이터만 지원합니다.
-
-    Args:
-        wb: xlrd Workbook 객체
-
-    Returns:
-        메타데이터 딕셔너리
+    XLS Metadata Extractor.
+    
+    Extracts metadata from xlrd Workbook objects.
+    Note: xlrd has limited metadata support.
+    
+    Supported fields:
+    - author (user_name)
+    
+    Usage:
+        extractor = XLSMetadataExtractor()
+        metadata = extractor.extract(workbook)
+        text = extractor.format(metadata)
     """
-    metadata = {}
-
-    try:
-        # xlrd는 제한된 메타데이터 접근만 가능
-        if hasattr(wb, 'user_name') and wb.user_name:
-            metadata['author'] = wb.user_name
-
-        logger.debug(f"Extracted XLS metadata: {list(metadata.keys())}")
-
-    except Exception as e:
-        logger.warning(f"Failed to extract XLS metadata: {e}")
-
-    return metadata
-
-
-def format_metadata(metadata: Dict[str, Any]) -> str:
+    
+    def extract(self, source: Any) -> DocumentMetadata:
+        """
+        Extract metadata from XLS document.
+        
+        Args:
+            source: xlrd Workbook object
+            
+        Returns:
+            DocumentMetadata instance containing extracted metadata.
+        """
+        try:
+            author = None
+            if hasattr(source, 'user_name') and source.user_name:
+                author = source.user_name
+            
+            return DocumentMetadata(author=author)
+        except Exception as e:
+            self.logger.warning(f"Failed to extract XLS metadata: {e}")
+            return DocumentMetadata()
+
+
+class ExcelMetadataExtractor(BaseMetadataExtractor):
     """
-    메타데이터 딕셔너리를 읽기 쉬운 문자열로 변환합니다.
-
-    Args:
-        metadata: 메타데이터 딕셔너리
-
-    Returns:
-        포맷된 메타데이터 문자열
+    Unified Excel Metadata Extractor.
+    
+    Selects appropriate extractor based on file format.
+    
+    Usage:
+        extractor = ExcelMetadataExtractor()
+        # For XLSX
+        metadata = extractor.extract(xlsx_workbook, file_type='xlsx')
+        # For XLS
+        metadata = extractor.extract(xls_workbook, file_type='xls')
     """
-    if not metadata:
-        return ""
-
-    lines = ["<Document-Metadata>"]
-
-    field_names = {
-        'title': '제목',
-        'subject': '주제',
-        'author': '작성자',
-        'keywords': '키워드',
-        'comments': '설명',
-        'last_saved_by': '마지막 저장자',
-        'create_time': '작성일',
-        'last_saved_time': '수정일',
-    }
-
-    for key, label in field_names.items():
-        if key in metadata and metadata[key]:
-            value = metadata[key]
-
-            # datetime 객체 포맷팅
-            if isinstance(value, datetime):
-                value = value.strftime('%Y-%m-%d %H:%M:%S')
-
-            lines.append(f"  {label}: {value}")
-
-    lines.append("</Document-Metadata>")
-
-    return "\n".join(lines)
+    
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+        self._xlsx_extractor = XLSXMetadataExtractor(**kwargs)
+        self._xls_extractor = XLSMetadataExtractor(**kwargs)
+    
+    def extract(self, source: Any, file_type: str = 'xlsx') -> DocumentMetadata:
+        """
+        Extract metadata from Excel document.
+        
+        Args:
+            source: openpyxl Workbook or xlrd Workbook object
+            file_type: File format ('xlsx' or 'xls')
+            
+        Returns:
+            DocumentMetadata instance containing extracted metadata.
+        """
+        if file_type.lower() == 'xls':
+            return self._xls_extractor.extract(source)
+        return self._xlsx_extractor.extract(source)
+
+
+__all__ = [
+    'ExcelMetadataExtractor',
+    'XLSXMetadataExtractor',
+    'XLSMetadataExtractor',
+]
diff --git a/contextifier/core/processor/excel_helper/excel_preprocessor.py b/contextifier/core/processor/excel_helper/excel_preprocessor.py
new file mode 100644
index 0000000..1ddead0
--- /dev/null
+++ b/contextifier/core/processor/excel_helper/excel_preprocessor.py
@@ -0,0 +1,83 @@
+# contextifier/core/processor/excel_helper/excel_preprocessor.py
+"""
+Excel Preprocessor - Process Excel workbook after conversion.
+
+Processing Pipeline Position:
+    1. ExcelFileConverter.convert() → openpyxl.Workbook or xlrd.Book
+    2. ExcelPreprocessor.preprocess() → PreprocessedData (THIS STEP)
+    3. ExcelMetadataExtractor.extract() → DocumentMetadata
+    4. Content extraction (sheets, cells, images, charts)
+
+Current Implementation:
+    - Pass-through (Excel uses openpyxl/xlrd objects directly)
+"""
+import logging
+from typing import Any, Dict
+
+from contextifier.core.functions.preprocessor import (
+    BasePreprocessor,
+    PreprocessedData,
+)
+
+logger = logging.getLogger("contextify.excel.preprocessor")
+
+
+class ExcelPreprocessor(BasePreprocessor):
+    """
+    Excel Workbook Preprocessor.
+
+    Currently a pass-through implementation as Excel processing
+    is handled during the content extraction phase using openpyxl/xlrd.
+    """
+
+    def preprocess(
+        self,
+        converted_data: Any,
+        **kwargs
+    ) -> PreprocessedData:
+        """
+        Preprocess the converted Excel workbook.
+
+        Args:
+            converted_data: openpyxl.Workbook or xlrd.Book from ExcelFileConverter
+            **kwargs: Additional options
+
+        Returns:
+            PreprocessedData with the workbook and any extracted resources
+        """
+        metadata: Dict[str, Any] = {}
+
+        # Detect workbook type and extract info
+        if hasattr(converted_data, 'sheetnames'):
+            # openpyxl Workbook
+            metadata['format'] = 'xlsx'
+            metadata['sheet_count'] = len(converted_data.sheetnames)
+            metadata['sheet_names'] = converted_data.sheetnames
+        elif hasattr(converted_data, 'sheet_names'):
+            # xlrd Book
+            metadata['format'] = 'xls'
+            metadata['sheet_count'] = converted_data.nsheets
+            metadata['sheet_names'] = converted_data.sheet_names()
+
+        logger.debug("Excel preprocessor: pass-through, metadata=%s", metadata)
+
+        # clean_content is the TRUE SOURCE - contains the Workbook
+        return PreprocessedData(
+            raw_content=converted_data,
+            clean_content=converted_data,  # TRUE SOURCE - openpyxl.Workbook or xlrd.Book
+            encoding="utf-8",
+            extracted_resources={},
+            metadata=metadata,
+        )
+
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "Excel Preprocessor"
+
+    def validate(self, data: Any) -> bool:
+        """Validate if data is an Excel Workbook object."""
+        # openpyxl or xlrd
+        return hasattr(data, 'sheetnames') or hasattr(data, 'sheet_names')
+
+
+__all__ = ['ExcelPreprocessor']
diff --git a/contextifier/core/processor/html_helper/__init__.py b/contextifier/core/processor/html_helper/__init__.py
new file mode 100644
index 0000000..9cf09be
--- /dev/null
+++ b/contextifier/core/processor/html_helper/__init__.py
@@ -0,0 +1,6 @@
+# libs/core/processor/html_helper/__init__.py
+"""HTML helper module for HTML file processing."""
+
+from contextifier.core.processor.html_helper.html_file_converter import HTMLFileConverter
+
+__all__ = ['HTMLFileConverter']
diff --git a/contextifier/core/processor/html_helper/html_file_converter.py b/contextifier/core/processor/html_helper/html_file_converter.py
new file mode 100644
index 0000000..86a63e0
--- /dev/null
+++ b/contextifier/core/processor/html_helper/html_file_converter.py
@@ -0,0 +1,91 @@
+# libs/core/processor/html_helper/html_file_converter.py
+"""
+HTMLFileConverter - HTML file format converter
+
+Converts binary HTML data to BeautifulSoup object.
+"""
+from typing import Any, Optional, BinaryIO
+
+from contextifier.core.functions.file_converter import BaseFileConverter
+
+
+class HTMLFileConverter(BaseFileConverter):
+    """
+    HTML file converter using BeautifulSoup.
+    
+    Converts binary HTML data to BeautifulSoup object.
+    """
+    
+    DEFAULT_ENCODINGS = ['utf-8', 'utf-8-sig', 'cp949', 'euc-kr', 'latin-1']
+    
+    def __init__(self, parser: str = 'html.parser'):
+        """
+        Initialize HTMLFileConverter.
+        
+        Args:
+            parser: BeautifulSoup parser to use
+        """
+        self._parser = parser
+        self._detected_encoding: Optional[str] = None
+    
+    def convert(
+        self,
+        file_data: bytes,
+        file_stream: Optional[BinaryIO] = None,
+        encoding: Optional[str] = None,
+        **kwargs
+    ) -> Any:
+        """
+        Convert binary HTML data to BeautifulSoup object.
+        
+        Args:
+            file_data: Raw binary HTML data
+            file_stream: Ignored
+            encoding: Specific encoding to use
+            **kwargs: Additional options
+            
+        Returns:
+            BeautifulSoup object
+        """
+        from bs4 import BeautifulSoup
+        
+        # Decode to text first
+        text = self._decode(file_data, encoding)
+        return BeautifulSoup(text, self._parser)
+    
+    def _decode(self, file_data: bytes, encoding: Optional[str] = None) -> str:
+        """Decode bytes to string."""
+        if encoding:
+            try:
+                self._detected_encoding = encoding
+                return file_data.decode(encoding)
+            except UnicodeDecodeError:
+                pass
+        
+        for enc in self.DEFAULT_ENCODINGS:
+            try:
+                self._detected_encoding = enc
+                return file_data.decode(enc)
+            except UnicodeDecodeError:
+                continue
+        
+        # Fallback
+        self._detected_encoding = 'utf-8'
+        return file_data.decode('utf-8', errors='replace')
+    
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "HTML Document"
+    
+    def validate(self, file_data: bytes) -> bool:
+        """Validate if data appears to be HTML."""
+        if not file_data:
+            return False
+        
+        header = file_data[:100].lower()
+        return (
+            b'<!doctype' in header or
+            b'<html' in header or
+            b'<head' in header or
+            b'<body' in header
+        )
diff --git a/contextifier/core/processor/html_helper/html_preprocessor.py b/contextifier/core/processor/html_helper/html_preprocessor.py
new file mode 100644
index 0000000..c58a675
--- /dev/null
+++ b/contextifier/core/processor/html_helper/html_preprocessor.py
@@ -0,0 +1,74 @@
+# contextifier/core/processor/html_helper/html_preprocessor.py
+"""
+HTML Preprocessor - Process HTML content after conversion.
+
+Processing Pipeline Position:
+    1. HTMLFileConverter.convert() → BeautifulSoup
+    2. HTMLPreprocessor.preprocess() → PreprocessedData (THIS STEP)
+    3. Content extraction
+
+Current Implementation:
+    - Pass-through (HTML uses BeautifulSoup object directly)
+"""
+import logging
+from typing import Any, Dict
+
+from contextifier.core.functions.preprocessor import (
+    BasePreprocessor,
+    PreprocessedData,
+)
+
+logger = logging.getLogger("contextify.html.preprocessor")
+
+
+class HTMLPreprocessor(BasePreprocessor):
+    """
+    HTML Content Preprocessor.
+
+    Currently a pass-through implementation as HTML processing
+    is handled using BeautifulSoup.
+    """
+
+    def preprocess(
+        self,
+        converted_data: Any,
+        **kwargs
+    ) -> PreprocessedData:
+        """
+        Preprocess the converted HTML content.
+
+        Args:
+            converted_data: BeautifulSoup object from HTMLFileConverter
+            **kwargs: Additional options
+
+        Returns:
+            PreprocessedData with the BeautifulSoup object
+        """
+        metadata: Dict[str, Any] = {}
+
+        if hasattr(converted_data, 'find_all'):
+            # Count some basic elements
+            metadata['table_count'] = len(converted_data.find_all('table'))
+            metadata['image_count'] = len(converted_data.find_all('img'))
+            metadata['link_count'] = len(converted_data.find_all('a'))
+
+        logger.debug("HTML preprocessor: pass-through, metadata=%s", metadata)
+
+        return PreprocessedData(
+            raw_content=b"",
+            clean_content=b"",
+            encoding="utf-8",
+            extracted_resources={"soup": converted_data},
+            metadata=metadata,
+        )
+
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "HTML Preprocessor"
+
+    def validate(self, data: Any) -> bool:
+        """Validate if data is a BeautifulSoup object."""
+        return hasattr(data, 'find_all') and hasattr(data, 'get_text')
+
+
+__all__ = ['HTMLPreprocessor']
diff --git a/contextifier/core/processor/hwp_handler.py b/contextifier/core/processor/hwp_handler.py
index aed0944..a33d6fa 100644
--- a/contextifier/core/processor/hwp_handler.py
+++ b/contextifier/core/processor/hwp_handler.py
@@ -24,12 +24,6 @@
     HWPTAG_TABLE,
     HwpRecord,
     decompress_section,
-    extract_metadata,
-    format_metadata,
-    find_bindata_stream,
-    extract_bindata_index,
-    extract_and_upload_image,
-    process_images_from_bindata,
     parse_doc_info,
     parse_table,
     extract_text_from_stream_raw,
@@ -38,6 +32,8 @@
     check_file_signature,
 )
 from contextifier.core.processor.hwp_helper.hwp_chart_extractor import HWPChartExtractor
+from contextifier.core.processor.hwp_helper.hwp_metadata import HWPMetadataExtractor
+from contextifier.core.processor.hwp_helper.hwp_image_processor import HWPImageProcessor
 
 if TYPE_CHECKING:
     from contextifier.core.document_processor import CurrentFile
@@ -48,11 +44,34 @@
 
 class HWPHandler(BaseHandler):
     """HWP 5.0 OLE Format File Processing Handler Class"""
-    
+
+    def _create_file_converter(self):
+        """Create HWP-specific file converter."""
+        from contextifier.core.processor.hwp_helper.hwp_file_converter import HWPFileConverter
+        return HWPFileConverter()
+
+    def _create_preprocessor(self):
+        """Create HWP-specific preprocessor."""
+        from contextifier.core.processor.hwp_helper.hwp_preprocessor import HWPPreprocessor
+        return HWPPreprocessor()
+
     def _create_chart_extractor(self) -> BaseChartExtractor:
         """Create HWP-specific chart extractor."""
         return HWPChartExtractor(self._chart_processor)
-    
+
+    def _create_metadata_extractor(self):
+        """Create HWP-specific metadata extractor."""
+        return HWPMetadataExtractor()
+
+    def _create_format_image_processor(self):
+        """Create HWP-specific image processor."""
+        return HWPImageProcessor(
+            directory_path=self._image_processor.config.directory_path,
+            tag_prefix=self._image_processor.config.tag_prefix,
+            tag_suffix=self._image_processor.config.tag_suffix,
+            storage_backend=self._image_processor.storage_backend,
+        )
+
     def extract_text(
         self,
         current_file: "CurrentFile",
@@ -61,69 +80,82 @@ def extract_text(
     ) -> str:
         """
         Extract text from HWP file.
-        
+
         Args:
             current_file: CurrentFile dict containing file info and binary data
             extract_metadata: Whether to extract metadata
             **kwargs: Additional options
-            
+
         Returns:
             Extracted text
         """
         file_path = current_file.get("file_path", "unknown")
         file_data = current_file.get("file_data", b"")
-        
-        # Check if it's an OLE file using bytes
-        if not self._is_ole_file(file_data):
+
+        # Check if it's an OLE file using file_converter.validate()
+        if not self.file_converter.validate(file_data):
             return self._handle_non_ole_file(current_file, extract_metadata)
-        
+
         text_content = []
         processed_images: Set[str] = set()
-        
+
         try:
-            # Open OLE file from stream
+            # Step 1: Open OLE file using file_converter
             file_stream = self.get_file_stream(current_file)
-            
+
             # Pre-extract all charts using ChartExtractor
             chart_data_list = self.chart_extractor.extract_all_from_file(file_stream)
-            
-            file_stream.seek(0)
-            
-            with olefile.OleFileIO(file_stream) as ole:
+
+            # Convert binary to OLE object using file_converter
+            ole = self.file_converter.convert(file_data, file_stream)
+
+            # Step 2: Preprocess - may transform ole in the future
+            preprocessed = self.preprocess(ole)
+            ole = preprocessed.clean_content  # TRUE SOURCE
+
+            try:
                 if extract_metadata:
                     metadata_text = self._extract_metadata(ole)
                     if metadata_text:
                         text_content.append(metadata_text)
                         text_content.append("")
-                
+
                 bin_data_map = self._parse_docinfo(ole)
                 section_texts = self._extract_body_text(ole, bin_data_map, processed_images)
                 text_content.extend(section_texts)
-                
-                image_text = process_images_from_bindata(ole, processed_images=processed_images, image_processor=self.image_processor)
+
+                # Use format_image_processor directly
+                image_processor = self.format_image_processor
+                if hasattr(image_processor, 'process_images_from_bindata'):
+                    image_text = image_processor.process_images_from_bindata(ole, processed_images=processed_images)
+                else:
+                    image_text = ""
                 if image_text:
                     text_content.append("\n\n=== Extracted Images (Not Inline) ===\n")
                     text_content.append(image_text)
-                
+
                 # Add pre-extracted charts
                 for chart_data in chart_data_list:
                     chart_text = self._format_chart_data(chart_data)
                     if chart_text:
                         text_content.append(chart_text)
-        
+            finally:
+                # Close OLE object using file_converter
+                self.file_converter.close(ole)
+
         except Exception as e:
             self.logger.error(f"Error processing HWP file: {e}")
             return f"Error processing HWP file: {str(e)}"
-        
+
         return "\n".join(text_content)
-    
+
     def _format_chart_data(self, chart_data: "ChartData") -> str:
         """Format ChartData using ChartProcessor."""
         from contextifier.core.functions.chart_extractor import ChartData
-        
+
         if not isinstance(chart_data, ChartData):
             return ""
-        
+
         if chart_data.has_data():
             return self.chart_processor.format_chart_data(
                 chart_type=chart_data.chart_type,
@@ -136,68 +168,61 @@ def _format_chart_data(self, chart_data: "ChartData") -> str:
                 chart_type=chart_data.chart_type,
                 title=chart_data.title
             )
-    
-    def _is_ole_file(self, file_data: bytes) -> bool:
-        """Check if file data is OLE format."""
-        # OLE file signature: D0 CF 11 E0 A1 B1 1A E1
-        ole_signature = b'\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1'
-        return file_data[:8] == ole_signature
-    
+
     def _handle_non_ole_file(self, current_file: "CurrentFile", extract_metadata: bool) -> str:
         """Handle non-OLE file."""
         file_path = current_file.get("file_path", "unknown")
         file_data = current_file.get("file_data", b"")
-        
+
         # Check if it's a ZIP file (HWPX)
         if file_data[:4] == b'PK\x03\x04':
             self.logger.info(f"File {file_path} is a Zip file. Processing as HWPX.")
-            from contextifier.core.processor.hwps_handler import HWPXHandler
-            hwpx_handler = HWPXHandler(config=self.config, image_processor=self.image_processor)
+            from contextifier.core.processor.hwpx_handler import HWPXHandler
+            hwpx_handler = HWPXHandler(config=self.config, image_processor=self.format_image_processor)
             return hwpx_handler.extract_text(current_file, extract_metadata=extract_metadata)
-        
+
         # Check HWP 3.0 format
         if b'HWP Document File' in file_data[:32]:
             return "[HWP 3.0 Format - Not Supported]"
-        
+
         return self._process_corrupted_hwp(current_file)
-    
+
     def _extract_metadata(self, ole: olefile.OleFileIO) -> str:
         """Extract metadata from OLE file."""
-        metadata = extract_metadata(ole)
-        return format_metadata(metadata)
-    
+        return self.extract_and_format_metadata(ole)
+
     def _parse_docinfo(self, ole: olefile.OleFileIO) -> Dict:
         """Parse DocInfo stream."""
         bin_data_by_storage_id, bin_data_list = parse_doc_info(ole)
         return {'by_storage_id': bin_data_by_storage_id, 'by_index': bin_data_list}
-    
+
     def _extract_body_text(self, ole: olefile.OleFileIO, bin_data_map: Dict, processed_images: Set[str]) -> List[str]:
         """Extract text from BodyText sections."""
         text_content = []
-        
+
         body_text_sections = [
             entry for entry in ole.listdir()
             if entry[0] == "BodyText" and entry[1].startswith("Section")
         ]
         body_text_sections.sort(key=lambda x: int(x[1].replace("Section", "")))
-        
+
         for section in body_text_sections:
             stream = ole.openstream(section)
             data = stream.read()
-            
+
             decompressed_data, success = decompress_section(data)
             if not success:
                 continue
-            
+
             section_text = self._parse_section(decompressed_data, ole, bin_data_map, processed_images)
-            
+
             if not section_text or not section_text.strip():
                 section_text = extract_text_from_stream_raw(decompressed_data)
-            
+
             text_content.append(section_text)
-        
+
         return text_content
-    
+
     def _parse_section(self, data: bytes, ole=None, bin_data_map=None, processed_images=None) -> str:
         """Parse a section."""
         try:
@@ -206,49 +231,49 @@ def _parse_section(self, data: bytes, ole=None, bin_data_map=None, processed_ima
         except Exception as e:
             self.logger.error(f"Error parsing HWP section: {e}")
             return ""
-    
+
     def _traverse_tree(self, record: 'HwpRecord', ole=None, bin_data_map=None, processed_images=None) -> str:
         """Traverse record tree."""
         parts = []
-        
+
         if record.tag_id == HWPTAG_PARA_HEADER:
             return self._process_paragraph(record, ole, bin_data_map, processed_images)
-        
+
         if record.tag_id == HWPTAG_CTRL_HEADER:
             result = self._process_control(record, ole, bin_data_map, processed_images)
             if result:
                 return result
-        
+
         if record.tag_id == HWPTAG_SHAPE_COMPONENT_PICTURE:
             result = self._process_picture(record, ole, bin_data_map, processed_images)
             if result:
                 return result
-        
+
         if record.tag_id == HWPTAG_PARA_TEXT:
             text = record.get_text().replace('\x0b', '')
             if text:
                 parts.append(text)
-        
+
         for child in record.children:
             child_text = self._traverse_tree(child, ole, bin_data_map, processed_images)
             if child_text:
                 parts.append(child_text)
-        
+
         if record.tag_id == HWPTAG_PARA_HEADER:
             parts.append("\n")
-        
+
         return "".join(parts)
-    
+
     def _process_paragraph(self, record: 'HwpRecord', ole, bin_data_map, processed_images) -> str:
         """Process PARA_HEADER record."""
         parts = []
-        
+
         text_rec = next((c for c in record.children if c.tag_id == HWPTAG_PARA_TEXT), None)
         text_content = text_rec.get_text() if text_rec else ""
-        
+
         control_tags = [HWPTAG_CTRL_HEADER, HWPTAG_TABLE]
         controls = [c for c in record.children if c.tag_id in control_tags]
-        
+
         if '\x0b' in text_content:
             segments = text_content.split('\x0b')
             for i, segment in enumerate(segments):
@@ -261,25 +286,25 @@ def _process_paragraph(self, record: 'HwpRecord', ole, bin_data_map, processed_i
             parts.append(text_content)
             for c in controls:
                 parts.append(self._traverse_tree(c, ole, bin_data_map, processed_images))
-        
+
         parts.append("\n")
         return "".join(parts)
-    
+
     def _process_control(self, record: 'HwpRecord', ole, bin_data_map, processed_images) -> Optional[str]:
         """Process CTRL_HEADER record."""
         if len(record.payload) < 4:
             return None
-        
+
         ctrl_id = record.payload[:4][::-1]
-        
+
         if ctrl_id == b'tbl ':
             return parse_table(record, self._traverse_tree, ole, bin_data_map, processed_images)
-        
+
         if ctrl_id == b'gso ':
             return self._process_gso(record, ole, bin_data_map, processed_images)
-        
+
         return None
-    
+
     def _process_gso(self, record: 'HwpRecord', ole, bin_data_map, processed_images) -> Optional[str]:
         """Process GSO (Graphic Shape Object) record."""
         def find_pictures(rec):
@@ -289,7 +314,7 @@ def find_pictures(rec):
             for child in rec.children:
                 results.extend(find_pictures(child))
             return results
-        
+
         pictures = find_pictures(record)
         if pictures:
             image_parts = []
@@ -299,74 +324,77 @@ def find_pictures(rec):
                     image_parts.append(img_result)
             if image_parts:
                 return "".join(image_parts)
-        
+
         return None
-    
+
     def _process_picture(self, record: 'HwpRecord', ole, bin_data_map, processed_images) -> Optional[str]:
         """Process SHAPE_COMPONENT_PICTURE record."""
         if not bin_data_map or not ole:
             return None
-        
+
         bin_data_list = bin_data_map.get('by_index', [])
         if not bin_data_list:
             return None
-        
-        bindata_index = extract_bindata_index(record.payload, len(bin_data_list))
-        
+
+        image_processor = self.format_image_processor
+
+        # Use image processor methods directly
+        bindata_index = image_processor.extract_bindata_index(record.payload, len(bin_data_list))
+
         if bindata_index and 0 < bindata_index <= len(bin_data_list):
             storage_id, ext = bin_data_list[bindata_index - 1]
             if storage_id > 0:
-                target_stream = find_bindata_stream(ole, storage_id, ext)
+                target_stream = image_processor.find_bindata_stream(ole, storage_id, ext)
                 if target_stream:
-                    return extract_and_upload_image(ole, target_stream, processed_images, image_processor=self.image_processor)
-        
+                    return image_processor.extract_and_save_image(ole, target_stream, processed_images)
+
         if len(bin_data_list) == 1:
             storage_id, ext = bin_data_list[0]
             if storage_id > 0:
-                target_stream = find_bindata_stream(ole, storage_id, ext)
+                target_stream = image_processor.find_bindata_stream(ole, storage_id, ext)
                 if target_stream:
-                    return extract_and_upload_image(ole, target_stream, processed_images, image_processor=self.image_processor)
-        
+                    return image_processor.extract_and_save_image(ole, target_stream, processed_images)
+
         return None
-    
+
     def _process_corrupted_hwp(self, current_file: "CurrentFile") -> str:
         """Attempt forensic recovery of corrupted HWP file."""
         file_path = current_file.get("file_path", "unknown")
         file_data = current_file.get("file_data", b"")
-        
+
         self.logger.info(f"Starting forensic recovery for: {file_path}")
         text_content = []
-        
+
         try:
             raw_data = file_data
-            
+
             file_type = check_file_signature(raw_data)
             if file_type == "HWP3.0":
                 return "[HWP 3.0 Format - Not Supported]"
-            
+
             zlib_chunks = find_zlib_streams(raw_data, min_size=50)
-            
+
             for offset, decompressed in zlib_chunks:
                 parsed_text = self._parse_section(decompressed)
                 if not parsed_text or not parsed_text.strip():
                     parsed_text = extract_text_from_stream_raw(decompressed)
                 if parsed_text and len(parsed_text.strip()) > 0:
                     text_content.append(parsed_text)
-            
+
             if not text_content:
                 plain_text = extract_text_from_stream_raw(raw_data)
                 if plain_text and len(plain_text) > 100:
                     text_content.append(plain_text)
-            
-            image_text = recover_images_from_raw(raw_data, image_processor=self.image_processor)
+
+            image_text = recover_images_from_raw(raw_data, image_processor=self.format_image_processor)
             if image_text:
                 text_content.append(f"\n\n=== Recovered Images ===\n{image_text}")
-        
+
         except Exception as e:
             self.logger.error(f"Forensic recovery failed: {e}")
             return f"Forensic recovery failed: {str(e)}"
-        
+
         if not text_content:
             return "[Forensic Recovery: No text found]"
-        
+
         return "\n".join(text_content)
diff --git a/contextifier/core/processor/hwp_helper/__init__.py b/contextifier/core/processor/hwp_helper/__init__.py
index 519af8a..6e44835 100644
--- a/contextifier/core/processor/hwp_helper/__init__.py
+++ b/contextifier/core/processor/hwp_helper/__init__.py
@@ -45,22 +45,12 @@
 
 # Metadata
 from contextifier.core.processor.hwp_helper.hwp_metadata import (
-    extract_metadata,
+    HWPMetadataExtractor,
     parse_hwp_summary_information,
-    format_metadata,
-    MetadataHelper,
 )
 
-# Image
-from contextifier.core.processor.hwp_helper.hwp_image import (
-    try_decompress_image,
-    save_image_to_local,
-    find_bindata_stream,
-    extract_bindata_index,
-    extract_and_upload_image,
-    process_images_from_bindata,
-    ImageHelper,
-)
+# Image Processor (replaces hwp_image.py utility functions)
+from contextifier.core.processor.hwp_helper.hwp_image_processor import HWPImageProcessor
 
 # Chart Extractor
 from contextifier.core.processor.hwp_helper.hwp_chart_extractor import HWPChartExtractor
@@ -109,18 +99,10 @@
     'decompress_stream',
     'decompress_section',
     # Metadata
-    'extract_metadata',
+    'HWPMetadataExtractor',
     'parse_hwp_summary_information',
-    'format_metadata',
-    'MetadataHelper',
-    # Image
-    'try_decompress_image',
-    'save_image_to_local',
-    'find_bindata_stream',
-    'extract_bindata_index',
-    'extract_and_upload_image',
-    'process_images_from_bindata',
-    'ImageHelper',
+    # Image Processor
+    'HWPImageProcessor',
     # Chart Extractor
     'HWPChartExtractor',
     # DocInfo
diff --git a/contextifier/core/processor/hwp_helper/hwp_file_converter.py b/contextifier/core/processor/hwp_helper/hwp_file_converter.py
new file mode 100644
index 0000000..ed5059f
--- /dev/null
+++ b/contextifier/core/processor/hwp_helper/hwp_file_converter.py
@@ -0,0 +1,59 @@
+# libs/core/processor/hwp_helper/hwp_file_converter.py
+"""
+HWPFileConverter - HWP file format converter
+
+Converts binary HWP data to OLE file object.
+"""
+from io import BytesIO
+from typing import Any, Optional, BinaryIO
+
+from contextifier.core.functions.file_converter import BaseFileConverter
+
+
+class HWPFileConverter(BaseFileConverter):
+    """
+    HWP file converter using olefile.
+    
+    Converts binary HWP (OLE format) data to OleFileIO object.
+    """
+    
+    # OLE magic number
+    OLE_MAGIC = b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1'
+    
+    def convert(
+        self,
+        file_data: bytes,
+        file_stream: Optional[BinaryIO] = None,
+        **kwargs
+    ) -> Any:
+        """
+        Convert binary HWP data to OleFileIO object.
+        
+        Args:
+            file_data: Raw binary HWP data
+            file_stream: Optional file stream
+            **kwargs: Additional options
+            
+        Returns:
+            olefile.OleFileIO object
+        """
+        import olefile
+        
+        stream = file_stream if file_stream is not None else BytesIO(file_data)
+        stream.seek(0)
+        return olefile.OleFileIO(stream)
+    
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "HWP Document (OLE)"
+    
+    def validate(self, file_data: bytes) -> bool:
+        """Validate if data is a valid OLE file."""
+        if not file_data or len(file_data) < 8:
+            return False
+        return file_data[:8] == self.OLE_MAGIC
+    
+    def close(self, converted_object: Any) -> None:
+        """Close the OLE file."""
+        if converted_object is not None and hasattr(converted_object, 'close'):
+            converted_object.close()
diff --git a/contextifier/core/processor/hwp_helper/hwp_image.py b/contextifier/core/processor/hwp_helper/hwp_image.py
deleted file mode 100644
index 3b08cc5..0000000
--- a/contextifier/core/processor/hwp_helper/hwp_image.py
+++ /dev/null
@@ -1,293 +0,0 @@
-# libs/core/processor/hwp_helper/hwp_image.py
-"""
-HWP 이미지 처리 유틸리티
-
-HWP 5.0 OLE 파일에서 이미지를 추출하고 로컬에 저장합니다.
-- try_decompress_image: zlib 압축 이미지 해제
-- find_bindata_stream: BinData 스트림 경로 찾기
-- extract_bindata_index: SHAPE_COMPONENT_PICTURE에서 BinData 인덱스 추출
-- extract_and_upload_image: 이미지 추출 및 로컬 저장
-- process_images_from_bindata: BinData에서 모든 이미지 추출
-"""
-import io
-import os
-import zlib
-import struct
-import logging
-import traceback
-from typing import Optional, List, Dict, Set
-
-import olefile
-from PIL import Image
-
-from contextifier.core.functions.img_processor import ImageProcessor
-
-logger = logging.getLogger("document-processor")
-
-
-def try_decompress_image(data: bytes) -> bytes:
-    """
-    HWP 이미지 데이터 압축 해제를 시도합니다.
-
-    HWP 파일에서 이미지가 zlib으로 압축되어 있을 수 있으므로,
-    다양한 전략으로 압축 해제를 시도합니다.
-
-    Args:
-        data: 원본 이미지 데이터 (압축되었을 수 있음)
-
-    Returns:
-        압축 해제된 이미지 데이터 (또는 원본 데이터)
-    """
-    # 1. zlib 헤더가 있으면 zlib 압축 해제 시도
-    if data.startswith(b'\x78'):
-        try:
-            return zlib.decompress(data)
-        except Exception:
-            pass
-
-    # 2. 이미 유효한 이미지인지 확인
-    try:
-        with Image.open(io.BytesIO(data)) as img:
-            img.verify()
-        return data  # 유효한 이미지
-    except Exception:
-        pass
-
-    # 3. raw deflate (헤더 없음) 시도
-    try:
-        return zlib.decompress(data, -15)
-    except Exception:
-        pass
-
-    return data
-
-
-def save_image_to_local(
-    image_data: bytes,
-    image_processor: ImageProcessor
-) -> Optional[str]:
-    """
-    이미지를 로컬에 저장합니다.
-
-    Args:
-        image_data: 이미지 바이너리 데이터
-        image_processor: 이미지 프로세서 인스턴스
-
-    Returns:
-        이미지 태그 문자열 또는 None
-    """
-    return image_processor.save_image(image_data)
-
-
-def find_bindata_stream(ole: olefile.OleFileIO, storage_id: int, ext: str) -> Optional[List[str]]:
-    """
-    OLE 컨테이너에서 storage_id와 확장자로 BinData 스트림을 찾습니다.
-
-    Args:
-        ole: OLE 파일 객체
-        storage_id: BinData 스토리지 ID
-        ext: 파일 확장자
-
-    Returns:
-        찾은 스트림 경로 또는 None
-    """
-    ole_dirs = ole.listdir()
-
-    candidates = [
-        f"BinData/BIN{storage_id:04X}.{ext}",
-        f"BinData/BIN{storage_id:04x}.{ext}",
-        f"BinData/Bin{storage_id:04X}.{ext}",
-        f"BinData/Bin{storage_id:04x}.{ext}",
-        f"BinData/BIN{storage_id:04X}.{ext.lower()}",
-        f"BinData/BIN{storage_id:04x}.{ext.lower()}",
-    ]
-
-    # 패턴 매칭으로 찾기
-    for entry in ole_dirs:
-        if entry[0] == "BinData" and len(entry) > 1:
-            fname = entry[1].lower()
-            expected_patterns = [
-                f"bin{storage_id:04x}",
-                f"bin{storage_id:04X}",
-            ]
-            for pattern in expected_patterns:
-                if pattern.lower() in fname.lower():
-                    logger.debug(f"Found stream by pattern match: {entry}")
-                    return entry
-
-    # 정확한 경로 매칭
-    for candidate in candidates:
-        candidate_parts = candidate.split('/')
-        if candidate_parts in ole_dirs:
-            return candidate_parts
-
-    # 대소문자 무시 매칭
-    for entry in ole_dirs:
-        if entry[0] == "BinData" and len(entry) > 1:
-            fname = entry[1]
-            for candidate in candidates:
-                if fname.lower() == candidate.split('/')[-1].lower():
-                    return entry
-
-    return None
-
-
-def extract_bindata_index(payload: bytes, bin_data_list_len: int) -> Optional[int]:
-    """
-    SHAPE_COMPONENT_PICTURE 레코드 payload에서 BinData 인덱스를 추출합니다.
-
-    여러 HWP 버전 호환을 위해 다양한 오프셋 전략을 시도합니다.
-
-    Args:
-        payload: SHAPE_COMPONENT_PICTURE 레코드의 payload
-        bin_data_list_len: bin_data_list의 길이 (유효 범위 검증용)
-
-    Returns:
-        BinData 인덱스 (1-based) 또는 None
-    """
-    if bin_data_list_len == 0:
-        return None
-
-    bindata_index = None
-
-    # Strategy 1: 오프셋 79 (HWP 5.0.3.x+ 스펙)
-    if len(payload) >= 81:
-        test_id = struct.unpack('<H', payload[79:81])[0]
-        if 0 < test_id <= bin_data_list_len:
-            bindata_index = test_id
-            logger.debug(f"Found BinData index at offset 79: {bindata_index}")
-            return bindata_index
-
-    # Strategy 2: 오프셋 8 (구 버전)
-    if len(payload) >= 10:
-        test_id = struct.unpack('<H', payload[8:10])[0]
-        if 0 < test_id <= bin_data_list_len:
-            bindata_index = test_id
-            logger.debug(f"Found BinData index at offset 8: {bindata_index}")
-            return bindata_index
-
-    # Strategy 3: 일반적인 오프셋 스캔
-    for offset in [4, 6, 10, 12, 14, 16, 18, 20, 40, 44, 48, 52, 56, 60, 64, 68, 72, 76, 80]:
-        if len(payload) >= offset + 2:
-            test_id = struct.unpack('<H', payload[offset:offset+2])[0]
-            if 0 < test_id <= bin_data_list_len:
-                bindata_index = test_id
-                logger.debug(f"Found potential BinData index at offset {offset}: {bindata_index}")
-                return bindata_index
-
-    # Strategy 4: 범위 내 첫 번째 non-zero 2바이트 값 스캔
-    for i in range(0, min(len(payload) - 1, 100), 2):
-        test_id = struct.unpack('<H', payload[i:i+2])[0]
-        if 0 < test_id <= bin_data_list_len:
-            bindata_index = test_id
-            logger.debug(f"Found BinData index by scanning at offset {i}: {bindata_index}")
-            return bindata_index
-
-    return None
-
-
-def extract_and_upload_image(
-    ole: olefile.OleFileIO,
-    target_stream: List[str],
-    processed_images: Optional[Set[str]],
-    image_processor: ImageProcessor
-) -> Optional[str]:
-    """
-    OLE 스트림에서 이미지를 추출하여 로컬에 저장합니다.
-
-    Args:
-        ole: OLE 파일 객체
-        target_stream: 스트림 경로
-        processed_images: 처리된 이미지 경로 집합
-        image_processor: 이미지 프로세서 인스턴스
-
-    Returns:
-        이미지 태그 문자열 또는 None
-    """
-    try:
-        stream = ole.openstream(target_stream)
-        image_data = stream.read()
-        image_data = try_decompress_image(image_data)
-
-        image_tag = save_image_to_local(image_data, image_processor)
-        if image_tag:
-            if processed_images is not None:
-                processed_images.add("/".join(target_stream))
-            logger.info(f"Successfully extracted inline image: {image_tag}")
-            return f"\n{image_tag}\n"
-    except Exception as e:
-        logger.warning(f"Failed to process inline HWP image {target_stream}: {e}")
-        logger.debug(traceback.format_exc())
-
-    return None
-
-
-def process_images_from_bindata(
-    ole: olefile.OleFileIO,
-    processed_images: Optional[Set[str]],
-    image_processor: ImageProcessor
-) -> str:
-    """
-    BinData 스토리지에서 이미지를 추출하여 로컬에 저장합니다.
-
-    Args:
-        ole: OLE 파일 객체
-        processed_images: 이미 처리된 이미지 경로 집합 (스킵용)
-        image_processor: 이미지 프로세서 인스턴스
-
-    Returns:
-        이미지 태그들을 결합한 문자열
-    """
-    results = []
-
-    try:
-        bindata_streams = [
-            entry for entry in ole.listdir()
-            if entry[0] == "BinData"
-        ]
-
-        for stream_path in bindata_streams:
-            if processed_images and "/".join(stream_path) in processed_images:
-                continue
-
-            stream_name = stream_path[-1]
-            ext = os.path.splitext(stream_name)[1].lower()
-            if ext in ['.jpg', '.jpeg', '.png', '.bmp', '.gif']:
-                stream = ole.openstream(stream_path)
-                image_data = stream.read()
-                image_data = try_decompress_image(image_data)
-
-                image_tag = save_image_to_local(image_data, image_processor)
-                if image_tag:
-                    results.append(image_tag)
-
-    except Exception as e:
-        logger.warning(f"Error processing HWP images: {e}")
-
-    return "\n\n".join(results)
-
-
-class ImageHelper:
-    """HWP 이미지 처리 유틸리티"""
-
-    @staticmethod
-    def try_decompress_image(data: bytes) -> bytes:
-        return try_decompress_image(data)
-
-    @staticmethod
-    def save_image_to_local(
-        image_data: bytes,
-        image_processor: ImageProcessor
-    ) -> Optional[str]:
-        return save_image_to_local(image_data, image_processor)
-
-
-__all__ = [
-    'try_decompress_image',
-    'save_image_to_local',
-    'find_bindata_stream',
-    'extract_bindata_index',
-    'extract_and_upload_image',
-    'process_images_from_bindata',
-    'ImageHelper',
-]
diff --git a/contextifier/core/processor/hwp_helper/hwp_image_processor.py b/contextifier/core/processor/hwp_helper/hwp_image_processor.py
new file mode 100644
index 0000000..40c0f5a
--- /dev/null
+++ b/contextifier/core/processor/hwp_helper/hwp_image_processor.py
@@ -0,0 +1,413 @@
+# contextifier/core/processor/hwp_helper/hwp_image_processor.py
+"""
+HWP Image Processor
+
+Provides HWP-specific image processing that inherits from ImageProcessor.
+Handles BinData stream images and embedded images in HWP 5.0 OLE format.
+
+This class consolidates all HWP image extraction logic including:
+- zlib decompression for compressed images
+- BinData stream finding and extraction
+- OLE storage image processing
+"""
+import io
+import os
+import zlib
+import struct
+import logging
+from typing import Any, Dict, List, Optional, Set, TYPE_CHECKING
+
+from PIL import Image
+
+from contextifier.core.functions.img_processor import ImageProcessor
+from contextifier.core.functions.storage_backend import BaseStorageBackend
+
+if TYPE_CHECKING:
+    import olefile
+
+logger = logging.getLogger("contextify.image_processor.hwp")
+
+
+class HWPImageProcessor(ImageProcessor):
+    """
+    HWP-specific image processor.
+    
+    Inherits from ImageProcessor and provides HWP-specific processing.
+    
+    Handles:
+    - BinData stream images
+    - Compressed images (zlib)
+    - Embedded OLE images
+    
+    Example:
+        processor = HWPImageProcessor()
+        
+        # Process BinData image
+        tag = processor.process_image(image_data, bindata_id="BIN0001")
+        
+        # Process from OLE stream
+        tag = processor.process_bindata_stream(ole, stream_path)
+    """
+    
+    def __init__(
+        self,
+        directory_path: str = "temp/images",
+        tag_prefix: str = "[Image:",
+        tag_suffix: str = "]",
+        storage_backend: Optional[BaseStorageBackend] = None,
+    ):
+        """
+        Initialize HWPImageProcessor.
+        
+        Args:
+            directory_path: Image save directory
+            tag_prefix: Tag prefix for image references
+            tag_suffix: Tag suffix for image references
+            storage_backend: Storage backend for saving images
+        """
+        super().__init__(
+            directory_path=directory_path,
+            tag_prefix=tag_prefix,
+            tag_suffix=tag_suffix,
+            storage_backend=storage_backend,
+        )
+    
+    def process_image(
+        self,
+        image_data: bytes,
+        bindata_id: Optional[str] = None,
+        image_index: Optional[int] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process and save HWP image data.
+        
+        Args:
+            image_data: Raw image binary data
+            bindata_id: BinData ID (e.g., "BIN0001")
+            image_index: Image index (for naming)
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        custom_name = None
+        if bindata_id is not None:
+            custom_name = f"hwp_{bindata_id}"
+        elif image_index is not None:
+            custom_name = f"hwp_image_{image_index}"
+        
+        return self.save_image(image_data, custom_name=custom_name)
+    
+    def process_bindata_stream(
+        self,
+        ole: "olefile.OleFileIO",
+        stream_path: str,
+        is_compressed: bool = True,
+    ) -> Optional[str]:
+        """
+        Process image from HWP BinData OLE stream.
+        
+        Args:
+            ole: OleFileIO object
+            stream_path: Path to BinData stream
+            is_compressed: Whether data is zlib compressed
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        try:
+            import zlib
+            
+            stream_data = ole.openstream(stream_path).read()
+            
+            if is_compressed:
+                try:
+                    image_data = zlib.decompress(stream_data, -15)
+                except zlib.error:
+                    # Try without negative windowBits
+                    try:
+                        image_data = zlib.decompress(stream_data)
+                    except zlib.error:
+                        # Not compressed after all
+                        image_data = stream_data
+            else:
+                image_data = stream_data
+            
+            # Extract bindata ID from path
+            bindata_id = stream_path.split('/')[-1] if '/' in stream_path else stream_path
+            
+            return self.process_image(image_data, bindata_id=bindata_id)
+            
+        except Exception as e:
+            self._logger.warning(f"Failed to process BinData stream {stream_path}: {e}")
+            return None
+    
+    def process_embedded_image(
+        self,
+        image_data: bytes,
+        image_name: Optional[str] = None,
+        bindata_id: Optional[str] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process embedded HWP image.
+        
+        Args:
+            image_data: Image binary data
+            image_name: Original image filename
+            bindata_id: BinData ID
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        custom_name = image_name
+        if custom_name is None and bindata_id is not None:
+            custom_name = f"hwp_embed_{bindata_id}"
+        
+        return self.save_image(image_data, custom_name=custom_name)
+    
+    def decompress_and_process(
+        self,
+        compressed_data: bytes,
+        bindata_id: Optional[str] = None,
+    ) -> Optional[str]:
+        """
+        Decompress and process zlib-compressed image data.
+        
+        Args:
+            compressed_data: zlib compressed image data
+            bindata_id: BinData ID
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        image_data = self.try_decompress_image(compressed_data)
+        return self.process_image(image_data, bindata_id=bindata_id)
+    
+    @staticmethod
+    def try_decompress_image(data: bytes) -> bytes:
+        """
+        Attempt to decompress HWP image data.
+
+        HWP files may contain zlib-compressed images, so this method
+        tries various decompression strategies.
+
+        Args:
+            data: Original image data (possibly compressed)
+
+        Returns:
+            Decompressed image data (or original if not compressed)
+        """
+        # 1. Try zlib decompression if zlib header present
+        if data.startswith(b'\x78'):
+            try:
+                return zlib.decompress(data)
+            except Exception:
+                pass
+
+        # 2. Check if already a valid image
+        try:
+            with Image.open(io.BytesIO(data)) as img:
+                img.verify()
+            return data  # Valid image
+        except Exception:
+            pass
+
+        # 3. Try raw deflate (no header)
+        try:
+            return zlib.decompress(data, -15)
+        except Exception:
+            pass
+
+        return data
+    
+    @staticmethod
+    def find_bindata_stream(ole: "olefile.OleFileIO", storage_id: int, ext: str) -> Optional[List[str]]:
+        """
+        Find BinData stream in OLE container by storage_id and extension.
+
+        Args:
+            ole: OLE file object
+            storage_id: BinData storage ID
+            ext: File extension
+
+        Returns:
+            Stream path if found, None otherwise
+        """
+        ole_dirs = ole.listdir()
+
+        candidates = [
+            f"BinData/BIN{storage_id:04X}.{ext}",
+            f"BinData/BIN{storage_id:04x}.{ext}",
+            f"BinData/Bin{storage_id:04X}.{ext}",
+            f"BinData/Bin{storage_id:04x}.{ext}",
+            f"BinData/BIN{storage_id:04X}.{ext.lower()}",
+            f"BinData/BIN{storage_id:04x}.{ext.lower()}",
+        ]
+
+        # Pattern matching
+        for entry in ole_dirs:
+            if entry[0] == "BinData" and len(entry) > 1:
+                fname = entry[1].lower()
+                expected_patterns = [
+                    f"bin{storage_id:04x}",
+                    f"bin{storage_id:04X}",
+                ]
+                for pattern in expected_patterns:
+                    if pattern.lower() in fname.lower():
+                        logger.debug(f"Found stream by pattern match: {entry}")
+                        return entry
+
+        # Exact path matching
+        for candidate in candidates:
+            candidate_parts = candidate.split('/')
+            if candidate_parts in ole_dirs:
+                return candidate_parts
+
+        # Case-insensitive matching
+        for entry in ole_dirs:
+            if entry[0] == "BinData" and len(entry) > 1:
+                fname = entry[1]
+                for candidate in candidates:
+                    if fname.lower() == candidate.split('/')[-1].lower():
+                        return entry
+
+        return None
+    
+    @staticmethod
+    def extract_bindata_index(payload: bytes, bin_data_list_len: int) -> Optional[int]:
+        """
+        Extract BinData index from SHAPE_COMPONENT_PICTURE record payload.
+
+        Tries various offset strategies for compatibility with different HWP versions.
+
+        Args:
+            payload: SHAPE_COMPONENT_PICTURE record payload
+            bin_data_list_len: Length of bin_data_list (for validation)
+
+        Returns:
+            BinData index (1-based) or None
+        """
+        if bin_data_list_len == 0:
+            return None
+
+        bindata_index = None
+
+        # Strategy 1: 오프셋 79 (HWP 5.0.3.x+ 스펙)
+        if len(payload) >= 81:
+            test_id = struct.unpack('<H', payload[79:81])[0]
+            if 0 < test_id <= bin_data_list_len:
+                bindata_index = test_id
+                logger.debug(f"Found BinData index at offset 79: {bindata_index}")
+                return bindata_index
+
+        # Strategy 2: 오프셋 8 (구 버전)
+        if len(payload) >= 10:
+            test_id = struct.unpack('<H', payload[8:10])[0]
+            if 0 < test_id <= bin_data_list_len:
+                bindata_index = test_id
+                logger.debug(f"Found BinData index at offset 8: {bindata_index}")
+                return bindata_index
+
+        # Strategy 3: 일반적인 오프셋 스캔
+        for offset in [4, 6, 10, 12, 14, 16, 18, 20, 40, 44, 48, 52, 56, 60, 64, 68, 72, 76, 80]:
+            if len(payload) >= offset + 2:
+                test_id = struct.unpack('<H', payload[offset:offset+2])[0]
+                if 0 < test_id <= bin_data_list_len:
+                    bindata_index = test_id
+                    logger.debug(f"Found potential BinData index at offset {offset}: {bindata_index}")
+                    return bindata_index
+
+        # Strategy 4: 범위 내 첫 번째 non-zero 2바이트 값 스캔
+        for i in range(0, min(len(payload) - 1, 100), 2):
+            test_id = struct.unpack('<H', payload[i:i+2])[0]
+            if 0 < test_id <= bin_data_list_len:
+                bindata_index = test_id
+                logger.debug(f"Found BinData index by scanning at offset {i}: {bindata_index}")
+                return bindata_index
+
+        return None
+    
+    def extract_and_save_image(
+        self,
+        ole: "olefile.OleFileIO",
+        target_stream: List[str],
+        processed_images: Optional[Set[str]] = None,
+    ) -> Optional[str]:
+        """
+        Extract image from OLE stream and save locally.
+
+        Args:
+            ole: OLE file object
+            target_stream: Stream path
+            processed_images: Set of processed image paths
+
+        Returns:
+            Image tag string or None
+        """
+        try:
+            stream = ole.openstream(target_stream)
+            image_data = stream.read()
+            image_data = self.try_decompress_image(image_data)
+
+            bindata_id = target_stream[-1] if target_stream else None
+            image_tag = self.process_image(image_data, bindata_id=bindata_id)
+            
+            if image_tag:
+                if processed_images is not None:
+                    processed_images.add("/".join(target_stream))
+                logger.info(f"Successfully extracted inline image: {image_tag}")
+                return f"\n{image_tag}\n"
+        except Exception as e:
+            logger.warning(f"Failed to process inline HWP image {target_stream}: {e}")
+
+        return None
+    
+    def process_images_from_bindata(
+        self,
+        ole: "olefile.OleFileIO",
+        processed_images: Optional[Set[str]] = None,
+    ) -> str:
+        """
+        Extract images from BinData storage and save locally.
+
+        Args:
+            ole: OLE file object
+            processed_images: Set of already processed image paths (to skip)
+
+        Returns:
+            Joined image tag strings
+        """
+        results = []
+
+        try:
+            bindata_streams = [
+                entry for entry in ole.listdir()
+                if entry[0] == "BinData"
+            ]
+
+            for stream_path in bindata_streams:
+                if processed_images and "/".join(stream_path) in processed_images:
+                    continue
+
+                stream_name = stream_path[-1]
+                ext = os.path.splitext(stream_name)[1].lower()
+                if ext in ['.jpg', '.jpeg', '.png', '.bmp', '.gif']:
+                    stream = ole.openstream(stream_path)
+                    image_data = stream.read()
+                    image_data = self.try_decompress_image(image_data)
+
+                    bindata_id = stream_name
+                    image_tag = self.process_image(image_data, bindata_id=bindata_id)
+                    if image_tag:
+                        results.append(image_tag)
+
+        except Exception as e:
+            logger.warning(f"Error processing HWP images: {e}")
+
+        return "\n\n".join(results)
+
+
+__all__ = ["HWPImageProcessor"]
diff --git a/contextifier/core/processor/hwp_helper/hwp_metadata.py b/contextifier/core/processor/hwp_helper/hwp_metadata.py
index 938f270..438ae0d 100644
--- a/contextifier/core/processor/hwp_helper/hwp_metadata.py
+++ b/contextifier/core/processor/hwp_helper/hwp_metadata.py
@@ -1,99 +1,131 @@
-# service/document_processor/processor/hwp_helper/hwp_metadata.py
+# contextifier/core/processor/hwp_helper/hwp_metadata.py
 """
-HWP 메타데이터 추출 유틸리티
+HWP Metadata Extraction Module
 
-HWP 5.0 OLE 파일에서 메타데이터를 추출합니다.
-- extract_metadata: OLE 표준 메타데이터 + HwpSummaryInformation 추출
-- parse_hwp_summary_information: HWP 고유 Property Set 파싱
-- format_metadata: 메타데이터를 문자열로 포맷팅
+Provides HWPMetadataExtractor class for extracting metadata from HWP 5.0 OLE files.
+Implements BaseMetadataExtractor interface.
+
+Extraction methods:
+1. olefile's get_metadata() - OLE standard metadata
+2. HwpSummaryInformation stream direct parsing - HWP-specific metadata
+
+Note: HWP is a Korean-native document format, so Korean metadata labels
+are preserved in output for proper display.
 """
 import struct
 import logging
 from datetime import datetime
-from typing import Dict, Any
+from typing import Dict, Any, Optional
 
 import olefile
 
+from contextifier.core.functions.metadata_extractor import (
+    BaseMetadataExtractor,
+    DocumentMetadata,
+)
+
 logger = logging.getLogger("document-processor")
 
 
-def extract_metadata(ole: olefile.OleFileIO) -> Dict[str, Any]:
+class HWPMetadataExtractor(BaseMetadataExtractor):
     """
-    HWP 파일의 메타데이터를 추출합니다.
+    HWP Metadata Extractor.
     
-    두 가지 방법으로 메타데이터를 추출합니다:
-    1. olefile의 get_metadata() - OLE 표준 메타데이터
-    2. HwpSummaryInformation 스트림 직접 파싱 - HWP 고유 메타데이터
+    Extracts metadata from olefile OleFileIO objects.
+    Supports both OLE standard metadata and HWP-specific HwpSummaryInformation.
     
-    Args:
-        ole: OLE 파일 객체
-        
-    Returns:
-        추출된 메타데이터 딕셔너리
+    Supported fields:
+    - title, subject, author, keywords, comments
+    - last_saved_by, create_time, last_saved_time
+    
+    Usage:
+        extractor = HWPMetadataExtractor()
+        metadata = extractor.extract(ole_file)
+        text = extractor.format(metadata)
     """
-    metadata = {}
     
-    # Method 1: olefile의 get_metadata() 사용
-    try:
-        ole_meta = ole.get_metadata()
-        
-        if ole_meta:
-            if ole_meta.title:
-                metadata['title'] = ole_meta.title
-            if ole_meta.subject:
-                metadata['subject'] = ole_meta.subject
-            if ole_meta.author:
-                metadata['author'] = ole_meta.author
-            if ole_meta.keywords:
-                metadata['keywords'] = ole_meta.keywords
-            if ole_meta.comments:
-                metadata['comments'] = ole_meta.comments
-            if ole_meta.last_saved_by:
-                metadata['last_saved_by'] = ole_meta.last_saved_by
-            if ole_meta.create_time:
-                metadata['create_time'] = ole_meta.create_time
-            if ole_meta.last_saved_time:
-                metadata['last_saved_time'] = ole_meta.last_saved_time
+    def extract(self, source: olefile.OleFileIO) -> DocumentMetadata:
+        """
+        Extract metadata from HWP file.
         
-        logger.info(f"Extracted OLE metadata: {metadata}")
+        Args:
+            source: olefile OleFileIO object
+            
+        Returns:
+            DocumentMetadata instance containing extracted metadata.
+        """
+        metadata_dict: Dict[str, Any] = {}
         
-    except Exception as e:
-        logger.warning(f"Failed to extract OLE metadata: {e}")
-    
-    # Method 2: HwpSummaryInformation 스트림 직접 파싱
-    try:
-        hwp_summary_stream = '\x05HwpSummaryInformation'
-        if ole.exists(hwp_summary_stream):
-            logger.debug("Found HwpSummaryInformation stream, attempting to parse...")
-            stream = ole.openstream(hwp_summary_stream)
-            data = stream.read()
-            hwp_meta = parse_hwp_summary_information(data)
+        # Method 1: Use olefile's get_metadata()
+        try:
+            ole_meta = source.get_metadata()
             
-            # HWP 특화 메타데이터가 우선
-            for key, value in hwp_meta.items():
-                if value:
-                    metadata[key] = value
-                    
-    except Exception as e:
-        logger.debug(f"Failed to parse HwpSummaryInformation: {e}")
-    
-    return metadata
+            if ole_meta:
+                if ole_meta.title:
+                    metadata_dict['title'] = ole_meta.title
+                if ole_meta.subject:
+                    metadata_dict['subject'] = ole_meta.subject
+                if ole_meta.author:
+                    metadata_dict['author'] = ole_meta.author
+                if ole_meta.keywords:
+                    metadata_dict['keywords'] = ole_meta.keywords
+                if ole_meta.comments:
+                    metadata_dict['comments'] = ole_meta.comments
+                if ole_meta.last_saved_by:
+                    metadata_dict['last_saved_by'] = ole_meta.last_saved_by
+                if ole_meta.create_time:
+                    metadata_dict['create_time'] = ole_meta.create_time
+                if ole_meta.last_saved_time:
+                    metadata_dict['last_saved_time'] = ole_meta.last_saved_time
+            
+            self.logger.debug(f"Extracted OLE metadata: {list(metadata_dict.keys())}")
+            
+        except Exception as e:
+            self.logger.warning(f"Failed to extract OLE metadata: {e}")
+        
+        # Method 2: Parse HwpSummaryInformation stream directly
+        try:
+            hwp_summary_stream = '\x05HwpSummaryInformation'
+            if source.exists(hwp_summary_stream):
+                self.logger.debug("Found HwpSummaryInformation stream, attempting to parse...")
+                stream = source.openstream(hwp_summary_stream)
+                data = stream.read()
+                hwp_meta = parse_hwp_summary_information(data)
+                
+                # HWP-specific metadata takes priority
+                for key, value in hwp_meta.items():
+                    if value:
+                        metadata_dict[key] = value
+                        
+        except Exception as e:
+            self.logger.debug(f"Failed to parse HwpSummaryInformation: {e}")
+        
+        return DocumentMetadata(
+            title=metadata_dict.get('title'),
+            subject=metadata_dict.get('subject'),
+            author=metadata_dict.get('author'),
+            keywords=metadata_dict.get('keywords'),
+            comments=metadata_dict.get('comments'),
+            last_saved_by=metadata_dict.get('last_saved_by'),
+            create_time=metadata_dict.get('create_time'),
+            last_saved_time=metadata_dict.get('last_saved_time'),
+        )
 
 
 def parse_hwp_summary_information(data: bytes) -> Dict[str, Any]:
     """
-    HwpSummaryInformation 스트림을 파싱합니다. (OLE Property Set 형식)
+    Parse HwpSummaryInformation stream (OLE Property Set format).
     
-    OLE Property Set 구조:
+    OLE Property Set structure:
     - Header (28 bytes)
     - Section(s) containing property ID/offset pairs
-    - Property values (string, datetime 등)
+    - Property values (string, datetime, etc.)
     
     Args:
-        data: HwpSummaryInformation 스트림 바이너리 데이터
+        data: HwpSummaryInformation stream binary data
         
     Returns:
-        파싱된 메타데이터 딕셔너리
+        Dictionary containing parsed metadata.
     """
     metadata = {}
     
@@ -103,7 +135,7 @@ def parse_hwp_summary_information(data: bytes) -> Dict[str, Any]:
         
         pos = 0
         _byte_order = struct.unpack('<H', data[pos:pos+2])[0]  # noqa: F841
-        pos = 28  # 헤더 스킵
+        pos = 28  # Skip header
         
         if len(data) < pos + 20:
             return metadata
@@ -114,7 +146,7 @@ def parse_hwp_summary_information(data: bytes) -> Dict[str, Any]:
         if section_offset >= len(data):
             return metadata
         
-        # Section 파싱
+        # Parse section
         pos = section_offset
         if len(data) < pos + 8:
             return metadata
@@ -123,7 +155,7 @@ def parse_hwp_summary_information(data: bytes) -> Dict[str, Any]:
         num_properties = struct.unpack('<I', data[pos+4:pos+8])[0]
         pos += 8
         
-        # Property ID/Offset 쌍 읽기
+        # Read property ID/offset pairs
         properties = []
         for _ in range(min(num_properties, 50)):
             if len(data) < pos + 8:
@@ -133,7 +165,7 @@ def parse_hwp_summary_information(data: bytes) -> Dict[str, Any]:
             properties.append((prop_id, prop_offset))
             pos += 8
         
-        # Property 값 읽기
+        # Read property values
         for prop_id, prop_offset in properties:
             abs_offset = section_offset + prop_offset
             if abs_offset + 4 >= len(data):
@@ -173,7 +205,7 @@ def parse_hwp_summary_information(data: bytes) -> Dict[str, Any]:
                         except Exception:
                             pass
             
-            # Property ID 매핑
+            # Property ID mapping
             if value:
                 if prop_id == 0x02:
                     metadata['title'] = value
@@ -198,59 +230,7 @@ def parse_hwp_summary_information(data: bytes) -> Dict[str, Any]:
     return metadata
 
 
-def format_metadata(metadata: Dict[str, Any]) -> str:
-    """
-    메타데이터 딕셔너리를 읽기 쉬운 문자열로 포맷팅합니다.
-    
-    Args:
-        metadata: 메타데이터 딕셔너리
-        
-    Returns:
-        포맷팅된 메타데이터 문자열
-    """
-    if not metadata:
-        return ""
-    
-    lines = ["<Document-Metadata>"]
-    
-    field_names = {
-        'title': '제목',
-        'subject': '주제',
-        'author': '작성자',
-        'keywords': '키워드',
-        'comments': '설명',
-        'last_saved_by': '마지막 저장자',
-        'create_time': '작성일',
-        'last_saved_time': '수정일',
-    }
-    
-    for key, label in field_names.items():
-        if key in metadata and metadata[key]:
-            value = metadata[key]
-            
-            # Format datetime objects
-            if isinstance(value, datetime):
-                value = value.strftime('%Y-%m-%d %H:%M:%S')
-            
-            lines.append(f"  {label}: {value}")
-    
-    lines.append("</Document-Metadata>")
-    
-    return "\n".join(lines)
-
-
-# 하위 호환성을 위한 클래스 래퍼
-class MetadataHelper:
-    """메타데이터 처리 관련 유틸리티 (하위 호환성)"""
-    
-    @staticmethod
-    def format_metadata(metadata: Dict[str, Any]) -> str:
-        return format_metadata(metadata)
-
-
 __all__ = [
-    'extract_metadata',
+    'HWPMetadataExtractor',
     'parse_hwp_summary_information',
-    'format_metadata',
-    'MetadataHelper',
 ]
diff --git a/contextifier/core/processor/hwp_helper/hwp_preprocessor.py b/contextifier/core/processor/hwp_helper/hwp_preprocessor.py
new file mode 100644
index 0000000..986ee2e
--- /dev/null
+++ b/contextifier/core/processor/hwp_helper/hwp_preprocessor.py
@@ -0,0 +1,82 @@
+# contextifier/core/processor/hwp_helper/hwp_preprocessor.py
+"""
+HWP Preprocessor - Process HWP OLE document after conversion.
+
+Processing Pipeline Position:
+    1. HWPFileConverter.convert() → olefile.OleFileIO
+    2. HWPPreprocessor.preprocess() → PreprocessedData (THIS STEP)
+    3. HWPMetadataExtractor.extract() → DocumentMetadata
+    4. Content extraction (body text, tables, images)
+
+Current Implementation:
+    - Pass-through (HWP uses olefile object directly)
+"""
+import logging
+from typing import Any, Dict
+
+from contextifier.core.functions.preprocessor import (
+    BasePreprocessor,
+    PreprocessedData,
+)
+
+logger = logging.getLogger("contextify.hwp.preprocessor")
+
+
+class HWPPreprocessor(BasePreprocessor):
+    """
+    HWP OLE Document Preprocessor.
+
+    Currently a pass-through implementation as HWP processing
+    is handled during the content extraction phase using olefile.
+    """
+
+    def preprocess(
+        self,
+        converted_data: Any,
+        **kwargs
+    ) -> PreprocessedData:
+        """
+        Preprocess the converted HWP OLE document.
+
+        Args:
+            converted_data: olefile.OleFileIO object from HWPFileConverter
+            **kwargs: Additional options
+
+        Returns:
+            PreprocessedData with the OLE object and any extracted resources
+        """
+        metadata: Dict[str, Any] = {}
+
+        if hasattr(converted_data, 'listdir'):
+            try:
+                streams = converted_data.listdir()
+                metadata['stream_count'] = len(streams)
+                # Check for common HWP streams
+                has_body = any('BodyText' in '/'.join(s) for s in streams)
+                has_docinfo = any('DocInfo' in '/'.join(s) for s in streams)
+                metadata['has_body_text'] = has_body
+                metadata['has_doc_info'] = has_docinfo
+            except Exception:
+                pass
+
+        logger.debug("HWP preprocessor: pass-through, metadata=%s", metadata)
+
+        # clean_content is the TRUE SOURCE - contains the OLE object
+        return PreprocessedData(
+            raw_content=converted_data,
+            clean_content=converted_data,  # TRUE SOURCE - olefile.OleFileIO
+            encoding="utf-8",
+            extracted_resources={},
+            metadata=metadata,
+        )
+
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "HWP Preprocessor"
+
+    def validate(self, data: Any) -> bool:
+        """Validate if data is an OLE file object."""
+        return hasattr(data, 'listdir') and hasattr(data, 'openstream')
+
+
+__all__ = ['HWPPreprocessor']
diff --git a/contextifier/core/processor/hwps_handler.py b/contextifier/core/processor/hwpx_handler.py
similarity index 60%
rename from contextifier/core/processor/hwps_handler.py
rename to contextifier/core/processor/hwpx_handler.py
index 5c4f7ea..a835d26 100644
--- a/contextifier/core/processor/hwps_handler.py
+++ b/contextifier/core/processor/hwpx_handler.py
@@ -5,21 +5,18 @@
 Class-based handler for HWPX files inheriting from BaseHandler.
 """
 import io
-import zipfile
 import logging
 from typing import Dict, Any, Set, TYPE_CHECKING
 
 from contextifier.core.processor.base_handler import BaseHandler
 from contextifier.core.functions.chart_extractor import BaseChartExtractor
-from contextifier.core.processor.hwp_helper import MetadataHelper
 from contextifier.core.processor.hwpx_helper import (
-    extract_hwpx_metadata,
     parse_bin_item_map,
     parse_hwpx_section,
-    process_hwpx_images,
-    get_remaining_images,
 )
 from contextifier.core.processor.hwpx_helper.hwpx_chart_extractor import HWPXChartExtractor
+from contextifier.core.processor.hwpx_helper.hwpx_metadata import HWPXMetadataExtractor
+from contextifier.core.processor.hwpx_helper.hwpx_image_processor import HWPXImageProcessor
 
 if TYPE_CHECKING:
     from contextifier.core.document_processor import CurrentFile
@@ -30,11 +27,34 @@
 
 class HWPXHandler(BaseHandler):
     """HWPX (ZIP/XML based Korean document) Processing Handler Class"""
-    
+
+    def _create_file_converter(self):
+        """Create HWPX-specific file converter."""
+        from contextifier.core.processor.hwpx_helper.hwpx_file_converter import HWPXFileConverter
+        return HWPXFileConverter()
+
+    def _create_preprocessor(self):
+        """Create HWPX-specific preprocessor."""
+        from contextifier.core.processor.hwpx_helper.hwpx_preprocessor import HWPXPreprocessor
+        return HWPXPreprocessor()
+
     def _create_chart_extractor(self) -> BaseChartExtractor:
         """Create HWPX-specific chart extractor."""
         return HWPXChartExtractor(self._chart_processor)
-    
+
+    def _create_metadata_extractor(self):
+        """Create HWPX-specific metadata extractor."""
+        return HWPXMetadataExtractor()
+
+    def _create_format_image_processor(self):
+        """Create HWPX-specific image processor."""
+        return HWPXImageProcessor(
+            directory_path=self._image_processor.config.directory_path,
+            tag_prefix=self._image_processor.config.tag_prefix,
+            tag_suffix=self._image_processor.config.tag_suffix,
+            storage_backend=self._image_processor.storage_backend,
+        )
+
     def extract_text(
         self,
         current_file: "CurrentFile",
@@ -43,34 +63,32 @@ def extract_text(
     ) -> str:
         """
         Extract text from HWPX file.
-        
+
         Args:
             current_file: CurrentFile dict containing file info and binary data
             extract_metadata: Whether to extract metadata
             **kwargs: Additional options
-            
+
         Returns:
             Extracted text
         """
         file_path = current_file.get("file_path", "unknown")
+        file_data = current_file.get("file_data", b"")
         text_content = []
-        
+
+        # Check if it's a valid ZIP file using file_converter.validate()
+        if not self.file_converter.validate(file_data):
+            self.logger.error("Not a valid Zip file: %s", file_path)
+            return ""
+
         try:
-            # Open ZIP from stream
+            # Get file stream
             file_stream = self.get_file_stream(current_file)
-            
-            # Check if valid ZIP
-            if not self._is_valid_zip(file_stream):
-                self.logger.error(f"Not a valid Zip file: {file_path}")
-                return ""
-            
-            # Reset stream position
-            file_stream.seek(0)
-            
+
             # Pre-extract all charts using ChartExtractor
             chart_data_list = self.chart_extractor.extract_all_from_file(file_stream)
             chart_idx = [0]  # Mutable container for closure
-            
+
             def get_next_chart() -> str:
                 """Callback to get the next pre-extracted chart content."""
                 if chart_idx[0] < len(chart_data_list):
@@ -78,52 +96,62 @@ def get_next_chart() -> str:
                     chart_idx[0] += 1
                     return self._format_chart_data(chart_data)
                 return ""
-            
-            file_stream.seek(0)
-            
-            with zipfile.ZipFile(file_stream, 'r') as zf:
+
+            # Step 1: Convert binary to ZipFile using file_converter
+            zf = self.file_converter.convert(file_data, file_stream)
+
+            # Step 2: Preprocess - clean_content is the TRUE SOURCE
+            preprocessed = self.preprocess(zf)
+            zf = preprocessed.clean_content  # TRUE SOURCE
+
+            try:
                 if extract_metadata:
-                    metadata = extract_hwpx_metadata(zf)
-                    metadata_text = MetadataHelper.format_metadata(metadata)
+                    metadata_text = self.extract_and_format_metadata(zf)
                     if metadata_text:
                         text_content.append(metadata_text)
                         text_content.append("")
-                
+
                 bin_item_map = parse_bin_item_map(zf)
-                
+
                 section_files = [
-                    f for f in zf.namelist() 
+                    f for f in zf.namelist()
                     if f.startswith("Contents/section") and f.endswith(".xml")
                 ]
                 section_files.sort(key=lambda x: int(x.replace("Contents/section", "").replace(".xml", "")))
-                
+
                 processed_images: Set[str] = set()
-                
+
                 for sec_file in section_files:
                     with zf.open(sec_file) as f:
                         xml_content = f.read()
-                        section_text = parse_hwpx_section(xml_content, zf, bin_item_map, processed_images, image_processor=self.image_processor)
+                        section_text = parse_hwpx_section(xml_content, zf, bin_item_map, processed_images, image_processor=self.format_image_processor)
                         text_content.append(section_text)
-                
-                remaining_images = get_remaining_images(zf, processed_images)
-                if remaining_images:
-                    image_text = process_hwpx_images(zf, remaining_images, image_processor=self.image_processor)
-                    if image_text:
-                        text_content.append("\n\n=== Extracted Images (Not Inline) ===\n")
-                        text_content.append(image_text)
-                
+
+                # Use format_image_processor directly
+                image_processor = self.format_image_processor
+                if hasattr(image_processor, 'get_remaining_images'):
+                    remaining_images = image_processor.get_remaining_images(zf, processed_images)
+                    if remaining_images and hasattr(image_processor, 'process_images'):
+                        image_text = image_processor.process_images(zf, remaining_images)
+                        if image_text:
+                            text_content.append("\n\n=== Extracted Images (Not Inline) ===\n")
+                            text_content.append(image_text)
+
                 # Add pre-extracted charts
                 while chart_idx[0] < len(chart_data_list):
                     chart_text = get_next_chart()
                     if chart_text:
                         text_content.append(chart_text)
-        
-        except Exception as e:
-            self.logger.error(f"Error processing HWPX file: {e}")
+            finally:
+                # Close ZipFile using file_converter
+                self.file_converter.close(zf)
+
+        except Exception as e:  # noqa: BLE001
+            self.logger.error("Error processing HWPX file: %s", e)
             return f"Error processing HWPX file: {str(e)}"
-        
+
         return "\n".join(text_content)
-    
+
     def _is_valid_zip(self, file_stream: io.BytesIO) -> bool:
         """Check if stream is a valid ZIP file."""
         try:
@@ -131,16 +159,16 @@ def _is_valid_zip(self, file_stream: io.BytesIO) -> bool:
             header = file_stream.read(4)
             file_stream.seek(0)
             return header == b'PK\x03\x04'
-        except:
+        except Exception:  # noqa: BLE001
             return False
-    
+
     def _format_chart_data(self, chart_data: "ChartData") -> str:
         """Format ChartData using ChartProcessor."""
         from contextifier.core.functions.chart_extractor import ChartData
-        
+
         if not isinstance(chart_data, ChartData):
             return ""
-        
+
         if chart_data.has_data():
             return self.chart_processor.format_chart_data(
                 chart_type=chart_data.chart_type,
diff --git a/contextifier/core/processor/hwpx_helper/__init__.py b/contextifier/core/processor/hwpx_helper/__init__.py
index faf2642..d3b64b5 100644
--- a/contextifier/core/processor/hwpx_helper/__init__.py
+++ b/contextifier/core/processor/hwpx_helper/__init__.py
@@ -25,7 +25,7 @@
 
 # Metadata
 from contextifier.core.processor.hwpx_helper.hwpx_metadata import (
-    extract_hwpx_metadata,
+    HWPXMetadataExtractor,
     parse_bin_item_map,
 )
 
@@ -40,10 +40,9 @@
     parse_hwpx_section,
 )
 
-# Image
-from contextifier.core.processor.hwpx_helper.hwpx_image import (
-    process_hwpx_images,
-    get_remaining_images,
+# Image Processor (replaces hwpx_image.py utility functions)
+from contextifier.core.processor.hwpx_helper.hwpx_image_processor import (
+    HWPXImageProcessor,
 )
 
 # Chart Extractor
@@ -60,16 +59,15 @@
     "HEADER_FILE_PATHS",
     "HPF_PATH",
     # Metadata
-    "extract_hwpx_metadata",
+    "HWPXMetadataExtractor",
     "parse_bin_item_map",
     # Table
     "parse_hwpx_table",
     "extract_cell_content",
     # Section
     "parse_hwpx_section",
-    # Image
-    "process_hwpx_images",
-    "get_remaining_images",
+    # Image Processor
+    "HWPXImageProcessor",
     # Chart Extractor
     "HWPXChartExtractor",
 ]
diff --git a/contextifier/core/processor/hwpx_helper/hwpx_file_converter.py b/contextifier/core/processor/hwpx_helper/hwpx_file_converter.py
new file mode 100644
index 0000000..e404434
--- /dev/null
+++ b/contextifier/core/processor/hwpx_helper/hwpx_file_converter.py
@@ -0,0 +1,69 @@
+# libs/core/processor/hwpx_helper/hwpx_file_converter.py
+"""
+HWPXFileConverter - HWPX file format converter
+
+Converts binary HWPX data to ZipFile object.
+"""
+from io import BytesIO
+from typing import Any, Optional, BinaryIO
+import zipfile
+
+from contextifier.core.functions.file_converter import BaseFileConverter
+
+
+class HWPXFileConverter(BaseFileConverter):
+    """
+    HWPX file converter.
+    
+    Converts binary HWPX (ZIP format) data to ZipFile object.
+    """
+    
+    # ZIP magic number
+    ZIP_MAGIC = b'PK\x03\x04'
+    
+    def convert(
+        self,
+        file_data: bytes,
+        file_stream: Optional[BinaryIO] = None,
+        **kwargs
+    ) -> zipfile.ZipFile:
+        """
+        Convert binary HWPX data to ZipFile object.
+        
+        Args:
+            file_data: Raw binary HWPX data
+            file_stream: Optional file stream
+            **kwargs: Additional options
+            
+        Returns:
+            zipfile.ZipFile object
+        """
+        stream = file_stream if file_stream is not None else BytesIO(file_data)
+        stream.seek(0)
+        return zipfile.ZipFile(stream, 'r')
+    
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "HWPX Document (ZIP/XML)"
+    
+    def validate(self, file_data: bytes) -> bool:
+        """Validate if data is a valid ZIP file."""
+        if not file_data or len(file_data) < 4:
+            return False
+        
+        if file_data[:4] != self.ZIP_MAGIC:
+            return False
+        
+        # Verify it's a valid ZIP
+        try:
+            with zipfile.ZipFile(BytesIO(file_data), 'r') as zf:
+                # HWPX should have specific structure
+                namelist = zf.namelist()
+                return len(namelist) > 0
+        except zipfile.BadZipFile:
+            return False
+    
+    def close(self, converted_object: Any) -> None:
+        """Close the ZipFile."""
+        if converted_object is not None and hasattr(converted_object, 'close'):
+            converted_object.close()
diff --git a/contextifier/core/processor/hwpx_helper/hwpx_image.py b/contextifier/core/processor/hwpx_helper/hwpx_image.py
deleted file mode 100644
index 867244c..0000000
--- a/contextifier/core/processor/hwpx_helper/hwpx_image.py
+++ /dev/null
@@ -1,77 +0,0 @@
-# hwpx_helper/hwpx_image.py
-"""
-HWPX 이미지 처리
-
-HWPX 문서의 이미지를 추출하고 로컬에 저장합니다.
-"""
-import logging
-import os
-import zipfile
-from typing import List, Optional
-
-from contextifier.core.processor.hwpx_helper.hwpx_constants import SUPPORTED_IMAGE_EXTENSIONS
-from contextifier.core.functions.img_processor import ImageProcessor
-
-logger = logging.getLogger("document-processor")
-
-
-def process_hwpx_images(
-    zf: zipfile.ZipFile,
-    image_files: List[str],
-    image_processor: ImageProcessor
-) -> str:
-    """
-    HWPX zip에서 이미지를 추출하여 로컬에 저장합니다.
-
-    Args:
-        zf: 열린 ZipFile 객체
-        image_files: 처리할 이미지 파일 경로 목록
-        image_processor: 이미지 프로세서 인스턴스
-
-    Returns:
-        이미지 태그 문자열들을 줄바꿈으로 연결한 결과
-    """
-    results = []
-
-    for img_path in image_files:
-        ext = os.path.splitext(img_path)[1].lower()
-        if ext in SUPPORTED_IMAGE_EXTENSIONS:
-            try:
-                with zf.open(img_path) as f:
-                    image_data = f.read()
-
-                image_tag = image_processor.save_image(image_data)
-                if image_tag:
-                    results.append(image_tag)
-
-            except Exception as e:
-                logger.warning(f"Error processing HWPX image {img_path}: {e}")
-
-    return "\n\n".join(results)
-
-
-def get_remaining_images(
-    zf: zipfile.ZipFile,
-    processed_images: set
-) -> List[str]:
-    """
-    아직 처리되지 않은 이미지 파일 목록을 반환합니다.
-
-    Args:
-        zf: 열린 ZipFile 객체
-        processed_images: 이미 처리된 이미지 경로 집합
-
-    Returns:
-        처리되지 않은 이미지 파일 경로 목록
-    """
-    image_files = [
-        f for f in zf.namelist()
-        if f.startswith("BinData/") and not f.endswith("/")
-    ]
-
-    remaining_images = []
-    for img in image_files:
-        if img not in processed_images:
-            remaining_images.append(img)
-
-    return remaining_images
diff --git a/contextifier/core/processor/hwpx_helper/hwpx_image_processor.py b/contextifier/core/processor/hwpx_helper/hwpx_image_processor.py
new file mode 100644
index 0000000..7d326b4
--- /dev/null
+++ b/contextifier/core/processor/hwpx_helper/hwpx_image_processor.py
@@ -0,0 +1,258 @@
+# contextifier/core/processor/hwpx_helper/hwpx_image_processor.py
+"""
+HWPX Image Processor
+
+Provides HWPX-specific image processing that inherits from ImageProcessor.
+Handles images in HWPX (ZIP/XML based) Korean document format.
+
+This class consolidates all HWPX image extraction logic including:
+- BinData images extraction from ZIP
+- Remaining images processing
+- Image filtering by extension
+"""
+import logging
+import os
+from typing import Any, Dict, List, Optional, Set, TYPE_CHECKING
+import zipfile
+
+from contextifier.core.functions.img_processor import ImageProcessor
+from contextifier.core.functions.storage_backend import BaseStorageBackend
+
+logger = logging.getLogger("contextify.image_processor.hwpx")
+
+# Supported image extensions
+SUPPORTED_IMAGE_EXTENSIONS = frozenset(['.png', '.jpg', '.jpeg', '.gif', '.bmp', '.tiff'])
+
+
+class HWPXImageProcessor(ImageProcessor):
+    """
+    HWPX-specific image processor.
+    
+    Inherits from ImageProcessor and provides HWPX-specific processing.
+    
+    Handles:
+    - BinData images in HWPX ZIP structure
+    - Embedded images
+    - Referenced images via bin_item_map
+    
+    Example:
+        processor = HWPXImageProcessor()
+        
+        # Process image from ZIP
+        with zipfile.ZipFile(file_stream, 'r') as zf:
+            tag = processor.process_from_zip(zf, "BinData/image1.png")
+    """
+    
+    def __init__(
+        self,
+        directory_path: str = "temp/images",
+        tag_prefix: str = "[Image:",
+        tag_suffix: str = "]",
+        storage_backend: Optional[BaseStorageBackend] = None,
+    ):
+        """
+        Initialize HWPXImageProcessor.
+        
+        Args:
+            directory_path: Image save directory
+            tag_prefix: Tag prefix for image references
+            tag_suffix: Tag suffix for image references
+            storage_backend: Storage backend for saving images
+        """
+        super().__init__(
+            directory_path=directory_path,
+            tag_prefix=tag_prefix,
+            tag_suffix=tag_suffix,
+            storage_backend=storage_backend,
+        )
+    
+    def process_image(
+        self,
+        image_data: bytes,
+        bin_item_id: Optional[str] = None,
+        image_path: Optional[str] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process and save HWPX image data.
+        
+        Args:
+            image_data: Raw image binary data
+            bin_item_id: BinItem ID from HWPX
+            image_path: Original path in ZIP (for naming)
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        custom_name = None
+        if bin_item_id is not None:
+            custom_name = f"hwpx_{bin_item_id}"
+        elif image_path is not None:
+            # Extract filename from path
+            filename = image_path.split('/')[-1] if '/' in image_path else image_path
+            # Remove extension and sanitize
+            name_base = filename.rsplit('.', 1)[0] if '.' in filename else filename
+            custom_name = f"hwpx_{name_base}"
+        
+        return self.save_image(image_data, custom_name=custom_name)
+    
+    def process_from_zip(
+        self,
+        zf: zipfile.ZipFile,
+        image_path: str,
+        bin_item_id: Optional[str] = None,
+    ) -> Optional[str]:
+        """
+        Process image from HWPX ZIP archive.
+        
+        Args:
+            zf: ZipFile object
+            image_path: Path to image in ZIP
+            bin_item_id: BinItem ID
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        try:
+            with zf.open(image_path) as f:
+                image_data = f.read()
+            
+            return self.process_image(
+                image_data,
+                bin_item_id=bin_item_id,
+                image_path=image_path
+            )
+            
+        except Exception as e:
+            self._logger.warning(f"Failed to process image from ZIP {image_path}: {e}")
+            return None
+    
+    def process_embedded_image(
+        self,
+        image_data: bytes,
+        image_name: Optional[str] = None,
+        bin_item_id: Optional[str] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process embedded HWPX image.
+        
+        Args:
+            image_data: Image binary data
+            image_name: Original image filename
+            bin_item_id: BinItem ID
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        custom_name = image_name
+        if custom_name is None and bin_item_id is not None:
+            custom_name = f"hwpx_embed_{bin_item_id}"
+        
+        return self.save_image(image_data, custom_name=custom_name)
+    
+    def process_bindata_images(
+        self,
+        zf: zipfile.ZipFile,
+        bin_item_map: Dict[str, str],
+        exclude_processed: Optional[Set[str]] = None,
+    ) -> Dict[str, str]:
+        """
+        Process all BinData images from HWPX.
+        
+        Args:
+            zf: ZipFile object
+            bin_item_map: Mapping of bin_item_id to path
+            exclude_processed: Set of already processed IDs to skip
+            
+        Returns:
+            Dictionary mapping bin_item_id to image tag
+        """
+        exclude = exclude_processed or set()
+        result = {}
+        
+        for bin_id, image_path in bin_item_map.items():
+            if bin_id in exclude:
+                continue
+            
+            tag = self.process_from_zip(zf, image_path, bin_item_id=bin_id)
+            if tag:
+                result[bin_id] = tag
+        
+        return result
+    
+    def process_images(
+        self,
+        zf: zipfile.ZipFile,
+        image_files: List[str],
+    ) -> str:
+        """
+        Extract images from HWPX zip and save locally.
+
+        Args:
+            zf: Open ZipFile object
+            image_files: List of image file paths to process
+
+        Returns:
+            Image tag strings joined by newlines
+        """
+        results = []
+
+        for img_path in image_files:
+            ext = os.path.splitext(img_path)[1].lower()
+            if ext in SUPPORTED_IMAGE_EXTENSIONS:
+                tag = self.process_from_zip(zf, img_path)
+                if tag:
+                    results.append(tag)
+
+        return "\n\n".join(results)
+    
+    def get_remaining_images(
+        self,
+        zf: zipfile.ZipFile,
+        processed_images: Set[str],
+    ) -> List[str]:
+        """
+        Return list of image files not yet processed.
+
+        Args:
+            zf: Open ZipFile object
+            processed_images: Set of already processed image paths
+
+        Returns:
+            List of unprocessed image file paths
+        """
+        image_files = [
+            f for f in zf.namelist()
+            if f.startswith("BinData/") and not f.endswith("/")
+        ]
+
+        remaining_images = []
+        for img in image_files:
+            if img not in processed_images:
+                remaining_images.append(img)
+
+        return remaining_images
+    
+    def process_remaining_images(
+        self,
+        zf: zipfile.ZipFile,
+        processed_images: Set[str],
+    ) -> str:
+        """
+        Process all images not yet processed.
+
+        Args:
+            zf: Open ZipFile object
+            processed_images: Set of already processed image paths
+
+        Returns:
+            Image tag strings joined by newlines
+        """
+        remaining = self.get_remaining_images(zf, processed_images)
+        return self.process_images(zf, remaining)
+
+
+__all__ = ["HWPXImageProcessor"]
diff --git a/contextifier/core/processor/hwpx_helper/hwpx_metadata.py b/contextifier/core/processor/hwpx_helper/hwpx_metadata.py
index 4bc26c9..57e1b1e 100644
--- a/contextifier/core/processor/hwpx_helper/hwpx_metadata.py
+++ b/contextifier/core/processor/hwpx_helper/hwpx_metadata.py
@@ -1,104 +1,139 @@
-# hwpx_helper/hwpx_metadata.py
+# contextifier/core/processor/hwpx_helper/hwpx_metadata.py
 """
-HWPX 메타데이터 추출
+HWPX Metadata Extraction Module
 
-HWPX 파일에서 메타데이터를 추출합니다.
-메타데이터는 다음 파일에 저장됩니다:
-- version.xml: 문서 버전 정보
-- META-INF/container.xml: 컨테이너 정보
-- Contents/header.xml: 문서 속성 (작성자, 날짜 등)
+Provides HWPXMetadataExtractor class for extracting metadata from HWPX files.
+Implements BaseMetadataExtractor interface.
+
+Metadata locations in HWPX:
+- version.xml: Document version information
+- META-INF/container.xml: Container information
+- Contents/header.xml: Document properties (author, date, etc.)
+
+Note: HWPX is a Korean-native document format, so Korean metadata labels
+are preserved in output for proper display.
 """
 import logging
 import xml.etree.ElementTree as ET
 import zipfile
 from typing import Any, Dict
 
+from contextifier.core.functions.metadata_extractor import (
+    BaseMetadataExtractor,
+    DocumentMetadata,
+)
 from contextifier.core.processor.hwpx_helper.hwpx_constants import HWPX_NAMESPACES, HEADER_FILE_PATHS
 
 logger = logging.getLogger("document-processor")
 
 
-def extract_hwpx_metadata(zf: zipfile.ZipFile) -> Dict[str, Any]:
+class HWPXMetadataExtractor(BaseMetadataExtractor):
     """
-    HWPX 파일에서 메타데이터를 추출합니다.
-
-    HWPX stores metadata in:
-    - version.xml: Document version info
-    - META-INF/container.xml: Container info
-    - Contents/header.xml: Document properties (작성자, 날짜 등)
-
-    Args:
-        zf: 열린 ZipFile 객체
-
-    Returns:
-        추출된 메타데이터 딕셔너리
+    HWPX Metadata Extractor.
+    
+    Extracts HWPX metadata from zipfile.ZipFile objects.
+    
+    Supported fields:
+    - Standard fields: title, subject, author, keywords, comments, etc.
+    - HWPX-specific: version, media_type, etc. (stored in custom fields)
+    
+    Usage:
+        extractor = HWPXMetadataExtractor()
+        metadata = extractor.extract(zip_file)
+        text = extractor.format(metadata)
     """
-    metadata = {}
-
-    try:
-        # Try to read header.xml for document properties
-        for header_path in HEADER_FILE_PATHS:
-            if header_path in zf.namelist():
-                with zf.open(header_path) as f:
-                    header_content = f.read()
-                    header_root = ET.fromstring(header_content)
-
-                    # Try to find document properties
-                    # <hh:docInfo> contains metadata
-                    doc_info = header_root.find('.//hh:docInfo', HWPX_NAMESPACES)
-                    if doc_info is not None:
-                        # Get properties
-                        for prop in doc_info:
-                            tag = prop.tag.split('}')[-1] if '}' in prop.tag else prop.tag
-                            if prop.text:
-                                metadata[tag.lower()] = prop.text
-                break
-
-        # Try to read version.xml
-        if 'version.xml' in zf.namelist():
-            with zf.open('version.xml') as f:
-                version_content = f.read()
-                version_root = ET.fromstring(version_content)
-
-                # Get version info
-                if version_root.text:
-                    metadata['version'] = version_root.text
-                for attr in version_root.attrib:
-                    metadata[f'version_{attr}'] = version_root.get(attr)
-
-        # Try to read META-INF/manifest.xml for additional info
-        if 'META-INF/manifest.xml' in zf.namelist():
-            with zf.open('META-INF/manifest.xml') as f:
-                manifest_content = f.read()
-                manifest_root = ET.fromstring(manifest_content)
-
-                # Get mimetype and other info
-                for child in manifest_root:
-                    tag = child.tag.split('}')[-1] if '}' in child.tag else child.tag
-                    if tag == 'file-entry':
-                        full_path = child.get('full-path', child.get('{urn:oasis:names:tc:opendocument:xmlns:manifest:1.0}full-path', ''))
-                        if full_path == '/':
-                            media_type = child.get('media-type', child.get('{urn:oasis:names:tc:opendocument:xmlns:manifest:1.0}media-type', ''))
-                            if media_type:
-                                metadata['media_type'] = media_type
-
-        logger.info(f"Extracted HWPX metadata: {metadata}")
-
-    except Exception as e:
-        logger.warning(f"Failed to extract HWPX metadata: {e}")
-
-    return metadata
+    
+    def extract(self, source: zipfile.ZipFile) -> DocumentMetadata:
+        """
+        Extract metadata from HWPX file.
+        
+        Args:
+            source: Open zipfile.ZipFile object
+            
+        Returns:
+            DocumentMetadata instance containing extracted metadata.
+        """
+        raw_metadata: Dict[str, Any] = {}
+
+        try:
+            # Try to read header.xml for document properties
+            for header_path in HEADER_FILE_PATHS:
+                if header_path in source.namelist():
+                    with source.open(header_path) as f:
+                        header_content = f.read()
+                        header_root = ET.fromstring(header_content)
+
+                        # Try to find document properties
+                        # <hh:docInfo> contains metadata
+                        doc_info = header_root.find('.//hh:docInfo', HWPX_NAMESPACES)
+                        if doc_info is not None:
+                            # Get properties
+                            for prop in doc_info:
+                                tag = prop.tag.split('}')[-1] if '}' in prop.tag else prop.tag
+                                if prop.text:
+                                    raw_metadata[tag.lower()] = prop.text
+                    break
+
+            # Try to read version.xml
+            if 'version.xml' in source.namelist():
+                with source.open('version.xml') as f:
+                    version_content = f.read()
+                    version_root = ET.fromstring(version_content)
+
+                    # Get version info
+                    if version_root.text:
+                        raw_metadata['version'] = version_root.text
+                    for attr in version_root.attrib:
+                        raw_metadata[f'version_{attr}'] = version_root.get(attr)
+
+            # Try to read META-INF/manifest.xml for additional info
+            if 'META-INF/manifest.xml' in source.namelist():
+                with source.open('META-INF/manifest.xml') as f:
+                    manifest_content = f.read()
+                    manifest_root = ET.fromstring(manifest_content)
+
+                    # Get mimetype and other info
+                    for child in manifest_root:
+                        tag = child.tag.split('}')[-1] if '}' in child.tag else child.tag
+                        if tag == 'file-entry':
+                            full_path = child.get('full-path', child.get('{urn:oasis:names:tc:opendocument:xmlns:manifest:1.0}full-path', ''))
+                            if full_path == '/':
+                                media_type = child.get('media-type', child.get('{urn:oasis:names:tc:opendocument:xmlns:manifest:1.0}media-type', ''))
+                                if media_type:
+                                    raw_metadata['media_type'] = media_type
+
+            self.logger.debug(f"Extracted HWPX metadata: {list(raw_metadata.keys())}")
+
+        except Exception as e:
+            self.logger.warning(f"Failed to extract HWPX metadata: {e}")
+        
+        # Separate standard fields from custom fields
+        standard_fields = {'title', 'subject', 'author', 'keywords', 'comments', 
+                          'last_saved_by', 'create_time', 'last_saved_time'}
+        custom_fields = {k: v for k, v in raw_metadata.items() if k not in standard_fields}
+        
+        return DocumentMetadata(
+            title=raw_metadata.get('title'),
+            subject=raw_metadata.get('subject'),
+            author=raw_metadata.get('author'),
+            keywords=raw_metadata.get('keywords'),
+            comments=raw_metadata.get('comments'),
+            last_saved_by=raw_metadata.get('last_saved_by'),
+            create_time=raw_metadata.get('create_time'),
+            last_saved_time=raw_metadata.get('last_saved_time'),
+            custom=custom_fields,
+        )
 
 
 def parse_bin_item_map(zf: zipfile.ZipFile) -> Dict[str, str]:
     """
-    content.hpf 파일을 파싱하여 BinItem ID와 파일 경로 매핑을 생성합니다.
+    Parse content.hpf file to create BinItem ID to file path mapping.
 
     Args:
-        zf: 열린 ZipFile 객체
+        zf: Open ZipFile object
 
     Returns:
-        BinItem ID -> 파일 경로 매핑 딕셔너리
+        Dictionary mapping BinItem ID to file path.
     """
     from .hwpx_constants import HPF_PATH, OPF_NAMESPACES
 
@@ -120,3 +155,9 @@ def parse_bin_item_map(zf: zipfile.ZipFile) -> Dict[str, str]:
         logger.warning(f"Failed to parse content.hpf: {e}")
 
     return bin_item_map
+
+
+__all__ = [
+    'HWPXMetadataExtractor',
+    'parse_bin_item_map',
+]
diff --git a/contextifier/core/processor/hwpx_helper/hwpx_preprocessor.py b/contextifier/core/processor/hwpx_helper/hwpx_preprocessor.py
new file mode 100644
index 0000000..7433097
--- /dev/null
+++ b/contextifier/core/processor/hwpx_helper/hwpx_preprocessor.py
@@ -0,0 +1,80 @@
+# contextifier/core/processor/hwpx_helper/hwpx_preprocessor.py
+"""
+HWPX Preprocessor - Process HWPX ZIP document after conversion.
+
+Processing Pipeline Position:
+    1. HWPXFileConverter.convert() → zipfile.ZipFile
+    2. HWPXPreprocessor.preprocess() → PreprocessedData (THIS STEP)
+    3. HWPXMetadataExtractor.extract() → DocumentMetadata
+    4. Content extraction (sections, tables, images)
+
+Current Implementation:
+    - Pass-through (HWPX uses zipfile object directly)
+"""
+import logging
+from typing import Any, Dict
+
+from contextifier.core.functions.preprocessor import (
+    BasePreprocessor,
+    PreprocessedData,
+)
+
+logger = logging.getLogger("contextify.hwpx.preprocessor")
+
+
+class HWPXPreprocessor(BasePreprocessor):
+    """
+    HWPX ZIP Document Preprocessor.
+
+    Currently a pass-through implementation as HWPX processing
+    is handled during the content extraction phase.
+    """
+
+    def preprocess(
+        self,
+        converted_data: Any,
+        **kwargs
+    ) -> PreprocessedData:
+        """
+        Preprocess the converted HWPX ZIP document.
+
+        Args:
+            converted_data: zipfile.ZipFile object from HWPXFileConverter
+            **kwargs: Additional options
+
+        Returns:
+            PreprocessedData with the ZIP object and any extracted resources
+        """
+        metadata: Dict[str, Any] = {}
+
+        if hasattr(converted_data, 'namelist'):
+            try:
+                files = converted_data.namelist()
+                metadata['file_count'] = len(files)
+                # Check for section files
+                sections = [f for f in files if 'section' in f.lower() and f.endswith('.xml')]
+                metadata['section_count'] = len(sections)
+            except Exception:  # noqa: BLE001
+                pass
+
+        logger.debug("HWPX preprocessor: pass-through, metadata=%s", metadata)
+
+        # clean_content is the TRUE SOURCE - contains the ZipFile
+        return PreprocessedData(
+            raw_content=converted_data,
+            clean_content=converted_data,  # TRUE SOURCE - zipfile.ZipFile
+            encoding="utf-8",
+            extracted_resources={},
+            metadata=metadata,
+        )
+
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "HWPX Preprocessor"
+
+    def validate(self, data: Any) -> bool:
+        """Validate if data is a ZipFile object."""
+        return hasattr(data, 'namelist') and hasattr(data, 'open')
+
+
+__all__ = ['HWPXPreprocessor']
diff --git a/contextifier/core/processor/image_file_handler.py b/contextifier/core/processor/image_file_handler.py
index b9cf4ba..6f0ca33 100644
--- a/contextifier/core/processor/image_file_handler.py
+++ b/contextifier/core/processor/image_file_handler.py
@@ -12,6 +12,8 @@
 
 from contextifier.core.processor.base_handler import BaseHandler
 from contextifier.core.functions.chart_extractor import BaseChartExtractor, NullChartExtractor
+from contextifier.core.processor.image_file_helper.image_file_image_processor import ImageFileImageProcessor
+from contextifier.core.functions.img_processor import ImageProcessor
 
 if TYPE_CHECKING:
     from contextifier.core.document_processor import CurrentFile
@@ -27,61 +29,82 @@
 class ImageFileHandler(BaseHandler):
     """
     Image File Processing Handler Class.
-    
+
     Processes standalone image files by converting them to text using OCR.
     Requires an OCR engine to be provided for actual text extraction.
-    
+
     Args:
         config: Configuration dictionary (passed from DocumentProcessor)
         image_processor: ImageProcessor instance (passed from DocumentProcessor)
         page_tag_processor: PageTagProcessor instance (passed from DocumentProcessor)
         ocr_engine: OCR engine instance (BaseOCR subclass) for image-to-text conversion
-    
+
     Example:
         >>> from contextifier.ocr.ocr_engine import OpenAIOCR
         >>> ocr = OpenAIOCR(api_key="sk-...", model="gpt-4o")
         >>> handler = ImageFileHandler(ocr_engine=ocr)
         >>> text = handler.extract_text(current_file)
     """
-    
+
+    def _create_file_converter(self):
+        """Create image-file-specific file converter."""
+        from contextifier.core.processor.image_file_helper.image_file_converter import ImageFileConverter
+        return ImageFileConverter()
+
+    def _create_preprocessor(self):
+        """Create image-file-specific preprocessor."""
+        from contextifier.core.processor.image_file_helper.image_file_preprocessor import ImageFilePreprocessor
+        return ImageFilePreprocessor()
+
     def _create_chart_extractor(self) -> BaseChartExtractor:
         """Image files do not contain charts. Return NullChartExtractor."""
         return NullChartExtractor(self._chart_processor)
-    
+
+    def _create_metadata_extractor(self):
+        """Image files do not have document metadata. Return None (uses NullMetadataExtractor)."""
+        return None
+
+    def _create_format_image_processor(self) -> ImageProcessor:
+        """Create image-file-specific image processor."""
+        return ImageFileImageProcessor()
+
     def __init__(
         self,
         config: Optional[dict] = None,
         image_processor: Optional[Any] = None,
         page_tag_processor: Optional[Any] = None,
+        chart_processor: Optional[Any] = None,
         ocr_engine: Optional["BaseOCR"] = None
     ):
         """
         Initialize ImageFileHandler.
-        
+
         Args:
             config: Configuration dictionary (passed from DocumentProcessor)
             image_processor: ImageProcessor instance (passed from DocumentProcessor)
             page_tag_processor: PageTagProcessor instance (passed from DocumentProcessor)
+            chart_processor: ChartProcessor instance (passed from DocumentProcessor)
             ocr_engine: OCR engine instance (BaseOCR subclass) for image-to-text conversion.
                        If None, images cannot be converted to text.
         """
         super().__init__(
             config=config,
             image_processor=image_processor,
-            page_tag_processor=page_tag_processor
+            page_tag_processor=page_tag_processor,
+            chart_processor=chart_processor
         )
         self._ocr_engine = ocr_engine
-    
+
     @property
     def ocr_engine(self) -> Optional["BaseOCR"]:
         """Current OCR engine instance."""
         return self._ocr_engine
-    
+
     @ocr_engine.setter
     def ocr_engine(self, engine: Optional["BaseOCR"]) -> None:
         """Set OCR engine instance."""
         self._ocr_engine = engine
-    
+
     def extract_text(
         self,
         current_file: "CurrentFile",
@@ -90,66 +113,72 @@ def extract_text(
     ) -> str:
         """
         Extract text from image file using OCR.
-        
+
         Converts the image file to text using the configured OCR engine.
         If no OCR engine is available, returns an error message.
-        
+
         Args:
             current_file: CurrentFile dict containing file info and binary data
             extract_metadata: Whether to extract metadata (not used for images)
             **kwargs: Additional options (not used)
-            
+
         Returns:
             Extracted text from image, or error message if OCR is not available
-            
+
         Raises:
             ValueError: If OCR engine is not configured
         """
         file_path = current_file.get("file_path", "unknown")
         file_name = current_file.get("file_name", "unknown")
         file_extension = current_file.get("file_extension", "").lower()
-        
+        file_data = current_file.get("file_data", b"")
+
         self.logger.info(f"Processing image file: {file_name}")
-        
+
+        # Step 1: No file_converter for image files (direct processing)
+        # Step 2: Preprocess - clean_content is the TRUE SOURCE
+        preprocessed = self.preprocess(file_data)
+        file_data = preprocessed.clean_content  # TRUE SOURCE
+
         # Validate file extension
         if file_extension not in SUPPORTED_IMAGE_EXTENSIONS:
             self.logger.warning(f"Unsupported image extension: {file_extension}")
             return f"[Unsupported image format: {file_extension}]"
-        
+
         # If OCR engine is not available, return image tag format
         # This allows the image to be processed later when OCR is available
         if self._ocr_engine is None:
             self.logger.debug(f"OCR engine not available, returning image tag: {file_name}")
             # Use ImageProcessor's tag format (e.g., [Image:path] or custom format)
             return self._build_image_tag(file_path)
-        
+
         # Use OCR engine to convert image to text
         try:
             # Use the file path directly for OCR conversion
             result = self._ocr_engine.convert_image_to_text(file_path)
-            
+
             if result is None:
                 self.logger.error(f"OCR returned None for image: {file_name}")
                 return f"[Image OCR failed: {file_name}]"
-            
+
             if result.startswith("[Image conversion error:"):
                 self.logger.error(f"OCR error for image {file_name}: {result}")
                 return result
-            
+
             self.logger.info(f"Successfully extracted text from image: {file_name}")
             return result
-            
+
         except Exception as e:
             self.logger.error(f"Error processing image {file_name}: {e}")
             return f"[Image processing error: {str(e)}]"
-    
+
     def is_supported(self, file_extension: str) -> bool:
         """
         Check if file extension is supported.
-        
+
         Args:
             file_extension: File extension (with or without dot)
-            
+
         Returns:
             True if extension is supported, False otherwise
         """
@@ -159,23 +188,23 @@ def is_supported(self, file_extension: str) -> bool:
     def _build_image_tag(self, file_path: str) -> str:
         """
         Build image tag using ImageProcessor's tag format.
-        
+
         Uses the configured tag_prefix and tag_suffix from ImageProcessor
         to create a consistent image tag format.
-        
+
         Args:
             file_path: Path to the image file
-            
+
         Returns:
             Image tag string (e.g., "[Image:path]" or custom format)
         """
         # Normalize path separators (Windows -> Unix style)
         path_str = file_path.replace("\\", "/")
-        
+
         # Use ImageProcessor's tag format
         prefix = self.image_processor.config.tag_prefix
         suffix = self.image_processor.config.tag_suffix
-        
+
         return f"{prefix}{path_str}{suffix}"
 
 
diff --git a/contextifier/core/processor/image_file_helper/__init__.py b/contextifier/core/processor/image_file_helper/__init__.py
new file mode 100644
index 0000000..e6bbb1b
--- /dev/null
+++ b/contextifier/core/processor/image_file_helper/__init__.py
@@ -0,0 +1,17 @@
+# contextifier/core/processor/image_file_helper/__init__.py
+"""
+Image File Helper 모듈
+
+이미지 파일 처리에 필요한 유틸리티를 제공합니다.
+
+모듈 구성:
+- image_file_image_processor: 이미지 파일용 이미지 프로세서
+"""
+
+from contextifier.core.processor.image_file_helper.image_file_image_processor import (
+    ImageFileImageProcessor,
+)
+
+__all__ = [
+    "ImageFileImageProcessor",
+]
diff --git a/contextifier/core/processor/image_file_helper/image_file_converter.py b/contextifier/core/processor/image_file_helper/image_file_converter.py
new file mode 100644
index 0000000..c9ea732
--- /dev/null
+++ b/contextifier/core/processor/image_file_helper/image_file_converter.py
@@ -0,0 +1,68 @@
+# libs/core/processor/image_file_helper/image_file_converter.py
+"""
+ImageFileConverter - Image file format converter
+
+Pass-through converter for image files.
+Images are kept as binary data.
+"""
+from typing import Any, Optional, BinaryIO
+
+from contextifier.core.functions.file_converter import NullFileConverter
+
+
+class ImageFileConverter(NullFileConverter):
+    """
+    Image file converter.
+    
+    Images don't need conversion - returns raw bytes.
+    This is a pass-through converter.
+    """
+    
+    # Common image magic numbers
+    MAGIC_JPEG = b'\xff\xd8\xff'
+    MAGIC_PNG = b'\x89PNG\r\n\x1a\n'
+    MAGIC_GIF = b'GIF8'
+    MAGIC_BMP = b'BM'
+    MAGIC_WEBP = b'RIFF'
+    
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "Image File"
+    
+    def validate(self, file_data: bytes) -> bool:
+        """Validate if data is an image."""
+        if not file_data or len(file_data) < 4:
+            return False
+        
+        return (
+            file_data[:3] == self.MAGIC_JPEG or
+            file_data[:8] == self.MAGIC_PNG or
+            file_data[:4] == self.MAGIC_GIF or
+            file_data[:2] == self.MAGIC_BMP or
+            file_data[:4] == self.MAGIC_WEBP
+        )
+    
+    def detect_image_type(self, file_data: bytes) -> Optional[str]:
+        """
+        Detect image type from binary data.
+        
+        Args:
+            file_data: Raw binary image data
+            
+        Returns:
+            Image type string (jpeg, png, gif, bmp, webp) or None
+        """
+        if not file_data or len(file_data) < 8:
+            return None
+        
+        if file_data[:3] == self.MAGIC_JPEG:
+            return "jpeg"
+        elif file_data[:8] == self.MAGIC_PNG:
+            return "png"
+        elif file_data[:4] == self.MAGIC_GIF:
+            return "gif"
+        elif file_data[:2] == self.MAGIC_BMP:
+            return "bmp"
+        elif file_data[:4] == self.MAGIC_WEBP:
+            return "webp"
+        return None
diff --git a/contextifier/core/processor/image_file_helper/image_file_image_processor.py b/contextifier/core/processor/image_file_helper/image_file_image_processor.py
new file mode 100644
index 0000000..b465f5f
--- /dev/null
+++ b/contextifier/core/processor/image_file_helper/image_file_image_processor.py
@@ -0,0 +1,123 @@
+# contextifier/core/processor/image_file_helper/image_file_image_processor.py
+"""
+Image File Image Processor
+
+Provides image-file-specific processing that inherits from ImageProcessor.
+Handles standalone image files (jpg, png, gif, bmp, webp, etc.).
+"""
+import logging
+from typing import Any, Optional
+
+from contextifier.core.functions.img_processor import ImageProcessor
+from contextifier.core.functions.storage_backend import BaseStorageBackend
+
+logger = logging.getLogger("contextify.image_processor.image_file")
+
+
+class ImageFileImageProcessor(ImageProcessor):
+    """
+    Image file-specific image processor.
+    
+    Inherits from ImageProcessor and provides image file-specific processing.
+    Handles standalone image files that are the document themselves.
+    
+    Handles:
+    - Standalone image files (jpg, jpeg, png, gif, bmp, webp)
+    - Image saving with metadata preservation
+    - Format conversion if needed
+    
+    Example:
+        processor = ImageFileImageProcessor()
+        
+        # Process standalone image
+        tag = processor.process_image(image_data, source_path="/path/to/image.png")
+        
+        # Process with original filename
+        tag = processor.process_standalone_image(image_data, original_name="photo.jpg")
+    """
+    
+    def __init__(
+        self,
+        directory_path: str = "temp/images",
+        tag_prefix: str = "[Image:",
+        tag_suffix: str = "]",
+        storage_backend: Optional[BaseStorageBackend] = None,
+        preserve_original_name: bool = False,
+    ):
+        """
+        Initialize ImageFileImageProcessor.
+        
+        Args:
+            directory_path: Image save directory
+            tag_prefix: Tag prefix for image references
+            tag_suffix: Tag suffix for image references
+            storage_backend: Storage backend for saving images
+            preserve_original_name: Whether to preserve original filename
+        """
+        super().__init__(
+            directory_path=directory_path,
+            tag_prefix=tag_prefix,
+            tag_suffix=tag_suffix,
+            storage_backend=storage_backend,
+        )
+        self._preserve_original_name = preserve_original_name
+    
+    @property
+    def preserve_original_name(self) -> bool:
+        """Whether to preserve original filename."""
+        return self._preserve_original_name
+    
+    def process_image(
+        self,
+        image_data: bytes,
+        source_path: Optional[str] = None,
+        original_name: Optional[str] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process and save image file data.
+        
+        Args:
+            image_data: Raw image binary data
+            source_path: Original file path
+            original_name: Original filename
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string or None if processing failed
+        """
+        # Use original name if preserve option is set
+        custom_name = None
+        if self._preserve_original_name and original_name:
+            import os
+            custom_name = os.path.splitext(original_name)[0]
+        elif source_path:
+            import os
+            custom_name = os.path.splitext(os.path.basename(source_path))[0]
+        
+        return self.save_image(image_data, custom_name=custom_name)
+    
+    def process_standalone_image(
+        self,
+        image_data: bytes,
+        original_name: Optional[str] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process standalone image file.
+        
+        Specialized method for processing image files that are the document.
+        
+        Args:
+            image_data: Raw image binary data
+            original_name: Original filename
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string or None if processing failed
+        """
+        return self.process_image(
+            image_data,
+            original_name=original_name,
+            **kwargs
+        )
diff --git a/contextifier/core/processor/image_file_helper/image_file_preprocessor.py b/contextifier/core/processor/image_file_helper/image_file_preprocessor.py
new file mode 100644
index 0000000..531758d
--- /dev/null
+++ b/contextifier/core/processor/image_file_helper/image_file_preprocessor.py
@@ -0,0 +1,84 @@
+# contextifier/core/processor/image_file_helper/image_file_preprocessor.py
+"""
+Image File Preprocessor - Process image file after conversion.
+
+Processing Pipeline Position:
+    1. ImageFileConverter.convert() → bytes (raw image data)
+    2. ImageFilePreprocessor.preprocess() → PreprocessedData (THIS STEP)
+    3. ImageFileMetadataExtractor.extract() → DocumentMetadata
+    4. OCR processing (if OCR engine available)
+
+Current Implementation:
+    - Pass-through (Image uses raw bytes directly for OCR)
+"""
+import logging
+from typing import Any, Dict
+
+from contextifier.core.functions.preprocessor import (
+    BasePreprocessor,
+    PreprocessedData,
+)
+
+logger = logging.getLogger("contextify.image_file.preprocessor")
+
+
+class ImageFilePreprocessor(BasePreprocessor):
+    """
+    Image File Preprocessor.
+
+    Currently a pass-through implementation as image processing
+    is handled by the OCR engine.
+    """
+
+    def preprocess(
+        self,
+        converted_data: Any,
+        **kwargs
+    ) -> PreprocessedData:
+        """
+        Preprocess the converted image data.
+
+        Args:
+            converted_data: Image bytes from ImageFileConverter
+            **kwargs: Additional options
+
+        Returns:
+            PreprocessedData with the image data
+        """
+        metadata: Dict[str, Any] = {}
+
+        if isinstance(converted_data, bytes):
+            metadata['size_bytes'] = len(converted_data)
+            # Try to detect image format from magic bytes
+            if converted_data.startswith(b'\xff\xd8\xff'):
+                metadata['format'] = 'jpeg'
+            elif converted_data.startswith(b'\x89PNG'):
+                metadata['format'] = 'png'
+            elif converted_data.startswith(b'GIF'):
+                metadata['format'] = 'gif'
+            elif converted_data.startswith(b'BM'):
+                metadata['format'] = 'bmp'
+            elif converted_data.startswith(b'RIFF') and b'WEBP' in converted_data[:12]:
+                metadata['format'] = 'webp'
+
+        logger.debug("Image file preprocessor: pass-through, metadata=%s", metadata)
+
+        # clean_content is the TRUE SOURCE - contains the image bytes
+        return PreprocessedData(
+            raw_content=converted_data,
+            clean_content=converted_data,  # TRUE SOURCE - image bytes
+            encoding="binary",
+            extracted_resources={},
+            metadata=metadata,
+        )
+
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "Image File Preprocessor"
+
+    def validate(self, data: Any) -> bool:
+        """Validate if data is image bytes."""
+        return isinstance(data, bytes) and len(data) > 0
+
+
+__all__ = ['ImageFilePreprocessor']
diff --git a/contextifier/core/processor/pdf_handler.py b/contextifier/core/processor/pdf_handler.py
index 81b4490..66ca487 100644
--- a/contextifier/core/processor/pdf_handler.py
+++ b/contextifier/core/processor/pdf_handler.py
@@ -57,15 +57,14 @@
 
 # Import from new modular helpers
 from contextifier.core.processor.pdf_helpers.pdf_metadata import (
-    extract_pdf_metadata,
-    format_metadata,
+    PDFMetadataExtractor,
+)
+from contextifier.core.processor.pdf_helpers.pdf_image_processor import (
+    PDFImageProcessor,
 )
 from contextifier.core.processor.pdf_helpers.pdf_utils import (
     bbox_overlaps,
 )
-from contextifier.core.processor.pdf_helpers.pdf_image import (
-    extract_images_from_page,
-)
 from contextifier.core.processor.pdf_helpers.pdf_text_extractor import (
     extract_text_blocks,
 )
@@ -124,20 +123,43 @@
 class PDFHandler(BaseHandler):
     """
     PDF Document Handler
-    
+
     Inherits from BaseHandler to manage config and image_processor at instance level.
     All internal methods access these via self.config, self.image_processor.
-    
+
     Usage:
         handler = PDFHandler(config=config, image_processor=image_processor)
         text = handler.extract_text(current_file)
     """
-    
+
+    def _create_file_converter(self):
+        """Create PDF-specific file converter."""
+        from contextifier.core.processor.pdf_helpers.pdf_file_converter import PDFFileConverter
+        return PDFFileConverter()
+
+    def _create_preprocessor(self):
+        """Create PDF-specific preprocessor."""
+        from contextifier.core.processor.pdf_helpers.pdf_preprocessor import PDFPreprocessor
+        return PDFPreprocessor()
+
     def _create_chart_extractor(self):
         """PDF chart extraction not yet implemented. Return NullChartExtractor."""
         from contextifier.core.functions.chart_extractor import NullChartExtractor
         return NullChartExtractor(self._chart_processor)
-    
+
+    def _create_metadata_extractor(self):
+        """Create PDF-specific metadata extractor."""
+        return PDFMetadataExtractor()
+
+    def _create_format_image_processor(self):
+        """Create PDF-specific image processor."""
+        return PDFImageProcessor(
+            directory_path=self._image_processor.config.directory_path,
+            tag_prefix=self._image_processor.config.tag_prefix,
+            tag_suffix=self._image_processor.config.tag_suffix,
+            storage_backend=self._image_processor.storage_backend,
+        )
+
     def extract_text(
         self,
         current_file: "CurrentFile",
@@ -146,19 +168,19 @@ def extract_text(
     ) -> str:
         """
         Extract text from PDF file.
-        
+
         Args:
             current_file: CurrentFile dict containing file info and binary data
             extract_metadata: Whether to extract metadata
             **kwargs: Additional options
-            
+
         Returns:
             Extracted text
         """
         file_path = current_file.get("file_path", "unknown")
         self.logger.info(f"[PDF] Processing: {file_path}")
         return self._extract_pdf(current_file, extract_metadata)
-    
+
     def _extract_pdf(
         self,
         current_file: "CurrentFile",
@@ -166,27 +188,31 @@ def _extract_pdf(
     ) -> str:
         """
         Enhanced PDF processing - adaptive complexity-based.
-        
+
         Args:
             current_file: CurrentFile dict containing file info and binary data
             extract_metadata: Whether to extract metadata
-            
+
         Returns:
             Extracted text
         """
         file_path = current_file.get("file_path", "unknown")
-        
+        file_data = current_file.get("file_data", b"")
+
         try:
-            # Open PDF from stream to avoid path encoding issues
-            file_stream = self.get_file_stream(current_file)
-            doc = fitz.open(stream=file_stream, filetype="pdf")
+            # Step 1: Use FileConverter to convert binary to fitz.Document
+            doc = self.file_converter.convert(file_data)
+
+            # Step 2: Preprocess - may transform doc in the future
+            preprocessed = self.preprocess(doc)
+            doc = preprocessed.clean_content  # TRUE SOURCE
+
             all_pages_text = []
             processed_images: Set[int] = set()
 
             # Extract metadata
             if extract_metadata:
-                metadata = extract_pdf_metadata(doc)
-                metadata_text = format_metadata(metadata)
+                metadata_text = self.extract_and_format_metadata(doc)
                 if metadata_text:
                     all_pages_text.append(metadata_text)
 
@@ -326,7 +352,7 @@ def _process_page_hybrid(
                 page_elements.append(elem)
 
         if complex_bboxes:
-            block_engine = BlockImageEngine(page, page_num, image_processor=self.image_processor)
+            block_engine = BlockImageEngine(page, page_num, image_processor=self.format_image_processor)
 
             for complex_bbox in complex_bboxes:
                 result = block_engine.process_region(complex_bbox, region_type="complex_region")
@@ -361,7 +387,7 @@ def _process_page_block_ocr(
         table_bboxes = [elem.bbox for elem in page_tables]
 
         if complex_regions:
-            block_engine = BlockImageEngine(page, page_num, image_processor=self.image_processor)
+            block_engine = BlockImageEngine(page, page_num, image_processor=self.format_image_processor)
 
             for complex_bbox in complex_regions:
                 if any(bbox_overlaps(complex_bbox, tb) for tb in table_bboxes):
@@ -443,7 +469,7 @@ def _process_page_full_ocr(
             return merge_page_elements(page_elements)
 
         # Smart block processing
-        block_engine = BlockImageEngine(page, page_num, image_processor=self.image_processor)
+        block_engine = BlockImageEngine(page, page_num, image_processor=self.format_image_processor)
         multi_result: MultiBlockResult = block_engine.process_page_smart()
 
         if multi_result.success and multi_result.block_results:
@@ -501,12 +527,25 @@ def _extract_images_from_page(
         min_image_size: int = 50,
         min_image_area: int = 2500
     ) -> List[PageElement]:
-        """Extract images from page using instance's image_processor."""
-        return extract_images_from_page(
-            page, page_num, doc, processed_images, table_bboxes,
-            image_processor=self.image_processor,
-            min_image_size=min_image_size, min_image_area=min_image_area
-        )
+        """Extract images from page using instance's format_image_processor."""
+        # Use PDFImageProcessor's integrated method
+        image_processor = self.format_image_processor
+        if hasattr(image_processor, 'extract_images_from_page'):
+            elements_dicts = image_processor.extract_images_from_page(
+                page, page_num, doc, processed_images, table_bboxes,
+                min_image_size=min_image_size, min_image_area=min_image_area
+            )
+            # Convert dicts to PageElement
+            return [
+                PageElement(
+                    element_type=ElementType.IMAGE,
+                    content=e['content'],
+                    bbox=e['bbox'],
+                    page_num=e['page_num']
+                )
+                for e in elements_dicts
+            ]
+        return []
 
 
 # ============================================================================
@@ -520,7 +559,7 @@ def extract_text_from_pdf(
 ) -> str:
     """
     PDF text extraction (legacy function interface).
-    
+
     This function creates a PDFHandler instance and delegates to it.
     For new code, consider using PDFHandler class directly.
 
@@ -534,13 +573,13 @@ def extract_text_from_pdf(
     """
     if current_config is None:
         current_config = {}
-    
+
     # Extract image_processor from config if available
     image_processor = current_config.get("image_processor")
-    
+
     # Create handler instance with config and image_processor
     handler = PDFHandler(config=current_config, image_processor=image_processor)
-    
+
     return handler.extract_text(file_path, extract_metadata=extract_default_metadata)
 
 
@@ -555,4 +594,3 @@ def _extract_pdf(
 ) -> str:
     """Deprecated: Use PDFHandler.extract_text() instead."""
     return extract_text_from_pdf(file_path, current_config, extract_default_metadata)
-
diff --git a/contextifier/core/processor/pdf_helpers/__init__.py b/contextifier/core/processor/pdf_helpers/__init__.py
index 22cfef3..ea8db7d 100644
--- a/contextifier/core/processor/pdf_helpers/__init__.py
+++ b/contextifier/core/processor/pdf_helpers/__init__.py
@@ -4,10 +4,9 @@
 Contains helper modules for PDF processing.
 """
 
-# Backward compatibility - import from new modules
+# Metadata - class-based extractor
 from contextifier.core.processor.pdf_helpers.pdf_metadata import (
-    extract_pdf_metadata,
-    format_metadata,
+    PDFMetadataExtractor,
     parse_pdf_date,
 )
 
@@ -20,8 +19,9 @@
     bbox_overlaps,
 )
 
-from contextifier.core.processor.pdf_helpers.pdf_image import (
-    extract_images_from_page,
+# Image Processor (replaces pdf_image.py utility functions)
+from contextifier.core.processor.pdf_helpers.pdf_image_processor import (
+    PDFImageProcessor,
 )
 
 from contextifier.core.processor.pdf_helpers.pdf_text_extractor import (
@@ -201,8 +201,8 @@
     'find_image_position',
     'get_text_lines_with_positions',
     'bbox_overlaps',
-    # pdf_image
-    'extract_images_from_page',
+    # Image Processor
+    'PDFImageProcessor',
     # pdf_text_extractor
     'extract_text_blocks',
     'split_ocr_text_to_blocks',
diff --git a/contextifier/core/processor/pdf_helpers/pdf_file_converter.py b/contextifier/core/processor/pdf_helpers/pdf_file_converter.py
new file mode 100644
index 0000000..ca45542
--- /dev/null
+++ b/contextifier/core/processor/pdf_helpers/pdf_file_converter.py
@@ -0,0 +1,71 @@
+# libs/core/processor/pdf_helpers/pdf_file_converter.py
+"""
+PDFFileConverter - PDF file format converter
+
+Converts binary PDF data to fitz.Document object using PyMuPDF.
+"""
+from typing import Any, Optional, BinaryIO
+
+from contextifier.core.functions.file_converter import BaseFileConverter
+
+
+class PDFFileConverter(BaseFileConverter):
+    """
+    PDF file converter using PyMuPDF (fitz).
+    
+    Converts binary PDF data to fitz.Document object.
+    """
+    
+    # PDF magic number
+    PDF_MAGIC = b'%PDF'
+    
+    def convert(
+        self,
+        file_data: bytes,
+        file_stream: Optional[BinaryIO] = None,
+        **kwargs
+    ) -> Any:
+        """
+        Convert binary PDF data to fitz.Document.
+        
+        Args:
+            file_data: Raw binary PDF data
+            file_stream: Optional file stream (not used, fitz prefers bytes)
+            **kwargs: Additional options
+            
+        Returns:
+            fitz.Document object
+            
+        Raises:
+            RuntimeError: If PDF cannot be opened
+        """
+        import fitz
+        return fitz.open(stream=file_data, filetype="pdf")
+    
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "PDF Document"
+    
+    def validate(self, file_data: bytes) -> bool:
+        """
+        Validate if data is a valid PDF.
+        
+        Args:
+            file_data: Raw binary file data
+            
+        Returns:
+            True if file appears to be a PDF
+        """
+        if not file_data or len(file_data) < 4:
+            return False
+        return file_data[:4] == self.PDF_MAGIC
+    
+    def close(self, converted_object: Any) -> None:
+        """
+        Close the fitz.Document.
+        
+        Args:
+            converted_object: fitz.Document to close
+        """
+        if converted_object is not None and hasattr(converted_object, 'close'):
+            converted_object.close()
diff --git a/contextifier/core/processor/pdf_helpers/pdf_image.py b/contextifier/core/processor/pdf_helpers/pdf_image.py
deleted file mode 100644
index e401a9f..0000000
--- a/contextifier/core/processor/pdf_helpers/pdf_image.py
+++ /dev/null
@@ -1,100 +0,0 @@
-# libs/core/processor/pdf_helpers/pdf_image.py
-"""
-PDF Image Extraction Module
-
-Provides functions for extracting images from PDF pages.
-"""
-import logging
-from typing import Any, Dict, List, Optional, Set, Tuple
-
-from contextifier.core.processor.pdf_helpers.types import (
-    ElementType,
-    PageElement,
-)
-from contextifier.core.processor.pdf_helpers.pdf_utils import (
-    find_image_position,
-    is_inside_any_bbox,
-)
-from contextifier.core.functions.img_processor import ImageProcessor
-
-logger = logging.getLogger("document-processor")
-
-
-def extract_images_from_page(
-    page,
-    page_num: int,
-    doc,
-    processed_images: Set[int],
-    table_bboxes: List[Tuple[float, float, float, float]],
-    image_processor: ImageProcessor,
-    min_image_size: int = 50,
-    min_image_area: int = 2500
-) -> List[PageElement]:
-    """
-    Extract images from page and save locally.
-
-    Args:
-        page: PyMuPDF page object
-        page_num: Page number (0-indexed)
-        doc: PyMuPDF document object
-        processed_images: Set of already processed image xrefs
-        table_bboxes: List of table bounding boxes to exclude
-        image_processor: ImageProcessor instance for saving images
-        min_image_size: Minimum image dimension (width/height)
-        min_image_area: Minimum image area
-
-    Returns:
-        List of PageElement for extracted images
-    """
-    elements = []
-
-    try:
-        image_list = page.get_images()
-
-        for img_info in image_list:
-            xref = img_info[0]
-
-            if xref in processed_images:
-                continue
-
-            try:
-                base_image = doc.extract_image(xref)
-                if not base_image:
-                    continue
-
-                image_bytes = base_image.get("image")
-                width = base_image.get("width", 0)
-                height = base_image.get("height", 0)
-
-                if width < min_image_size or height < min_image_size:
-                    continue
-                if width * height < min_image_area:
-                    continue
-
-                img_bbox = find_image_position(page, xref)
-                if img_bbox is None:
-                    continue
-
-                if is_inside_any_bbox(img_bbox, table_bboxes, threshold=0.7):
-                    continue
-
-                image_tag = image_processor.save_image(image_bytes)
-
-                if image_tag:
-                    processed_images.add(xref)
-
-                    elements.append(PageElement(
-                        element_type=ElementType.IMAGE,
-                        content=f'\n{image_tag}\n',
-                        bbox=img_bbox,
-                        page_num=page_num
-                    ))
-
-            except Exception as e:
-                logger.debug(f"[PDF] Error extracting image xref={xref}: {e}")
-                continue
-
-    except Exception as e:
-        logger.warning(f"[PDF] Error extracting images: {e}")
-
-    return elements
diff --git a/contextifier/core/processor/pdf_helpers/pdf_image_processor.py b/contextifier/core/processor/pdf_helpers/pdf_image_processor.py
new file mode 100644
index 0000000..32c20b8
--- /dev/null
+++ b/contextifier/core/processor/pdf_helpers/pdf_image_processor.py
@@ -0,0 +1,321 @@
+# contextifier/core/processor/pdf_helpers/pdf_image_processor.py
+"""
+PDF Image Processor
+
+Provides PDF-specific image processing that inherits from ImageProcessor.
+Handles XRef images, inline images, and page rendering for complex regions.
+
+This class consolidates all PDF image extraction logic including:
+- XRef-based image extraction
+- Page region rendering
+- Image filtering by size/position
+"""
+import logging
+from typing import Any, Dict, List, Optional, Set, Tuple, TYPE_CHECKING
+
+from contextifier.core.functions.img_processor import ImageProcessor
+from contextifier.core.functions.storage_backend import BaseStorageBackend
+
+if TYPE_CHECKING:
+    import fitz
+
+logger = logging.getLogger("contextify.image_processor.pdf")
+
+
+class PDFImageProcessor(ImageProcessor):
+    """
+    PDF-specific image processor.
+    
+    Inherits from ImageProcessor and provides PDF-specific processing.
+    
+    Handles:
+    - XRef images (embedded images with XRef references)
+    - Inline images
+    - Page region rendering for complex areas
+    - Image extraction from PyMuPDF objects
+    
+    Example:
+        processor = PDFImageProcessor()
+        
+        # Process XRef image
+        tag = processor.process_image(image_data, xref=123)
+        
+        # Process page region
+        tag = processor.process_page_region(page, rect)
+    """
+    
+    def __init__(
+        self,
+        directory_path: str = "temp/images",
+        tag_prefix: str = "[Image:",
+        tag_suffix: str = "]",
+        storage_backend: Optional[BaseStorageBackend] = None,
+        dpi: int = 150,
+    ):
+        """
+        Initialize PDFImageProcessor.
+        
+        Args:
+            directory_path: Image save directory
+            tag_prefix: Tag prefix for image references
+            tag_suffix: Tag suffix for image references
+            storage_backend: Storage backend for saving images
+            dpi: DPI for page rendering
+        """
+        super().__init__(
+            directory_path=directory_path,
+            tag_prefix=tag_prefix,
+            tag_suffix=tag_suffix,
+            storage_backend=storage_backend,
+        )
+        self._dpi = dpi
+    
+    @property
+    def dpi(self) -> int:
+        """DPI for page rendering."""
+        return self._dpi
+    
+    @dpi.setter
+    def dpi(self, value: int) -> None:
+        """Set DPI for page rendering."""
+        self._dpi = value
+    
+    def process_image(
+        self,
+        image_data: bytes,
+        xref: Optional[int] = None,
+        page_num: Optional[int] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process and save PDF image data.
+        
+        Args:
+            image_data: Raw image binary data
+            xref: Image XRef number (for naming)
+            page_num: Page number (for naming)
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        # Generate custom name based on XRef or page
+        custom_name = None
+        if xref is not None:
+            custom_name = f"pdf_xref_{xref}"
+        elif page_num is not None:
+            custom_name = f"pdf_page_{page_num}"
+        
+        return self.save_image(image_data, custom_name=custom_name)
+    
+    def process_xref_image(
+        self,
+        doc: "fitz.Document",
+        xref: int,
+    ) -> Optional[str]:
+        """
+        Extract and save image by XRef number.
+        
+        Args:
+            doc: PyMuPDF document object
+            xref: Image XRef number
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        try:
+            import fitz
+            
+            image_dict = doc.extract_image(xref)
+            if not image_dict:
+                return None
+            
+            image_data = image_dict.get("image")
+            if not image_data:
+                return None
+            
+            return self.process_image(image_data, xref=xref)
+            
+        except Exception as e:
+            self._logger.warning(f"Failed to extract XRef image {xref}: {e}")
+            return None
+    
+    def process_page_region(
+        self,
+        page: "fitz.Page",
+        rect: "fitz.Rect",
+        region_name: Optional[str] = None,
+    ) -> Optional[str]:
+        """
+        Render and save a page region as image.
+        
+        Used for complex regions that can't be represented as text.
+        
+        Args:
+            page: PyMuPDF page object
+            rect: Region rectangle to render
+            region_name: Optional name for the region
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        try:
+            import fitz
+            
+            # Calculate zoom for DPI
+            zoom = self._dpi / 72.0
+            mat = fitz.Matrix(zoom, zoom)
+            
+            # Clip to region
+            clip = rect
+            pix = page.get_pixmap(matrix=mat, clip=clip, alpha=False)
+            image_data = pix.tobytes("png")
+            
+            custom_name = region_name or f"pdf_page{page.number}_region"
+            return self.save_image(image_data, custom_name=custom_name)
+            
+        except Exception as e:
+            self._logger.warning(f"Failed to render page region: {e}")
+            return None
+    
+    def process_embedded_image(
+        self,
+        image_data: bytes,
+        image_name: Optional[str] = None,
+        xref: Optional[int] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process embedded PDF image.
+        
+        Args:
+            image_data: Image binary data
+            image_name: Original image name
+            xref: Image XRef number
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        custom_name = image_name
+        if custom_name is None and xref is not None:
+            custom_name = f"pdf_embedded_{xref}"
+        
+        return self.save_image(image_data, custom_name=custom_name)
+    
+    def render_page(
+        self,
+        page: "fitz.Page",
+        alpha: bool = False,
+    ) -> Optional[str]:
+        """
+        Render entire page as image.
+        
+        Args:
+            page: PyMuPDF page object
+            alpha: Include alpha channel
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        try:
+            import fitz
+            
+            zoom = self._dpi / 72.0
+            mat = fitz.Matrix(zoom, zoom)
+            pix = page.get_pixmap(matrix=mat, alpha=alpha)
+            image_data = pix.tobytes("png")
+            
+            custom_name = f"pdf_page_{page.number + 1}_full"
+            return self.save_image(image_data, custom_name=custom_name)
+            
+        except Exception as e:
+            self._logger.warning(f"Failed to render page: {e}")
+            return None
+    
+    def extract_images_from_page(
+        self,
+        page: "fitz.Page",
+        page_num: int,
+        doc: "fitz.Document",
+        processed_images: Set[int],
+        table_bboxes: List[Tuple[float, float, float, float]],
+        min_image_size: int = 50,
+        min_image_area: int = 2500
+    ) -> List[Dict[str, Any]]:
+        """
+        Extract images from PDF page.
+        
+        This consolidates the logic from pdf_image.py extract_images_from_page().
+        
+        Args:
+            page: PyMuPDF page object
+            page_num: Page number (0-indexed)
+            doc: PyMuPDF document object
+            processed_images: Set of already processed image xrefs
+            table_bboxes: List of table bounding boxes to exclude
+            min_image_size: Minimum image dimension
+            min_image_area: Minimum image area
+            
+        Returns:
+            List of dicts with 'content', 'bbox', 'page_num' keys
+        """
+        from contextifier.core.processor.pdf_helpers.pdf_utils import (
+            find_image_position,
+            is_inside_any_bbox,
+        )
+        
+        elements = []
+        
+        try:
+            image_list = page.get_images()
+            
+            for img_info in image_list:
+                xref = img_info[0]
+                
+                if xref in processed_images:
+                    continue
+                
+                try:
+                    base_image = doc.extract_image(xref)
+                    if not base_image:
+                        continue
+                    
+                    image_bytes = base_image.get("image")
+                    width = base_image.get("width", 0)
+                    height = base_image.get("height", 0)
+                    
+                    if width < min_image_size or height < min_image_size:
+                        continue
+                    if width * height < min_image_area:
+                        continue
+                    
+                    img_bbox = find_image_position(page, xref)
+                    if img_bbox is None:
+                        continue
+                    
+                    if is_inside_any_bbox(img_bbox, table_bboxes, threshold=0.7):
+                        continue
+                    
+                    # Use format-specific process_image method
+                    image_tag = self.process_image(image_bytes, xref=xref, page_num=page_num)
+                    
+                    if image_tag:
+                        processed_images.add(xref)
+                        elements.append({
+                            'content': f'\n{image_tag}\n',
+                            'bbox': img_bbox,
+                            'page_num': page_num
+                        })
+                
+                except Exception as e:
+                    logger.debug(f"[PDF] Error extracting image xref={xref}: {e}")
+                    continue
+        
+        except Exception as e:
+            logger.warning(f"[PDF] Error extracting images: {e}")
+        
+        return elements
+
+
+__all__ = ["PDFImageProcessor"]
diff --git a/contextifier/core/processor/pdf_helpers/pdf_metadata.py b/contextifier/core/processor/pdf_helpers/pdf_metadata.py
index a226e98..ad1693d 100644
--- a/contextifier/core/processor/pdf_helpers/pdf_metadata.py
+++ b/contextifier/core/processor/pdf_helpers/pdf_metadata.py
@@ -2,61 +2,71 @@
 """
 PDF Metadata Extraction Module
 
-Provides functions for extracting and formatting PDF document metadata.
+Provides PDFMetadataExtractor class for extracting and formatting PDF document metadata.
+Implements BaseMetadataExtractor interface from contextifier.core.functions.
 """
 import logging
 from datetime import datetime
 from typing import Any, Dict, Optional
 
+from contextifier.core.functions.metadata_extractor import (
+    BaseMetadataExtractor,
+    DocumentMetadata,
+)
+
 logger = logging.getLogger("document-processor")
 
 
-def extract_pdf_metadata(doc) -> Dict[str, Any]:
+class PDFMetadataExtractor(BaseMetadataExtractor):
     """
-    Extract metadata from a PDF document.
-
-    Args:
-        doc: PyMuPDF document object
-
-    Returns:
-        Metadata dictionary
+    PDF Metadata Extractor.
+    
+    Extracts metadata from PyMuPDF (fitz) document objects.
+    
+    Supported fields:
+    - title, subject, author, keywords
+    - create_time, last_saved_time
+    
+    Usage:
+        extractor = PDFMetadataExtractor()
+        metadata = extractor.extract(pdf_doc)
+        text = extractor.format(metadata)
     """
-    metadata = {}
-
-    try:
-        pdf_meta = doc.metadata
-        if not pdf_meta:
-            return metadata
-
-        if pdf_meta.get('title'):
-            metadata['title'] = pdf_meta['title'].strip()
-
-        if pdf_meta.get('subject'):
-            metadata['subject'] = pdf_meta['subject'].strip()
-
-        if pdf_meta.get('author'):
-            metadata['author'] = pdf_meta['author'].strip()
-
-        if pdf_meta.get('keywords'):
-            metadata['keywords'] = pdf_meta['keywords'].strip()
-
-        if pdf_meta.get('creationDate'):
-            create_time = parse_pdf_date(pdf_meta['creationDate'])
-            if create_time:
-                metadata['create_time'] = create_time
-
-        if pdf_meta.get('modDate'):
-            mod_time = parse_pdf_date(pdf_meta['modDate'])
-            if mod_time:
-                metadata['last_saved_time'] = mod_time
-
-    except Exception as e:
-        logger.debug(f"[PDF] Error extracting metadata: {e}")
-
-    return metadata
-
-
-def parse_pdf_date(date_str: str) -> Optional[datetime]:
+    
+    def extract(self, source: Any) -> DocumentMetadata:
+        """
+        Extract metadata from PDF document.
+        
+        Args:
+            source: PyMuPDF document object (fitz.Document)
+            
+        Returns:
+            DocumentMetadata instance containing extracted metadata.
+        """
+        try:
+            pdf_meta = source.metadata
+            if not pdf_meta:
+                return DocumentMetadata()
+            
+            return DocumentMetadata(
+                title=self._get_stripped(pdf_meta, 'title'),
+                subject=self._get_stripped(pdf_meta, 'subject'),
+                author=self._get_stripped(pdf_meta, 'author'),
+                keywords=self._get_stripped(pdf_meta, 'keywords'),
+                create_time=parse_pdf_date(pdf_meta.get('creationDate')),
+                last_saved_time=parse_pdf_date(pdf_meta.get('modDate')),
+            )
+        except Exception as e:
+            self.logger.debug(f"[PDF] Error extracting metadata: {e}")
+            return DocumentMetadata()
+    
+    def _get_stripped(self, meta: Dict[str, Any], key: str) -> Optional[str]:
+        """Get stripped string value from metadata dict."""
+        value = meta.get(key)
+        return value.strip() if value else None
+
+
+def parse_pdf_date(date_str: Optional[str]) -> Optional[datetime]:
     """
     Convert a PDF date string to datetime.
 
@@ -84,37 +94,7 @@ def parse_pdf_date(date_str: str) -> Optional[datetime]:
     return None
 
 
-def format_metadata(metadata: Dict[str, Any]) -> str:
-    """
-    Format metadata as a string.
-
-    Args:
-        metadata: Metadata dictionary
-
-    Returns:
-        Formatted metadata string
-    """
-    if not metadata:
-        return ""
-
-    lines = ["<Document-Metadata>"]
-
-    field_names = {
-        'title': 'Title',
-        'subject': 'Subject',
-        'author': 'Author',
-        'keywords': 'Keywords',
-        'create_time': 'Created',
-        'last_saved_time': 'Last Modified'
-    }
-
-    for key, label in field_names.items():
-        value = metadata.get(key)
-        if value:
-            if isinstance(value, datetime):
-                value = value.strftime("%Y-%m-%d %H:%M:%S")
-            lines.append(f"  {label}: {value}")
-
-    lines.append("</Document-Metadata>\n")
-
-    return "\n".join(lines)
+__all__ = [
+    "PDFMetadataExtractor",
+    "parse_pdf_date",
+]
diff --git a/contextifier/core/processor/pdf_helpers/pdf_preprocessor.py b/contextifier/core/processor/pdf_helpers/pdf_preprocessor.py
new file mode 100644
index 0000000..8198617
--- /dev/null
+++ b/contextifier/core/processor/pdf_helpers/pdf_preprocessor.py
@@ -0,0 +1,106 @@
+# contextifier/core/processor/pdf_helpers/pdf_preprocessor.py
+"""
+PDF Preprocessor - Process PDF document after conversion.
+
+This preprocessor handles PDF-specific processing after the document
+has been converted from binary to fitz.Document.
+
+Processing Pipeline Position:
+    1. PDFFileConverter.convert() → fitz.Document
+    2. PDFPreprocessor.preprocess() → PreprocessedData (THIS STEP)
+    3. PDFMetadataExtractor.extract() → DocumentMetadata
+    4. Content extraction (text, images, tables)
+
+Current Implementation:
+    - Pass-through (no special preprocessing needed for PDF)
+    - PDF processing is done during content extraction phase
+
+Future Enhancements:
+    - Page rotation normalization
+    - Damaged page recovery
+    - Font embedding analysis
+    - Document structure analysis
+"""
+import logging
+from typing import Any, Dict
+
+from contextifier.core.functions.preprocessor import (
+    BasePreprocessor,
+    PreprocessedData,
+)
+
+logger = logging.getLogger("contextify.pdf.preprocessor")
+
+
+class PDFPreprocessor(BasePreprocessor):
+    """
+    PDF Document Preprocessor.
+
+    Currently a pass-through implementation as PDF processing
+    is handled during the content extraction phase.
+
+    The fitz.Document object from PDFFileConverter already provides
+    a clean interface for accessing pages, text, and images.
+    """
+
+    def preprocess(
+        self,
+        converted_data: Any,
+        **kwargs
+    ) -> PreprocessedData:
+        """
+        Preprocess the converted PDF document.
+
+        Args:
+            converted_data: fitz.Document object from PDFFileConverter
+            **kwargs: Additional options
+                - analyze_structure: Whether to analyze document structure
+                - normalize_rotation: Whether to normalize page rotation
+
+        Returns:
+            PreprocessedData with the document and any extracted resources
+        """
+        # For now, PDF preprocessing is a pass-through
+        # The fitz.Document is already in a workable state
+
+        # Store the document reference for downstream processing
+        metadata: Dict[str, Any] = {}
+
+        # If it's a fitz.Document, extract some basic info
+        if hasattr(converted_data, 'page_count'):
+            metadata['page_count'] = converted_data.page_count
+            metadata['is_encrypted'] = getattr(converted_data, 'is_encrypted', False)
+            metadata['is_pdf'] = getattr(converted_data, 'is_pdf', True)
+
+        logger.debug("PDF preprocessor: pass-through, metadata=%s", metadata)
+
+        # clean_content is the TRUE SOURCE - contains the fitz.Document
+        return PreprocessedData(
+            raw_content=converted_data,
+            clean_content=converted_data,  # TRUE SOURCE - fitz.Document
+            encoding="binary",
+            extracted_resources={"document": converted_data},
+            metadata=metadata,
+        )
+
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "PDF Preprocessor"
+
+    def validate(self, data: Any) -> bool:
+        """
+        Validate if the data can be preprocessed.
+
+        Args:
+            data: fitz.Document object or bytes
+
+        Returns:
+            True if valid PDF document
+        """
+        # Check if it's a fitz.Document
+        if hasattr(data, 'page_count') and hasattr(data, 'load_page'):
+            return True
+        return False
+
+
+__all__ = ['PDFPreprocessor']
diff --git a/contextifier/core/processor/ppt_handler.py b/contextifier/core/processor/ppt_handler.py
index de1890c..a1a22a8 100644
--- a/contextifier/core/processor/ppt_handler.py
+++ b/contextifier/core/processor/ppt_handler.py
@@ -7,15 +7,11 @@
 import logging
 from typing import Any, Dict, List, Optional, Set, TYPE_CHECKING
 
-from pptx import Presentation
-
 from contextifier.core.processor.base_handler import BaseHandler
 from contextifier.core.functions.chart_extractor import BaseChartExtractor
 from contextifier.core.processor.ppt_helper import (
     ElementType,
     SlideElement,
-    extract_ppt_metadata,
-    format_metadata,
     extract_text_with_bullets,
     is_simple_table,
     extract_simple_table_as_text,
@@ -29,6 +25,8 @@
     merge_slide_elements,
 )
 from contextifier.core.processor.ppt_helper.ppt_chart_extractor import PPTChartExtractor
+from contextifier.core.processor.ppt_helper.ppt_metadata import PPTMetadataExtractor
+from contextifier.core.processor.ppt_helper.ppt_image_processor import PPTImageProcessor
 
 if TYPE_CHECKING:
     from contextifier.core.document_processor import CurrentFile
@@ -39,11 +37,34 @@
 
 class PPTHandler(BaseHandler):
     """PPT/PPTX File Processing Handler Class"""
-    
+
+    def _create_file_converter(self):
+        """Create PPT-specific file converter."""
+        from contextifier.core.processor.ppt_helper.ppt_file_converter import PPTFileConverter
+        return PPTFileConverter()
+
+    def _create_preprocessor(self):
+        """Create PPT-specific preprocessor."""
+        from contextifier.core.processor.ppt_helper.ppt_preprocessor import PPTPreprocessor
+        return PPTPreprocessor()
+
     def _create_chart_extractor(self) -> BaseChartExtractor:
         """Create PPT-specific chart extractor."""
         return PPTChartExtractor(self._chart_processor)
-    
+
+    def _create_metadata_extractor(self):
+        """Create PPT-specific metadata extractor."""
+        return PPTMetadataExtractor()
+
+    def _create_format_image_processor(self):
+        """Create PPT-specific image processor."""
+        return PPTImageProcessor(
+            directory_path=self._image_processor.config.directory_path,
+            tag_prefix=self._image_processor.config.tag_prefix,
+            tag_suffix=self._image_processor.config.tag_suffix,
+            storage_backend=self._image_processor.storage_backend,
+        )
+
     def extract_text(
         self,
         current_file: "CurrentFile",
@@ -52,39 +73,45 @@ def extract_text(
     ) -> str:
         """
         Extract text from PPT/PPTX file.
-        
+
         Args:
             current_file: CurrentFile dict containing file info and binary data
             extract_metadata: Whether to extract metadata
             **kwargs: Additional options
-            
+
         Returns:
             Extracted text
         """
         file_path = current_file.get("file_path", "unknown")
         self.logger.info(f"PPT processing: {file_path}")
         return self._extract_ppt_enhanced(current_file, extract_metadata)
-    
+
     def _extract_ppt_enhanced(self, current_file: "CurrentFile", extract_metadata: bool = True) -> str:
         """Enhanced PPT processing with pre-extracted charts."""
         file_path = current_file.get("file_path", "unknown")
         self.logger.info(f"Enhanced PPT processing: {file_path}")
-        
+
         try:
-            # Open from stream to avoid path encoding issues
+            # Step 1: Convert to Presentation using file_converter
+            file_data = current_file.get("file_data", b"")
             file_stream = self.get_file_stream(current_file)
-            prs = Presentation(file_stream)
+            prs = self.file_converter.convert(file_data, file_stream)
+
+            # Step 2: Preprocess - may transform prs in the future
+            preprocessed = self.preprocess(prs)
+            prs = preprocessed.clean_content  # TRUE SOURCE
+
             result_parts = []
             processed_images: Set[str] = set()
             total_tables = 0
             total_images = 0
             total_charts = 0
-            
+
             # Pre-extract all charts using ChartExtractor
             file_stream.seek(0)
             chart_data_list = self.chart_extractor.extract_all_from_file(file_stream)
             chart_idx = [0]  # Mutable container for closure
-            
+
             def get_next_chart() -> str:
                 """Callback to get the next pre-extracted chart content."""
                 if chart_idx[0] < len(chart_data_list):
@@ -92,25 +119,24 @@ def get_next_chart() -> str:
                     chart_idx[0] += 1
                     return self._format_chart_data(chart_data)
                 return ""
-            
+
             if extract_metadata:
-                metadata = extract_ppt_metadata(prs)
-                metadata_text = format_metadata(metadata)
+                metadata_text = self.extract_and_format_metadata(prs)
                 if metadata_text:
                     result_parts.append(metadata_text)
                     result_parts.append("")
-            
+
             for slide_idx, slide in enumerate(prs.slides):
                 slide_tag = self.create_slide_tag(slide_idx + 1)
                 result_parts.append(f"\n{slide_tag}\n")
-                
+
                 elements: List[SlideElement] = []
-                
+
                 for shape in slide.shapes:
                     try:
                         position = get_shape_position(shape)
                         shape_id = shape.shape_id if hasattr(shape, 'shape_id') else id(shape)
-                        
+
                         if shape.has_table:
                             if is_simple_table(shape.table):
                                 simple_text = extract_simple_table_as_text(shape.table)
@@ -131,9 +157,9 @@ def get_next_chart() -> str:
                                         position=position,
                                         shape_id=shape_id
                                     ))
-                        
+
                         elif is_picture_shape(shape):
-                            image_tag = process_image_shape(shape, processed_images, self.image_processor)
+                            image_tag = process_image_shape(shape, processed_images, self.format_image_processor)
                             if image_tag:
                                 total_images += 1
                                 elements.append(SlideElement(
@@ -142,7 +168,7 @@ def get_next_chart() -> str:
                                     position=position,
                                     shape_id=shape_id
                                 ))
-                        
+
                         elif shape.has_chart:
                             # Use pre-extracted chart via callback
                             chart_text = get_next_chart()
@@ -154,7 +180,7 @@ def get_next_chart() -> str:
                                     position=position,
                                     shape_id=shape_id
                                 ))
-                        
+
                         elif hasattr(shape, "text_frame") and shape.text_frame:
                             text_content = extract_text_with_bullets(shape.text_frame)
                             if text_content:
@@ -164,7 +190,7 @@ def get_next_chart() -> str:
                                     position=position,
                                     shape_id=shape_id
                                 ))
-                        
+
                         elif hasattr(shape, "text") and shape.text.strip():
                             elements.append(SlideElement(
                                 element_type=ElementType.TEXT,
@@ -172,46 +198,46 @@ def get_next_chart() -> str:
                                 position=position,
                                 shape_id=shape_id
                             ))
-                        
+
                         elif hasattr(shape, "shapes"):
-                            group_elements = process_group_shape(shape, processed_images, self.image_processor)
+                            group_elements = process_group_shape(shape, processed_images, self.format_image_processor)
                             elements.extend(group_elements)
-                    
+
                     except Exception as shape_e:
                         self.logger.warning(f"Error processing shape in slide {slide_idx + 1}: {shape_e}")
                         continue
-                
+
                 elements.sort(key=lambda e: e.sort_key)
                 slide_content = merge_slide_elements(elements)
-                
+
                 if slide_content.strip():
                     result_parts.append(slide_content)
                 else:
                     result_parts.append("[Empty Slide]\n")
-                
+
                 notes_text = extract_slide_notes(slide)
                 if notes_text:
                     result_parts.append(f"\n[Slide Notes]\n{notes_text}\n")
-            
+
             result = "".join(result_parts)
             self.logger.info(f"Enhanced PPT: {len(prs.slides)} slides, {total_tables} tables, "
                            f"{total_images} images, {total_charts} charts")
-            
+
             return result
-            
+
         except Exception as e:
             self.logger.error(f"Error in enhanced PPT processing: {e}")
             import traceback
             self.logger.debug(traceback.format_exc())
             return self._extract_ppt_simple(current_file)
-    
+
     def _format_chart_data(self, chart_data: "ChartData") -> str:
         """Format ChartData using ChartProcessor."""
         from contextifier.core.functions.chart_extractor import ChartData
-        
+
         if not isinstance(chart_data, ChartData):
             return ""
-        
+
         if chart_data.has_data():
             return self.chart_processor.format_chart_data(
                 chart_type=chart_data.chart_type,
@@ -224,18 +250,19 @@ def _format_chart_data(self, chart_data: "ChartData") -> str:
                 chart_type=chart_data.chart_type,
                 title=chart_data.title
             )
-    
+
     def _extract_ppt_simple(self, current_file: "CurrentFile") -> str:
         """Simple text extraction (fallback)."""
         try:
+            file_data = current_file.get("file_data", b"")
             file_stream = self.get_file_stream(current_file)
-            prs = Presentation(file_stream)
+            prs = self.file_converter.convert(file_data, file_stream)
             result_parts = []
-            
+
             for slide_idx, slide in enumerate(prs.slides):
                 slide_tag = self.create_slide_tag(slide_idx + 1)
                 result_parts.append(f"\n{slide_tag}\n")
-                
+
                 slide_texts = []
                 for shape in slide.shapes:
                     try:
@@ -247,14 +274,14 @@ def _extract_ppt_simple(self, current_file: "CurrentFile") -> str:
                                 slide_texts.append(table_text)
                     except:
                         continue
-                
+
                 if slide_texts:
                     result_parts.append("\n".join(slide_texts) + "\n")
                 else:
                     result_parts.append("[Empty Slide]\n")
-            
+
             return "".join(result_parts)
-            
+
         except Exception as e:
             self.logger.error(f"Error in simple PPT extraction: {e}")
             return f"[PPT file processing failed: {str(e)}]"
diff --git a/contextifier/core/processor/ppt_helper/__init__.py b/contextifier/core/processor/ppt_helper/__init__.py
index 75c07fb..4612d62 100644
--- a/contextifier/core/processor/ppt_helper/__init__.py
+++ b/contextifier/core/processor/ppt_helper/__init__.py
@@ -24,8 +24,7 @@
 
 # === Metadata ===
 from contextifier.core.processor.ppt_helper.ppt_metadata import (
-    extract_ppt_metadata,
-    format_metadata,
+    PPTMetadataExtractor,
 )
 
 # === Bullet/Numbering ===
diff --git a/contextifier/core/processor/ppt_helper/ppt_file_converter.py b/contextifier/core/processor/ppt_helper/ppt_file_converter.py
new file mode 100644
index 0000000..d7246a7
--- /dev/null
+++ b/contextifier/core/processor/ppt_helper/ppt_file_converter.py
@@ -0,0 +1,54 @@
+# libs/core/processor/ppt_helper/ppt_file_converter.py
+"""
+PPTFileConverter - PPT/PPTX file format converter
+
+Converts binary PPT/PPTX data to python-pptx Presentation object.
+"""
+from io import BytesIO
+from typing import Any, Optional, BinaryIO
+
+from contextifier.core.functions.file_converter import BaseFileConverter
+
+
+class PPTFileConverter(BaseFileConverter):
+    """
+    PPT/PPTX file converter using python-pptx.
+    
+    Converts binary PPT/PPTX data to Presentation object.
+    """
+    
+    # ZIP magic number (PPTX is a ZIP file)
+    ZIP_MAGIC = b'PK\x03\x04'
+    
+    def convert(
+        self,
+        file_data: bytes,
+        file_stream: Optional[BinaryIO] = None,
+        **kwargs
+    ) -> Any:
+        """
+        Convert binary PPT/PPTX data to Presentation object.
+        
+        Args:
+            file_data: Raw binary PPT/PPTX data
+            file_stream: Optional file stream
+            **kwargs: Additional options
+            
+        Returns:
+            pptx.Presentation object
+        """
+        from pptx import Presentation
+        
+        stream = file_stream if file_stream is not None else BytesIO(file_data)
+        stream.seek(0)
+        return Presentation(stream)
+    
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "PPT/PPTX Presentation"
+    
+    def validate(self, file_data: bytes) -> bool:
+        """Validate if data is a valid PPTX."""
+        if not file_data or len(file_data) < 4:
+            return False
+        return file_data[:4] == self.ZIP_MAGIC
diff --git a/contextifier/core/processor/ppt_helper/ppt_image_processor.py b/contextifier/core/processor/ppt_helper/ppt_image_processor.py
new file mode 100644
index 0000000..af05972
--- /dev/null
+++ b/contextifier/core/processor/ppt_helper/ppt_image_processor.py
@@ -0,0 +1,196 @@
+# contextifier/core/processor/ppt_helper/ppt_image_processor.py
+"""
+PPT Image Processor
+
+Provides PPT/PPTX-specific image processing that inherits from ImageProcessor.
+Handles slide images, shape images, and embedded pictures.
+"""
+import logging
+from typing import Any, Dict, Optional, Set, TYPE_CHECKING
+
+from contextifier.core.functions.img_processor import ImageProcessor
+from contextifier.core.functions.storage_backend import BaseStorageBackend
+
+if TYPE_CHECKING:
+    from pptx import Presentation
+    from pptx.slide import Slide
+    from pptx.shapes.base import BaseShape
+
+logger = logging.getLogger("contextify.image_processor.ppt")
+
+
+class PPTImageProcessor(ImageProcessor):
+    """
+    PPT/PPTX-specific image processor.
+    
+    Inherits from ImageProcessor and provides PPT-specific processing.
+    
+    Handles:
+    - Picture shapes
+    - Embedded images
+    - Group shape images
+    - Background images
+    
+    Example:
+        processor = PPTImageProcessor()
+        
+        # Process slide image
+        tag = processor.process_image(image_data, slide_num=1)
+        
+        # Process from shape
+        tag = processor.process_picture_shape(shape)
+    """
+    
+    def __init__(
+        self,
+        directory_path: str = "temp/images",
+        tag_prefix: str = "[Image:",
+        tag_suffix: str = "]",
+        storage_backend: Optional[BaseStorageBackend] = None,
+    ):
+        """
+        Initialize PPTImageProcessor.
+        
+        Args:
+            directory_path: Image save directory
+            tag_prefix: Tag prefix for image references
+            tag_suffix: Tag suffix for image references
+            storage_backend: Storage backend for saving images
+        """
+        super().__init__(
+            directory_path=directory_path,
+            tag_prefix=tag_prefix,
+            tag_suffix=tag_suffix,
+            storage_backend=storage_backend,
+        )
+    
+    def process_image(
+        self,
+        image_data: bytes,
+        slide_num: Optional[int] = None,
+        shape_id: Optional[int] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process and save PPT image data.
+        
+        Args:
+            image_data: Raw image binary data
+            slide_num: Source slide number (for naming)
+            shape_id: Shape ID (for naming)
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        custom_name = None
+        if slide_num is not None:
+            if shape_id is not None:
+                custom_name = f"ppt_slide{slide_num}_shape{shape_id}"
+            else:
+                custom_name = f"ppt_slide{slide_num}"
+        elif shape_id is not None:
+            custom_name = f"ppt_shape{shape_id}"
+        
+        return self.save_image(image_data, custom_name=custom_name)
+    
+    def process_picture_shape(
+        self,
+        shape: "BaseShape",
+        slide_num: Optional[int] = None,
+    ) -> Optional[str]:
+        """
+        Process python-pptx picture shape.
+        
+        Args:
+            shape: Picture shape object
+            slide_num: Source slide number
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        try:
+            if not hasattr(shape, 'image'):
+                return None
+            
+            image = shape.image
+            image_data = image.blob
+            
+            if not image_data:
+                return None
+            
+            shape_id = shape.shape_id if hasattr(shape, 'shape_id') else None
+            
+            return self.process_image(
+                image_data,
+                slide_num=slide_num,
+                shape_id=shape_id
+            )
+            
+        except Exception as e:
+            self._logger.warning(f"Failed to process picture shape: {e}")
+            return None
+    
+    def process_embedded_image(
+        self,
+        image_data: bytes,
+        image_name: Optional[str] = None,
+        slide_num: Optional[int] = None,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process embedded PPT image.
+        
+        Args:
+            image_data: Image binary data
+            image_name: Original image filename
+            slide_num: Source slide number
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string, or None on failure
+        """
+        custom_name = image_name
+        if custom_name is None and slide_num is not None:
+            custom_name = f"ppt_embed_slide{slide_num}"
+        
+        return self.save_image(image_data, custom_name=custom_name)
+    
+    def process_group_shape_images(
+        self,
+        group_shape: "BaseShape",
+        slide_num: Optional[int] = None,
+    ) -> list:
+        """
+        Process all images in a group shape.
+        
+        Args:
+            group_shape: Group shape containing other shapes
+            slide_num: Source slide number
+            
+        Returns:
+            List of image tags
+        """
+        tags = []
+        
+        try:
+            if not hasattr(group_shape, 'shapes'):
+                return tags
+            
+            for shape in group_shape.shapes:
+                if hasattr(shape, 'image'):
+                    tag = self.process_picture_shape(shape, slide_num)
+                    if tag:
+                        tags.append(tag)
+                elif hasattr(shape, 'shapes'):
+                    # Nested group
+                    nested_tags = self.process_group_shape_images(shape, slide_num)
+                    tags.extend(nested_tags)
+                    
+        except Exception as e:
+            self._logger.warning(f"Failed to process group shape: {e}")
+        
+        return tags
+
+
+__all__ = ["PPTImageProcessor"]
diff --git a/contextifier/core/processor/ppt_helper/ppt_metadata.py b/contextifier/core/processor/ppt_helper/ppt_metadata.py
index 9d8f487..ed12d94 100644
--- a/contextifier/core/processor/ppt_helper/ppt_metadata.py
+++ b/contextifier/core/processor/ppt_helper/ppt_metadata.py
@@ -1,105 +1,71 @@
+# contextifier/core/processor/ppt_helper/ppt_metadata.py
 """
-PPT 메타데이터 추출 모듈
+PPT Metadata Extraction Module
 
-포함 함수:
-- extract_ppt_metadata(): PPT에서 메타데이터 추출
-- format_metadata(): 메타데이터를 읽기 쉬운 문자열로 변환
+Provides PPTMetadataExtractor class for extracting metadata from PowerPoint documents.
+Implements BaseMetadataExtractor interface.
 """
 import logging
-from datetime import datetime
-from typing import Any, Dict
+from typing import Any, Optional
 
 from pptx import Presentation
 
-logger = logging.getLogger("document-processor")
-
-
-def extract_ppt_metadata(prs: Presentation) -> Dict[str, Any]:
-    """
-    PPT 파일에서 메타데이터를 추출합니다.
-
-    python-pptx의 core_properties를 통해 다음 정보를 추출합니다:
-    - 제목 (title)
-    - 주제 (subject)
-    - 작성자 (author)
-    - 키워드 (keywords)
-    - 설명 (comments)
-    - 마지막 수정자 (last_modified_by)
-    - 작성일 (created)
-    - 수정일 (modified)
-
-    Args:
-        prs: python-pptx Presentation 객체
-
-    Returns:
-        메타데이터 딕셔너리
-    """
-    metadata = {}
-
-    try:
-        props = prs.core_properties
-
-        if props.title:
-            metadata['title'] = props.title
-        if props.subject:
-            metadata['subject'] = props.subject
-        if props.author:
-            metadata['author'] = props.author
-        if props.keywords:
-            metadata['keywords'] = props.keywords
-        if props.comments:
-            metadata['comments'] = props.comments
-        if props.last_modified_by:
-            metadata['last_saved_by'] = props.last_modified_by
-        if props.created:
-            metadata['create_time'] = props.created
-        if props.modified:
-            metadata['last_saved_time'] = props.modified
-
-        logger.info(f"Extracted PPT metadata: {metadata}")
+from contextifier.core.functions.metadata_extractor import (
+    BaseMetadataExtractor,
+    DocumentMetadata,
+)
 
-    except Exception as e:
-        logger.warning(f"Failed to extract PPT metadata: {e}")
-
-    return metadata
+logger = logging.getLogger("document-processor")
 
 
-def format_metadata(metadata: Dict[str, Any]) -> str:
+class PPTMetadataExtractor(BaseMetadataExtractor):
     """
-    메타데이터 딕셔너리를 읽기 쉬운 문자열로 변환합니다.
-
-    Args:
-        metadata: 메타데이터 딕셔너리
-
-    Returns:
-        포맷팅된 메타데이터 문자열
+    PPT/PPTX Metadata Extractor.
+    
+    Extracts metadata from python-pptx Presentation objects.
+    
+    Supported fields:
+    - title, subject, author, keywords, comments
+    - last_saved_by, create_time, last_saved_time
+    
+    Usage:
+        extractor = PPTMetadataExtractor()
+        metadata = extractor.extract(presentation)
+        text = extractor.format(metadata)
     """
-    if not metadata:
-        return ""
-
-    lines = ["<Document-Metadata>"]
-
-    field_names = {
-        'title': '제목',
-        'subject': '주제',
-        'author': '작성자',
-        'keywords': '키워드',
-        'comments': '설명',
-        'last_saved_by': '마지막 저장자',
-        'create_time': '작성일',
-        'last_saved_time': '수정일',
-    }
-
-    for key, label in field_names.items():
-        if key in metadata and metadata[key]:
-            value = metadata[key]
-
-            # datetime 객체 포맷팅
-            if isinstance(value, datetime):
-                value = value.strftime('%Y-%m-%d %H:%M:%S')
-
-            lines.append(f"  {label}: {value}")
-
-    lines.append("</Document-Metadata>")
-
-    return "\n".join(lines)
+    
+    def extract(self, source: Presentation) -> DocumentMetadata:
+        """
+        Extract metadata from PPT document.
+        
+        Args:
+            source: python-pptx Presentation object
+            
+        Returns:
+            DocumentMetadata instance containing extracted metadata.
+        """
+        try:
+            props = source.core_properties
+            
+            return DocumentMetadata(
+                title=self._get_value(props.title),
+                subject=self._get_value(props.subject),
+                author=self._get_value(props.author),
+                keywords=self._get_value(props.keywords),
+                comments=self._get_value(props.comments),
+                last_saved_by=self._get_value(props.last_modified_by),
+                create_time=props.created,
+                last_saved_time=props.modified,
+            )
+        except Exception as e:
+            self.logger.warning(f"Failed to extract PPT metadata: {e}")
+            return DocumentMetadata()
+    
+    def _get_value(self, value: Optional[str]) -> Optional[str]:
+        """Return value if present, None otherwise."""
+        return value if value else None
+
+
+__all__ = [
+    'PPTMetadataExtractor',
+]
diff --git a/contextifier/core/processor/ppt_helper/ppt_preprocessor.py b/contextifier/core/processor/ppt_helper/ppt_preprocessor.py
new file mode 100644
index 0000000..4a28b0d
--- /dev/null
+++ b/contextifier/core/processor/ppt_helper/ppt_preprocessor.py
@@ -0,0 +1,77 @@
+# contextifier/core/processor/ppt_helper/ppt_preprocessor.py
+"""
+PPT Preprocessor - Process PPT/PPTX presentation after conversion.
+
+Processing Pipeline Position:
+    1. PPTFileConverter.convert() → pptx.Presentation
+    2. PPTPreprocessor.preprocess() → PreprocessedData (THIS STEP)
+    3. PPTMetadataExtractor.extract() → DocumentMetadata
+    4. Content extraction (slides, shapes, images, charts)
+
+Current Implementation:
+    - Pass-through (PPT uses python-pptx Presentation object directly)
+"""
+import logging
+from typing import Any, Dict
+
+from contextifier.core.functions.preprocessor import (
+    BasePreprocessor,
+    PreprocessedData,
+)
+
+logger = logging.getLogger("contextify.ppt.preprocessor")
+
+
+class PPTPreprocessor(BasePreprocessor):
+    """
+    PPT/PPTX Presentation Preprocessor.
+
+    Currently a pass-through implementation as PPT processing
+    is handled during the content extraction phase using python-pptx.
+    """
+
+    def preprocess(
+        self,
+        converted_data: Any,
+        **kwargs
+    ) -> PreprocessedData:
+        """
+        Preprocess the converted PPT presentation.
+
+        Args:
+            converted_data: pptx.Presentation object from PPTFileConverter
+            **kwargs: Additional options
+
+        Returns:
+            PreprocessedData with the presentation and any extracted resources
+        """
+        metadata: Dict[str, Any] = {}
+
+        if hasattr(converted_data, 'slides'):
+            metadata['slide_count'] = len(converted_data.slides)
+
+        if hasattr(converted_data, 'slide_width'):
+            metadata['slide_width'] = converted_data.slide_width
+            metadata['slide_height'] = converted_data.slide_height
+
+        logger.debug("PPT preprocessor: pass-through, metadata=%s", metadata)
+
+        # clean_content is the TRUE SOURCE - contains the Presentation
+        return PreprocessedData(
+            raw_content=converted_data,
+            clean_content=converted_data,  # TRUE SOURCE - pptx.Presentation
+            encoding="utf-8",
+            extracted_resources={},
+            metadata=metadata,
+        )
+
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "PPT Preprocessor"
+
+    def validate(self, data: Any) -> bool:
+        """Validate if data is a PPT Presentation object."""
+        return hasattr(data, 'slides') and hasattr(data, 'slide_layouts')
+
+
+__all__ = ['PPTPreprocessor']
diff --git a/contextifier/core/processor/rtf_handler.py b/contextifier/core/processor/rtf_handler.py
new file mode 100644
index 0000000..5cd8420
--- /dev/null
+++ b/contextifier/core/processor/rtf_handler.py
@@ -0,0 +1,290 @@
+# contextifier/core/processor/rtf_handler.py
+"""
+RTF Handler
+
+Class-based handler for RTF files.
+Follows the correct architecture:
+1. Converter: Pass through (RTF uses raw binary)
+2. Preprocessor: Binary preprocessing (image extraction, \\bin removal)
+3. Handler: Sequential processing (metadata → tables → content → result)
+"""
+import logging
+import re
+from pathlib import Path
+from typing import Any, Dict, Optional, TYPE_CHECKING
+
+from striprtf.striprtf import rtf_to_text
+
+from contextifier.core.processor.base_handler import BaseHandler
+from contextifier.core.functions.img_processor import ImageProcessor
+from contextifier.core.functions.chart_extractor import BaseChartExtractor, NullChartExtractor
+
+# Import from rtf_helper
+from contextifier.core.processor.rtf_helper import (
+    RTFFileConverter,
+    RTFConvertedData,
+    RTFMetadataExtractor,
+    RTFSourceInfo,
+    RTFPreprocessor,
+    extract_tables_with_positions,
+    extract_inline_content,
+    extract_text_only,
+    decode_content,
+    detect_encoding,
+)
+
+if TYPE_CHECKING:
+    from contextifier.core.document_processor import CurrentFile
+
+logger = logging.getLogger("contextify.rtf.handler")
+
+
+class RTFHandler(BaseHandler):
+    """
+    RTF Document Processing Handler.
+
+    Processing flow:
+    1. file_converter.convert() → bytes (pass through)
+    2. preprocessor.preprocess() → PreprocessedData (image extraction, binary cleanup)
+    3. decode content → string
+    4. metadata_extractor.extract() → DocumentMetadata
+    5. extract_tables_with_positions() → List[RTFTable]
+    6. extract_inline_content() → str
+    7. Build result string
+    """
+
+    def _create_file_converter(self) -> RTFFileConverter:
+        """Create RTF-specific file converter."""
+        return RTFFileConverter()
+
+    def _create_preprocessor(self) -> RTFPreprocessor:
+        """Create RTF-specific preprocessor."""
+        return RTFPreprocessor()
+
+    def _create_chart_extractor(self) -> BaseChartExtractor:
+        """RTF files do not contain charts. Return NullChartExtractor."""
+        return NullChartExtractor(self._chart_processor)
+
+    def _create_metadata_extractor(self) -> RTFMetadataExtractor:
+        """Create RTF-specific metadata extractor."""
+        return RTFMetadataExtractor()
+
+    def _create_format_image_processor(self) -> ImageProcessor:
+        """Create RTF-specific image processor (use base for now)."""
+        return self._image_processor
+
+    def extract_text(
+        self,
+        current_file: "CurrentFile",
+        extract_metadata: bool = True,
+        **kwargs
+    ) -> str:
+        """
+        Extract text from RTF file.
+
+        Args:
+            current_file: CurrentFile dict containing file info and binary data
+            extract_metadata: Whether to extract metadata
+            **kwargs: Additional options
+
+        Returns:
+            Extracted text
+        """
+        file_path = current_file.get("file_path", "unknown")
+        file_data = current_file.get("file_data", b"")
+
+        self.logger.info(f"RTF processing: {file_path}")
+
+        if not file_data:
+            self.logger.error(f"Empty file data: {file_path}")
+            return f"[RTF file is empty: {file_path}]"
+
+        # Validate RTF format
+        if not file_data.strip().startswith(b'{\\rtf'):
+            self.logger.warning(f"Invalid RTF format: {file_path}")
+            return self._extract_fallback(file_data, extract_metadata)
+
+        try:
+            # Step 1: Converter - pass through (RTF uses raw binary)
+            raw_data: bytes = self.file_converter.convert(file_data)
+
+            # Step 2: Preprocessor - extract images, remove binary data
+            output_dir = self._get_output_dir(file_path)
+            doc_name = Path(file_path).stem if file_path != "unknown" else "document"
+
+            preprocessed = self.preprocessor.preprocess(
+                raw_data,
+                output_dir=output_dir,
+                doc_name=doc_name,
+            )
+
+            clean_content = preprocessed.clean_content
+            image_tags = preprocessed.extracted_resources.get("image_tags", [])
+            encoding = preprocessed.encoding or "cp949"
+
+            # Step 3: Decode to string if still bytes
+            if isinstance(clean_content, bytes):
+                encoding = detect_encoding(clean_content) or encoding
+                content = decode_content(clean_content, encoding)
+            else:
+                content = clean_content
+
+            # Build RTFConvertedData for downstream processing
+            converted = RTFConvertedData(
+                content=content,
+                encoding=encoding,
+                image_tags=image_tags,
+                original_size=len(file_data),
+            )
+
+            self.logger.debug(
+                f"RTF preprocessed: encoding={encoding}, "
+                f"images={len(image_tags)}, size={len(file_data)}"
+            )
+
+            # Step 4: Extract content
+            return self._extract_from_converted(
+                converted,
+                current_file,
+                extract_metadata,
+            )
+
+        except Exception as e:
+            self.logger.error(f"Error in RTF processing: {e}", exc_info=True)
+            return self._extract_fallback(file_data, extract_metadata)
+
+    def _extract_from_converted(
+        self,
+        converted: RTFConvertedData,
+        current_file: "CurrentFile",
+        extract_metadata: bool,
+    ) -> str:
+        """
+        Internal method to extract content from RTFConvertedData.
+
+        Args:
+            converted: RTFConvertedData object
+            current_file: CurrentFile dict
+            extract_metadata: Whether to extract metadata
+
+        Returns:
+            Extracted text
+        """
+        content = converted.content
+        encoding = converted.encoding
+
+        result_parts = []
+
+        # Step 2: Extract metadata
+        if extract_metadata:
+            source = RTFSourceInfo(content=content, encoding=encoding)
+            metadata = self.metadata_extractor.extract(source)
+            metadata_str = self.metadata_extractor.format(metadata)
+            if metadata_str:
+                result_parts.append(metadata_str + "\n\n")
+
+        # Add page tag
+        page_tag = self.create_page_tag(1)
+        result_parts.append(f"{page_tag}\n")
+
+        # Step 3: Extract tables with positions
+        tables, table_regions = extract_tables_with_positions(content, encoding)
+
+        # Step 4: Extract inline content (preserves table positions)
+        inline_content = extract_inline_content(content, table_regions, encoding)
+
+        if inline_content:
+            result_parts.append(inline_content)
+        else:
+            # Fallback: separate text and tables
+            text_only = extract_text_only(content, encoding)
+            if text_only:
+                result_parts.append(text_only)
+
+            for table in tables:
+                if not table.rows:
+                    continue
+                if table.is_real_table():
+                    result_parts.append("\n" + table.to_html() + "\n")
+                else:
+                    result_parts.append("\n" + table.to_text_list() + "\n")
+
+        # Step 5: Add image tags
+        if converted.image_tags:
+            result_parts.append("\n")
+            for tag in converted.image_tags:
+                result_parts.append(tag + "\n")
+
+        result = "\n".join(result_parts)
+
+        # Clean up invalid image tags
+        result = re.sub(r'\[image:[^\]]*uploads/\.[^\]]*\]', '', result)
+
+        return result
+
+    def _extract_fallback(
+        self,
+        file_data: bytes,
+        extract_metadata: bool,
+    ) -> str:
+        """
+        Fallback extraction using striprtf library.
+
+        Args:
+            file_data: Raw binary data
+            extract_metadata: Whether to extract metadata
+
+        Returns:
+            Extracted text
+        """
+        # Try different encodings
+        content = None
+        for encoding in ['utf-8', 'cp949', 'euc-kr', 'cp1252', 'latin-1']:
+            try:
+                content = file_data.decode(encoding)
+                break
+            except (UnicodeDecodeError, UnicodeError):
+                continue
+
+        if content is None:
+            content = file_data.decode('cp1252', errors='replace')
+
+        result_parts = []
+
+        # Extract metadata from raw content
+        if extract_metadata:
+            source = RTFSourceInfo(content=content, encoding='cp1252')
+            metadata = self.metadata_extractor.extract(source)
+            metadata_str = self.extract_and_format_metadata(metadata)
+            if metadata_str:
+                result_parts.append(metadata_str + "\n\n")
+
+        # Add page tag
+        page_tag = self.create_page_tag(1)
+        result_parts.append(f"{page_tag}\n")
+
+        # Extract text using striprtf
+        try:
+            text = rtf_to_text(content)
+        except Exception:
+            # Manual cleanup
+            text = re.sub(r'\\[a-z]+\d*\s?', '', content)
+            text = re.sub(r"\\'[0-9a-fA-F]{2}", '', text)
+            text = re.sub(r'[{}]', '', text)
+
+        if text:
+            text = re.sub(r'\n{3,}', '\n\n', text)
+            result_parts.append(text.strip())
+
+        return "\n".join(result_parts)
+
+    def _get_output_dir(self, file_path: str) -> Optional[Path]:
+        """Get output directory for images."""
+        if hasattr(self._image_processor, 'config'):
+            dir_path = self._image_processor.config.directory_path
+            if dir_path:
+                return Path(dir_path)
+        return None
+
+
+__all__ = ['RTFHandler']
diff --git a/contextifier/core/processor/rtf_helper/__init__.py b/contextifier/core/processor/rtf_helper/__init__.py
new file mode 100644
index 0000000..6a558a9
--- /dev/null
+++ b/contextifier/core/processor/rtf_helper/__init__.py
@@ -0,0 +1,128 @@
+# contextifier/core/processor/rtf_helper/__init__.py
+"""
+RTF Helper Module
+
+Provides RTF parsing and extraction utilities with proper interface separation.
+
+Architecture:
+    - RTFPreprocessor: Binary preprocessing (image extraction, \\bin handling)
+    - RTFFileConverter: Pass through (RTF uses raw binary)
+    - RTFMetadataExtractor: Metadata extraction
+    - Table extraction: extract_tables_with_positions()
+    - Content extraction: extract_inline_content(), extract_text_only()
+
+Usage:
+    from contextifier.core.processor.rtf_helper import (
+        RTFFileConverter,
+        RTFConvertedData,
+        RTFPreprocessor,
+        RTFMetadataExtractor,
+        RTFSourceInfo,
+        extract_tables_with_positions,
+        extract_inline_content,
+        extract_text_only,
+    )
+"""
+
+# Converter
+from contextifier.core.processor.rtf_helper.rtf_file_converter import (
+    RTFFileConverter,
+    RTFConvertedData,
+)
+
+# Preprocessor
+from contextifier.core.processor.rtf_helper.rtf_preprocessor import (
+    RTFPreprocessor,
+)
+
+# Metadata
+from contextifier.core.processor.rtf_helper.rtf_metadata_extractor import (
+    RTFMetadataExtractor,
+    RTFSourceInfo,
+)
+
+# Table extraction
+from contextifier.core.processor.rtf_helper.rtf_table_extractor import (
+    RTFCellInfo,
+    RTFTable,
+    extract_tables_with_positions,
+)
+
+# Content extraction
+from contextifier.core.processor.rtf_helper.rtf_content_extractor import (
+    extract_inline_content,
+    extract_text_only,
+)
+
+# Decoder utilities
+from contextifier.core.processor.rtf_helper.rtf_decoder import (
+    detect_encoding,
+    decode_content,
+    decode_bytes,
+    decode_hex_escapes,
+)
+
+# Text cleaning utilities
+from contextifier.core.processor.rtf_helper.rtf_text_cleaner import (
+    clean_rtf_text,
+    remove_destination_groups,
+    remove_shape_groups,
+    remove_shape_property_groups,
+    remove_shprslt_blocks,
+)
+
+# Region finder utilities
+from contextifier.core.processor.rtf_helper.rtf_region_finder import (
+    find_excluded_regions,
+    is_in_excluded_region,
+)
+
+# Constants
+from contextifier.core.processor.rtf_helper.rtf_constants import (
+    SHAPE_PROPERTY_NAMES,
+    SKIP_DESTINATIONS,
+    EXCLUDE_DESTINATION_KEYWORDS,
+    IMAGE_DESTINATIONS,
+    CODEPAGE_ENCODING_MAP,
+    DEFAULT_ENCODINGS,
+)
+
+
+__all__ = [
+    # Converter
+    'RTFFileConverter',
+    'RTFConvertedData',
+    # Preprocessor
+    'RTFPreprocessor',
+    # Metadata
+    'RTFMetadataExtractor',
+    'RTFSourceInfo',
+    # Table
+    'RTFCellInfo',
+    'RTFTable',
+    'extract_tables_with_positions',
+    # Content
+    'extract_inline_content',
+    'extract_text_only',
+    # Decoder
+    'detect_encoding',
+    'decode_content',
+    'decode_bytes',
+    'decode_hex_escapes',
+    # Text cleaner
+    'clean_rtf_text',
+    'remove_destination_groups',
+    'remove_shape_groups',
+    'remove_shape_property_groups',
+    'remove_shprslt_blocks',
+    # Region finder
+    'find_excluded_regions',
+    'is_in_excluded_region',
+    # Constants
+    'SHAPE_PROPERTY_NAMES',
+    'SKIP_DESTINATIONS',
+    'EXCLUDE_DESTINATION_KEYWORDS',
+    'IMAGE_DESTINATIONS',
+    'CODEPAGE_ENCODING_MAP',
+    'DEFAULT_ENCODINGS',
+]
diff --git a/contextifier/core/processor/rtf_helper/rtf_constants.py b/contextifier/core/processor/rtf_helper/rtf_constants.py
new file mode 100644
index 0000000..a9121a2
--- /dev/null
+++ b/contextifier/core/processor/rtf_helper/rtf_constants.py
@@ -0,0 +1,94 @@
+# contextifier/core/processor/rtf_helper/rtf_constants.py
+"""
+RTF Constants
+
+Constants used for RTF parsing.
+"""
+
+# Shape property names (to be removed)
+SHAPE_PROPERTY_NAMES = [
+    'shapeType', 'fFlipH', 'fFlipV', 'rotation',
+    'posh', 'posrelh', 'posv', 'posrelv',
+    'fLayoutInCell', 'fAllowOverlap', 'fBehindDocument',
+    'fPseudoInline', 'fLockAnchor', 'fLockPosition',
+    'fLockAspectRatio', 'fLockRotation', 'fLockAgainstSelect',
+    'fLockCropping', 'fLockVerticies', 'fLockText',
+    'fLockAdjustHandles', 'fLockAgainstGrouping',
+    'geoLeft', 'geoTop', 'geoRight', 'geoBottom',
+    'shapePath', 'pWrapPolygonVertices', 'dxWrapDistLeft',
+    'dyWrapDistTop', 'dxWrapDistRight', 'dyWrapDistBottom',
+    'fLine', 'fFilled', 'fillType', 'fillColor',
+    'fillOpacity', 'fillBackColor', 'fillBackOpacity',
+    'lineColor', 'lineOpacity', 'lineWidth', 'lineStyle',
+    'lineDashing', 'lineStartArrowhead', 'lineStartArrowWidth',
+    'lineStartArrowLength', 'lineEndArrowhead', 'lineEndArrowWidth',
+    'lineEndArrowLength', 'shadowType', 'shadowColor',
+    'shadowOpacity', 'shadowOffsetX', 'shadowOffsetY',
+]
+
+# RTF destination 키워드 (제외 대상)
+EXCLUDE_DESTINATION_KEYWORDS = [
+    'fonttbl', 'colortbl', 'stylesheet', 'listtable',
+    'listoverridetable', 'revtbl', 'rsidtbl', 'generator',
+    'info', 'xmlnstbl', 'mmathPr', 'themedata', 'colorschememapping',
+    'datastore', 'latentstyles', 'pgptbl', 'protusertbl',
+]
+
+# RTF skip destinations
+SKIP_DESTINATIONS = {
+    'fonttbl', 'colortbl', 'stylesheet', 'listtable',
+    'listoverridetable', 'revtbl', 'rsidtbl', 'generator',
+    'xmlnstbl', 'mmathPr', 'themedata', 'colorschememapping',
+    'datastore', 'latentstyles', 'pgptbl', 'protusertbl',
+    'bookmarkstart', 'bookmarkend', 'bkmkstart', 'bkmkend',
+    'fldinst', 'fldrslt',  # field instructions and results
+}
+
+# Image-related destinations
+IMAGE_DESTINATIONS = {
+    'pict', 'shppict', 'nonshppict', 'blipuid',
+}
+
+# Codepage to encoding mapping
+CODEPAGE_ENCODING_MAP = {
+    437: 'cp437',
+    850: 'cp850',
+    852: 'cp852',
+    855: 'cp855',
+    857: 'cp857',
+    860: 'cp860',
+    861: 'cp861',
+    863: 'cp863',
+    865: 'cp865',
+    866: 'cp866',
+    869: 'cp869',
+    874: 'cp874',
+    932: 'cp932',     # Japanese
+    936: 'gb2312',    # Simplified Chinese
+    949: 'cp949',     # Korean
+    950: 'big5',      # Traditional Chinese
+    1250: 'cp1250',   # Central European
+    1251: 'cp1251',   # Cyrillic
+    1252: 'cp1252',   # Western European
+    1253: 'cp1253',   # Greek
+    1254: 'cp1254',   # Turkish
+    1255: 'cp1255',   # Hebrew
+    1256: 'cp1256',   # Arabic
+    1257: 'cp1257',   # Baltic
+    1258: 'cp1258',   # Vietnamese
+    10000: 'mac_roman',
+    65001: 'utf-8',
+}
+
+# Default encodings to try
+DEFAULT_ENCODINGS = ['utf-8', 'cp949', 'euc-kr', 'cp1252', 'latin-1']
+
+
+__all__ = [
+    'SHAPE_PROPERTY_NAMES',
+    'EXCLUDE_DESTINATION_KEYWORDS',
+    'SKIP_DESTINATIONS',
+    'IMAGE_DESTINATIONS',
+    'CODEPAGE_ENCODING_MAP',
+    'DEFAULT_ENCODINGS',
+]
diff --git a/contextifier/core/processor/doc_helpers/rtf_content_extractor.py b/contextifier/core/processor/rtf_helper/rtf_content_extractor.py
similarity index 55%
rename from contextifier/core/processor/doc_helpers/rtf_content_extractor.py
rename to contextifier/core/processor/rtf_helper/rtf_content_extractor.py
index 3b02262..f1d1e7f 100644
--- a/contextifier/core/processor/doc_helpers/rtf_content_extractor.py
+++ b/contextifier/core/processor/rtf_helper/rtf_content_extractor.py
@@ -1,89 +1,77 @@
-# service/document_processor/processor/doc_helpers/rtf_content_extractor.py
+# contextifier/core/processor/rtf_helper/rtf_content_extractor.py
 """
-RTF 콘텐츠 추출기
+RTF Content Extractor
 
-RTF 문서에서 인라인 콘텐츠(텍스트 + 테이블)를 추출하는 기능을 제공합니다.
+Extracts inline content (text + tables) from RTF documents.
 """
 import logging
 import re
 from typing import List, Tuple
 
-from contextifier.core.processor.doc_helpers.rtf_models import (
-    RTFTable,
-    RTFContentPart,
-)
-from contextifier.core.processor.doc_helpers.rtf_decoder import (
+from contextifier.core.processor.rtf_helper.rtf_decoder import (
     decode_hex_escapes,
 )
-from contextifier.core.processor.doc_helpers.rtf_text_cleaner import (
+from contextifier.core.processor.rtf_helper.rtf_text_cleaner import (
     clean_rtf_text,
     remove_destination_groups,
     remove_shape_groups,
     remove_shape_property_groups,
 )
-from contextifier.core.processor.doc_helpers.rtf_region_finder import (
+from contextifier.core.processor.rtf_helper.rtf_region_finder import (
     find_excluded_regions,
 )
+from contextifier.core.processor.rtf_helper.rtf_table_extractor import (
+    RTFTable,
+)
 
-logger = logging.getLogger("document-processor")
+logger = logging.getLogger("contextify.rtf.content")
 
 
 def extract_inline_content(
     content: str,
     table_regions: List[Tuple[int, int, RTFTable]],
     encoding: str = "cp949"
-) -> List[RTFContentPart]:
+) -> str:
     """
-    RTF에서 인라인 콘텐츠를 추출합니다.
-
-    테이블은 원래 위치에 배치됩니다.
-
+    Extract inline content from RTF with tables in original positions.
+    
     Args:
-        content: RTF 문자열 콘텐츠
-        table_regions: 테이블 영역 리스트 [(start, end, table), ...]
-        encoding: 사용할 인코딩
-
+        content: RTF string content
+        table_regions: Table region list [(start, end, table), ...]
+        encoding: Encoding to use
+        
     Returns:
-        콘텐츠 파트 리스트
+        Content string with tables inline
     """
-    content_parts = []
-
-    # 헤더 영역 제거 (fonttbl, colortbl, stylesheet, info 등)
-    # 첫 번째 \pard 이전은 헤더로 간주
+    # Find header end (before first \pard)
     header_end = 0
     pard_match = re.search(r'\\pard\b', content)
     if pard_match:
         header_end = pard_match.start()
-
-    # 제외 영역 찾기 (header, footer, footnote 등)
+    
+    # Find excluded regions (header, footer, footnote, etc.)
     excluded_regions = find_excluded_regions(content)
-
+    
     def clean_segment(segment: str, start_pos: int) -> str:
-        """세그먼트를 정리하되 제외 영역은 건너뜁니다."""
+        """Clean a segment while respecting excluded regions."""
         if not excluded_regions:
-            # 제외 영역이 없으면 전체 정리
             segment = remove_destination_groups(segment)
             decoded = decode_hex_escapes(segment, encoding)
             return clean_rtf_text(decoded, encoding)
-
-        # 세그먼트 내에서 제외 영역을 마스킹
+        
         result_parts = []
         seg_pos = 0
-
+        
         for excl_start, excl_end in excluded_regions:
-            # 세그먼트 기준 상대 위치로 변환
             rel_start = excl_start - start_pos
             rel_end = excl_end - start_pos
-
-            # 세그먼트 범위 내에 있는지 확인
+            
             if rel_end <= 0 or rel_start >= len(segment):
-                continue  # 범위 밖
-
-            # 범위 조정
+                continue
+            
             rel_start = max(0, rel_start)
             rel_end = min(len(segment), rel_end)
-
-            # 제외 영역 전 텍스트 처리
+            
             if rel_start > seg_pos:
                 part = segment[seg_pos:rel_start]
                 part = remove_destination_groups(part)
@@ -91,10 +79,9 @@ def clean_segment(segment: str, start_pos: int) -> str:
                 clean = clean_rtf_text(decoded, encoding)
                 if clean.strip():
                     result_parts.append(clean)
-
+            
             seg_pos = rel_end
-
-        # 마지막 제외 영역 이후 텍스트
+        
         if seg_pos < len(segment):
             part = segment[seg_pos:]
             part = remove_destination_groups(part)
@@ -102,110 +89,100 @@ def clean_segment(segment: str, start_pos: int) -> str:
             clean = clean_rtf_text(decoded, encoding)
             if clean.strip():
                 result_parts.append(clean)
-
+        
         return ' '.join(result_parts)
-
-    # 테이블 영역이 없으면 전체 텍스트만 추출
+    
+    result_parts = []
+    
+    # No tables - just extract text
     if not table_regions:
         clean = clean_segment(content[header_end:], header_end)
         if clean.strip():
-            content_parts.append(RTFContentPart(
-                content_type="text",
-                position=0,
-                text=clean
-            ))
-        return content_parts
-
-    # 헤더 오프셋 적용
+            result_parts.append(clean)
+        return '\n\n'.join(result_parts)
+    
+    # Adjust regions for header offset
     adjusted_regions = []
     for start_pos, end_pos, table in table_regions:
-        # 헤더 이후 영역만 처리
         if end_pos > header_end:
             adj_start = max(start_pos, header_end)
             adjusted_regions.append((adj_start, end_pos, table))
-
-    # 콘텐츠 파트 생성
+    
+    # Build content parts
     last_end = header_end
-
+    
     for start_pos, end_pos, table in adjusted_regions:
-        # 테이블 전 텍스트
+        # Text before table
         if start_pos > last_end:
             segment = content[last_end:start_pos]
             clean = clean_segment(segment, last_end)
             if clean.strip():
-                content_parts.append(RTFContentPart(
-                    content_type="text",
-                    position=last_end,
-                    text=clean
-                ))
-
-        # 테이블
-        content_parts.append(RTFContentPart(
-            content_type="table",
-            position=start_pos,
-            table=table
-        ))
-
+                result_parts.append(clean)
+        
+        # Table
+        if table.is_real_table():
+            result_parts.append(table.to_html())
+        else:
+            text_list = table.to_text_list()
+            if text_list:
+                result_parts.append(text_list)
+        
         last_end = end_pos
-
-    # 마지막 부분 (테이블 이후 텍스트)
+    
+    # Text after last table
     if last_end < len(content):
         segment = content[last_end:]
         clean = clean_segment(segment, last_end)
         if clean.strip():
-            content_parts.append(RTFContentPart(
-                content_type="text",
-                position=last_end,
-                text=clean
-            ))
+            result_parts.append(clean)
+    
+    return '\n\n'.join(result_parts)
 
-    return content_parts
 
-
-def extract_text_legacy(content: str, encoding: str = "cp949") -> str:
+def extract_text_only(content: str, encoding: str = "cp949") -> str:
     """
-    RTF에서 일반 텍스트를 추출합니다.
-    테이블 영역은 제외하고 추출합니다.
-    (레거시 호환성을 위해 유지)
-
+    Extract only text from RTF (exclude tables).
+    
+    Legacy compatibility function.
+    
     Args:
-        content: RTF 문자열 콘텐츠
-        encoding: 사용할 인코딩
-
+        content: RTF string content
+        encoding: Encoding to use
+        
     Returns:
-        추출된 텍스트
+        Extracted text
     """
-    # 헤더 영역 제거 (fonttbl, colortbl, stylesheet 등)
+    # Remove header (fonttbl, colortbl, stylesheet, etc.)
     pard_match = re.search(r'\\pard\b', content)
     if pard_match:
         content = content[pard_match.start():]
-
-    # destination 그룹 제거 (latentstyles, themedata 등)
+    
+    # Remove destination groups
     content = remove_destination_groups(content)
-
-    # Shape 그룹 처리 (shptxt 내용은 보존)
+    
+    # Handle shape groups (preserve shptxt content)
     content = remove_shape_groups(content)
-
-    # Shape 속성 그룹 제거
+    
+    # Remove shape property groups
     content = remove_shape_property_groups(content)
-
-    # 테이블 영역 찾기 및 마킹
+    
+    # Find table regions
     table_regions = []
     for match in re.finditer(r'\\trowd.*?\\row', content, re.DOTALL):
         table_regions.append((match.start(), match.end()))
-
-    # 테이블 영역을 병합 (인접한 테이블들)
+    
+    # Merge adjacent tables
     merged_regions = []
     for start, end in table_regions:
         if merged_regions and start - merged_regions[-1][1] < 100:
             merged_regions[-1] = (merged_regions[-1][0], end)
         else:
             merged_regions.append((start, end))
-
-    # 테이블 영역을 제외한 텍스트 추출
+    
+    # Extract text excluding table regions
     text_parts = []
     last_end = 0
-
+    
     for start, end in merged_regions:
         if start > last_end:
             segment = content[last_end:start]
@@ -214,17 +191,21 @@ def extract_text_legacy(content: str, encoding: str = "cp949") -> str:
             if clean:
                 text_parts.append(clean)
         last_end = end
-
-    # 마지막 부분
+    
     if last_end < len(content):
         segment = content[last_end:]
         decoded = decode_hex_escapes(segment, encoding)
         clean = clean_rtf_text(decoded, encoding)
         if clean:
             text_parts.append(clean)
-
-    # 연속된 빈 줄 정리
+    
     text = '\n'.join(text_parts)
     text = re.sub(r'\n{3,}', '\n\n', text)
-
+    
     return text.strip()
+
+
+__all__ = [
+    'extract_inline_content',
+    'extract_text_only',
+]
diff --git a/contextifier/core/processor/doc_helpers/rtf_decoder.py b/contextifier/core/processor/rtf_helper/rtf_decoder.py
similarity index 51%
rename from contextifier/core/processor/doc_helpers/rtf_decoder.py
rename to contextifier/core/processor/rtf_helper/rtf_decoder.py
index 4cd8bad..259825f 100644
--- a/contextifier/core/processor/doc_helpers/rtf_decoder.py
+++ b/contextifier/core/processor/rtf_helper/rtf_decoder.py
@@ -1,36 +1,37 @@
-# service/document_processor/processor/doc_helpers/rtf_decoder.py
+# contextifier/core/processor/rtf_helper/rtf_decoder.py
 """
-RTF 디코딩 유틸리티
+RTF Decoding Utilities
 
-RTF 인코딩 감지 및 디코딩 관련 함수들을 제공합니다.
+Encoding detection and decoding functions for RTF content.
 """
 import logging
 import re
-from typing import List, Tuple
+from typing import List
 
-from contextifier.core.processor.doc_helpers.rtf_constants import (
+from contextifier.core.processor.rtf_helper.rtf_constants import (
     CODEPAGE_ENCODING_MAP,
     DEFAULT_ENCODINGS,
 )
 
-logger = logging.getLogger("document-processor")
+logger = logging.getLogger("contextify.rtf.decoder")
 
 
 def detect_encoding(content: bytes, default_encoding: str = "cp949") -> str:
     """
-    RTF 콘텐츠에서 인코딩을 감지합니다.
-
+    Detect encoding from RTF content.
+    
+    Looks for \\ansicpgXXXX pattern in the header.
+    
     Args:
-        content: RTF 바이트 데이터
-        default_encoding: 기본 인코딩
-
+        content: RTF binary data
+        default_encoding: Fallback encoding
+        
     Returns:
-        감지된 인코딩 문자열
+        Detected encoding string
     """
     try:
         text = content[:1000].decode('ascii', errors='ignore')
-
-        # \ansicpgXXXX 패턴 찾기
+        
         match = re.search(r'\\ansicpg(\d+)', text)
         if match:
             codepage = int(match.group(1))
@@ -39,44 +40,44 @@ def detect_encoding(content: bytes, default_encoding: str = "cp949") -> str:
             return encoding
     except Exception as e:
         logger.debug(f"Encoding detection failed: {e}")
-
+    
     return default_encoding
 
 
 def decode_content(content: bytes, encoding: str = "cp949") -> str:
     """
-    RTF 바이너리를 문자열로 디코딩합니다.
-
-    여러 인코딩을 시도하여 성공하는 첫 번째 결과를 반환합니다.
-
+    Decode RTF binary to string.
+    
+    Tries multiple encodings and returns first successful result.
+    
     Args:
-        content: RTF 바이트 데이터
-        encoding: 우선 시도할 인코딩
-
+        content: RTF binary data
+        encoding: Preferred encoding to try first
+        
     Returns:
-        디코딩된 문자열
+        Decoded string
     """
     encodings = [encoding] + [e for e in DEFAULT_ENCODINGS if e != encoding]
-
+    
     for enc in encodings:
         try:
             return content.decode(enc)
         except (UnicodeDecodeError, LookupError):
             continue
-
+    
     return content.decode('cp1252', errors='replace')
 
 
 def decode_bytes(byte_list: List[int], encoding: str = "cp949") -> str:
     """
-    바이트 리스트를 문자열로 디코딩합니다.
-
+    Decode byte list to string.
+    
     Args:
-        byte_list: 바이트 값 리스트
-        encoding: 사용할 인코딩
-
+        byte_list: List of byte values
+        encoding: Encoding to use
+        
     Returns:
-        디코딩된 문자열
+        Decoded string
     """
     try:
         return bytes(byte_list).decode(encoding)
@@ -89,44 +90,52 @@ def decode_bytes(byte_list: List[int], encoding: str = "cp949") -> str:
 
 def decode_hex_escapes(text: str, encoding: str = "cp949") -> str:
     """
-    RTF hex escape (\'XX) 시퀀스를 디코딩합니다.
-
+    Decode RTF hex escape sequences (\\'XX).
+    
     Args:
-        text: RTF 텍스트
-        encoding: 사용할 인코딩
-
+        text: RTF text with hex escapes
+        encoding: Encoding for decoding
+        
     Returns:
-        디코딩된 텍스트
+        Decoded text
     """
+    if "\\'" not in text:
+        return text
+    
     result = []
     byte_buffer = []
     i = 0
-
-    while i < len(text):
-        if text[i:i+2] == "\\'":
-            # hex escape 발견
+    n = len(text)
+    
+    while i < n:
+        if i + 3 < n and text[i:i+2] == "\\'":
             try:
                 hex_val = text[i+2:i+4]
                 byte_val = int(hex_val, 16)
                 byte_buffer.append(byte_val)
                 i += 4
-            except (ValueError, IndexError):
-                # 잘못된 escape, 그대로 추가
-                if byte_buffer:
-                    result.append(decode_bytes(byte_buffer, encoding))
-                    byte_buffer = []
-                result.append(text[i])
-                i += 1
-        else:
-            # 일반 문자
-            if byte_buffer:
-                result.append(decode_bytes(byte_buffer, encoding))
-                byte_buffer = []
-            result.append(text[i])
-            i += 1
-
-    # 남은 바이트 처리
+                continue
+            except ValueError:
+                pass
+        
+        # Flush byte buffer
+        if byte_buffer:
+            result.append(decode_bytes(byte_buffer, encoding))
+            byte_buffer = []
+        
+        result.append(text[i])
+        i += 1
+    
+    # Flush remaining bytes
     if byte_buffer:
         result.append(decode_bytes(byte_buffer, encoding))
-
+    
     return ''.join(result)
+
+
+__all__ = [
+    'detect_encoding',
+    'decode_content',
+    'decode_bytes',
+    'decode_hex_escapes',
+]
diff --git a/contextifier/core/processor/rtf_helper/rtf_file_converter.py b/contextifier/core/processor/rtf_helper/rtf_file_converter.py
new file mode 100644
index 0000000..fecd7b5
--- /dev/null
+++ b/contextifier/core/processor/rtf_helper/rtf_file_converter.py
@@ -0,0 +1,87 @@
+# contextifier/core/processor/rtf_helper/rtf_file_converter.py
+"""
+RTF File Converter
+
+RTF uses raw binary directly, so converter just passes through.
+All actual processing is done by Preprocessor in Handler.
+"""
+import logging
+from dataclasses import dataclass, field
+from typing import Any, BinaryIO, List, Optional
+
+from contextifier.core.functions.file_converter import BaseFileConverter
+
+logger = logging.getLogger("contextify.rtf.converter")
+
+
+@dataclass
+class RTFConvertedData:
+    """
+    RTF converted data container.
+    
+    Attributes:
+        content: RTF content string (after preprocessing)
+        encoding: Detected encoding
+        image_tags: List of image tags from preprocessing
+        original_size: Original binary data size
+        has_images: Whether images were extracted
+    """
+    content: str
+    encoding: str = "cp949"
+    image_tags: List[str] = field(default_factory=list)
+    original_size: int = 0
+    has_images: bool = False
+    
+    def __post_init__(self):
+        """Set has_images based on image_tags."""
+        if self.image_tags:
+            self.has_images = True
+
+
+class RTFFileConverter(BaseFileConverter):
+    """
+    RTF file converter.
+    
+    RTF uses raw binary directly, so this converter just passes through.
+    All actual processing (image extraction, binary removal, decoding)
+    is done by RTFPreprocessor called from Handler.
+    """
+    
+    def __init__(self):
+        """Initialize RTFFileConverter."""
+        self.logger = logger
+    
+    def convert(
+        self,
+        file_data: bytes,
+        file_stream: Optional[BinaryIO] = None,
+        **kwargs
+    ) -> bytes:
+        """
+        Pass through binary data.
+        
+        RTF processing uses raw binary, so just return as-is.
+        
+        Args:
+            file_data: Raw binary RTF data
+            file_stream: Optional file stream (not used)
+            **kwargs: Not used
+                
+        Returns:
+            Original bytes (pass through)
+        """
+        return file_data
+    
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "RTF Document"
+    
+    def close(self, converted_object: Any) -> None:
+        """Nothing to close."""
+        pass
+
+
+__all__ = [
+    'RTFFileConverter',
+    'RTFConvertedData',
+]
diff --git a/contextifier/core/processor/rtf_helper/rtf_metadata_extractor.py b/contextifier/core/processor/rtf_helper/rtf_metadata_extractor.py
new file mode 100644
index 0000000..633632a
--- /dev/null
+++ b/contextifier/core/processor/rtf_helper/rtf_metadata_extractor.py
@@ -0,0 +1,179 @@
+# contextifier/core/processor/rtf_helper/rtf_metadata_extractor.py
+"""
+RTF Metadata Extractor
+
+Extracts metadata from RTF content.
+Implements BaseMetadataExtractor interface.
+"""
+import logging
+import re
+from dataclasses import dataclass
+from datetime import datetime
+from typing import Any, Dict, Optional, Union
+
+from contextifier.core.functions.metadata_extractor import (
+    BaseMetadataExtractor,
+    DocumentMetadata,
+)
+from contextifier.core.processor.rtf_helper.rtf_decoder import (
+    decode_hex_escapes,
+)
+from contextifier.core.processor.rtf_helper.rtf_text_cleaner import (
+    clean_rtf_text,
+)
+
+logger = logging.getLogger("contextify.rtf.metadata")
+
+
+@dataclass
+class RTFSourceInfo:
+    """
+    Source information for RTF metadata extraction.
+    
+    Container for data passed to RTFMetadataExtractor.extract().
+    """
+    content: str
+    encoding: str = "cp949"
+
+
+class RTFMetadataExtractor(BaseMetadataExtractor):
+    """
+    RTF Metadata Extractor.
+    
+    Extracts metadata from RTF content.
+    
+    Supported fields:
+    - title, subject, author, keywords, comments
+    - last_saved_by, create_time, last_saved_time
+    
+    Usage:
+        extractor = RTFMetadataExtractor()
+        source = RTFSourceInfo(content=rtf_content, encoding="cp949")
+        metadata = extractor.extract(source)
+        text = extractor.format(metadata)
+    """
+    
+    def extract(self, source: Union[RTFSourceInfo, Dict[str, Any]]) -> DocumentMetadata:
+        """
+        Extract metadata from RTF content.
+        
+        Args:
+            source: RTFSourceInfo object (content string and encoding)
+                    OR Dict[str, Any] (pre-parsed metadata)
+            
+        Returns:
+            DocumentMetadata instance
+        """
+        if isinstance(source, dict):
+            return self._from_dict(source)
+        
+        content = source.content
+        encoding = source.encoding
+        
+        title = None
+        subject = None
+        author = None
+        keywords = None
+        comments = None
+        last_saved_by = None
+        create_time = None
+        last_saved_time = None
+        
+        # Find \info group
+        info_match = re.search(r'\\info\s*\{([^}]*(?:\{[^}]*\}[^}]*)*)\}', content)
+        if info_match:
+            info_content = info_match.group(1)
+            
+            # Extract each metadata field
+            field_patterns = {
+                'title': r'\\title\s*\{([^}]*)\}',
+                'subject': r'\\subject\s*\{([^}]*)\}',
+                'author': r'\\author\s*\{([^}]*)\}',
+                'keywords': r'\\keywords\s*\{([^}]*)\}',
+                'comments': r'\\doccomm\s*\{([^}]*)\}',
+                'last_saved_by': r'\\operator\s*\{([^}]*)\}',
+            }
+            
+            for key, pattern in field_patterns.items():
+                match = re.search(pattern, info_content)
+                if match:
+                    value = decode_hex_escapes(match.group(1), encoding)
+                    value = clean_rtf_text(value, encoding)
+                    if value:
+                        if key == 'title':
+                            title = value
+                        elif key == 'subject':
+                            subject = value
+                        elif key == 'author':
+                            author = value
+                        elif key == 'keywords':
+                            keywords = value
+                        elif key == 'comments':
+                            comments = value
+                        elif key == 'last_saved_by':
+                            last_saved_by = value
+            
+            # Extract dates
+            create_time = self._extract_date(
+                content, 
+                r'\\creatim\\yr(\d+)\\mo(\d+)\\dy(\d+)(?:\\hr(\d+))?(?:\\min(\d+))?'
+            )
+            last_saved_time = self._extract_date(
+                content,
+                r'\\revtim\\yr(\d+)\\mo(\d+)\\dy(\d+)(?:\\hr(\d+))?(?:\\min(\d+))?'
+            )
+        
+        self.logger.debug("Extracted RTF metadata fields")
+        
+        return DocumentMetadata(
+            title=title,
+            subject=subject,
+            author=author,
+            keywords=keywords,
+            comments=comments,
+            last_saved_by=last_saved_by,
+            create_time=create_time,
+            last_saved_time=last_saved_time,
+        )
+    
+    def _extract_date(self, content: str, pattern: str) -> Optional[datetime]:
+        """Extract datetime from RTF date pattern."""
+        match = re.search(pattern, content)
+        if match:
+            try:
+                year = int(match.group(1))
+                month = int(match.group(2))
+                day = int(match.group(3))
+                hour = int(match.group(4)) if match.group(4) else 0
+                minute = int(match.group(5)) if match.group(5) else 0
+                return datetime(year, month, day, hour, minute)
+            except (ValueError, TypeError):
+                pass
+        return None
+    
+    def _from_dict(self, metadata: Dict[str, Any]) -> DocumentMetadata:
+        """
+        Convert pre-parsed metadata dict to DocumentMetadata.
+        
+        Args:
+            metadata: Pre-parsed metadata dict
+            
+        Returns:
+            DocumentMetadata instance
+        """
+        return DocumentMetadata(
+            title=metadata.get('title'),
+            subject=metadata.get('subject'),
+            author=metadata.get('author'),
+            keywords=metadata.get('keywords'),
+            comments=metadata.get('comments'),
+            last_saved_by=metadata.get('last_saved_by'),
+            create_time=metadata.get('create_time'),
+            last_saved_time=metadata.get('last_saved_time'),
+        )
+
+
+__all__ = [
+    'RTFMetadataExtractor',
+    'RTFSourceInfo',
+]
diff --git a/contextifier/core/processor/rtf_helper/rtf_preprocessor.py b/contextifier/core/processor/rtf_helper/rtf_preprocessor.py
new file mode 100644
index 0000000..2c7ebdf
--- /dev/null
+++ b/contextifier/core/processor/rtf_helper/rtf_preprocessor.py
@@ -0,0 +1,426 @@
+# contextifier/core/processor/rtf_helper/rtf_preprocessor.py
+"""
+RTF Preprocessor
+
+Preprocesses RTF binary data before conversion:
+- \\binN tag processing (skip N bytes of raw binary data)
+- \\pict group image extraction
+- Image saving and tag generation
+- Encoding detection
+
+Implements BasePreprocessor interface.
+"""
+import hashlib
+import logging
+import re
+from dataclasses import dataclass, field
+from typing import Any, Dict, List, Optional, Set, Tuple
+
+from contextifier.core.functions.preprocessor import (
+    BasePreprocessor,
+    PreprocessedData,
+)
+from contextifier.core.functions.img_processor import ImageProcessor
+from contextifier.core.functions.storage_backend import BaseStorageBackend
+from contextifier.core.processor.rtf_helper.rtf_decoder import (
+    detect_encoding,
+)
+
+logger = logging.getLogger("contextify.rtf.preprocessor")
+
+
+# Image format magic numbers
+IMAGE_SIGNATURES = {
+    b'\xff\xd8\xff': 'jpeg',
+    b'\x89PNG\r\n\x1a\n': 'png',
+    b'GIF87a': 'gif',
+    b'GIF89a': 'gif',
+    b'BM': 'bmp',
+    b'\xd7\xcd\xc6\x9a': 'wmf',
+    b'\x01\x00\x09\x00': 'wmf',
+    b'\x01\x00\x00\x00': 'emf',
+}
+
+# RTF image type mapping
+RTF_IMAGE_TYPES = {
+    'jpegblip': 'jpeg',
+    'pngblip': 'png',
+    'wmetafile': 'wmf',
+    'emfblip': 'emf',
+    'dibitmap': 'bmp',
+    'wbitmap': 'bmp',
+}
+
+# Supported image formats for saving
+SUPPORTED_IMAGE_FORMATS = {'jpeg', 'png', 'gif', 'bmp'}
+
+
+@dataclass
+class RTFBinaryRegion:
+    """RTF binary data region information."""
+    start_pos: int
+    end_pos: int
+    bin_type: str  # "bin" or "pict"
+    data_size: int
+    image_format: str = ""
+    image_data: bytes = b""
+
+
+class RTFPreprocessor(BasePreprocessor):
+    """
+    RTF-specific preprocessor.
+
+    Handles RTF binary preprocessing:
+    - Removes \\bin tag binary data
+    - Extracts embedded images
+    - Detects encoding
+    - Returns clean content ready for parsing
+
+    Usage:
+        preprocessor = RTFPreprocessor(image_processor=img_proc)
+        result = preprocessor.preprocess(rtf_bytes)
+
+        # result.clean_content - bytes ready for parsing
+        # result.encoding - detected encoding
+        # result.extracted_resources["image_tags"] - list of image tags
+    """
+
+    RTF_MAGIC = b'{\\rtf'
+
+    def __init__(
+        self,
+        image_processor: Optional[ImageProcessor] = None,
+        processed_images: Optional[Set[str]] = None,
+    ):
+        """
+        Initialize RTFPreprocessor.
+
+        Args:
+            image_processor: Image processor for saving images
+            processed_images: Set of already processed image hashes
+        """
+        self._image_processor = image_processor
+        self._processed_images = processed_images if processed_images is not None else set()
+
+    def preprocess(
+        self,
+        converted_data: Any,
+        **kwargs
+    ) -> PreprocessedData:
+        """
+        Preprocess RTF data.
+
+        For RTF, the converter returns raw bytes (pass-through),
+        so converted_data is the original RTF binary data.
+
+        Args:
+            converted_data: RTF binary data (bytes) from RTFFileConverter
+            **kwargs: Additional options
+
+        Returns:
+            PreprocessedData with clean content, encoding, and image tags
+        """
+        # Handle bytes input
+        if isinstance(converted_data, bytes):
+            file_data = converted_data
+        elif hasattr(converted_data, 'read'):
+            # Handle file-like objects
+            file_data = converted_data.read()
+        else:
+            return PreprocessedData(
+                raw_content=b"",
+                clean_content=b"",
+                encoding="cp949",
+            )
+
+        if not file_data:
+            return PreprocessedData(
+                raw_content=b"",
+                clean_content=b"",
+                encoding="cp949",
+            )
+
+        # Get options from kwargs
+        image_processor = kwargs.get('image_processor', self._image_processor)
+        processed_images = kwargs.get('processed_images', self._processed_images)
+
+        # Detect encoding
+        detected_encoding = detect_encoding(file_data, "cp949")
+
+        # Process binary data (extract images, clean content)
+        clean_content, image_tags = self._process_binary_content(
+            file_data,
+            image_processor,
+            processed_images
+        )
+
+        # Filter valid image tags
+        valid_tags = [
+            tag for tag in image_tags
+            if tag and tag.strip() and '/uploads/.' not in tag
+        ]
+
+        return PreprocessedData(
+            raw_content=file_data,
+            clean_content=clean_content,
+            encoding=detected_encoding,
+            extracted_resources={
+                "image_tags": valid_tags,
+            }
+        )
+
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "RTF Preprocessor"
+
+    def validate(self, data: Any) -> bool:
+        """Validate if data is valid RTF content."""
+        if isinstance(data, bytes):
+            if len(data) < 5:
+                return False
+            return data[:5] == self.RTF_MAGIC
+        return False
+
+    def _process_binary_content(
+        self,
+        content: bytes,
+        image_processor: Optional[ImageProcessor],
+        processed_images: Set[str]
+    ) -> Tuple[bytes, List[str]]:
+        """
+        Process RTF binary content.
+
+        Args:
+            content: RTF binary content
+            image_processor: Image processor instance
+            processed_images: Set of processed image hashes
+
+        Returns:
+            Tuple of (clean_content, list of image tags)
+        """
+        image_tags: Dict[int, str] = {}
+
+        # Find \bin tag regions
+        bin_regions = self._find_bin_regions(content)
+
+        # Find \pict regions (excluding bin regions)
+        pict_regions = self._find_pict_regions(content, bin_regions)
+
+        # Merge and sort all regions
+        all_regions = bin_regions + pict_regions
+        all_regions.sort(key=lambda r: r.start_pos)
+
+        # Process images and generate tags
+        for region in all_regions:
+            if not region.image_data:
+                continue
+
+            # Check for duplicates
+            image_hash = hashlib.md5(region.image_data).hexdigest()
+            if image_hash in processed_images:
+                image_tags[region.start_pos] = ""
+                continue
+
+            processed_images.add(image_hash)
+
+            if region.image_format in SUPPORTED_IMAGE_FORMATS and image_processor:
+                tag = image_processor.save_image(region.image_data)
+                if tag:
+                    image_tags[region.start_pos] = f"\n{tag}\n"
+                    logger.info(
+                        f"Saved RTF image: {tag} "
+                        f"(format={region.image_format}, size={region.data_size})"
+                    )
+                else:
+                    image_tags[region.start_pos] = ""
+            else:
+                image_tags[region.start_pos] = ""
+
+        # Remove binary data from content
+        clean_content = self._remove_binary_data(content, all_regions, image_tags)
+
+        # Collect all image tags as list
+        tag_list = [tag for tag in image_tags.values() if tag and tag.strip()]
+
+        return clean_content, tag_list
+
+    def _find_bin_regions(self, content: bytes) -> List[RTFBinaryRegion]:
+        """Find \\binN tags and identify binary regions."""
+        regions = []
+        pattern = rb'\\bin(\d+)'
+
+        for match in re.finditer(pattern, content):
+            try:
+                bin_size = int(match.group(1))
+                bin_tag_start = match.start()
+                bin_tag_end = match.end()
+
+                data_start = bin_tag_end
+                if data_start < len(content) and content[data_start:data_start+1] == b' ':
+                    data_start += 1
+
+                data_end = data_start + bin_size
+
+                if data_end <= len(content):
+                    binary_data = content[data_start:data_end]
+                    image_format = self._detect_image_format(binary_data)
+
+                    # Find parent \shppict group
+                    group_start = bin_tag_start
+                    group_end = data_end
+
+                    search_start = max(0, bin_tag_start - 500)
+                    search_area = content[search_start:bin_tag_start]
+
+                    shppict_pos = search_area.rfind(b'\\shppict')
+                    if shppict_pos != -1:
+                        abs_pos = search_start + shppict_pos
+                        brace_pos = abs_pos
+                        while brace_pos > 0 and content[brace_pos:brace_pos+1] != b'{':
+                            brace_pos -= 1
+                        group_start = brace_pos
+
+                        depth = 1
+                        j = data_end
+                        while j < len(content) and depth > 0:
+                            if content[j:j+1] == b'{':
+                                depth += 1
+                            elif content[j:j+1] == b'}':
+                                depth -= 1
+                            j += 1
+                        group_end = j
+
+                    regions.append(RTFBinaryRegion(
+                        start_pos=group_start,
+                        end_pos=group_end,
+                        bin_type="bin",
+                        data_size=bin_size,
+                        image_format=image_format,
+                        image_data=binary_data
+                    ))
+            except (ValueError, IndexError):
+                continue
+
+        return regions
+
+    def _find_pict_regions(
+        self,
+        content: bytes,
+        exclude_regions: List[RTFBinaryRegion]
+    ) -> List[RTFBinaryRegion]:
+        """Find hex-encoded \\pict regions."""
+        regions = []
+
+        bin_tag_positions = {r.start_pos for r in exclude_regions if r.bin_type == "bin"}
+        excluded_ranges = [(r.start_pos, r.end_pos) for r in exclude_regions]
+
+        def is_excluded(pos: int) -> bool:
+            return any(start <= pos < end for start, end in excluded_ranges)
+
+        def has_bin_nearby(pict_pos: int) -> bool:
+            return any(pict_pos < bp < pict_pos + 200 for bp in bin_tag_positions)
+
+        try:
+            text_content = content.decode('cp1252', errors='replace')
+            pict_pattern = r'\\pict\s*((?:\\[a-zA-Z]+\d*\s*)*)'
+
+            for match in re.finditer(pict_pattern, text_content):
+                start_pos = match.start()
+
+                if is_excluded(start_pos) or has_bin_nearby(start_pos):
+                    continue
+
+                attrs = match.group(1)
+                image_format = ""
+                for rtf_type, fmt in RTF_IMAGE_TYPES.items():
+                    if rtf_type in attrs:
+                        image_format = fmt
+                        break
+
+                # Extract hex data
+                hex_start = match.end()
+                hex_data = []
+                i = hex_start
+
+                while i < len(text_content):
+                    ch = text_content[i]
+                    if ch in '0123456789abcdefABCDEF':
+                        hex_data.append(ch)
+                    elif ch in ' \t\r\n':
+                        pass
+                    elif ch == '}':
+                        break
+                    elif ch == '\\':
+                        if text_content[i:i+4] == '\\bin':
+                            hex_data = []
+                            break
+                        while i < len(text_content) and text_content[i] not in ' \t\r\n}':
+                            i += 1
+                        continue
+                    else:
+                        break
+                    i += 1
+
+                hex_str = ''.join(hex_data)
+
+                if len(hex_str) >= 32:
+                    try:
+                        image_data = bytes.fromhex(hex_str)
+                        if not image_format:
+                            image_format = self._detect_image_format(image_data)
+
+                        if image_format:
+                            regions.append(RTFBinaryRegion(
+                                start_pos=start_pos,
+                                end_pos=i,
+                                bin_type="pict",
+                                data_size=len(image_data),
+                                image_format=image_format,
+                                image_data=image_data
+                            ))
+                    except ValueError:
+                        continue
+        except Exception as e:
+            logger.warning(f"Error finding pict regions: {e}")
+
+        return regions
+
+    def _detect_image_format(self, data: bytes) -> str:
+        """Detect image format from binary data."""
+        if not data or len(data) < 4:
+            return ""
+
+        for signature, format_name in IMAGE_SIGNATURES.items():
+            if data.startswith(signature):
+                return format_name
+
+        if len(data) >= 2 and data[0:2] == b'\xff\xd8':
+            return 'jpeg'
+
+        return ""
+
+    def _remove_binary_data(
+        self,
+        content: bytes,
+        regions: List[RTFBinaryRegion],
+        image_tags: Dict[int, str]
+    ) -> bytes:
+        """Remove binary data regions from content."""
+        if not regions:
+            return content
+
+        sorted_regions = sorted(regions, key=lambda r: r.start_pos, reverse=True)
+        result = bytearray(content)
+
+        for region in sorted_regions:
+            replacement = b''
+            if region.start_pos in image_tags:
+                tag = image_tags[region.start_pos]
+                if tag:
+                    replacement = tag.encode('ascii', errors='replace')
+            result[region.start_pos:region.end_pos] = replacement
+
+        return bytes(result)
+
+
+__all__ = ['RTFPreprocessor', 'RTFBinaryRegion']
diff --git a/contextifier/core/processor/rtf_helper/rtf_region_finder.py b/contextifier/core/processor/rtf_helper/rtf_region_finder.py
new file mode 100644
index 0000000..b508962
--- /dev/null
+++ b/contextifier/core/processor/rtf_helper/rtf_region_finder.py
@@ -0,0 +1,91 @@
+# contextifier/core/processor/rtf_helper/rtf_region_finder.py
+"""
+RTF Region Finder
+
+Functions for finding excluded regions (header, footer, footnote, etc.) in RTF.
+"""
+import re
+from typing import List, Tuple
+
+
+def find_excluded_regions(content: str) -> List[Tuple[int, int]]:
+    """
+    Find regions to exclude from content extraction.
+    
+    Finds header, footer, footnote, and other special regions
+    that should not be part of main content.
+    
+    Args:
+        content: RTF content string
+        
+    Returns:
+        List of (start, end) position tuples
+    """
+    regions = []
+    
+    # Header/footer patterns
+    patterns = [
+        (r'\\header[lrf]?\b', r'\\par\s*\}'),      # Headers
+        (r'\\footer[lrf]?\b', r'\\par\s*\}'),      # Footers
+        (r'\\footnote\b', r'\}'),                   # Footnotes
+        (r'\\annotation\b', r'\}'),                 # Annotations
+        (r'\{\\headerf', r'\}'),                    # First page header
+        (r'\{\\footerf', r'\}'),                    # First page footer
+    ]
+    
+    for start_pattern, end_pattern in patterns:
+        for match in re.finditer(start_pattern, content):
+            start_pos = match.start()
+            
+            # Find matching closing brace
+            depth = 0
+            i = start_pos
+            found_start = False
+            
+            while i < len(content):
+                if content[i] == '{':
+                    if not found_start:
+                        found_start = True
+                    depth += 1
+                elif content[i] == '}':
+                    depth -= 1
+                    if found_start and depth == 0:
+                        regions.append((start_pos, i + 1))
+                        break
+                i += 1
+    
+    # Merge overlapping regions
+    if regions:
+        regions.sort(key=lambda x: x[0])
+        merged = [regions[0]]
+        for start, end in regions[1:]:
+            if start <= merged[-1][1]:
+                merged[-1] = (merged[-1][0], max(merged[-1][1], end))
+            else:
+                merged.append((start, end))
+        return merged
+    
+    return regions
+
+
+def is_in_excluded_region(position: int, regions: List[Tuple[int, int]]) -> bool:
+    """
+    Check if a position is within an excluded region.
+    
+    Args:
+        position: Position to check
+        regions: List of (start, end) tuples
+        
+    Returns:
+        True if position is in an excluded region
+    """
+    for start, end in regions:
+        if start <= position < end:
+            return True
+    return False
+
+
+__all__ = [
+    'find_excluded_regions',
+    'is_in_excluded_region',
+]
diff --git a/contextifier/core/processor/rtf_helper/rtf_table_extractor.py b/contextifier/core/processor/rtf_helper/rtf_table_extractor.py
new file mode 100644
index 0000000..51f4e61
--- /dev/null
+++ b/contextifier/core/processor/rtf_helper/rtf_table_extractor.py
@@ -0,0 +1,482 @@
+# contextifier/core/processor/rtf_helper/rtf_table_extractor.py
+"""
+RTF Table Extractor
+
+Extracts and parses tables from RTF content.
+Includes RTFCellInfo and RTFTable data models.
+"""
+import logging
+import re
+from dataclasses import dataclass, field
+from typing import List, NamedTuple, Optional, Tuple
+
+from contextifier.core.processor.rtf_helper.rtf_decoder import (
+    decode_hex_escapes,
+)
+from contextifier.core.processor.rtf_helper.rtf_text_cleaner import (
+    clean_rtf_text,
+)
+from contextifier.core.processor.rtf_helper.rtf_region_finder import (
+    find_excluded_regions,
+    is_in_excluded_region,
+)
+
+logger = logging.getLogger("contextify.rtf.table")
+
+
+# =============================================================================
+# Data Models
+# =============================================================================
+
+class RTFCellInfo(NamedTuple):
+    """RTF cell information with merge info."""
+    text: str              # Cell text content
+    h_merge_first: bool    # Horizontal merge start (clmgf)
+    h_merge_cont: bool     # Horizontal merge continue (clmrg)
+    v_merge_first: bool    # Vertical merge start (clvmgf)
+    v_merge_cont: bool     # Vertical merge continue (clvmrg)
+    right_boundary: int    # Cell right boundary (twips)
+
+
+@dataclass
+class RTFTable:
+    """RTF table structure with merge cell support."""
+    rows: List[List[RTFCellInfo]] = field(default_factory=list)
+    col_count: int = 0
+    position: int = 0      # Start position in document
+    end_position: int = 0  # End position in document
+    
+    def is_real_table(self) -> bool:
+        """
+        Determine if this is a real table.
+        
+        n rows x 1 column is considered a list, not a table.
+        """
+        if not self.rows:
+            return False
+        
+        effective_cols = self._get_effective_col_count()
+        return effective_cols >= 2
+    
+    def _get_effective_col_count(self) -> int:
+        """Calculate effective column count (excluding empty columns)."""
+        if not self.rows:
+            return 0
+        
+        effective_counts = []
+        for row in self.rows:
+            non_empty_cells = []
+            for i, cell in enumerate(row):
+                if cell.h_merge_cont:
+                    continue
+                if cell.text.strip() or cell.v_merge_first:
+                    non_empty_cells.append(i)
+            
+            if non_empty_cells:
+                effective_counts.append(max(non_empty_cells) + 1)
+        
+        return max(effective_counts) if effective_counts else 0
+    
+    def to_html(self) -> str:
+        """Convert table to HTML with merge cell support."""
+        if not self.rows:
+            return ""
+        
+        merge_info = self._calculate_merge_info()
+        html_parts = ['<table border="1">']
+        
+        for row_idx, row in enumerate(self.rows):
+            html_parts.append('<tr>')
+            
+            for col_idx, cell in enumerate(row):
+                if col_idx < len(merge_info[row_idx]):
+                    colspan, rowspan = merge_info[row_idx][col_idx]
+                    
+                    if colspan == 0 or rowspan == 0:
+                        continue
+                    
+                    cell_text = re.sub(r'\s+', ' ', cell.text).strip()
+                    
+                    attrs = []
+                    if colspan > 1:
+                        attrs.append(f'colspan="{colspan}"')
+                    if rowspan > 1:
+                        attrs.append(f'rowspan="{rowspan}"')
+                    
+                    attr_str = ' ' + ' '.join(attrs) if attrs else ''
+                    html_parts.append(f'<td{attr_str}>{cell_text}</td>')
+                else:
+                    cell_text = re.sub(r'\s+', ' ', cell.text).strip()
+                    html_parts.append(f'<td>{cell_text}</td>')
+            
+            html_parts.append('</tr>')
+        
+        html_parts.append('</table>')
+        return '\n'.join(html_parts)
+    
+    def to_text_list(self) -> str:
+        """
+        Convert 1-column table to text list.
+        
+        - 1x1 table: Return cell content only (container table)
+        - nx1 table: Return rows separated by blank lines
+        """
+        if not self.rows:
+            return ""
+        
+        if len(self.rows) == 1 and len(self.rows[0]) == 1:
+            return self.rows[0][0].text
+        
+        lines = []
+        for row in self.rows:
+            if row:
+                cell_text = row[0].text
+                if cell_text:
+                    lines.append(cell_text)
+        
+        return '\n\n'.join(lines)
+    
+    def _calculate_merge_info(self) -> List[List[tuple]]:
+        """Calculate colspan and rowspan for each cell."""
+        if not self.rows:
+            return []
+        
+        num_rows = len(self.rows)
+        max_cols = max(len(row) for row in self.rows) if self.rows else 0
+        
+        if max_cols == 0:
+            return []
+        
+        # Initialize with (1, 1) for all cells
+        merge_info = [[(1, 1) for _ in range(max_cols)] for _ in range(num_rows)]
+        
+        # Process horizontal merges
+        for row_idx, row in enumerate(self.rows):
+            col_idx = 0
+            while col_idx < len(row):
+                cell = row[col_idx]
+                
+                if cell.h_merge_first:
+                    colspan = 1
+                    for next_col in range(col_idx + 1, len(row)):
+                        if row[next_col].h_merge_cont:
+                            colspan += 1
+                            merge_info[row_idx][next_col] = (0, 0)
+                        else:
+                            break
+                    merge_info[row_idx][col_idx] = (colspan, 1)
+                
+                col_idx += 1
+        
+        # Process vertical merges
+        for col_idx in range(max_cols):
+            row_idx = 0
+            while row_idx < num_rows:
+                if col_idx >= len(self.rows[row_idx]):
+                    row_idx += 1
+                    continue
+                
+                cell = self.rows[row_idx][col_idx]
+                
+                if cell.v_merge_first:
+                    rowspan = 1
+                    for next_row in range(row_idx + 1, num_rows):
+                        if col_idx < len(self.rows[next_row]) and self.rows[next_row][col_idx].v_merge_cont:
+                            rowspan += 1
+                            merge_info[next_row][col_idx] = (0, 0)
+                        else:
+                            break
+                    
+                    current_colspan = merge_info[row_idx][col_idx][0]
+                    merge_info[row_idx][col_idx] = (current_colspan, rowspan)
+                    row_idx += rowspan
+                elif cell.v_merge_cont:
+                    merge_info[row_idx][col_idx] = (0, 0)
+                    row_idx += 1
+                else:
+                    row_idx += 1
+        
+        return merge_info
+
+
+# =============================================================================
+# Table Extraction Functions
+# =============================================================================
+
+def extract_tables_with_positions(
+    content: str,
+    encoding: str = "cp949"
+) -> Tuple[List[RTFTable], List[Tuple[int, int, RTFTable]]]:
+    """
+    Extract tables from RTF content with position information.
+    
+    RTF table structure:
+    - \\trowd: Table row start (row definition)
+    - \\cellxN: Cell boundary position
+    - \\clmgf: Horizontal merge start
+    - \\clmrg: Horizontal merge continue
+    - \\clvmgf: Vertical merge start
+    - \\clvmrg: Vertical merge continue
+    - \\intbl: Paragraph in cell
+    - \\cell: Cell end
+    - \\row: Row end
+    
+    Args:
+        content: RTF string content
+        encoding: Encoding to use
+        
+    Returns:
+        Tuple of (table list, table region list [(start, end, table), ...])
+    """
+    tables = []
+    table_regions = []
+    
+    # Find excluded regions (header, footer, footnote, etc.)
+    excluded_regions = find_excluded_regions(content)
+    
+    # Step 1: Find all \row positions
+    row_positions = []
+    for match in re.finditer(r'\\row(?![a-z])', content):
+        row_positions.append(match.end())
+    
+    if not row_positions:
+        return tables, table_regions
+    
+    # Step 2: Find \trowd before each \row
+    all_rows = []
+    for i, row_end in enumerate(row_positions):
+        if i == 0:
+            search_start = 0
+        else:
+            search_start = row_positions[i - 1]
+        
+        segment = content[search_start:row_end]
+        trowd_match = re.search(r'\\trowd', segment)
+        
+        if trowd_match:
+            row_start = search_start + trowd_match.start()
+            
+            # Skip rows in excluded regions
+            if is_in_excluded_region(row_start, excluded_regions):
+                logger.debug(f"Skipping table row at {row_start} (in header/footer/footnote)")
+                continue
+            
+            row_text = content[row_start:row_end]
+            all_rows.append((row_start, row_end, row_text))
+    
+    if not all_rows:
+        return tables, table_regions
+    
+    # Group consecutive rows into tables
+    table_groups = []
+    current_table = []
+    current_start = -1
+    current_end = -1
+    prev_end = -1
+    
+    for row_start, row_end, row_text in all_rows:
+        # Rows within 150 chars are same table
+        if prev_end == -1 or row_start - prev_end < 150:
+            if current_start == -1:
+                current_start = row_start
+            current_table.append(row_text)
+            current_end = row_end
+        else:
+            if current_table:
+                table_groups.append((current_start, current_end, current_table))
+            current_table = [row_text]
+            current_start = row_start
+            current_end = row_end
+        prev_end = row_end
+    
+    if current_table:
+        table_groups.append((current_start, current_end, current_table))
+    
+    logger.info(f"Found {len(table_groups)} table groups")
+    
+    # Parse each table group
+    for start_pos, end_pos, table_rows in table_groups:
+        table = _parse_table_with_merge(table_rows, encoding)
+        if table and table.rows:
+            table.position = start_pos
+            table.end_position = end_pos
+            tables.append(table)
+            table_regions.append((start_pos, end_pos, table))
+    
+    logger.info(f"Extracted {len(tables)} tables")
+    return tables, table_regions
+
+
+def _parse_table_with_merge(rows: List[str], encoding: str = "cp949") -> Optional[RTFTable]:
+    """
+    Parse table rows to RTFTable object with merge support.
+    
+    Args:
+        rows: Table row text list
+        encoding: Encoding to use
+        
+    Returns:
+        RTFTable object
+    """
+    table = RTFTable()
+    
+    for row_text in rows:
+        cells = _extract_cells_with_merge(row_text, encoding)
+        if cells:
+            table.rows.append(cells)
+            if len(cells) > table.col_count:
+                table.col_count = len(cells)
+    
+    return table if table.rows else None
+
+
+def _extract_cells_with_merge(row_text: str, encoding: str = "cp949") -> List[RTFCellInfo]:
+    """
+    Extract cell content and merge information from table row.
+    
+    Args:
+        row_text: Table row RTF text
+        encoding: Encoding to use
+        
+    Returns:
+        List of RTFCellInfo
+    """
+    cells = []
+    
+    # Step 1: Parse cell definitions (attributes before cellx)
+    cell_defs = []
+    
+    # Find first \cell that is not \cellx
+    first_cell_idx = -1
+    pos = 0
+    while True:
+        idx = row_text.find('\\cell', pos)
+        if idx == -1:
+            first_cell_idx = len(row_text)
+            break
+        if idx + 5 < len(row_text) and row_text[idx + 5] == 'x':
+            pos = idx + 1
+            continue
+        first_cell_idx = idx
+        break
+    
+    def_part = row_text[:first_cell_idx]
+    
+    current_def = {
+        'h_merge_first': False,
+        'h_merge_cont': False,
+        'v_merge_first': False,
+        'v_merge_cont': False,
+        'right_boundary': 0
+    }
+    
+    cell_def_pattern = r'\\cl(?:mgf|mrg|vmgf|vmrg)|\\cellx(-?\d+)'
+    
+    for match in re.finditer(cell_def_pattern, def_part):
+        token = match.group()
+        if token == '\\clmgf':
+            current_def['h_merge_first'] = True
+        elif token == '\\clmrg':
+            current_def['h_merge_cont'] = True
+        elif token == '\\clvmgf':
+            current_def['v_merge_first'] = True
+        elif token == '\\clvmrg':
+            current_def['v_merge_cont'] = True
+        elif token.startswith('\\cellx'):
+            if match.group(1):
+                current_def['right_boundary'] = int(match.group(1))
+            cell_defs.append(current_def.copy())
+            current_def = {
+                'h_merge_first': False,
+                'h_merge_cont': False,
+                'v_merge_first': False,
+                'v_merge_cont': False,
+                'right_boundary': 0
+            }
+    
+    # Step 2: Extract cell texts
+    cell_texts = _extract_cell_texts(row_text, encoding)
+    
+    # Step 3: Match cell definitions with content
+    for i, cell_text in enumerate(cell_texts):
+        if i < len(cell_defs):
+            cell_def = cell_defs[i]
+        else:
+            cell_def = {
+                'h_merge_first': False,
+                'h_merge_cont': False,
+                'v_merge_first': False,
+                'v_merge_cont': False,
+                'right_boundary': 0
+            }
+        
+        cells.append(RTFCellInfo(
+            text=cell_text,
+            h_merge_first=cell_def['h_merge_first'],
+            h_merge_cont=cell_def['h_merge_cont'],
+            v_merge_first=cell_def['v_merge_first'],
+            v_merge_cont=cell_def['v_merge_cont'],
+            right_boundary=cell_def['right_boundary']
+        ))
+    
+    return cells
+
+
+def _extract_cell_texts(row_text: str, encoding: str = "cp949") -> List[str]:
+    """
+    Extract cell texts from row.
+    
+    Args:
+        row_text: Table row RTF text
+        encoding: Encoding to use
+        
+    Returns:
+        List of cell texts
+    """
+    cell_texts = []
+    
+    # Step 1: Find all \cell positions (not \cellx)
+    cell_positions = []
+    pos = 0
+    while True:
+        idx = row_text.find('\\cell', pos)
+        if idx == -1:
+            break
+        next_pos = idx + 5
+        if next_pos < len(row_text) and row_text[next_pos] == 'x':
+            pos = idx + 1
+            continue
+        cell_positions.append(idx)
+        pos = idx + 1
+    
+    if not cell_positions:
+        return cell_texts
+    
+    # Step 2: Find last \cellx before first \cell
+    first_cell_pos = cell_positions[0]
+    def_part = row_text[:first_cell_pos]
+    
+    last_cellx_end = 0
+    for match in re.finditer(r'\\cellx-?\d+', def_part):
+        last_cellx_end = match.end()
+    
+    # Step 3: Extract each cell content
+    prev_end = last_cellx_end
+    for cell_end in cell_positions:
+        cell_content = row_text[prev_end:cell_end]
+        
+        # RTF decoding and cleaning
+        decoded = decode_hex_escapes(cell_content, encoding)
+        clean = clean_rtf_text(decoded, encoding)
+        cell_texts.append(clean)
+        
+        prev_end = cell_end + 5  # len('\\cell') = 5
+    
+    return cell_texts
+
+
+__all__ = [
+    'RTFCellInfo',
+    'RTFTable',
+    'extract_tables_with_positions',
+]
diff --git a/contextifier/core/processor/doc_helpers/rtf_text_cleaner.py b/contextifier/core/processor/rtf_helper/rtf_text_cleaner.py
similarity index 66%
rename from contextifier/core/processor/doc_helpers/rtf_text_cleaner.py
rename to contextifier/core/processor/rtf_helper/rtf_text_cleaner.py
index ebeb61d..5aff9c4 100644
--- a/contextifier/core/processor/doc_helpers/rtf_text_cleaner.py
+++ b/contextifier/core/processor/rtf_helper/rtf_text_cleaner.py
@@ -1,63 +1,60 @@
-# service/document_processor/processor/doc_helpers/rtf_text_cleaner.py
+# contextifier/core/processor/rtf_helper/rtf_text_cleaner.py
 """
-RTF 텍스트 정리 유틸리티
+RTF Text Cleaner
 
-RTF 제어 코드 제거 및 텍스트 정리 관련 함수들을 제공합니다.
+Functions for removing RTF control codes and cleaning text.
 """
 import re
 from typing import List
 
-from contextifier.core.processor.doc_helpers.rtf_constants import (
+from contextifier.core.processor.rtf_helper.rtf_constants import (
     SHAPE_PROPERTY_NAMES,
+    SKIP_DESTINATIONS,
+    IMAGE_DESTINATIONS,
 )
-from contextifier.core.processor.doc_helpers.rtf_decoder import (
+from contextifier.core.processor.rtf_helper.rtf_decoder import (
     decode_bytes,
 )
 
 
 def clean_rtf_text(text: str, encoding: str = "cp949") -> str:
     """
-    RTF 제어 코드를 안전하게 제거하고 순수 텍스트만 추출합니다.
-
-    토큰 기반 파싱으로 내용 유실을 방지합니다.
-
+    Remove RTF control codes and extract pure text.
+    
+    Uses token-based parsing to prevent content loss.
+    
     Args:
-        text: RTF 텍스트
-        encoding: 사용할 인코딩
-
+        text: RTF text
+        encoding: Encoding for decoding
+        
     Returns:
-        정리된 텍스트
+        Cleaned text
     """
     if not text:
         return ""
-
-    # 전처리: 이미지 태그 보호 (임시 마커로 치환)
+    
+    # Protect image tags (replace with temporary markers)
     image_tags = []
     def save_image_tag(m):
         image_tags.append(m.group())
         return f'\x00IMG{len(image_tags)-1}\x00'
-
+    
     text = re.sub(r'\[image:[^\]]+\]', save_image_tag, text)
-
-    # 전처리: Shape 속성 제거 ({\sp{\sn name}{\sv value}} 형식)
+    
+    # Remove shape properties
     text = re.sub(r'\{\\sp\{\\sn\s*\w+\}\{\\sv\s*[^}]*\}\}', '', text)
-
-    # Shape 속성이 직접 출력된 경우도 제거 (shapeType202fFlipH0... 형태)
     text = re.sub(r'shapeType\d+[a-zA-Z0-9]+(?:posrelh\d+posrelv\d+)?', '', text)
-
-    # \shp 관련 제어 워드 제거
     text = re.sub(r'\\shp(?:inst|txt|left|right|top|bottom|bx\w+|by\w+|wr\d+|fblwtxt\d+|z\d+|lid\d+)\b\d*', '', text)
-
+    
     result = []
     i = 0
     n = len(text)
-
+    
     while i < n:
         ch = text[i]
-
-        # 이미지 태그 마커 복원
+        
+        # Restore image tag markers
         if ch == '\x00' and i + 3 < n and text[i+1:i+4] == 'IMG':
-            # \x00IMGn\x00 패턴 찾기
             end_idx = text.find('\x00', i + 4)
             if end_idx != -1:
                 try:
@@ -67,13 +64,12 @@ def save_image_tag(m):
                     continue
                 except (ValueError, IndexError):
                     pass
-
+        
         if ch == '\\':
-            # 제어 워드 또는 제어 기호
             if i + 1 < n:
                 next_ch = text[i + 1]
-
-                # 특수 이스케이프 처리
+                
+                # Special escapes
                 if next_ch == '\\':
                     result.append('\\')
                     i += 2
@@ -99,12 +95,11 @@ def save_image_tag(m):
                     i += 2
                     continue
                 elif next_ch == "'":
-                    # hex escape \'XX
+                    # Hex escape \'XX
                     if i + 3 < n:
                         try:
                             hex_val = text[i+2:i+4]
                             byte_val = int(hex_val, 16)
-                            # 단일 바이트 디코딩 시도
                             try:
                                 result.append(bytes([byte_val]).decode(encoding))
                             except:
@@ -119,33 +114,32 @@ def save_image_tag(m):
                     i += 1
                     continue
                 elif next_ch == '*':
-                    # \* - destination 마커, 건너뛰기
+                    # \* destination marker, skip
                     i += 2
                     continue
                 elif next_ch.isalpha():
-                    # 제어 워드: \word[N][delimiter]
+                    # Control word: \word[N][delimiter]
                     j = i + 1
                     while j < n and text[j].isalpha():
                         j += 1
-
+                    
                     control_word = text[i+1:j]
-
-                    # 숫자 파라미터 스킵
+                    
+                    # Skip numeric parameter
                     while j < n and (text[j].isdigit() or text[j] == '-'):
                         j += 1
-
-                    # 구분자 처리 (공백은 제어 워드의 일부)
+                    
+                    # Handle delimiter (space is part of control word)
                     if j < n and text[j] == ' ':
                         j += 1
-
-                    # 특별 처리가 필요한 제어 워드
+                    
+                    # Special control words
                     if control_word in ('par', 'line'):
                         result.append('\n')
                     elif control_word == 'tab':
                         result.append('\t')
                     elif control_word == 'u':
-                        # 유니코드: \uN?
-                        # 이미 파라미터를 스킵했으므로 다시 파싱
+                        # Unicode: \uN?
                         um = re.match(r'\\u(-?\d+)\??', text[i:])
                         if um:
                             try:
@@ -156,65 +150,55 @@ def save_image_tag(m):
                             except:
                                 pass
                             j = i + um.end()
-                    # 다른 제어 워드는 무시
-
+                    
                     i = j
                     continue
-
+            
             i += 1
         elif ch == '{' or ch == '}':
-            # 중괄호는 건너뛰기
             i += 1
         elif ch == '\r' or ch == '\n':
-            # RTF에서 줄바꿈 문자는 무시 (\par가 실제 줄바꿈)
             i += 1
         else:
-            # 일반 텍스트
             result.append(ch)
             i += 1
-
-    # 최종 정리
+    
     text_result = ''.join(result)
-
-    # Shape 속성 이름 제거
+    
+    # Remove shape property names
     shape_name_pattern = r'\b(' + '|'.join(SHAPE_PROPERTY_NAMES) + r')\b'
     text_result = re.sub(shape_name_pattern, '', text_result)
-
-    # 숫자만 있는 쓰레기 제거 (예: -231, -1, -5 등)
+    
+    # Remove garbage numbers
     text_result = re.sub(r'\s*-\d+\s*', ' ', text_result)
-
-    # Binary/Hex 데이터 제거
+    
+    # Remove hex data outside image tags
     text_result = _remove_hex_outside_image_tags(text_result)
-
-    # 여러 공백을 하나로
+    
+    # Normalize whitespace
     text_result = re.sub(r'\s+', ' ', text_result)
-
+    
     return text_result.strip()
 
 
 def _remove_hex_outside_image_tags(text: str) -> str:
-    """이미지 태그 외부의 긴 hex 문자열만 제거"""
-    # 이미지 태그 위치 찾기
+    """Remove long hex strings outside image tags."""
     protected_ranges = []
     for m in re.finditer(r'\[image:[^\]]+\]', text):
         protected_ranges.append((m.start(), m.end()))
-
+    
     if not protected_ranges:
-        # 이미지 태그가 없으면 그냥 제거
         return re.sub(r'(?<![a-zA-Z])[0-9a-fA-F]{32,}(?![a-zA-Z])', '', text)
-
-    # 이미지 태그 외부에서만 hex 제거
+    
     result = []
     last_end = 0
     for start, end in protected_ranges:
-        # 보호 영역 전 부분에서 hex 제거
         before = text[last_end:start]
         before = re.sub(r'(?<![a-zA-Z])[0-9a-fA-F]{32,}(?![a-zA-Z])', '', before)
         result.append(before)
-        # 보호 영역(이미지 태그)은 그대로 유지
         result.append(text[start:end])
         last_end = end
-    # 마지막 보호 영역 이후
+    
     after = text[last_end:]
     after = re.sub(r'(?<![a-zA-Z])[0-9a-fA-F]{32,}(?![a-zA-Z])', '', after)
     result.append(after)
@@ -222,46 +206,37 @@ def _remove_hex_outside_image_tags(text: str) -> str:
 
 
 def remove_destination_groups(content: str) -> str:
-    r"""
-    RTF destination 그룹 {\*\destination...}을 제거합니다.
-
-    문서 끝에 나타나는 themedata, colorschememapping, latentstyles, datastore 등을
-    제거하여 메타데이터가 텍스트로 추출되는 것을 방지합니다.
-
+    """
+    Remove RTF destination groups {\\*\\destination...}.
+    
+    Removes themedata, colorschememapping, latentstyles, datastore, etc.
+    to prevent metadata from being extracted as text.
+    
     Args:
-        content: RTF 콘텐츠
-
+        content: RTF content
+        
     Returns:
-        destination 그룹이 제거된 콘텐츠
+        Content with destination groups removed
     """
-    from contextifier.core.processor.doc_helpers.rtf_constants import (
-        SKIP_DESTINATIONS,
-        IMAGE_DESTINATIONS,
-    )
-
     result = []
     i = 0
     n = len(content)
-
+    
     while i < n:
-        # {\* 패턴 감지
         if content[i:i+3] == '{\\*':
-            # destination 이름 추출
             j = i + 3
             while j < n and content[j] in ' \t\r\n':
                 j += 1
-
+            
             if j < n and content[j] == '\\':
-                # 제어 워드 추출
                 k = j + 1
                 while k < n and content[k].isalpha():
                     k += 1
                 ctrl_word = content[j+1:k]
-
+                
                 if ctrl_word in SKIP_DESTINATIONS:
-                    # 이 그룹 전체를 건너뛰기
                     depth = 1
-                    i += 1  # '{' 다음으로
+                    i += 1
                     while i < n and depth > 0:
                         if content[i] == '{':
                             depth += 1
@@ -269,70 +244,62 @@ def remove_destination_groups(content: str) -> str:
                             depth -= 1
                         i += 1
                     continue
-
+                
                 if ctrl_word in IMAGE_DESTINATIONS:
-                    # 이미지 태그는 보존하면서 그룹 제거
                     depth = 1
                     group_start = i
-                    i += 1  # '{' 다음으로
+                    i += 1
                     while i < n and depth > 0:
                         if content[i] == '{':
                             depth += 1
                         elif content[i] == '}':
                             depth -= 1
                         i += 1
-
-                    # 그룹 내에서 유효한 이미지 태그만 추출
+                    
                     group_content = content[group_start:i]
                     image_tag_match = re.search(r'\[image:[^\]]+\]', group_content)
                     if image_tag_match:
                         tag = image_tag_match.group()
-                        # 유효한 태그인지 확인
                         if '/uploads/.' not in tag and 'uploads/.' not in tag:
                             result.append(tag)
                     continue
-
+        
         result.append(content[i])
         i += 1
-
+    
     return ''.join(result)
 
 
 def remove_shape_groups(content: str) -> str:
     """
-    Shape 그룹을 제거하되, shptxt 내의 텍스트는 보존합니다.
-
-    RTF Shape 구조:
-    {\\shp{\\*\\shpinst...{\\sp{\\sn xxx}{\\sv yyy}}...{\\shptxt 실제텍스트}}}
-
+    Remove shape groups but preserve text in shptxt.
+    
+    RTF Shape structure:
+    {\\shp{\\*\\shpinst...{\\sp{\\sn xxx}{\\sv yyy}}...{\\shptxt actual_text}}}
+    
     Args:
-        content: RTF 콘텐츠
-
+        content: RTF content
+        
     Returns:
-        Shape 그룹이 정리된 콘텐츠
+        Content with shape groups cleaned
     """
     result = []
     i = 0
-
+    
     while i < len(content):
-        # \shp 시작 감지
         if content[i:i+5] == '{\\shp' or content[i:i+10] == '{\\*\\shpinst':
-            # Shape 그룹 시작
-            # shptxt 내용만 추출하고 나머지는 건너뛰기
             depth = 1
-            start = i
             i += 1
             shptxt_content = []
             in_shptxt = False
             shptxt_depth = 0
-
+            
             while i < len(content) and depth > 0:
                 if content[i] == '{':
-                    # \shptxt 시작 확인
                     if content[i:i+8] == '{\\shptxt':
                         in_shptxt = True
                         shptxt_depth = depth + 1
-                        i += 8  # '{\\shptxt' 건너뛰기
+                        i += 8
                         continue
                     depth += 1
                 elif content[i] == '}':
@@ -342,78 +309,63 @@ def remove_shape_groups(content: str) -> str:
                 elif in_shptxt:
                     shptxt_content.append(content[i])
                 i += 1
-
-            # shptxt 내용이 있으면 추가
+            
             if shptxt_content:
-                shptxt_text = ''.join(shptxt_content)
-                result.append(shptxt_text)
+                result.append(''.join(shptxt_content))
         else:
             result.append(content[i])
             i += 1
-
+    
     return ''.join(result)
 
 
 def remove_shape_property_groups(content: str) -> str:
     """
-    Shape 속성 그룹 {\\sp{\\sn xxx}{\\sv yyy}}를 제거합니다.
-
+    Remove shape property groups {\\sp{\\sn xxx}{\\sv yyy}}.
+    
     Args:
-        content: RTF 콘텐츠
-
+        content: RTF content
+        
     Returns:
-        Shape 속성이 제거된 콘텐츠
+        Content with shape properties removed
     """
-    # {\\sp{\\sn ...}{\\sv ...}} 패턴 제거
     content = re.sub(r'\{\\sp\{\\sn\s*[^}]*\}\{\\sv\s*[^}]*\}\}', '', content)
-
-    # 개별 {\\sp ...} 패턴도 제거
     content = re.sub(r'\{\\sp\s*[^}]*\}', '', content)
-
-    # {\\sn ...} 패턴 제거
     content = re.sub(r'\{\\sn\s*[^}]*\}', '', content)
-
-    # {\\sv ...} 패턴 제거
     content = re.sub(r'\{\\sv\s*[^}]*\}', '', content)
-
     return content
 
 
 def remove_shprslt_blocks(content: str) -> str:
-    r"""
-    \shprslt{...} 블록을 제거합니다.
-
-    Word는 Shape (도형/테이블)를 \shp 블록으로 저장하고,
-    이전 버전 호환성을 위해 \shprslt 블록에 동일한 내용을 중복 저장합니다.
-
+    """
+    Remove \\shprslt{...} blocks.
+    
+    Word saves Shape (drawing/table) in \\shp block and duplicates
+    the same content in \\shprslt block for backward compatibility.
+    
     Args:
-        content: RTF 콘텐츠
-
+        content: RTF content
+        
     Returns:
-        \shprslt 블록이 제거된 콘텐츠
+        Content with \\shprslt blocks removed
     """
     result = []
     i = 0
     pattern = '\\shprslt'
-
+    
     while i < len(content):
-        # \shprslt 찾기
         idx = content.find(pattern, i)
         if idx == -1:
             result.append(content[i:])
             break
-
-        # \shprslt 전까지 추가
+        
         result.append(content[i:idx])
-
-        # \shprslt{ 다음의 중괄호 블록 건너뛰기
+        
         brace_start = content.find('{', idx)
         if brace_start == -1:
-            # 중괄호가 없으면 \shprslt만 건너뛰기
             i = idx + len(pattern)
             continue
-
-        # 매칭되는 닫는 중괄호 찾기
+        
         depth = 1
         j = brace_start + 1
         while j < len(content) and depth > 0:
@@ -422,7 +374,16 @@ def remove_shprslt_blocks(content: str) -> str:
             elif content[j] == '}':
                 depth -= 1
             j += 1
-
+        
         i = j
-
+    
     return ''.join(result)
+
+
+__all__ = [
+    'clean_rtf_text',
+    'remove_destination_groups',
+    'remove_shape_groups',
+    'remove_shape_property_groups',
+    'remove_shprslt_blocks',
+]
diff --git a/contextifier/core/processor/text_handler.py b/contextifier/core/processor/text_handler.py
index 28e2e27..0393c0a 100644
--- a/contextifier/core/processor/text_handler.py
+++ b/contextifier/core/processor/text_handler.py
@@ -10,6 +10,8 @@
 from contextifier.core.processor.base_handler import BaseHandler
 from contextifier.core.functions.utils import clean_text, clean_code_text
 from contextifier.core.functions.chart_extractor import BaseChartExtractor, NullChartExtractor
+from contextifier.core.processor.text_helper.text_image_processor import TextImageProcessor
+from contextifier.core.functions.img_processor import ImageProcessor
 
 if TYPE_CHECKING:
     from contextifier.core.document_processor import CurrentFile
@@ -22,11 +24,29 @@
 
 class TextHandler(BaseHandler):
     """Text File Processing Handler Class"""
-    
+
+    def _create_file_converter(self):
+        """Create text-specific file converter."""
+        from contextifier.core.processor.text_helper.text_file_converter import TextFileConverter
+        return TextFileConverter()
+
+    def _create_preprocessor(self):
+        """Create text-specific preprocessor."""
+        from contextifier.core.processor.text_helper.text_preprocessor import TextPreprocessor
+        return TextPreprocessor()
+
     def _create_chart_extractor(self) -> BaseChartExtractor:
         """Text files do not contain charts. Return NullChartExtractor."""
         return NullChartExtractor(self._chart_processor)
-    
+
+    def _create_metadata_extractor(self):
+        """Text files do not have embedded metadata. Return None (uses NullMetadataExtractor)."""
+        return None
+
+    def _create_format_image_processor(self) -> ImageProcessor:
+        """Create text-specific image processor."""
+        return TextImageProcessor()
+
     def extract_text(
         self,
         current_file: "CurrentFile",
@@ -38,7 +58,7 @@ def extract_text(
     ) -> str:
         """
         Extract text from text file.
-        
+
         Args:
             current_file: CurrentFile dict containing file info and binary data
             extract_metadata: Whether to extract metadata (ignored for text files)
@@ -46,14 +66,19 @@ def extract_text(
             encodings: List of encodings to try
             is_code: Whether this is a code file
             **kwargs: Additional options
-            
+
         Returns:
             Extracted text
         """
         file_path = current_file.get("file_path", "unknown")
         file_data = current_file.get("file_data", b"")
         enc = encodings or DEFAULT_ENCODINGS
-        
+
+        # Step 1: No file_converter for text files (direct decode)
+        # Step 2: Preprocess - clean_content is the TRUE SOURCE
+        preprocessed = self.preprocess(file_data)
+        file_data = preprocessed.clean_content  # TRUE SOURCE
+
         for e in enc:
             try:
                 text = file_data.decode(e)
@@ -65,5 +90,5 @@ def extract_text(
             except Exception as ex:
                 self.logger.error(f"Error decoding file {file_path} with {e}: {ex}")
                 continue
-        
+
         raise Exception(f"Could not decode file {file_path} with any supported encoding")
diff --git a/contextifier/core/processor/text_helper/__init__.py b/contextifier/core/processor/text_helper/__init__.py
new file mode 100644
index 0000000..f0723e1
--- /dev/null
+++ b/contextifier/core/processor/text_helper/__init__.py
@@ -0,0 +1,17 @@
+# contextifier/core/processor/text_helper/__init__.py
+"""
+Text Helper 모듈
+
+텍스트 파일 처리에 필요한 유틸리티를 제공합니다.
+
+모듈 구성:
+- text_image_processor: 텍스트 파일용 이미지 프로세서
+"""
+
+from contextifier.core.processor.text_helper.text_image_processor import (
+    TextImageProcessor,
+)
+
+__all__ = [
+    "TextImageProcessor",
+]
diff --git a/contextifier/core/processor/text_helper/text_file_converter.py b/contextifier/core/processor/text_helper/text_file_converter.py
new file mode 100644
index 0000000..165c133
--- /dev/null
+++ b/contextifier/core/processor/text_helper/text_file_converter.py
@@ -0,0 +1,27 @@
+# libs/core/processor/text_helper/text_file_converter.py
+"""
+TextFileConverter - Text file format converter
+
+Converts binary text data to string with encoding detection.
+"""
+from typing import Optional, BinaryIO
+
+from contextifier.core.functions.file_converter import TextFileConverter as BaseTextFileConverter
+
+
+class TextFileConverter(BaseTextFileConverter):
+    """
+    Text file converter.
+    
+    Converts binary text data to decoded string.
+    Inherits from base TextFileConverter.
+    """
+    
+    def __init__(self):
+        """Initialize with common text encodings."""
+        super().__init__(encodings=['utf-8', 'utf-8-sig', 'cp949', 'euc-kr', 'latin-1', 'ascii'])
+    
+    def get_format_name(self) -> str:
+        """Return format name."""
+        enc = self._detected_encoding or 'unknown'
+        return f"Text File ({enc})"
diff --git a/contextifier/core/processor/text_helper/text_image_processor.py b/contextifier/core/processor/text_helper/text_image_processor.py
new file mode 100644
index 0000000..e6498d3
--- /dev/null
+++ b/contextifier/core/processor/text_helper/text_image_processor.py
@@ -0,0 +1,75 @@
+# contextifier/core/processor/text_helper/text_image_processor.py
+"""
+Text Image Processor
+
+Provides text-specific image processing that inherits from ImageProcessor.
+Text files do not contain embedded images, so this is a minimal implementation.
+"""
+import logging
+from typing import Any, Optional
+
+from contextifier.core.functions.img_processor import ImageProcessor
+from contextifier.core.functions.storage_backend import BaseStorageBackend
+
+logger = logging.getLogger("contextify.image_processor.text")
+
+
+class TextImageProcessor(ImageProcessor):
+    """
+    Text-specific image processor.
+    
+    Inherits from ImageProcessor and provides text-specific processing.
+    Text files do not contain embedded images, so this processor
+    provides a consistent interface without additional functionality.
+    
+    This class exists to maintain interface consistency across all handlers.
+    
+    Example:
+        processor = TextImageProcessor()
+        
+        # No images in text files, but interface is consistent
+        tag = processor.process_image(image_data)  # Falls back to base implementation
+    """
+    
+    def __init__(
+        self,
+        directory_path: str = "temp/images",
+        tag_prefix: str = "[Image:",
+        tag_suffix: str = "]",
+        storage_backend: Optional[BaseStorageBackend] = None,
+    ):
+        """
+        Initialize TextImageProcessor.
+        
+        Args:
+            directory_path: Image save directory
+            tag_prefix: Tag prefix for image references
+            tag_suffix: Tag suffix for image references
+            storage_backend: Storage backend for saving images
+        """
+        super().__init__(
+            directory_path=directory_path,
+            tag_prefix=tag_prefix,
+            tag_suffix=tag_suffix,
+            storage_backend=storage_backend,
+        )
+    
+    def process_image(
+        self,
+        image_data: bytes,
+        **kwargs
+    ) -> Optional[str]:
+        """
+        Process and save image data.
+        
+        Text files do not contain embedded images, so this method
+        delegates to the base implementation.
+        
+        Args:
+            image_data: Raw image binary data
+            **kwargs: Additional options
+            
+        Returns:
+            Image tag string or None if processing failed
+        """
+        return super().process_image(image_data, **kwargs)
diff --git a/contextifier/core/processor/text_helper/text_preprocessor.py b/contextifier/core/processor/text_helper/text_preprocessor.py
new file mode 100644
index 0000000..961077f
--- /dev/null
+++ b/contextifier/core/processor/text_helper/text_preprocessor.py
@@ -0,0 +1,82 @@
+# contextifier/core/processor/text_helper/text_preprocessor.py
+"""
+Text Preprocessor - Process text content after conversion.
+
+Processing Pipeline Position:
+    1. TextFileConverter.convert() → str
+    2. TextPreprocessor.preprocess() → PreprocessedData (THIS STEP)
+    3. TextMetadataExtractor.extract() → DocumentMetadata (if any)
+    4. Content extraction
+
+Current Implementation:
+    - Pass-through (Text uses decoded string content directly)
+"""
+import logging
+from typing import Any, Dict
+
+from contextifier.core.functions.preprocessor import (
+    BasePreprocessor,
+    PreprocessedData,
+)
+
+logger = logging.getLogger("contextify.text.preprocessor")
+
+
+class TextPreprocessor(BasePreprocessor):
+    """
+    Text Content Preprocessor.
+
+    Currently a pass-through implementation as text processing
+    is straightforward.
+    """
+
+    def preprocess(
+        self,
+        converted_data: Any,
+        **kwargs
+    ) -> PreprocessedData:
+        """
+        Preprocess the converted text content.
+
+        Args:
+            converted_data: Text string from TextFileConverter
+            **kwargs: Additional options
+
+        Returns:
+            PreprocessedData with the content
+        """
+        metadata: Dict[str, Any] = {}
+
+        content = ""
+        encoding = kwargs.get("encoding", "utf-8")
+
+        if isinstance(converted_data, str):
+            content = converted_data
+            metadata['char_count'] = len(content)
+            metadata['line_count'] = len(content.split('\n'))
+        elif isinstance(converted_data, bytes):
+            content = converted_data.decode(encoding, errors='replace')
+            metadata['char_count'] = len(content)
+            metadata['line_count'] = len(content.split('\n'))
+
+        logger.debug("Text preprocessor: pass-through, metadata=%s", metadata)
+
+        # clean_content is the TRUE SOURCE - contains the processed text/bytes
+        return PreprocessedData(
+            raw_content=converted_data,
+            clean_content=converted_data,  # TRUE SOURCE - bytes or str
+            encoding=encoding,
+            extracted_resources={},
+            metadata=metadata,
+        )
+
+    def get_format_name(self) -> str:
+        """Return format name."""
+        return "Text Preprocessor"
+
+    def validate(self, data: Any) -> bool:
+        """Validate if data is text content."""
+        return isinstance(data, (str, bytes))
+
+
+__all__ = ['TextPreprocessor']
diff --git a/pyproject.toml b/pyproject.toml
index 568e3ea..81698c0 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 
 [project]
 name = "contextifier"
-version = "0.1.6"
+version = "0.2.0"
 description = "Convert raw documents into AI-understandable context with intelligent text extraction, table detection, and semantic chunking"
 readme = "README.md"
 requires-python = ">=3.12"
@@ -75,7 +75,6 @@ dependencies = [
     "pdf2image==1.17.0",
     "pytesseract==0.3.13",
     "striprtf==0.0.29",
-    "matplotlib==3.10.8",
     "cachetools==6.2.4",
 ]
 
diff --git a/uv.lock b/uv.lock
index a766275..1d59561 100644
--- a/uv.lock
+++ b/uv.lock
@@ -373,7 +373,7 @@ wheels = [
 
 [[package]]
 name = "contextifier"
-version = "0.1.5"
+version = "0.2.0"
 source = { editable = "." }
 dependencies = [
     { name = "beautifulsoup4" },
@@ -390,7 +390,6 @@ dependencies = [
     { name = "langchain-text-splitters" },
     { name = "langgraph" },
     { name = "langsmith" },
-    { name = "matplotlib" },
     { name = "olefile" },
     { name = "openpyxl" },
     { name = "orjson" },
@@ -430,7 +429,6 @@ requires-dist = [
     { name = "langchain-text-splitters", specifier = "==1.1.0" },
     { name = "langgraph", specifier = "==1.0.5" },
     { name = "langsmith", specifier = "==0.6.2" },
-    { name = "matplotlib", specifier = "==3.10.8" },
     { name = "olefile", specifier = "==0.47" },
     { name = "openpyxl", specifier = "==3.1.5" },
     { name = "orjson", specifier = "==3.11.5" },
@@ -454,72 +452,6 @@ requires-dist = [
     { name = "xlrd", specifier = "==2.0.2" },
 ]
 
-[[package]]
-name = "contourpy"
-version = "1.3.3"
-source = { registry = "https://pypi.org/simple" }
-dependencies = [
-    { name = "numpy" },
-]
-sdist = { url = "https://files.pythonhosted.org/packages/58/01/1253e6698a07380cd31a736d248a3f2a50a7c88779a1813da27503cadc2a/contourpy-1.3.3.tar.gz", hash = "sha256:083e12155b210502d0bca491432bb04d56dc3432f95a979b429f2848c3dbe880", size = 13466174, upload-time = "2025-07-26T12:03:12.549Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/be/45/adfee365d9ea3d853550b2e735f9d66366701c65db7855cd07621732ccfc/contourpy-1.3.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:b08a32ea2f8e42cf1d4be3169a98dd4be32bafe4f22b6c4cb4ba810fa9e5d2cb", size = 293419, upload-time = "2025-07-26T12:01:21.16Z" },
-    { url = "https://files.pythonhosted.org/packages/53/3e/405b59cfa13021a56bba395a6b3aca8cec012b45bf177b0eaf7a202cde2c/contourpy-1.3.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:556dba8fb6f5d8742f2923fe9457dbdd51e1049c4a43fd3986a0b14a1d815fc6", size = 273979, upload-time = "2025-07-26T12:01:22.448Z" },
-    { url = "https://files.pythonhosted.org/packages/d4/1c/a12359b9b2ca3a845e8f7f9ac08bdf776114eb931392fcad91743e2ea17b/contourpy-1.3.3-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:92d9abc807cf7d0e047b95ca5d957cf4792fcd04e920ca70d48add15c1a90ea7", size = 332653, upload-time = "2025-07-26T12:01:24.155Z" },
-    { url = "https://files.pythonhosted.org/packages/63/12/897aeebfb475b7748ea67b61e045accdfcf0d971f8a588b67108ed7f5512/contourpy-1.3.3-cp312-cp312-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:b2e8faa0ed68cb29af51edd8e24798bb661eac3bd9f65420c1887b6ca89987c8", size = 379536, upload-time = "2025-07-26T12:01:25.91Z" },
-    { url = "https://files.pythonhosted.org/packages/43/8a/a8c584b82deb248930ce069e71576fc09bd7174bbd35183b7943fb1064fd/contourpy-1.3.3-cp312-cp312-manylinux_2_26_s390x.manylinux_2_28_s390x.whl", hash = "sha256:626d60935cf668e70a5ce6ff184fd713e9683fb458898e4249b63be9e28286ea", size = 384397, upload-time = "2025-07-26T12:01:27.152Z" },
-    { url = "https://files.pythonhosted.org/packages/cc/8f/ec6289987824b29529d0dfda0d74a07cec60e54b9c92f3c9da4c0ac732de/contourpy-1.3.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:4d00e655fcef08aba35ec9610536bfe90267d7ab5ba944f7032549c55a146da1", size = 362601, upload-time = "2025-07-26T12:01:28.808Z" },
-    { url = "https://files.pythonhosted.org/packages/05/0a/a3fe3be3ee2dceb3e615ebb4df97ae6f3828aa915d3e10549ce016302bd1/contourpy-1.3.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:451e71b5a7d597379ef572de31eeb909a87246974d960049a9848c3bc6c41bf7", size = 1331288, upload-time = "2025-07-26T12:01:31.198Z" },
-    { url = "https://files.pythonhosted.org/packages/33/1d/acad9bd4e97f13f3e2b18a3977fe1b4a37ecf3d38d815333980c6c72e963/contourpy-1.3.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:459c1f020cd59fcfe6650180678a9993932d80d44ccde1fa1868977438f0b411", size = 1403386, upload-time = "2025-07-26T12:01:33.947Z" },
-    { url = "https://files.pythonhosted.org/packages/cf/8f/5847f44a7fddf859704217a99a23a4f6417b10e5ab1256a179264561540e/contourpy-1.3.3-cp312-cp312-win32.whl", hash = "sha256:023b44101dfe49d7d53932be418477dba359649246075c996866106da069af69", size = 185018, upload-time = "2025-07-26T12:01:35.64Z" },
-    { url = "https://files.pythonhosted.org/packages/19/e8/6026ed58a64563186a9ee3f29f41261fd1828f527dd93d33b60feca63352/contourpy-1.3.3-cp312-cp312-win_amd64.whl", hash = "sha256:8153b8bfc11e1e4d75bcb0bff1db232f9e10b274e0929de9d608027e0d34ff8b", size = 226567, upload-time = "2025-07-26T12:01:36.804Z" },
-    { url = "https://files.pythonhosted.org/packages/d1/e2/f05240d2c39a1ed228d8328a78b6f44cd695f7ef47beb3e684cf93604f86/contourpy-1.3.3-cp312-cp312-win_arm64.whl", hash = "sha256:07ce5ed73ecdc4a03ffe3e1b3e3c1166db35ae7584be76f65dbbe28a7791b0cc", size = 193655, upload-time = "2025-07-26T12:01:37.999Z" },
-    { url = "https://files.pythonhosted.org/packages/68/35/0167aad910bbdb9599272bd96d01a9ec6852f36b9455cf2ca67bd4cc2d23/contourpy-1.3.3-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:177fb367556747a686509d6fef71d221a4b198a3905fe824430e5ea0fda54eb5", size = 293257, upload-time = "2025-07-26T12:01:39.367Z" },
-    { url = "https://files.pythonhosted.org/packages/96/e4/7adcd9c8362745b2210728f209bfbcf7d91ba868a2c5f40d8b58f54c509b/contourpy-1.3.3-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:d002b6f00d73d69333dac9d0b8d5e84d9724ff9ef044fd63c5986e62b7c9e1b1", size = 274034, upload-time = "2025-07-26T12:01:40.645Z" },
-    { url = "https://files.pythonhosted.org/packages/73/23/90e31ceeed1de63058a02cb04b12f2de4b40e3bef5e082a7c18d9c8ae281/contourpy-1.3.3-cp313-cp313-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:348ac1f5d4f1d66d3322420f01d42e43122f43616e0f194fc1c9f5d830c5b286", size = 334672, upload-time = "2025-07-26T12:01:41.942Z" },
-    { url = "https://files.pythonhosted.org/packages/ed/93/b43d8acbe67392e659e1d984700e79eb67e2acb2bd7f62012b583a7f1b55/contourpy-1.3.3-cp313-cp313-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:655456777ff65c2c548b7c454af9c6f33f16c8884f11083244b5819cc214f1b5", size = 381234, upload-time = "2025-07-26T12:01:43.499Z" },
-    { url = "https://files.pythonhosted.org/packages/46/3b/bec82a3ea06f66711520f75a40c8fc0b113b2a75edb36aa633eb11c4f50f/contourpy-1.3.3-cp313-cp313-manylinux_2_26_s390x.manylinux_2_28_s390x.whl", hash = "sha256:644a6853d15b2512d67881586bd03f462c7ab755db95f16f14d7e238f2852c67", size = 385169, upload-time = "2025-07-26T12:01:45.219Z" },
-    { url = "https://files.pythonhosted.org/packages/4b/32/e0f13a1c5b0f8572d0ec6ae2f6c677b7991fafd95da523159c19eff0696a/contourpy-1.3.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:4debd64f124ca62069f313a9cb86656ff087786016d76927ae2cf37846b006c9", size = 362859, upload-time = "2025-07-26T12:01:46.519Z" },
-    { url = "https://files.pythonhosted.org/packages/33/71/e2a7945b7de4e58af42d708a219f3b2f4cff7386e6b6ab0a0fa0033c49a9/contourpy-1.3.3-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:a15459b0f4615b00bbd1e91f1b9e19b7e63aea7483d03d804186f278c0af2659", size = 1332062, upload-time = "2025-07-26T12:01:48.964Z" },
-    { url = "https://files.pythonhosted.org/packages/12/fc/4e87ac754220ccc0e807284f88e943d6d43b43843614f0a8afa469801db0/contourpy-1.3.3-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:ca0fdcd73925568ca027e0b17ab07aad764be4706d0a925b89227e447d9737b7", size = 1403932, upload-time = "2025-07-26T12:01:51.979Z" },
-    { url = "https://files.pythonhosted.org/packages/a6/2e/adc197a37443f934594112222ac1aa7dc9a98faf9c3842884df9a9d8751d/contourpy-1.3.3-cp313-cp313-win32.whl", hash = "sha256:b20c7c9a3bf701366556e1b1984ed2d0cedf999903c51311417cf5f591d8c78d", size = 185024, upload-time = "2025-07-26T12:01:53.245Z" },
-    { url = "https://files.pythonhosted.org/packages/18/0b/0098c214843213759692cc638fce7de5c289200a830e5035d1791d7a2338/contourpy-1.3.3-cp313-cp313-win_amd64.whl", hash = "sha256:1cadd8b8969f060ba45ed7c1b714fe69185812ab43bd6b86a9123fe8f99c3263", size = 226578, upload-time = "2025-07-26T12:01:54.422Z" },
-    { url = "https://files.pythonhosted.org/packages/8a/9a/2f6024a0c5995243cd63afdeb3651c984f0d2bc727fd98066d40e141ad73/contourpy-1.3.3-cp313-cp313-win_arm64.whl", hash = "sha256:fd914713266421b7536de2bfa8181aa8c699432b6763a0ea64195ebe28bff6a9", size = 193524, upload-time = "2025-07-26T12:01:55.73Z" },
-    { url = "https://files.pythonhosted.org/packages/c0/b3/f8a1a86bd3298513f500e5b1f5fd92b69896449f6cab6a146a5d52715479/contourpy-1.3.3-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:88df9880d507169449d434c293467418b9f6cbe82edd19284aa0409e7fdb933d", size = 306730, upload-time = "2025-07-26T12:01:57.051Z" },
-    { url = "https://files.pythonhosted.org/packages/3f/11/4780db94ae62fc0c2053909b65dc3246bd7cecfc4f8a20d957ad43aa4ad8/contourpy-1.3.3-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:d06bb1f751ba5d417047db62bca3c8fde202b8c11fb50742ab3ab962c81e8216", size = 287897, upload-time = "2025-07-26T12:01:58.663Z" },
-    { url = "https://files.pythonhosted.org/packages/ae/15/e59f5f3ffdd6f3d4daa3e47114c53daabcb18574a26c21f03dc9e4e42ff0/contourpy-1.3.3-cp313-cp313t-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e4e6b05a45525357e382909a4c1600444e2a45b4795163d3b22669285591c1ae", size = 326751, upload-time = "2025-07-26T12:02:00.343Z" },
-    { url = "https://files.pythonhosted.org/packages/0f/81/03b45cfad088e4770b1dcf72ea78d3802d04200009fb364d18a493857210/contourpy-1.3.3-cp313-cp313t-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:ab3074b48c4e2cf1a960e6bbeb7f04566bf36b1861d5c9d4d8ac04b82e38ba20", size = 375486, upload-time = "2025-07-26T12:02:02.128Z" },
-    { url = "https://files.pythonhosted.org/packages/0c/ba/49923366492ffbdd4486e970d421b289a670ae8cf539c1ea9a09822b371a/contourpy-1.3.3-cp313-cp313t-manylinux_2_26_s390x.manylinux_2_28_s390x.whl", hash = "sha256:6c3d53c796f8647d6deb1abe867daeb66dcc8a97e8455efa729516b997b8ed99", size = 388106, upload-time = "2025-07-26T12:02:03.615Z" },
-    { url = "https://files.pythonhosted.org/packages/9f/52/5b00ea89525f8f143651f9f03a0df371d3cbd2fccd21ca9b768c7a6500c2/contourpy-1.3.3-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:50ed930df7289ff2a8d7afeb9603f8289e5704755c7e5c3bbd929c90c817164b", size = 352548, upload-time = "2025-07-26T12:02:05.165Z" },
-    { url = "https://files.pythonhosted.org/packages/32/1d/a209ec1a3a3452d490f6b14dd92e72280c99ae3d1e73da74f8277d4ee08f/contourpy-1.3.3-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:4feffb6537d64b84877da813a5c30f1422ea5739566abf0bd18065ac040e120a", size = 1322297, upload-time = "2025-07-26T12:02:07.379Z" },
-    { url = "https://files.pythonhosted.org/packages/bc/9e/46f0e8ebdd884ca0e8877e46a3f4e633f6c9c8c4f3f6e72be3fe075994aa/contourpy-1.3.3-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:2b7e9480ffe2b0cd2e787e4df64270e3a0440d9db8dc823312e2c940c167df7e", size = 1391023, upload-time = "2025-07-26T12:02:10.171Z" },
-    { url = "https://files.pythonhosted.org/packages/b9/70/f308384a3ae9cd2209e0849f33c913f658d3326900d0ff5d378d6a1422d2/contourpy-1.3.3-cp313-cp313t-win32.whl", hash = "sha256:283edd842a01e3dcd435b1c5116798d661378d83d36d337b8dde1d16a5fc9ba3", size = 196157, upload-time = "2025-07-26T12:02:11.488Z" },
-    { url = "https://files.pythonhosted.org/packages/b2/dd/880f890a6663b84d9e34a6f88cded89d78f0091e0045a284427cb6b18521/contourpy-1.3.3-cp313-cp313t-win_amd64.whl", hash = "sha256:87acf5963fc2b34825e5b6b048f40e3635dd547f590b04d2ab317c2619ef7ae8", size = 240570, upload-time = "2025-07-26T12:02:12.754Z" },
-    { url = "https://files.pythonhosted.org/packages/80/99/2adc7d8ffead633234817ef8e9a87115c8a11927a94478f6bb3d3f4d4f7d/contourpy-1.3.3-cp313-cp313t-win_arm64.whl", hash = "sha256:3c30273eb2a55024ff31ba7d052dde990d7d8e5450f4bbb6e913558b3d6c2301", size = 199713, upload-time = "2025-07-26T12:02:14.4Z" },
-    { url = "https://files.pythonhosted.org/packages/72/8b/4546f3ab60f78c514ffb7d01a0bd743f90de36f0019d1be84d0a708a580a/contourpy-1.3.3-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:fde6c716d51c04b1c25d0b90364d0be954624a0ee9d60e23e850e8d48353d07a", size = 292189, upload-time = "2025-07-26T12:02:16.095Z" },
-    { url = "https://files.pythonhosted.org/packages/fd/e1/3542a9cb596cadd76fcef413f19c79216e002623158befe6daa03dbfa88c/contourpy-1.3.3-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:cbedb772ed74ff5be440fa8eee9bd49f64f6e3fc09436d9c7d8f1c287b121d77", size = 273251, upload-time = "2025-07-26T12:02:17.524Z" },
-    { url = "https://files.pythonhosted.org/packages/b1/71/f93e1e9471d189f79d0ce2497007731c1e6bf9ef6d1d61b911430c3db4e5/contourpy-1.3.3-cp314-cp314-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:22e9b1bd7a9b1d652cd77388465dc358dafcd2e217d35552424aa4f996f524f5", size = 335810, upload-time = "2025-07-26T12:02:18.9Z" },
-    { url = "https://files.pythonhosted.org/packages/91/f9/e35f4c1c93f9275d4e38681a80506b5510e9327350c51f8d4a5a724d178c/contourpy-1.3.3-cp314-cp314-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:a22738912262aa3e254e4f3cb079a95a67132fc5a063890e224393596902f5a4", size = 382871, upload-time = "2025-07-26T12:02:20.418Z" },
-    { url = "https://files.pythonhosted.org/packages/b5/71/47b512f936f66a0a900d81c396a7e60d73419868fba959c61efed7a8ab46/contourpy-1.3.3-cp314-cp314-manylinux_2_26_s390x.manylinux_2_28_s390x.whl", hash = "sha256:afe5a512f31ee6bd7d0dda52ec9864c984ca3d66664444f2d72e0dc4eb832e36", size = 386264, upload-time = "2025-07-26T12:02:21.916Z" },
-    { url = "https://files.pythonhosted.org/packages/04/5f/9ff93450ba96b09c7c2b3f81c94de31c89f92292f1380261bd7195bea4ea/contourpy-1.3.3-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:f64836de09927cba6f79dcd00fdd7d5329f3fccc633468507079c829ca4db4e3", size = 363819, upload-time = "2025-07-26T12:02:23.759Z" },
-    { url = "https://files.pythonhosted.org/packages/3e/a6/0b185d4cc480ee494945cde102cb0149ae830b5fa17bf855b95f2e70ad13/contourpy-1.3.3-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:1fd43c3be4c8e5fd6e4f2baeae35ae18176cf2e5cced681cca908addf1cdd53b", size = 1333650, upload-time = "2025-07-26T12:02:26.181Z" },
-    { url = "https://files.pythonhosted.org/packages/43/d7/afdc95580ca56f30fbcd3060250f66cedbde69b4547028863abd8aa3b47e/contourpy-1.3.3-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:6afc576f7b33cf00996e5c1102dc2a8f7cc89e39c0b55df93a0b78c1bd992b36", size = 1404833, upload-time = "2025-07-26T12:02:28.782Z" },
-    { url = "https://files.pythonhosted.org/packages/e2/e2/366af18a6d386f41132a48f033cbd2102e9b0cf6345d35ff0826cd984566/contourpy-1.3.3-cp314-cp314-win32.whl", hash = "sha256:66c8a43a4f7b8df8b71ee1840e4211a3c8d93b214b213f590e18a1beca458f7d", size = 189692, upload-time = "2025-07-26T12:02:30.128Z" },
-    { url = "https://files.pythonhosted.org/packages/7d/c2/57f54b03d0f22d4044b8afb9ca0e184f8b1afd57b4f735c2fa70883dc601/contourpy-1.3.3-cp314-cp314-win_amd64.whl", hash = "sha256:cf9022ef053f2694e31d630feaacb21ea24224be1c3ad0520b13d844274614fd", size = 232424, upload-time = "2025-07-26T12:02:31.395Z" },
-    { url = "https://files.pythonhosted.org/packages/18/79/a9416650df9b525737ab521aa181ccc42d56016d2123ddcb7b58e926a42c/contourpy-1.3.3-cp314-cp314-win_arm64.whl", hash = "sha256:95b181891b4c71de4bb404c6621e7e2390745f887f2a026b2d99e92c17892339", size = 198300, upload-time = "2025-07-26T12:02:32.956Z" },
-    { url = "https://files.pythonhosted.org/packages/1f/42/38c159a7d0f2b7b9c04c64ab317042bb6952b713ba875c1681529a2932fe/contourpy-1.3.3-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:33c82d0138c0a062380332c861387650c82e4cf1747aaa6938b9b6516762e772", size = 306769, upload-time = "2025-07-26T12:02:34.2Z" },
-    { url = "https://files.pythonhosted.org/packages/c3/6c/26a8205f24bca10974e77460de68d3d7c63e282e23782f1239f226fcae6f/contourpy-1.3.3-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:ea37e7b45949df430fe649e5de8351c423430046a2af20b1c1961cae3afcda77", size = 287892, upload-time = "2025-07-26T12:02:35.807Z" },
-    { url = "https://files.pythonhosted.org/packages/66/06/8a475c8ab718ebfd7925661747dbb3c3ee9c82ac834ccb3570be49d129f4/contourpy-1.3.3-cp314-cp314t-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d304906ecc71672e9c89e87c4675dc5c2645e1f4269a5063b99b0bb29f232d13", size = 326748, upload-time = "2025-07-26T12:02:37.193Z" },
-    { url = "https://files.pythonhosted.org/packages/b4/a3/c5ca9f010a44c223f098fccd8b158bb1cb287378a31ac141f04730dc49be/contourpy-1.3.3-cp314-cp314t-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:ca658cd1a680a5c9ea96dc61cdbae1e85c8f25849843aa799dfd3cb370ad4fbe", size = 375554, upload-time = "2025-07-26T12:02:38.894Z" },
-    { url = "https://files.pythonhosted.org/packages/80/5b/68bd33ae63fac658a4145088c1e894405e07584a316738710b636c6d0333/contourpy-1.3.3-cp314-cp314t-manylinux_2_26_s390x.manylinux_2_28_s390x.whl", hash = "sha256:ab2fd90904c503739a75b7c8c5c01160130ba67944a7b77bbf36ef8054576e7f", size = 388118, upload-time = "2025-07-26T12:02:40.642Z" },
-    { url = "https://files.pythonhosted.org/packages/40/52/4c285a6435940ae25d7410a6c36bda5145839bc3f0beb20c707cda18b9d2/contourpy-1.3.3-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:b7301b89040075c30e5768810bc96a8e8d78085b47d8be6e4c3f5a0b4ed478a0", size = 352555, upload-time = "2025-07-26T12:02:42.25Z" },
-    { url = "https://files.pythonhosted.org/packages/24/ee/3e81e1dd174f5c7fefe50e85d0892de05ca4e26ef1c9a59c2a57e43b865a/contourpy-1.3.3-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:2a2a8b627d5cc6b7c41a4beff6c5ad5eb848c88255fda4a8745f7e901b32d8e4", size = 1322295, upload-time = "2025-07-26T12:02:44.668Z" },
-    { url = "https://files.pythonhosted.org/packages/3c/b2/6d913d4d04e14379de429057cd169e5e00f6c2af3bb13e1710bcbdb5da12/contourpy-1.3.3-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:fd6ec6be509c787f1caf6b247f0b1ca598bef13f4ddeaa126b7658215529ba0f", size = 1391027, upload-time = "2025-07-26T12:02:47.09Z" },
-    { url = "https://files.pythonhosted.org/packages/93/8a/68a4ec5c55a2971213d29a9374913f7e9f18581945a7a31d1a39b5d2dfe5/contourpy-1.3.3-cp314-cp314t-win32.whl", hash = "sha256:e74a9a0f5e3fff48fb5a7f2fd2b9b70a3fe014a67522f79b7cca4c0c7e43c9ae", size = 202428, upload-time = "2025-07-26T12:02:48.691Z" },
-    { url = "https://files.pythonhosted.org/packages/fa/96/fd9f641ffedc4fa3ace923af73b9d07e869496c9cc7a459103e6e978992f/contourpy-1.3.3-cp314-cp314t-win_amd64.whl", hash = "sha256:13b68d6a62db8eafaebb8039218921399baf6e47bf85006fd8529f2a08ef33fc", size = 250331, upload-time = "2025-07-26T12:02:50.137Z" },
-    { url = "https://files.pythonhosted.org/packages/ae/8c/469afb6465b853afff216f9528ffda78a915ff880ed58813ba4faf4ba0b6/contourpy-1.3.3-cp314-cp314t-win_arm64.whl", hash = "sha256:b7448cb5a725bb1e35ce88771b86fba35ef418952474492cf7c764059933ff8b", size = 203831, upload-time = "2025-07-26T12:02:51.449Z" },
-]
-
 [[package]]
 name = "cryptography"
 version = "46.0.3"
@@ -576,15 +508,6 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/e8/cb/2da4cc83f5edb9c3257d09e1e7ab7b23f049c7962cae8d842bbef0a9cec9/cryptography-46.0.3-cp38-abi3-win_arm64.whl", hash = "sha256:d89c3468de4cdc4f08a57e214384d0471911a3830fcdaf7a8cc587e42a866372", size = 2918740, upload-time = "2025-10-15T23:18:12.277Z" },
 ]
 
-[[package]]
-name = "cycler"
-version = "0.12.1"
-source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/a9/95/a3dbbb5028f35eafb79008e7522a75244477d2838f38cbb722248dabc2a8/cycler-0.12.1.tar.gz", hash = "sha256:88bb128f02ba341da8ef447245a9e138fae777f6a23943da4540077d3601eb1c", size = 7615, upload-time = "2023-10-07T05:32:18.335Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/e7/05/c19819d5e3d95294a6f5947fb9b9629efb316b96de511b418c53d245aae6/cycler-0.12.1-py3-none-any.whl", hash = "sha256:85cef7cff222d8644161529808465972e51340599459b8ac3ccbac5a854e0d30", size = 8321, upload-time = "2023-10-07T05:32:16.783Z" },
-]
-
 [[package]]
 name = "dataclasses-json"
 version = "0.6.7"
@@ -648,47 +571,6 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/18/79/1b8fa1bb3568781e84c9200f951c735f3f157429f44be0495da55894d620/filetype-1.2.0-py2.py3-none-any.whl", hash = "sha256:7ce71b6880181241cf7ac8697a2f1eb6a8bd9b429f7ad6d27b8db9ba5f1c2d25", size = 19970, upload-time = "2022-11-02T17:34:01.425Z" },
 ]
 
-[[package]]
-name = "fonttools"
-version = "4.61.1"
-source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/ec/ca/cf17b88a8df95691275a3d77dc0a5ad9907f328ae53acbe6795da1b2f5ed/fonttools-4.61.1.tar.gz", hash = "sha256:6675329885c44657f826ef01d9e4fb33b9158e9d93c537d84ad8399539bc6f69", size = 3565756, upload-time = "2025-12-12T17:31:24.246Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/6f/16/7decaa24a1bd3a70c607b2e29f0adc6159f36a7e40eaba59846414765fd4/fonttools-4.61.1-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:f3cb4a569029b9f291f88aafc927dd53683757e640081ca8c412781ea144565e", size = 2851593, upload-time = "2025-12-12T17:30:04.225Z" },
-    { url = "https://files.pythonhosted.org/packages/94/98/3c4cb97c64713a8cf499b3245c3bf9a2b8fd16a3e375feff2aed78f96259/fonttools-4.61.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:41a7170d042e8c0024703ed13b71893519a1a6d6e18e933e3ec7507a2c26a4b2", size = 2400231, upload-time = "2025-12-12T17:30:06.47Z" },
-    { url = "https://files.pythonhosted.org/packages/b7/37/82dbef0f6342eb01f54bca073ac1498433d6ce71e50c3c3282b655733b31/fonttools-4.61.1-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:10d88e55330e092940584774ee5e8a6971b01fc2f4d3466a1d6c158230880796", size = 4954103, upload-time = "2025-12-12T17:30:08.432Z" },
-    { url = "https://files.pythonhosted.org/packages/6c/44/f3aeac0fa98e7ad527f479e161aca6c3a1e47bb6996b053d45226fe37bf2/fonttools-4.61.1-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:15acc09befd16a0fb8a8f62bc147e1a82817542d72184acca9ce6e0aeda9fa6d", size = 5004295, upload-time = "2025-12-12T17:30:10.56Z" },
-    { url = "https://files.pythonhosted.org/packages/14/e8/7424ced75473983b964d09f6747fa09f054a6d656f60e9ac9324cf40c743/fonttools-4.61.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:e6bcdf33aec38d16508ce61fd81838f24c83c90a1d1b8c68982857038673d6b8", size = 4944109, upload-time = "2025-12-12T17:30:12.874Z" },
-    { url = "https://files.pythonhosted.org/packages/c8/8b/6391b257fa3d0b553d73e778f953a2f0154292a7a7a085e2374b111e5410/fonttools-4.61.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:5fade934607a523614726119164ff621e8c30e8fa1ffffbbd358662056ba69f0", size = 5093598, upload-time = "2025-12-12T17:30:15.79Z" },
-    { url = "https://files.pythonhosted.org/packages/d9/71/fd2ea96cdc512d92da5678a1c98c267ddd4d8c5130b76d0f7a80f9a9fde8/fonttools-4.61.1-cp312-cp312-win32.whl", hash = "sha256:75da8f28eff26defba42c52986de97b22106cb8f26515b7c22443ebc9c2d3261", size = 2269060, upload-time = "2025-12-12T17:30:18.058Z" },
-    { url = "https://files.pythonhosted.org/packages/80/3b/a3e81b71aed5a688e89dfe0e2694b26b78c7d7f39a5ffd8a7d75f54a12a8/fonttools-4.61.1-cp312-cp312-win_amd64.whl", hash = "sha256:497c31ce314219888c0e2fce5ad9178ca83fe5230b01a5006726cdf3ac9f24d9", size = 2319078, upload-time = "2025-12-12T17:30:22.862Z" },
-    { url = "https://files.pythonhosted.org/packages/4b/cf/00ba28b0990982530addb8dc3e9e6f2fa9cb5c20df2abdda7baa755e8fe1/fonttools-4.61.1-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:8c56c488ab471628ff3bfa80964372fc13504ece601e0d97a78ee74126b2045c", size = 2846454, upload-time = "2025-12-12T17:30:24.938Z" },
-    { url = "https://files.pythonhosted.org/packages/5a/ca/468c9a8446a2103ae645d14fee3f610567b7042aba85031c1c65e3ef7471/fonttools-4.61.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:dc492779501fa723b04d0ab1f5be046797fee17d27700476edc7ee9ae535a61e", size = 2398191, upload-time = "2025-12-12T17:30:27.343Z" },
-    { url = "https://files.pythonhosted.org/packages/a3/4b/d67eedaed19def5967fade3297fed8161b25ba94699efc124b14fb68cdbc/fonttools-4.61.1-cp313-cp313-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:64102ca87e84261419c3747a0d20f396eb024bdbeb04c2bfb37e2891f5fadcb5", size = 4928410, upload-time = "2025-12-12T17:30:29.771Z" },
-    { url = "https://files.pythonhosted.org/packages/b0/8d/6fb3494dfe61a46258cd93d979cf4725ded4eb46c2a4ca35e4490d84daea/fonttools-4.61.1-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4c1b526c8d3f615a7b1867f38a9410849c8f4aef078535742198e942fba0e9bd", size = 4984460, upload-time = "2025-12-12T17:30:32.073Z" },
-    { url = "https://files.pythonhosted.org/packages/f7/f1/a47f1d30b3dc00d75e7af762652d4cbc3dff5c2697a0dbd5203c81afd9c3/fonttools-4.61.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:41ed4b5ec103bd306bb68f81dc166e77409e5209443e5773cb4ed837bcc9b0d3", size = 4925800, upload-time = "2025-12-12T17:30:34.339Z" },
-    { url = "https://files.pythonhosted.org/packages/a7/01/e6ae64a0981076e8a66906fab01539799546181e32a37a0257b77e4aa88b/fonttools-4.61.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:b501c862d4901792adaec7c25b1ecc749e2662543f68bb194c42ba18d6eec98d", size = 5067859, upload-time = "2025-12-12T17:30:36.593Z" },
-    { url = "https://files.pythonhosted.org/packages/73/aa/28e40b8d6809a9b5075350a86779163f074d2b617c15d22343fce81918db/fonttools-4.61.1-cp313-cp313-win32.whl", hash = "sha256:4d7092bb38c53bbc78e9255a59158b150bcdc115a1e3b3ce0b5f267dc35dd63c", size = 2267821, upload-time = "2025-12-12T17:30:38.478Z" },
-    { url = "https://files.pythonhosted.org/packages/1a/59/453c06d1d83dc0951b69ef692d6b9f1846680342927df54e9a1ca91c6f90/fonttools-4.61.1-cp313-cp313-win_amd64.whl", hash = "sha256:21e7c8d76f62ab13c9472ccf74515ca5b9a761d1bde3265152a6dc58700d895b", size = 2318169, upload-time = "2025-12-12T17:30:40.951Z" },
-    { url = "https://files.pythonhosted.org/packages/32/8f/4e7bf82c0cbb738d3c2206c920ca34ca74ef9dabde779030145d28665104/fonttools-4.61.1-cp314-cp314-macosx_10_15_universal2.whl", hash = "sha256:fff4f534200a04b4a36e7ae3cb74493afe807b517a09e99cb4faa89a34ed6ecd", size = 2846094, upload-time = "2025-12-12T17:30:43.511Z" },
-    { url = "https://files.pythonhosted.org/packages/71/09/d44e45d0a4f3a651f23a1e9d42de43bc643cce2971b19e784cc67d823676/fonttools-4.61.1-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:d9203500f7c63545b4ce3799319fe4d9feb1a1b89b28d3cb5abd11b9dd64147e", size = 2396589, upload-time = "2025-12-12T17:30:45.681Z" },
-    { url = "https://files.pythonhosted.org/packages/89/18/58c64cafcf8eb677a99ef593121f719e6dcbdb7d1c594ae5a10d4997ca8a/fonttools-4.61.1-cp314-cp314-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:fa646ecec9528bef693415c79a86e733c70a4965dd938e9a226b0fc64c9d2e6c", size = 4877892, upload-time = "2025-12-12T17:30:47.709Z" },
-    { url = "https://files.pythonhosted.org/packages/8a/ec/9e6b38c7ba1e09eb51db849d5450f4c05b7e78481f662c3b79dbde6f3d04/fonttools-4.61.1-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:11f35ad7805edba3aac1a3710d104592df59f4b957e30108ae0ba6c10b11dd75", size = 4972884, upload-time = "2025-12-12T17:30:49.656Z" },
-    { url = "https://files.pythonhosted.org/packages/5e/87/b5339da8e0256734ba0dbbf5b6cdebb1dd79b01dc8c270989b7bcd465541/fonttools-4.61.1-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:b931ae8f62db78861b0ff1ac017851764602288575d65b8e8ff1963fed419063", size = 4924405, upload-time = "2025-12-12T17:30:51.735Z" },
-    { url = "https://files.pythonhosted.org/packages/0b/47/e3409f1e1e69c073a3a6fd8cb886eb18c0bae0ee13db2c8d5e7f8495e8b7/fonttools-4.61.1-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:b148b56f5de675ee16d45e769e69f87623a4944f7443850bf9a9376e628a89d2", size = 5035553, upload-time = "2025-12-12T17:30:54.823Z" },
-    { url = "https://files.pythonhosted.org/packages/bf/b6/1f6600161b1073a984294c6c031e1a56ebf95b6164249eecf30012bb2e38/fonttools-4.61.1-cp314-cp314-win32.whl", hash = "sha256:9b666a475a65f4e839d3d10473fad6d47e0a9db14a2f4a224029c5bfde58ad2c", size = 2271915, upload-time = "2025-12-12T17:30:57.913Z" },
-    { url = "https://files.pythonhosted.org/packages/52/7b/91e7b01e37cc8eb0e1f770d08305b3655e4f002fc160fb82b3390eabacf5/fonttools-4.61.1-cp314-cp314-win_amd64.whl", hash = "sha256:4f5686e1fe5fce75d82d93c47a438a25bf0d1319d2843a926f741140b2b16e0c", size = 2323487, upload-time = "2025-12-12T17:30:59.804Z" },
-    { url = "https://files.pythonhosted.org/packages/39/5c/908ad78e46c61c3e3ed70c3b58ff82ab48437faf84ec84f109592cabbd9f/fonttools-4.61.1-cp314-cp314t-macosx_10_15_universal2.whl", hash = "sha256:e76ce097e3c57c4bcb67c5aa24a0ecdbd9f74ea9219997a707a4061fbe2707aa", size = 2929571, upload-time = "2025-12-12T17:31:02.574Z" },
-    { url = "https://files.pythonhosted.org/packages/bd/41/975804132c6dea64cdbfbaa59f3518a21c137a10cccf962805b301ac6ab2/fonttools-4.61.1-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:9cfef3ab326780c04d6646f68d4b4742aae222e8b8ea1d627c74e38afcbc9d91", size = 2435317, upload-time = "2025-12-12T17:31:04.974Z" },
-    { url = "https://files.pythonhosted.org/packages/b0/5a/aef2a0a8daf1ebaae4cfd83f84186d4a72ee08fd6a8451289fcd03ffa8a4/fonttools-4.61.1-cp314-cp314t-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:a75c301f96db737e1c5ed5fd7d77d9c34466de16095a266509e13da09751bd19", size = 4882124, upload-time = "2025-12-12T17:31:07.456Z" },
-    { url = "https://files.pythonhosted.org/packages/80/33/d6db3485b645b81cea538c9d1c9219d5805f0877fda18777add4671c5240/fonttools-4.61.1-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:91669ccac46bbc1d09e9273546181919064e8df73488ea087dcac3e2968df9ba", size = 5100391, upload-time = "2025-12-12T17:31:09.732Z" },
-    { url = "https://files.pythonhosted.org/packages/6c/d6/675ba631454043c75fcf76f0ca5463eac8eb0666ea1d7badae5fea001155/fonttools-4.61.1-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:c33ab3ca9d3ccd581d58e989d67554e42d8d4ded94ab3ade3508455fe70e65f7", size = 4978800, upload-time = "2025-12-12T17:31:11.681Z" },
-    { url = "https://files.pythonhosted.org/packages/7f/33/d3ec753d547a8d2bdaedd390d4a814e8d5b45a093d558f025c6b990b554c/fonttools-4.61.1-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:664c5a68ec406f6b1547946683008576ef8b38275608e1cee6c061828171c118", size = 5006426, upload-time = "2025-12-12T17:31:13.764Z" },
-    { url = "https://files.pythonhosted.org/packages/b4/40/cc11f378b561a67bea850ab50063366a0d1dd3f6d0a30ce0f874b0ad5664/fonttools-4.61.1-cp314-cp314t-win32.whl", hash = "sha256:aed04cabe26f30c1647ef0e8fbb207516fd40fe9472e9439695f5c6998e60ac5", size = 2335377, upload-time = "2025-12-12T17:31:16.49Z" },
-    { url = "https://files.pythonhosted.org/packages/e4/ff/c9a2b66b39f8628531ea58b320d66d951267c98c6a38684daa8f50fb02f8/fonttools-4.61.1-cp314-cp314t-win_amd64.whl", hash = "sha256:2180f14c141d2f0f3da43f3a81bc8aa4684860f6b0e6f9e165a4831f24e6a23b", size = 2400613, upload-time = "2025-12-12T17:31:18.769Z" },
-    { url = "https://files.pythonhosted.org/packages/c7/4e/ce75a57ff3aebf6fc1f4e9d508b8e5810618a33d900ad6c19eb30b290b97/fonttools-4.61.1-py3-none-any.whl", hash = "sha256:17d2bf5d541add43822bcf0c43d7d847b160c9bb01d15d5007d84e2217aaa371", size = 1148996, upload-time = "2025-12-12T17:31:21.03Z" },
-]
-
 [[package]]
 name = "frozenlist"
 version = "1.8.0"
@@ -1005,78 +887,6 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/71/92/5e77f98553e9e75130c78900d000368476aed74276eb8ae8796f65f00918/jsonpointer-3.0.0-py2.py3-none-any.whl", hash = "sha256:13e088adc14fca8b6aa8177c044e12701e6ad4b28ff10e65f2267a90109c9942", size = 7595, upload-time = "2024-06-10T19:24:40.698Z" },
 ]
 
-[[package]]
-name = "kiwisolver"
-version = "1.4.9"
-source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/5c/3c/85844f1b0feb11ee581ac23fe5fce65cd049a200c1446708cc1b7f922875/kiwisolver-1.4.9.tar.gz", hash = "sha256:c3b22c26c6fd6811b0ae8363b95ca8ce4ea3c202d3d0975b2914310ceb1bcc4d", size = 97564, upload-time = "2025-08-10T21:27:49.279Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/86/c9/13573a747838aeb1c76e3267620daa054f4152444d1f3d1a2324b78255b5/kiwisolver-1.4.9-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:ac5a486ac389dddcc5bef4f365b6ae3ffff2c433324fb38dd35e3fab7c957999", size = 123686, upload-time = "2025-08-10T21:26:10.034Z" },
-    { url = "https://files.pythonhosted.org/packages/51/ea/2ecf727927f103ffd1739271ca19c424d0e65ea473fbaeea1c014aea93f6/kiwisolver-1.4.9-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:f2ba92255faa7309d06fe44c3a4a97efe1c8d640c2a79a5ef728b685762a6fd2", size = 66460, upload-time = "2025-08-10T21:26:11.083Z" },
-    { url = "https://files.pythonhosted.org/packages/5b/5a/51f5464373ce2aeb5194508298a508b6f21d3867f499556263c64c621914/kiwisolver-1.4.9-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:4a2899935e724dd1074cb568ce7ac0dce28b2cd6ab539c8e001a8578eb106d14", size = 64952, upload-time = "2025-08-10T21:26:12.058Z" },
-    { url = "https://files.pythonhosted.org/packages/70/90/6d240beb0f24b74371762873e9b7f499f1e02166a2d9c5801f4dbf8fa12e/kiwisolver-1.4.9-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:f6008a4919fdbc0b0097089f67a1eb55d950ed7e90ce2cc3e640abadd2757a04", size = 1474756, upload-time = "2025-08-10T21:26:13.096Z" },
-    { url = "https://files.pythonhosted.org/packages/12/42/f36816eaf465220f683fb711efdd1bbf7a7005a2473d0e4ed421389bd26c/kiwisolver-1.4.9-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:67bb8b474b4181770f926f7b7d2f8c0248cbcb78b660fdd41a47054b28d2a752", size = 1276404, upload-time = "2025-08-10T21:26:14.457Z" },
-    { url = "https://files.pythonhosted.org/packages/2e/64/bc2de94800adc830c476dce44e9b40fd0809cddeef1fde9fcf0f73da301f/kiwisolver-1.4.9-cp312-cp312-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:2327a4a30d3ee07d2fbe2e7933e8a37c591663b96ce42a00bc67461a87d7df77", size = 1294410, upload-time = "2025-08-10T21:26:15.73Z" },
-    { url = "https://files.pythonhosted.org/packages/5f/42/2dc82330a70aa8e55b6d395b11018045e58d0bb00834502bf11509f79091/kiwisolver-1.4.9-cp312-cp312-manylinux_2_24_s390x.manylinux_2_28_s390x.whl", hash = "sha256:7a08b491ec91b1d5053ac177afe5290adacf1f0f6307d771ccac5de30592d198", size = 1343631, upload-time = "2025-08-10T21:26:17.045Z" },
-    { url = "https://files.pythonhosted.org/packages/22/fd/f4c67a6ed1aab149ec5a8a401c323cee7a1cbe364381bb6c9c0d564e0e20/kiwisolver-1.4.9-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:d8fc5c867c22b828001b6a38d2eaeb88160bf5783c6cb4a5e440efc981ce286d", size = 2224963, upload-time = "2025-08-10T21:26:18.737Z" },
-    { url = "https://files.pythonhosted.org/packages/45/aa/76720bd4cb3713314677d9ec94dcc21ced3f1baf4830adde5bb9b2430a5f/kiwisolver-1.4.9-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:3b3115b2581ea35bb6d1f24a4c90af37e5d9b49dcff267eeed14c3893c5b86ab", size = 2321295, upload-time = "2025-08-10T21:26:20.11Z" },
-    { url = "https://files.pythonhosted.org/packages/80/19/d3ec0d9ab711242f56ae0dc2fc5d70e298bb4a1f9dfab44c027668c673a1/kiwisolver-1.4.9-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:858e4c22fb075920b96a291928cb7dea5644e94c0ee4fcd5af7e865655e4ccf2", size = 2487987, upload-time = "2025-08-10T21:26:21.49Z" },
-    { url = "https://files.pythonhosted.org/packages/39/e9/61e4813b2c97e86b6fdbd4dd824bf72d28bcd8d4849b8084a357bc0dd64d/kiwisolver-1.4.9-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:ed0fecd28cc62c54b262e3736f8bb2512d8dcfdc2bcf08be5f47f96bf405b145", size = 2291817, upload-time = "2025-08-10T21:26:22.812Z" },
-    { url = "https://files.pythonhosted.org/packages/a0/41/85d82b0291db7504da3c2defe35c9a8a5c9803a730f297bd823d11d5fb77/kiwisolver-1.4.9-cp312-cp312-win_amd64.whl", hash = "sha256:f68208a520c3d86ea51acf688a3e3002615a7f0238002cccc17affecc86a8a54", size = 73895, upload-time = "2025-08-10T21:26:24.37Z" },
-    { url = "https://files.pythonhosted.org/packages/e2/92/5f3068cf15ee5cb624a0c7596e67e2a0bb2adee33f71c379054a491d07da/kiwisolver-1.4.9-cp312-cp312-win_arm64.whl", hash = "sha256:2c1a4f57df73965f3f14df20b80ee29e6a7930a57d2d9e8491a25f676e197c60", size = 64992, upload-time = "2025-08-10T21:26:25.732Z" },
-    { url = "https://files.pythonhosted.org/packages/31/c1/c2686cda909742ab66c7388e9a1a8521a59eb89f8bcfbee28fc980d07e24/kiwisolver-1.4.9-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:a5d0432ccf1c7ab14f9949eec60c5d1f924f17c037e9f8b33352fa05799359b8", size = 123681, upload-time = "2025-08-10T21:26:26.725Z" },
-    { url = "https://files.pythonhosted.org/packages/ca/f0/f44f50c9f5b1a1860261092e3bc91ecdc9acda848a8b8c6abfda4a24dd5c/kiwisolver-1.4.9-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:efb3a45b35622bb6c16dbfab491a8f5a391fe0e9d45ef32f4df85658232ca0e2", size = 66464, upload-time = "2025-08-10T21:26:27.733Z" },
-    { url = "https://files.pythonhosted.org/packages/2d/7a/9d90a151f558e29c3936b8a47ac770235f436f2120aca41a6d5f3d62ae8d/kiwisolver-1.4.9-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:1a12cf6398e8a0a001a059747a1cbf24705e18fe413bc22de7b3d15c67cffe3f", size = 64961, upload-time = "2025-08-10T21:26:28.729Z" },
-    { url = "https://files.pythonhosted.org/packages/e9/e9/f218a2cb3a9ffbe324ca29a9e399fa2d2866d7f348ec3a88df87fc248fc5/kiwisolver-1.4.9-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:b67e6efbf68e077dd71d1a6b37e43e1a99d0bff1a3d51867d45ee8908b931098", size = 1474607, upload-time = "2025-08-10T21:26:29.798Z" },
-    { url = "https://files.pythonhosted.org/packages/d9/28/aac26d4c882f14de59041636292bc838db8961373825df23b8eeb807e198/kiwisolver-1.4.9-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5656aa670507437af0207645273ccdfee4f14bacd7f7c67a4306d0dcaeaf6eed", size = 1276546, upload-time = "2025-08-10T21:26:31.401Z" },
-    { url = "https://files.pythonhosted.org/packages/8b/ad/8bfc1c93d4cc565e5069162f610ba2f48ff39b7de4b5b8d93f69f30c4bed/kiwisolver-1.4.9-cp313-cp313-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:bfc08add558155345129c7803b3671cf195e6a56e7a12f3dde7c57d9b417f525", size = 1294482, upload-time = "2025-08-10T21:26:32.721Z" },
-    { url = "https://files.pythonhosted.org/packages/da/f1/6aca55ff798901d8ce403206d00e033191f63d82dd708a186e0ed2067e9c/kiwisolver-1.4.9-cp313-cp313-manylinux_2_24_s390x.manylinux_2_28_s390x.whl", hash = "sha256:40092754720b174e6ccf9e845d0d8c7d8e12c3d71e7fc35f55f3813e96376f78", size = 1343720, upload-time = "2025-08-10T21:26:34.032Z" },
-    { url = "https://files.pythonhosted.org/packages/d1/91/eed031876c595c81d90d0f6fc681ece250e14bf6998c3d7c419466b523b7/kiwisolver-1.4.9-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:497d05f29a1300d14e02e6441cf0f5ee81c1ff5a304b0d9fb77423974684e08b", size = 2224907, upload-time = "2025-08-10T21:26:35.824Z" },
-    { url = "https://files.pythonhosted.org/packages/e9/ec/4d1925f2e49617b9cca9c34bfa11adefad49d00db038e692a559454dfb2e/kiwisolver-1.4.9-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:bdd1a81a1860476eb41ac4bc1e07b3f07259e6d55bbf739b79c8aaedcf512799", size = 2321334, upload-time = "2025-08-10T21:26:37.534Z" },
-    { url = "https://files.pythonhosted.org/packages/43/cb/450cd4499356f68802750c6ddc18647b8ea01ffa28f50d20598e0befe6e9/kiwisolver-1.4.9-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:e6b93f13371d341afee3be9f7c5964e3fe61d5fa30f6a30eb49856935dfe4fc3", size = 2488313, upload-time = "2025-08-10T21:26:39.191Z" },
-    { url = "https://files.pythonhosted.org/packages/71/67/fc76242bd99f885651128a5d4fa6083e5524694b7c88b489b1b55fdc491d/kiwisolver-1.4.9-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:d75aa530ccfaa593da12834b86a0724f58bff12706659baa9227c2ccaa06264c", size = 2291970, upload-time = "2025-08-10T21:26:40.828Z" },
-    { url = "https://files.pythonhosted.org/packages/75/bd/f1a5d894000941739f2ae1b65a32892349423ad49c2e6d0771d0bad3fae4/kiwisolver-1.4.9-cp313-cp313-win_amd64.whl", hash = "sha256:dd0a578400839256df88c16abddf9ba14813ec5f21362e1fe65022e00c883d4d", size = 73894, upload-time = "2025-08-10T21:26:42.33Z" },
-    { url = "https://files.pythonhosted.org/packages/95/38/dce480814d25b99a391abbddadc78f7c117c6da34be68ca8b02d5848b424/kiwisolver-1.4.9-cp313-cp313-win_arm64.whl", hash = "sha256:d4188e73af84ca82468f09cadc5ac4db578109e52acb4518d8154698d3a87ca2", size = 64995, upload-time = "2025-08-10T21:26:43.889Z" },
-    { url = "https://files.pythonhosted.org/packages/e2/37/7d218ce5d92dadc5ebdd9070d903e0c7cf7edfe03f179433ac4d13ce659c/kiwisolver-1.4.9-cp313-cp313t-macosx_10_13_universal2.whl", hash = "sha256:5a0f2724dfd4e3b3ac5a82436a8e6fd16baa7d507117e4279b660fe8ca38a3a1", size = 126510, upload-time = "2025-08-10T21:26:44.915Z" },
-    { url = "https://files.pythonhosted.org/packages/23/b0/e85a2b48233daef4b648fb657ebbb6f8367696a2d9548a00b4ee0eb67803/kiwisolver-1.4.9-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:1b11d6a633e4ed84fc0ddafd4ebfd8ea49b3f25082c04ad12b8315c11d504dc1", size = 67903, upload-time = "2025-08-10T21:26:45.934Z" },
-    { url = "https://files.pythonhosted.org/packages/44/98/f2425bc0113ad7de24da6bb4dae1343476e95e1d738be7c04d31a5d037fd/kiwisolver-1.4.9-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:61874cdb0a36016354853593cffc38e56fc9ca5aa97d2c05d3dcf6922cd55a11", size = 66402, upload-time = "2025-08-10T21:26:47.101Z" },
-    { url = "https://files.pythonhosted.org/packages/98/d8/594657886df9f34c4177cc353cc28ca7e6e5eb562d37ccc233bff43bbe2a/kiwisolver-1.4.9-cp313-cp313t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:60c439763a969a6af93b4881db0eed8fadf93ee98e18cbc35bc8da868d0c4f0c", size = 1582135, upload-time = "2025-08-10T21:26:48.665Z" },
-    { url = "https://files.pythonhosted.org/packages/5c/c6/38a115b7170f8b306fc929e166340c24958347308ea3012c2b44e7e295db/kiwisolver-1.4.9-cp313-cp313t-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:92a2f997387a1b79a75e7803aa7ded2cfbe2823852ccf1ba3bcf613b62ae3197", size = 1389409, upload-time = "2025-08-10T21:26:50.335Z" },
-    { url = "https://files.pythonhosted.org/packages/bf/3b/e04883dace81f24a568bcee6eb3001da4ba05114afa622ec9b6fafdc1f5e/kiwisolver-1.4.9-cp313-cp313t-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:a31d512c812daea6d8b3be3b2bfcbeb091dbb09177706569bcfc6240dcf8b41c", size = 1401763, upload-time = "2025-08-10T21:26:51.867Z" },
-    { url = "https://files.pythonhosted.org/packages/9f/80/20ace48e33408947af49d7d15c341eaee69e4e0304aab4b7660e234d6288/kiwisolver-1.4.9-cp313-cp313t-manylinux_2_24_s390x.manylinux_2_28_s390x.whl", hash = "sha256:52a15b0f35dad39862d376df10c5230155243a2c1a436e39eb55623ccbd68185", size = 1453643, upload-time = "2025-08-10T21:26:53.592Z" },
-    { url = "https://files.pythonhosted.org/packages/64/31/6ce4380a4cd1f515bdda976a1e90e547ccd47b67a1546d63884463c92ca9/kiwisolver-1.4.9-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:a30fd6fdef1430fd9e1ba7b3398b5ee4e2887783917a687d86ba69985fb08748", size = 2330818, upload-time = "2025-08-10T21:26:55.051Z" },
-    { url = "https://files.pythonhosted.org/packages/fa/e9/3f3fcba3bcc7432c795b82646306e822f3fd74df0ee81f0fa067a1f95668/kiwisolver-1.4.9-cp313-cp313t-musllinux_1_2_ppc64le.whl", hash = "sha256:cc9617b46837c6468197b5945e196ee9ca43057bb7d9d1ae688101e4e1dddf64", size = 2419963, upload-time = "2025-08-10T21:26:56.421Z" },
-    { url = "https://files.pythonhosted.org/packages/99/43/7320c50e4133575c66e9f7dadead35ab22d7c012a3b09bb35647792b2a6d/kiwisolver-1.4.9-cp313-cp313t-musllinux_1_2_s390x.whl", hash = "sha256:0ab74e19f6a2b027ea4f845a78827969af45ce790e6cb3e1ebab71bdf9f215ff", size = 2594639, upload-time = "2025-08-10T21:26:57.882Z" },
-    { url = "https://files.pythonhosted.org/packages/65/d6/17ae4a270d4a987ef8a385b906d2bdfc9fce502d6dc0d3aea865b47f548c/kiwisolver-1.4.9-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:dba5ee5d3981160c28d5490f0d1b7ed730c22470ff7f6cc26cfcfaacb9896a07", size = 2391741, upload-time = "2025-08-10T21:26:59.237Z" },
-    { url = "https://files.pythonhosted.org/packages/2a/8f/8f6f491d595a9e5912971f3f863d81baddccc8a4d0c3749d6a0dd9ffc9df/kiwisolver-1.4.9-cp313-cp313t-win_arm64.whl", hash = "sha256:0749fd8f4218ad2e851e11cc4dc05c7cbc0cbc4267bdfdb31782e65aace4ee9c", size = 68646, upload-time = "2025-08-10T21:27:00.52Z" },
-    { url = "https://files.pythonhosted.org/packages/6b/32/6cc0fbc9c54d06c2969faa9c1d29f5751a2e51809dd55c69055e62d9b426/kiwisolver-1.4.9-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:9928fe1eb816d11ae170885a74d074f57af3a0d65777ca47e9aeb854a1fba386", size = 123806, upload-time = "2025-08-10T21:27:01.537Z" },
-    { url = "https://files.pythonhosted.org/packages/b2/dd/2bfb1d4a4823d92e8cbb420fe024b8d2167f72079b3bb941207c42570bdf/kiwisolver-1.4.9-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:d0005b053977e7b43388ddec89fa567f43d4f6d5c2c0affe57de5ebf290dc552", size = 66605, upload-time = "2025-08-10T21:27:03.335Z" },
-    { url = "https://files.pythonhosted.org/packages/f7/69/00aafdb4e4509c2ca6064646cba9cd4b37933898f426756adb2cb92ebbed/kiwisolver-1.4.9-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:2635d352d67458b66fd0667c14cb1d4145e9560d503219034a18a87e971ce4f3", size = 64925, upload-time = "2025-08-10T21:27:04.339Z" },
-    { url = "https://files.pythonhosted.org/packages/43/dc/51acc6791aa14e5cb6d8a2e28cefb0dc2886d8862795449d021334c0df20/kiwisolver-1.4.9-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:767c23ad1c58c9e827b649a9ab7809fd5fd9db266a9cf02b0e926ddc2c680d58", size = 1472414, upload-time = "2025-08-10T21:27:05.437Z" },
-    { url = "https://files.pythonhosted.org/packages/3d/bb/93fa64a81db304ac8a246f834d5094fae4b13baf53c839d6bb6e81177129/kiwisolver-1.4.9-cp314-cp314-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:72d0eb9fba308b8311685c2268cf7d0a0639a6cd027d8128659f72bdd8a024b4", size = 1281272, upload-time = "2025-08-10T21:27:07.063Z" },
-    { url = "https://files.pythonhosted.org/packages/70/e6/6df102916960fb8d05069d4bd92d6d9a8202d5a3e2444494e7cd50f65b7a/kiwisolver-1.4.9-cp314-cp314-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:f68e4f3eeca8fb22cc3d731f9715a13b652795ef657a13df1ad0c7dc0e9731df", size = 1298578, upload-time = "2025-08-10T21:27:08.452Z" },
-    { url = "https://files.pythonhosted.org/packages/7c/47/e142aaa612f5343736b087864dbaebc53ea8831453fb47e7521fa8658f30/kiwisolver-1.4.9-cp314-cp314-manylinux_2_24_s390x.manylinux_2_28_s390x.whl", hash = "sha256:d84cd4061ae292d8ac367b2c3fa3aad11cb8625a95d135fe93f286f914f3f5a6", size = 1345607, upload-time = "2025-08-10T21:27:10.125Z" },
-    { url = "https://files.pythonhosted.org/packages/54/89/d641a746194a0f4d1a3670fb900d0dbaa786fb98341056814bc3f058fa52/kiwisolver-1.4.9-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:a60ea74330b91bd22a29638940d115df9dc00af5035a9a2a6ad9399ffb4ceca5", size = 2230150, upload-time = "2025-08-10T21:27:11.484Z" },
-    { url = "https://files.pythonhosted.org/packages/aa/6b/5ee1207198febdf16ac11f78c5ae40861b809cbe0e6d2a8d5b0b3044b199/kiwisolver-1.4.9-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:ce6a3a4e106cf35c2d9c4fa17c05ce0b180db622736845d4315519397a77beaf", size = 2325979, upload-time = "2025-08-10T21:27:12.917Z" },
-    { url = "https://files.pythonhosted.org/packages/fc/ff/b269eefd90f4ae14dcc74973d5a0f6d28d3b9bb1afd8c0340513afe6b39a/kiwisolver-1.4.9-cp314-cp314-musllinux_1_2_s390x.whl", hash = "sha256:77937e5e2a38a7b48eef0585114fe7930346993a88060d0bf886086d2aa49ef5", size = 2491456, upload-time = "2025-08-10T21:27:14.353Z" },
-    { url = "https://files.pythonhosted.org/packages/fc/d4/10303190bd4d30de547534601e259a4fbf014eed94aae3e5521129215086/kiwisolver-1.4.9-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:24c175051354f4a28c5d6a31c93906dc653e2bf234e8a4bbfb964892078898ce", size = 2294621, upload-time = "2025-08-10T21:27:15.808Z" },
-    { url = "https://files.pythonhosted.org/packages/28/e0/a9a90416fce5c0be25742729c2ea52105d62eda6c4be4d803c2a7be1fa50/kiwisolver-1.4.9-cp314-cp314-win_amd64.whl", hash = "sha256:0763515d4df10edf6d06a3c19734e2566368980d21ebec439f33f9eb936c07b7", size = 75417, upload-time = "2025-08-10T21:27:17.436Z" },
-    { url = "https://files.pythonhosted.org/packages/1f/10/6949958215b7a9a264299a7db195564e87900f709db9245e4ebdd3c70779/kiwisolver-1.4.9-cp314-cp314-win_arm64.whl", hash = "sha256:0e4e2bf29574a6a7b7f6cb5fa69293b9f96c928949ac4a53ba3f525dffb87f9c", size = 66582, upload-time = "2025-08-10T21:27:18.436Z" },
-    { url = "https://files.pythonhosted.org/packages/ec/79/60e53067903d3bc5469b369fe0dfc6b3482e2133e85dae9daa9527535991/kiwisolver-1.4.9-cp314-cp314t-macosx_10_13_universal2.whl", hash = "sha256:d976bbb382b202f71c67f77b0ac11244021cfa3f7dfd9e562eefcea2df711548", size = 126514, upload-time = "2025-08-10T21:27:19.465Z" },
-    { url = "https://files.pythonhosted.org/packages/25/d1/4843d3e8d46b072c12a38c97c57fab4608d36e13fe47d47ee96b4d61ba6f/kiwisolver-1.4.9-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:2489e4e5d7ef9a1c300a5e0196e43d9c739f066ef23270607d45aba368b91f2d", size = 67905, upload-time = "2025-08-10T21:27:20.51Z" },
-    { url = "https://files.pythonhosted.org/packages/8c/ae/29ffcbd239aea8b93108de1278271ae764dfc0d803a5693914975f200596/kiwisolver-1.4.9-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:e2ea9f7ab7fbf18fffb1b5434ce7c69a07582f7acc7717720f1d69f3e806f90c", size = 66399, upload-time = "2025-08-10T21:27:21.496Z" },
-    { url = "https://files.pythonhosted.org/packages/a1/ae/d7ba902aa604152c2ceba5d352d7b62106bedbccc8e95c3934d94472bfa3/kiwisolver-1.4.9-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:b34e51affded8faee0dfdb705416153819d8ea9250bbbf7ea1b249bdeb5f1122", size = 1582197, upload-time = "2025-08-10T21:27:22.604Z" },
-    { url = "https://files.pythonhosted.org/packages/f2/41/27c70d427eddb8bc7e4f16420a20fefc6f480312122a59a959fdfe0445ad/kiwisolver-1.4.9-cp314-cp314t-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d8aacd3d4b33b772542b2e01beb50187536967b514b00003bdda7589722d2a64", size = 1390125, upload-time = "2025-08-10T21:27:24.036Z" },
-    { url = "https://files.pythonhosted.org/packages/41/42/b3799a12bafc76d962ad69083f8b43b12bf4fe78b097b12e105d75c9b8f1/kiwisolver-1.4.9-cp314-cp314t-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:7cf974dd4e35fa315563ac99d6287a1024e4dc2077b8a7d7cd3d2fb65d283134", size = 1402612, upload-time = "2025-08-10T21:27:25.773Z" },
-    { url = "https://files.pythonhosted.org/packages/d2/b5/a210ea073ea1cfaca1bb5c55a62307d8252f531beb364e18aa1e0888b5a0/kiwisolver-1.4.9-cp314-cp314t-manylinux_2_24_s390x.manylinux_2_28_s390x.whl", hash = "sha256:85bd218b5ecfbee8c8a82e121802dcb519a86044c9c3b2e4aef02fa05c6da370", size = 1453990, upload-time = "2025-08-10T21:27:27.089Z" },
-    { url = "https://files.pythonhosted.org/packages/5f/ce/a829eb8c033e977d7ea03ed32fb3c1781b4fa0433fbadfff29e39c676f32/kiwisolver-1.4.9-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:0856e241c2d3df4efef7c04a1e46b1936b6120c9bcf36dd216e3acd84bc4fb21", size = 2331601, upload-time = "2025-08-10T21:27:29.343Z" },
-    { url = "https://files.pythonhosted.org/packages/e0/4b/b5e97eb142eb9cd0072dacfcdcd31b1c66dc7352b0f7c7255d339c0edf00/kiwisolver-1.4.9-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:9af39d6551f97d31a4deebeac6f45b156f9755ddc59c07b402c148f5dbb6482a", size = 2422041, upload-time = "2025-08-10T21:27:30.754Z" },
-    { url = "https://files.pythonhosted.org/packages/40/be/8eb4cd53e1b85ba4edc3a9321666f12b83113a178845593307a3e7891f44/kiwisolver-1.4.9-cp314-cp314t-musllinux_1_2_s390x.whl", hash = "sha256:bb4ae2b57fc1d8cbd1cf7b1d9913803681ffa903e7488012be5b76dedf49297f", size = 2594897, upload-time = "2025-08-10T21:27:32.803Z" },
-    { url = "https://files.pythonhosted.org/packages/99/dd/841e9a66c4715477ea0abc78da039832fbb09dac5c35c58dc4c41a407b8a/kiwisolver-1.4.9-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:aedff62918805fb62d43a4aa2ecd4482c380dc76cd31bd7c8878588a61bd0369", size = 2391835, upload-time = "2025-08-10T21:27:34.23Z" },
-    { url = "https://files.pythonhosted.org/packages/0c/28/4b2e5c47a0da96896fdfdb006340ade064afa1e63675d01ea5ac222b6d52/kiwisolver-1.4.9-cp314-cp314t-win_amd64.whl", hash = "sha256:1fa333e8b2ce4d9660f2cda9c0e1b6bafcfb2457a9d259faa82289e73ec24891", size = 79988, upload-time = "2025-08-10T21:27:35.587Z" },
-    { url = "https://files.pythonhosted.org/packages/80/be/3578e8afd18c88cdf9cb4cffde75a96d2be38c5a903f1ed0ceec061bd09e/kiwisolver-1.4.9-cp314-cp314t-win_arm64.whl", hash = "sha256:4a48a2ce79d65d363597ef7b567ce3d14d68783d2b2263d98db3d9477805ba32", size = 70260, upload-time = "2025-08-10T21:27:36.606Z" },
-]
-
 [[package]]
 name = "langchain"
 version = "1.2.3"
@@ -1388,60 +1198,6 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/be/2f/5108cb3ee4ba6501748c4908b908e55f42a5b66245b4cfe0c99326e1ef6e/marshmallow-3.26.2-py3-none-any.whl", hash = "sha256:013fa8a3c4c276c24d26d84ce934dc964e2aa794345a0f8c7e5a7191482c8a73", size = 50964, upload-time = "2025-12-22T06:53:51.801Z" },
 ]
 
-[[package]]
-name = "matplotlib"
-version = "3.10.8"
-source = { registry = "https://pypi.org/simple" }
-dependencies = [
-    { name = "contourpy" },
-    { name = "cycler" },
-    { name = "fonttools" },
-    { name = "kiwisolver" },
-    { name = "numpy" },
-    { name = "packaging" },
-    { name = "pillow" },
-    { name = "pyparsing" },
-    { name = "python-dateutil" },
-]
-sdist = { url = "https://files.pythonhosted.org/packages/8a/76/d3c6e3a13fe484ebe7718d14e269c9569c4eb0020a968a327acb3b9a8fe6/matplotlib-3.10.8.tar.gz", hash = "sha256:2299372c19d56bcd35cf05a2738308758d32b9eaed2371898d8f5bd33f084aa3", size = 34806269, upload-time = "2025-12-10T22:56:51.155Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/9e/67/f997cdcbb514012eb0d10cd2b4b332667997fb5ebe26b8d41d04962fa0e6/matplotlib-3.10.8-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:64fcc24778ca0404ce0cb7b6b77ae1f4c7231cdd60e6778f999ee05cbd581b9a", size = 8260453, upload-time = "2025-12-10T22:55:30.709Z" },
-    { url = "https://files.pythonhosted.org/packages/7e/65/07d5f5c7f7c994f12c768708bd2e17a4f01a2b0f44a1c9eccad872433e2e/matplotlib-3.10.8-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:b9a5ca4ac220a0cdd1ba6bcba3608547117d30468fefce49bb26f55c1a3d5c58", size = 8148321, upload-time = "2025-12-10T22:55:33.265Z" },
-    { url = "https://files.pythonhosted.org/packages/3e/f3/c5195b1ae57ef85339fd7285dfb603b22c8b4e79114bae5f4f0fcf688677/matplotlib-3.10.8-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:3ab4aabc72de4ff77b3ec33a6d78a68227bf1123465887f9905ba79184a1cc04", size = 8716944, upload-time = "2025-12-10T22:55:34.922Z" },
-    { url = "https://files.pythonhosted.org/packages/00/f9/7638f5cc82ec8a7aa005de48622eecc3ed7c9854b96ba15bd76b7fd27574/matplotlib-3.10.8-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:24d50994d8c5816ddc35411e50a86ab05f575e2530c02752e02538122613371f", size = 9550099, upload-time = "2025-12-10T22:55:36.789Z" },
-    { url = "https://files.pythonhosted.org/packages/57/61/78cd5920d35b29fd2a0fe894de8adf672ff52939d2e9b43cb83cd5ce1bc7/matplotlib-3.10.8-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:99eefd13c0dc3b3c1b4d561c1169e65fe47aab7b8158754d7c084088e2329466", size = 9613040, upload-time = "2025-12-10T22:55:38.715Z" },
-    { url = "https://files.pythonhosted.org/packages/30/4e/c10f171b6e2f44d9e3a2b96efa38b1677439d79c99357600a62cc1e9594e/matplotlib-3.10.8-cp312-cp312-win_amd64.whl", hash = "sha256:dd80ecb295460a5d9d260df63c43f4afbdd832d725a531f008dad1664f458adf", size = 8142717, upload-time = "2025-12-10T22:55:41.103Z" },
-    { url = "https://files.pythonhosted.org/packages/f1/76/934db220026b5fef85f45d51a738b91dea7d70207581063cd9bd8fafcf74/matplotlib-3.10.8-cp312-cp312-win_arm64.whl", hash = "sha256:3c624e43ed56313651bc18a47f838b60d7b8032ed348911c54906b130b20071b", size = 8012751, upload-time = "2025-12-10T22:55:42.684Z" },
-    { url = "https://files.pythonhosted.org/packages/3d/b9/15fd5541ef4f5b9a17eefd379356cf12175fe577424e7b1d80676516031a/matplotlib-3.10.8-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:3f2e409836d7f5ac2f1c013110a4d50b9f7edc26328c108915f9075d7d7a91b6", size = 8261076, upload-time = "2025-12-10T22:55:44.648Z" },
-    { url = "https://files.pythonhosted.org/packages/8d/a0/2ba3473c1b66b9c74dc7107c67e9008cb1782edbe896d4c899d39ae9cf78/matplotlib-3.10.8-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:56271f3dac49a88d7fca5060f004d9d22b865f743a12a23b1e937a0be4818ee1", size = 8148794, upload-time = "2025-12-10T22:55:46.252Z" },
-    { url = "https://files.pythonhosted.org/packages/75/97/a471f1c3eb1fd6f6c24a31a5858f443891d5127e63a7788678d14e249aea/matplotlib-3.10.8-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:a0a7f52498f72f13d4a25ea70f35f4cb60642b466cbb0a9be951b5bc3f45a486", size = 8718474, upload-time = "2025-12-10T22:55:47.864Z" },
-    { url = "https://files.pythonhosted.org/packages/01/be/cd478f4b66f48256f42927d0acbcd63a26a893136456cd079c0cc24fbabf/matplotlib-3.10.8-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:646d95230efb9ca614a7a594d4fcacde0ac61d25e37dd51710b36477594963ce", size = 9549637, upload-time = "2025-12-10T22:55:50.048Z" },
-    { url = "https://files.pythonhosted.org/packages/5d/7c/8dc289776eae5109e268c4fb92baf870678dc048a25d4ac903683b86d5bf/matplotlib-3.10.8-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:f89c151aab2e2e23cb3fe0acad1e8b82841fd265379c4cecd0f3fcb34c15e0f6", size = 9613678, upload-time = "2025-12-10T22:55:52.21Z" },
-    { url = "https://files.pythonhosted.org/packages/64/40/37612487cc8a437d4dd261b32ca21fe2d79510fe74af74e1f42becb1bdb8/matplotlib-3.10.8-cp313-cp313-win_amd64.whl", hash = "sha256:e8ea3e2d4066083e264e75c829078f9e149fa119d27e19acd503de65e0b13149", size = 8142686, upload-time = "2025-12-10T22:55:54.253Z" },
-    { url = "https://files.pythonhosted.org/packages/66/52/8d8a8730e968185514680c2a6625943f70269509c3dcfc0dcf7d75928cb8/matplotlib-3.10.8-cp313-cp313-win_arm64.whl", hash = "sha256:c108a1d6fa78a50646029cb6d49808ff0fc1330fda87fa6f6250c6b5369b6645", size = 8012917, upload-time = "2025-12-10T22:55:56.268Z" },
-    { url = "https://files.pythonhosted.org/packages/b5/27/51fe26e1062f298af5ef66343d8ef460e090a27fea73036c76c35821df04/matplotlib-3.10.8-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:ad3d9833a64cf48cc4300f2b406c3d0f4f4724a91c0bd5640678a6ba7c102077", size = 8305679, upload-time = "2025-12-10T22:55:57.856Z" },
-    { url = "https://files.pythonhosted.org/packages/2c/1e/4de865bc591ac8e3062e835f42dd7fe7a93168d519557837f0e37513f629/matplotlib-3.10.8-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:eb3823f11823deade26ce3b9f40dcb4a213da7a670013929f31d5f5ed1055b22", size = 8198336, upload-time = "2025-12-10T22:55:59.371Z" },
-    { url = "https://files.pythonhosted.org/packages/c6/cb/2f7b6e75fb4dce87ef91f60cac4f6e34f4c145ab036a22318ec837971300/matplotlib-3.10.8-cp313-cp313t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:d9050fee89a89ed57b4fb2c1bfac9a3d0c57a0d55aed95949eedbc42070fea39", size = 8731653, upload-time = "2025-12-10T22:56:01.032Z" },
-    { url = "https://files.pythonhosted.org/packages/46/b3/bd9c57d6ba670a37ab31fb87ec3e8691b947134b201f881665b28cc039ff/matplotlib-3.10.8-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b44d07310e404ba95f8c25aa5536f154c0a8ec473303535949e52eb71d0a1565", size = 9561356, upload-time = "2025-12-10T22:56:02.95Z" },
-    { url = "https://files.pythonhosted.org/packages/c0/3d/8b94a481456dfc9dfe6e39e93b5ab376e50998cddfd23f4ae3b431708f16/matplotlib-3.10.8-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:0a33deb84c15ede243aead39f77e990469fff93ad1521163305095b77b72ce4a", size = 9614000, upload-time = "2025-12-10T22:56:05.411Z" },
-    { url = "https://files.pythonhosted.org/packages/bd/cd/bc06149fe5585ba800b189a6a654a75f1f127e8aab02fd2be10df7fa500c/matplotlib-3.10.8-cp313-cp313t-win_amd64.whl", hash = "sha256:3a48a78d2786784cc2413e57397981fb45c79e968d99656706018d6e62e57958", size = 8220043, upload-time = "2025-12-10T22:56:07.551Z" },
-    { url = "https://files.pythonhosted.org/packages/e3/de/b22cf255abec916562cc04eef457c13e58a1990048de0c0c3604d082355e/matplotlib-3.10.8-cp313-cp313t-win_arm64.whl", hash = "sha256:15d30132718972c2c074cd14638c7f4592bd98719e2308bccea40e0538bc0cb5", size = 8062075, upload-time = "2025-12-10T22:56:09.178Z" },
-    { url = "https://files.pythonhosted.org/packages/3c/43/9c0ff7a2f11615e516c3b058e1e6e8f9614ddeca53faca06da267c48345d/matplotlib-3.10.8-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:b53285e65d4fa4c86399979e956235deb900be5baa7fc1218ea67fbfaeaadd6f", size = 8262481, upload-time = "2025-12-10T22:56:10.885Z" },
-    { url = "https://files.pythonhosted.org/packages/6f/ca/e8ae28649fcdf039fda5ef554b40a95f50592a3c47e6f7270c9561c12b07/matplotlib-3.10.8-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:32f8dce744be5569bebe789e46727946041199030db8aeb2954d26013a0eb26b", size = 8151473, upload-time = "2025-12-10T22:56:12.377Z" },
-    { url = "https://files.pythonhosted.org/packages/f1/6f/009d129ae70b75e88cbe7e503a12a4c0670e08ed748a902c2568909e9eb5/matplotlib-3.10.8-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4cf267add95b1c88300d96ca837833d4112756045364f5c734a2276038dae27d", size = 9553896, upload-time = "2025-12-10T22:56:14.432Z" },
-    { url = "https://files.pythonhosted.org/packages/f5/26/4221a741eb97967bc1fd5e4c52b9aa5a91b2f4ec05b59f6def4d820f9df9/matplotlib-3.10.8-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:2cf5bd12cecf46908f286d7838b2abc6c91cda506c0445b8223a7c19a00df008", size = 9824193, upload-time = "2025-12-10T22:56:16.29Z" },
-    { url = "https://files.pythonhosted.org/packages/1f/f3/3abf75f38605772cf48a9daf5821cd4f563472f38b4b828c6fba6fa6d06e/matplotlib-3.10.8-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:41703cc95688f2516b480f7f339d8851a6035f18e100ee6a32bc0b8536a12a9c", size = 9615444, upload-time = "2025-12-10T22:56:18.155Z" },
-    { url = "https://files.pythonhosted.org/packages/93/a5/de89ac80f10b8dc615807ee1133cd99ac74082581196d4d9590bea10690d/matplotlib-3.10.8-cp314-cp314-win_amd64.whl", hash = "sha256:83d282364ea9f3e52363da262ce32a09dfe241e4080dcedda3c0db059d3c1f11", size = 8272719, upload-time = "2025-12-10T22:56:20.366Z" },
-    { url = "https://files.pythonhosted.org/packages/69/ce/b006495c19ccc0a137b48083168a37bd056392dee02f87dba0472f2797fe/matplotlib-3.10.8-cp314-cp314-win_arm64.whl", hash = "sha256:2c1998e92cd5999e295a731bcb2911c75f597d937341f3030cc24ef2733d78a8", size = 8144205, upload-time = "2025-12-10T22:56:22.239Z" },
-    { url = "https://files.pythonhosted.org/packages/68/d9/b31116a3a855bd313c6fcdb7226926d59b041f26061c6c5b1be66a08c826/matplotlib-3.10.8-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:b5a2b97dbdc7d4f353ebf343744f1d1f1cca8aa8bfddb4262fcf4306c3761d50", size = 8305785, upload-time = "2025-12-10T22:56:24.218Z" },
-    { url = "https://files.pythonhosted.org/packages/1e/90/6effe8103f0272685767ba5f094f453784057072f49b393e3ea178fe70a5/matplotlib-3.10.8-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:3f5c3e4da343bba819f0234186b9004faba952cc420fbc522dc4e103c1985908", size = 8198361, upload-time = "2025-12-10T22:56:26.787Z" },
-    { url = "https://files.pythonhosted.org/packages/d7/65/a73188711bea603615fc0baecca1061429ac16940e2385433cc778a9d8e7/matplotlib-3.10.8-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5f62550b9a30afde8c1c3ae450e5eb547d579dd69b25c2fc7a1c67f934c1717a", size = 9561357, upload-time = "2025-12-10T22:56:28.953Z" },
-    { url = "https://files.pythonhosted.org/packages/f4/3d/b5c5d5d5be8ce63292567f0e2c43dde9953d3ed86ac2de0a72e93c8f07a1/matplotlib-3.10.8-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:495672de149445ec1b772ff2c9ede9b769e3cb4f0d0aa7fa730d7f59e2d4e1c1", size = 9823610, upload-time = "2025-12-10T22:56:31.455Z" },
-    { url = "https://files.pythonhosted.org/packages/4d/4b/e7beb6bbd49f6bae727a12b270a2654d13c397576d25bd6786e47033300f/matplotlib-3.10.8-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:595ba4d8fe983b88f0eec8c26a241e16d6376fe1979086232f481f8f3f67494c", size = 9614011, upload-time = "2025-12-10T22:56:33.85Z" },
-    { url = "https://files.pythonhosted.org/packages/7c/e6/76f2813d31f032e65f6f797e3f2f6e4aab95b65015924b1c51370395c28a/matplotlib-3.10.8-cp314-cp314t-win_amd64.whl", hash = "sha256:25d380fe8b1dc32cf8f0b1b448470a77afb195438bafdf1d858bfb876f3edf7b", size = 8362801, upload-time = "2025-12-10T22:56:36.107Z" },
-    { url = "https://files.pythonhosted.org/packages/5d/49/d651878698a0b67f23aa28e17f45a6d6dd3d3f933fa29087fa4ce5947b5a/matplotlib-3.10.8-cp314-cp314t-win_arm64.whl", hash = "sha256:113bb52413ea508ce954a02c10ffd0d565f9c3bc7f2eddc27dfe1731e71c7b5f", size = 8192560, upload-time = "2025-12-10T22:56:38.008Z" },
-]
-
 [[package]]
 name = "multidict"
 version = "6.7.0"
@@ -2193,15 +1949,6 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/c6/96/fd59c1532891762ea4815e73956c532053d5e26d56969e1e5d1e4ca4b207/pymupdf-1.26.5-cp39-abi3-win_amd64.whl", hash = "sha256:39a6fb58182b27b51ea8150a0cd2e4ee7e0cf71e9d6723978f28699b42ee61ae", size = 18747258, upload-time = "2025-10-10T14:01:37.346Z" },
 ]
 
-[[package]]
-name = "pyparsing"
-version = "3.3.1"
-source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/33/c1/1d9de9aeaa1b89b0186e5fe23294ff6517fce1bc69149185577cd31016b2/pyparsing-3.3.1.tar.gz", hash = "sha256:47fad0f17ac1e2cad3de3b458570fbc9b03560aa029ed5e16ee5554da9a2251c", size = 1550512, upload-time = "2025-12-23T03:14:04.391Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/8b/40/2614036cdd416452f5bf98ec037f38a1afb17f327cb8e6b652d4729e0af8/pyparsing-3.3.1-py3-none-any.whl", hash = "sha256:023b5e7e5520ad96642e2c6db4cb683d3970bd640cdf7115049a6e9c3682df82", size = 121793, upload-time = "2025-12-23T03:14:02.103Z" },
-]
-
 [[package]]
 name = "pypdfium2"
 version = "5.3.0"