Description
Add a DocxLoader that reads .docx files using python-docx and returns a Document per file (concatenating all paragraphs).
Motivation
Word documents are ubiquitous in business workflows. Supporting DOCX opens the framework to HR, legal, and operational use-cases.
Acceptance criteria
Files to touch
ragframework/document/loaders.py — add DocxLoader
ragframework/document/__init__.py — export it
pyproject.toml — add python-docx to [docx] extra
tests/test_document/test_loaders.py — add tests
Resources
- python-docx docs
- Existing loaders (
TextFileLoader, MarkdownLoader) in ragframework/document/loaders.py as reference
Description
Add a
DocxLoaderthat reads.docxfiles usingpython-docxand returns aDocumentper file (concatenating all paragraphs).Motivation
Word documents are ubiquitous in business workflows. Supporting DOCX opens the framework to HR, legal, and operational use-cases.
Acceptance criteria
DocxLoaderinragframework/document/loaders.pyDocumentLoaderfromragframework/base.pypython-docx— add it to the[docx]extra inpyproject.tomlLoaderErroron failure; import guarded with helpful messagetests/test_document/test_loaders.pyragframework/document/__init__.pyCHANGELOG.mdupdated under[Unreleased]Files to touch
ragframework/document/loaders.py— addDocxLoaderragframework/document/__init__.py— export itpyproject.toml— addpython-docxto[docx]extratests/test_document/test_loaders.py— add testsResources
TextFileLoader,MarkdownLoader) inragframework/document/loaders.pyas reference