## Word Document Processing

In [1]:
import os
from langchain_community.document_loaders import Docx2txtLoader, UnstructuredWordDocumentLoader

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Method 1: Using DocxtxtLoader
print("Using DocxtxtLoader")
try:
    docx_loader = Docx2txtLoader("data/market_analysis_report.docx")
    docs = docx_loader.load()
    print(f"✅ Loaded {len(docs)} documents using Docx2txtLoader.")
    print(f"Content Preview: {docs[0].page_content[:200]}...")
    print(f"Metadata: {docs[0].metadata}")

except Exception as e:
    print(f"Error loading DOCX with Docx2txtLoader: {e}")

Using DocxtxtLoader
✅ Loaded 1 documents using Docx2txtLoader.
Content Preview: Market Analysis Report - 2025

This report provides an overview of the market trends observed in the first quarter of 2025. The technology sector continues to show robust growth, particularly in AI-dr...
Metadata: {'source': 'data/market_analysis_report.docx'}


In [3]:
# Method 2: Using UnstructuredWordDocumentLoader
print("\n Using UnstructuredWordDocumentLoader")
try:
    unstructured_loader = UnstructuredWordDocumentLoader("data/market_analysis_report.docx", mode="elements")
    unstructured_docs = unstructured_loader.load()
    print(f"✅ Loaded {len(unstructured_docs)} elements using UnstructuredWordDocumentLoader.")
    for i, doc in enumerate(unstructured_docs):
        print(f"\n--- Element {i+1} ---")
        print(f"Type: {doc.metadata.get('category', 'unknown')}")
        print(f"Content Preview: {doc.page_content[:100]}...")

except Exception as e:
    print(f"Error loading DOCX with UnstructuredWordDocumentLoader: {e}")


 Using UnstructuredWordDocumentLoader
✅ Loaded 5 elements using UnstructuredWordDocumentLoader.

--- Element 1 ---
Type: Title
Content Preview: Market Analysis Report - 2025...

--- Element 2 ---
Type: NarrativeText
Content Preview: This report provides an overview of the market trends observed in the first quarter of 2025. The tec...

--- Element 3 ---
Type: NarrativeText
Content Preview: The following table summarizes the key financial indicators of major tech companies during Q1 2025....

--- Element 4 ---
Type: Table
Content Preview: Company Revenue (in $M) Profit Margin (%) Market Share (%) TechNova 1200 15 10 InnovateX 950 12 8 Cl...

--- Element 5 ---
Type: NarrativeText
Content Preview: In conclusion, the market outlook remains positive with continued opportunities in AI, data analytic...
