# Word File Field Extraction

This notebook contains how to extract structured data from a `.docx` Microsoft Word file using:

- `python-docx` for reading Word documents
- `pandas` for saving structured data to Excel

It assumes each line in the Word file is in the format:
`<Field Name>: <Value>`

The notebook reads all paragraph lines, splits them by colon, and stores the key-value pairs in a dictionary.

Output:

- Extracted fields saved into an Excel file: `word_extracted_output.xlsx`


In [1]:
from docx import Document
import pandas as pd

In [2]:
# Load the Word file
doc = Document("sample_policy.docx")

In [3]:
# Extract paragraph text and split by colon
data = {}
for para in doc.paragraphs:
    line = para.text.strip()
    if ":" in line:
        key, value = line.split(":", 1)
        data[key.strip()] = value.strip()

# Save to Excel
df = pd.DataFrame([data])
df.to_excel("word_extracted_output.xlsx", index=False)

print("Word fields extracted and saved to Excel.")


Word fields extracted and saved to Excel.
