In [1]:
import imp
from docx import Document
from docx.shared import Pt
from docx.enum.text import WD_ALIGN_PARAGRAPH

# Working with Word documents in Python

Output for the RID project needs to be presented in a series of Word tables. Examples from previous years can be found here:

K:\Avdeling\Vass\316_Miljøinformatikk\Prosjekter\RID\2016\utsendt

Tore has lots of code in Visual Studio for generating these tables, but I'm no expert in VB.NET and getting the extensions working in Visual Studio sounds time-consuming. Surprisingly, it seems relative straightforward to manipulate Word files using Python's [py-docx](https://python-docx.readthedocs.io/en/latest/) module. The code below illustrates some key features, which I think will make it fairly easy to generate the Word tables without Visual Studio if necessary.

The first step is to update one set of Tore's original tables to the latest Word format (`.docx`). This is because py-docx can only work with post-2013 Word files. Some test files are saved locally here:

C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet\Results\Word_Tables\2016

Having done this, I can use Tore's Word documents from last year as templates and modify the values as necessary using Python.

In [2]:
# Try "TABLE3" as an example
in_docx = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\Results\Word_Tables\2016\RID_2015_PARTB_TABLE3_07102016.docx')

out_docx = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\Results\Word_Tables\2016\table3_test.docx')

# Open the document
doc = Document(in_docx)

# List the tables
doc.tables

[<docx.table.Table at 0x4240550>,
 <docx.table.Table at 0x4240588>,
 <docx.table.Table at 0x42405c0>,
 <docx.table.Table at 0x42405f8>,
 <docx.table.Table at 0x4240630>]

So this Word document contains 5 tables. Let's try modifying the first one.

In [3]:
# Get the first table
tab = doc.tables[0]

# Extract text to index rows
row_dict = {}
for idx, cell in enumerate(tab.column_cells(0)):
    for paragraph in cell.paragraphs:
        row_dict[paragraph.text] = idx 

# Extract text to index cols
col_dict = {}
for idx, cell in enumerate(tab.row_cells(1)):
    for paragraph in cell.paragraphs:
        col_dict[paragraph.text] = idx 

In [4]:
# Change the value for PO4-P and "Fish farming" to -9999
# Get row and col indices
col = col_dict['PO4-P']
row = row_dict['Fish Farming']

# Get cell
cell = tab.cell(row, col)

# Modify value
cell.text = '-9999'

# Set font and size
run = tab.cell(row, col).paragraphs[0].runs[0]
run.font.size = Pt(8)
run.font.name = 'Times New Roman'

# Align right
p = tab.cell(row, col).paragraphs[0]
p.alignment = WD_ALIGN_PARAGRAPH.RIGHT

# Save new file
doc.save(out_docx)

All this can be wrapped in a function to make it easy to update cells in the summary tables. The new function is called `update_word_table`.

In [5]:
# Import custom RID functions
rid_func_path = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
                 r'\Python\rid\useful_rid_code.py')

rid = imp.load_source('useful_rid_code', rid_func_path)

In [6]:
# Modify a value in 2nd table of file "TABLE1"
in_docx = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\Results\Word_Tables\2016\RID_2015_PARTB_TABLE1_07102016.docx')

out_docx = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
            r'\Results\Word_Tables\2016\table1_test.docx')

# Open the document
doc = Document(in_docx)

# Update the table
rid.update_word_table(doc, '-9999', tab_id=1,
                      row='Maximum', col='TOC',
                      row_idx=1, col_idx=0)

# Save new file
doc.save(out_docx)

In [7]:
# Modify a value in "TABLE2"
in_docx = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\Results\Word_Tables\2016\RID_2015_PARTB_TABLE2_07102016.docx')

out_docx = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\Results\Word_Tables\2016\table2_test.docx')

# Open the document
doc = Document(in_docx)

# Update the table
rid.update_word_table(doc, '-9999', tab_id=0,
                      row='Saudaelva', col='SiO2',
                      row_idx=0, col_idx=0)

# Save new file
doc.save(out_docx)

In [8]:
# Modify a value in the 3rd table of file "TABLE3"
in_docx = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\Results\Word_Tables\2016\RID_2015_PARTB_TABLE3_07102016.docx')

out_docx = (r'C:\Data\James_Work\Staff\Oeyvind_K\Elveovervakingsprogrammet'
           r'\Results\Word_Tables\2016\table3_test.docx')

# Open the document
doc = Document(in_docx)

# Update the table
rid.update_word_table(doc, '-9999', tab_id=2,
                      row='Fish Farming', col='PO4-P',
                      row_idx=1, col_idx=0)

# Save new file
doc.save(out_docx)