Skip to content

set_cell loses font/size formatting in DOCX and PPTX adapters #1

@SonAIengine

Description

@SonAIengine

Summary

DocxAdapter.set_cell() and PptxAdapter.set_cell() use cell.text = value to replace cell content. In both python-docx and python-pptx, this setter deletes all existing runs and creates a single new run with default font/size, so any formatting on the original cell (font family, size, bold, color, etc.) is lost.

HwpxAdapter is not affected — it already uses paragraphs[0].text = value which preserves run structure.

Affected code

document_adapter/docx_adapter.py

def set_cell(self, table_index: int, row: int, col: int, value: str) -> str:
    cell = self._doc.tables[table_index].rows[row].cells[col]
    old = cell.text
    cell.text = value   # ← drops all runs, resets formatting
    return old

document_adapter/pptx_adapter.py

def set_cell(self, table_index: int, row: int, col: int, value: str) -> str:
    table = self._get_table(table_index)
    cell = table.cell(row, col)
    old = cell.text
    cell.text = value   # ← same issue via python-pptx
    return old

DocxAdapter.append_row() has the same issue — it calls new_row.cells[i].text = v on each cell, so newly added rows ignore any run-level template formatting.

Reproduction

from docx import Document
from docx.shared import Pt

doc = Document()
table = doc.add_table(rows=1, cols=1)
cell = table.cell(0, 0)
run = cell.paragraphs[0].add_run('original')
run.font.name = 'Malgun Gothic'
run.font.size = Pt(18)
run.bold = True

# simulating our adapter
cell.text = 'replaced'

new_run = cell.paragraphs[0].runs[0]
print(new_run.font.name, new_run.font.size, new_run.bold)
# → None None None   (formatting gone)

Same behavior in python-pptx with shape.table.cell(r, c).text = value.

Impact

Any downstream application that cares about preserving the visual style of a template (fonts for Korean/CJK text, header bold, numeric alignment, brand colors, etc.) gets a degraded result after set_cell or append_row. This is especially visible in real office templates from .docx / .pptx where the original cell runs carry non-default fonts.

Currently in xgen-workflow we work around this with a monkey patch on the installed document_adapter package; the fix belongs upstream.

Proposed fix

Mirror HwpxAdapter.set_cell's run-preserving strategy.

DOCX

def set_cell(self, table_index: int, row: int, col: int, value: str) -> str:
    cell = self._doc.tables[table_index].rows[row].cells[col]
    old = cell.text

    paragraphs = cell.paragraphs
    if not paragraphs or not paragraphs[0].runs:
        # fall back only when the cell is truly empty
        cell.text = value
        return old

    first_para = paragraphs[0]
    first_run = first_para.runs[0]
    first_run.text = value
    for run in first_para.runs[1:]:
        run.text = ""
    for para in paragraphs[1:]:
        for run in para.runs:
            run.text = ""
    return old

PPTX

Same strategy against cell.text_frame.paragraphs[0].runs[0].

append_row

Replace per-cell cell.text = v with the same run-preserving helper, so newly-added rows inherit formatting from the template row when python-docx copies it.

Acceptance criteria

  • DocxAdapter.set_cell preserves font name/size/bold/italic/color of the first run when the cell had existing runs
  • PptxAdapter.set_cell preserves the same run-level attributes
  • DocxAdapter.append_row does not drop formatting on newly added rows
  • Empty-cell behavior unchanged (falls back to default run)
  • New smoke test in tests/test_smoke.py asserting preserved font.size / font.name after set_cell

Related

  • HwpxAdapter.set_cell — already uses the correct pattern, keep as the reference implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingformattingFont/run-level formatting concerns

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions