Skip to content

feat(slides): Slide.duplicate — slide CRUD Phase 2 (closes upstream #132 in fork)#34

Merged
MHoroszowski merged 1 commit into
feature/slide-crud-phase1from
feature/slide-crud-phase2
May 7, 2026
Merged

feat(slides): Slide.duplicate — slide CRUD Phase 2 (closes upstream #132 in fork)#34
MHoroszowski merged 1 commit into
feature/slide-crud-phase1from
feature/slide-crud-phase2

Conversation

@MHoroszowski
Copy link
Copy Markdown
Owner

Summary

Phase 2 of slide CRUD epic (#11)Slide.duplicate / Slides.duplicate. Closes the 12-year-old most-commented upstream feature request scanny/python-pptx#132 in this fork.

Stacked on top of #33. This PR's base branch is feature/slide-crud-phase1. After #33 merges to master, GitHub will auto-update this PR's diff to show only the Phase 2 changes against master, and merging then becomes a straight fast-forward.

API

  • Slides.duplicate(slide, index=None) -> Slide — collection-level operation. index=None defaults to source_index + 1 (immediately after the source — duplicates are most often siblings, not tail-appends). Raises ValueError if slide is not a member of the collection; IndexError if index is out of range.
  • Slide.duplicate(index=None) -> Slide — convenience alias delegating to Slides.duplicate(self, index). Mirrors Slide.delete()Slides.remove(self) from Phase 1.

Design — three load-bearing decisions

1. Namespace-filtered rId substitution (not local-name enumeration)

After copy.deepcopy of <p:sld>, the duplicate still references the source's rIds. The remap pass walks every descendant via lxml.iter() and substitutes any attribute whose name is in the OOXML relationships namespace (http://schemas.openxmlformats.org/officeDocument/2006/relationships). One pass catches r:id, r:embed, r:link, r:pict, r:href — and any future attribute Microsoft adds to the same namespace.

Filtering by local-name (the approach in the widely-cited Stack Overflow recipe) silently misses r:link on linked images — the most common real-world failure mode of community recipes.

A defensive test (DescribeSlideDuplicate_RIdRemap::it_resolves_every_rId_reference_in_the_duplicate_xml) asserts that every rels-namespace attribute on the duplicate resolves to a rel on the new slide part. Cheap, catches every "I missed an attribute" regression.

2. Embedded xlsx is binary, not XML

ChartPart deep-copy uses copy.deepcopy on the chart _element, but the embedded Microsoft_Excel_Worksheet1.xlsx (reached via RT.PACKAGE) is itself an OPC package — opaque bytes from python-pptx's perspective. It must be blob-copied, not XML-deepcopied. The chart <c:numCache> values mirror what's in the xlsx; copy chart XML as-is, copy the xlsx blob as-is, remap the rId between them — done.

The helper _duplicate_blob_part encapsulates this for any binary part (embeddedpackage, oleObject).

3. Notes-slide back-rel rewire (community gotcha)

A NotesSlidePart back-references its parent slide via RT.SLIDE. If you blindly carry that rel through, the duplicate's notes-slide back-references the source slide, not the duplicate — PowerPoint accepts the file but Save-Notes-As-PDF and outline-view actions misbehave. This is the root cause of upstream issue scanny/python-pptx#961.

The implementation drops RT.NOTES_SLIDE from the slide-rels walk and creates a fresh NotesSlidePart at the Slides.duplicate level (helper duplicate_notes_slide_for) with the back-rel rewired to point at the new slide.

What's shared vs deep-copied

Reltype Treatment Why
RT.IMAGE, RT.MEDIA, RT.VIDEO, RT.AUDIO Shared via package-level dedup Same picture on two slides was always one part — duplicate respects that
RT.SLIDE_LAYOUT, RT.SLIDE_MASTER, RT.THEME Shared Layout/master are owned by the deck, not the slide
RT.HYPERLINK (external) Shared target_ref External URLs are just strings
RT.CHART Deep-copied Chart XML + embedded xlsx are slide-private payload
RT.OLE_OBJECT, RT.PACKAGE Deep-copied (blob-copy) Embedded files are slide-private payload
RT.NOTES_SLIDE Deep-copied with back-rel rewire Per gotcha scanny#961
RT.COMMENTS, RT.COMMENT_AUTHORS Dropped Out of scope for Phase 2; explicit test documents the drop

Out of scope (deferred per epic split)

  • Presentation.append_from(other_pres, slide_indexes=None) — Phase 3 (cross-deck rels resolution)
  • Presentation.sections (<p14:sectionLst>) — Phase 4
  • Append-mode open — overlaps with Performance & Scale epic
  • Comments parts (RT.COMMENTS / RT.COMMENT_AUTHORS) — explicitly dropped by _DUP_DROP_RELTYPES_SLIDE filter; revisit when the comments API surface lands

Tests

  • 39 new pytest cases (35 declared functions; 4 added by parametrize) in tests/test_slide_duplicate.py across 7 describe classes:
    • DescribeSlides_Duplicate — API surface, signature, raises, index variants
    • DescribeSlide_Duplicate — convenience alias delegation
    • DescribeSlideDuplicate_PartGraph — new partname, new rId, layout shared, modification isolation, XML equivalence
    • DescribeSlideDuplicate_ImageDedup — package-level dedup invariant + round-trip with picture
    • DescribeSlideDuplicate_NotesSlide — own copy, content carried, edits isolated, no-notes case, back-rel rewired, round-trip
    • DescribeSlideDuplicate_RIdRemap — defensive XPath check on every rels-namespace attribute
    • DescribeSlideDuplicate_RoundTrip — open → duplicate → save → reopen across all paths
  • 4 new behave scenarios in features/sld-duplicate.feature — default-index, explicit-index, alias, IndexError.
$ python3 -m pytest tests/ -q
3125 passed in 4.02s

$ python3 -m ruff format src tests
203 files left unchanged

$ python3 -m ruff check src tests
All checks passed!

$ python3 -m behave features/ --no-color
56 features passed, 0 failed, 0 skipped
990 scenarios passed, 0 failed, 0 skipped

UAT

uat_slide_duplicate.py at the repo root builds a 3-slide deck with a textbox, a picture, and speaker notes; exercises both Slides.duplicate (default + explicit index) and Slide.duplicate (alias); round-trips through save/reopen; prints the structural summary. Output shows image_part_count = 2 before AND after duplicate (dedup invariant holds), slide_part_count = 6 (3 originals + 3 duplicates), notes correctly carried only to duplicates of slides that originally had them. Maintainer signoff received.

Known coverage gaps (deferred)

ISC-19 (chart copy live probe) and ISC-20 (OLE-object copy live probe) are marked [DEFERRED-VERIFY] in the working ISA — the code paths exist (_duplicate_chart_part, _duplicate_blob_part) and are exercised by the rels walk, but no Phase 2 fixture seeds a slide with a chart shape or an OLE object. Follow-up: add tests/test_slide_duplicate_chart.py when a fixture exercising add_chart(...) ships, or as part of Phase 3 / a dedicated 2.5 PR.

Closes the duplicate sub-task of #11. Refs #11.

…e 2)

Closes the Phase 2 sub-task of the slide-CRUD epic (#11) and the
12-year-old most-commented upstream feature request scanny#132
in this fork.

API
---

- `Slides.duplicate(slide, index=None)` — collection-level operation,
  returns a `Slide`. `index=None` defaults to `source_index + 1`
  (immediately after the source). Raises `ValueError` if `slide` is
  not a member, `IndexError` if `index` is out of range.
- `Slide.duplicate(index=None)` — convenience alias delegating to
  `Slides.duplicate(self, index)`. Mirrors `Slide.delete()` →
  `Slides.remove(self)` from Phase 1.

Design
------

The implementation walks the source slide's rels and decides per
reltype whether to share the target part (image, media, slide-layout,
slide-master, theme — package-level dedup) or deep-copy it (chart,
OLE-object, embedded-package, notes-slide). For chart parts the chart
XML is `copy.deepcopy`'d but the embedded xlsx workbook is *blob-
copied* — the workbook is the chart's data and `<c:numCache>` values
in the chart XML mirror it; deepcopy of XML would not capture the
binary payload.

After deepcopy, the duplicated `<p:sld>` still references the source's
`rId`s. We walk every descendant element via `lxml.iter()` and
substitute any attribute whose name is in the OOXML relationships
namespace — catches `r:id`, `r:embed`, `r:link`, `r:pict`, `r:href`
in one pass without enumerating local-names (the local-name approach
silently misses `r:link` on linked images, which is the most common
real-world failure mode of community recipes).

Notes-slide handling addresses upstream gotcha scanny#961: a NotesSlidePart
back-references its parent slide via `RT.SLIDE`. The duplicate's
notes-slide MUST point at the new slide, not the source. The new
helper `duplicate_notes_slide_for(src, new)` rewires the back-rel
and is invoked AFTER the new slide part is registered in presentation
rels.

Comments parts (`RT.COMMENTS` / `RT.COMMENT_AUTHORS`) are dropped on
duplicate by an explicit reltype filter — comments support is out of
scope for Phase 2 of #11. A test documents the drop so the policy
doesn't silently change.

Test coverage
-------------

- `tests/test_slide_duplicate.py` — 39 new tests across 7 describe
  classes covering the API surface (existence, signature, raises,
  index variants, alias delegation), the part-graph invariants (new
  unique partname / rId / sldId, layout-part shared with source,
  modification isolation, image-dedup invariant unchanged), the
  notes-slide path (own copy, back-ref rewired, isolation), the
  defensive XPath `r:*` check (no unmapped rIds), and round-trip
  through save/reopen.
- `features/sld-duplicate.feature` + 7 new step impls — 4 acceptance
  scenarios for default-index, explicit-index, alias, and IndexError
  paths.

Verification
------------

```
$ python3 -m pytest tests/ -q
3125 passed in 4.02s
$ python3 -m ruff format src tests
203 files left unchanged
$ python3 -m ruff check src tests
All checks passed!
$ python3 -m behave features/ --no-color
990 scenarios passed, 0 failed, 0 skipped
```

UAT script `uat_slide_duplicate.py` (untracked at repo root per
CLAUDE.md §6): builds a 3-slide deck with a textbox, a picture, and
speaker notes; exercises both `Slides.duplicate` and
`Slide.duplicate`; round-trips; prints structural summary
(image_part_count = 2 before AND after — dedup invariant holds).

Out of scope (deferred per epic split):
- `Presentation.append_from` — Phase 3 (cross-deck rels).
- `Presentation.sections` — Phase 4 (`<p14:sectionLst>`).

Refs #11.
@MHoroszowski MHoroszowski merged commit bcb004e into feature/slide-crud-phase1 May 7, 2026
6 checks passed
@MHoroszowski MHoroszowski deleted the feature/slide-crud-phase2 branch May 7, 2026 22:34
MHoroszowski added a commit that referenced this pull request May 7, 2026
* feat(slides): add slide CRUD Phase 1 — remove, move, indexed add_slide

First slice of issue #11 (slide CRUD epic). Lands the in-deck CRUD
operations users have been asking for since 2014; full duplicate(),
cross-deck append_from(), and Sections support stay deferred to
Phases 2–4 in the epic.

New public API
--------------
- `Slides.add_slide(slide_layout, index=None)` — `index=None`
  preserves existing append behavior (backwards compatible). Integer
  `index` inserts at that zero-based position; `index=len(self)`
  is a valid append, negatives raise `IndexError`.
- `Slides.move(slide, new_index)` — reorder a slide in the
  presentation's slide sequence. Raises `ValueError` for foreign
  slides, `IndexError` for out-of-range `new_index`.
- `Slides.remove(slide)` — drop a slide from the collection. The
  presentation→slide relationship is removed; the underlying part
  falls out on save. Image and other media parts shared with
  surviving slides are preserved (covered by integration test).
- `Slide.delete()` — convenience alias for
  `prs.slides.remove(self)`.

API shape mirrors the existing `SlideLayouts.remove(slide_layout)`
shape rather than upstream PR scanny#1029's
`remove_slide(slide_id)` — instance-keyed is more Pythonic and
matches the rest of the library.

Internal additions
------------------
- `CT_SlideIdList.insert_sldId_at(rId, idx)` — index-aware sldId
  insertion, with `idx == len` allowed.
- `CT_SlideIdList.move_sldId_to(sldId, new_idx)` — reposition an
  existing sldId; no-op when the index already matches.
- `CT_SlideIdList.remove_sldId(sldId)` — bounded remove with a
  parent-of check.

Test coverage
-------------
- 32 new unit tests in `tests/test_slide_crud.py` covering oxml
  helpers, the Slides CRUD methods, the Slide.delete alias, and four
  full-pipeline round-trip integration tests (add-at-index, remove,
  delete, move). Plus a shared-image-preservation round-trip and a
  `index=len(slides)` boundary test added per advisor review.
- 5 new behave scenarios in `features/sld-crud.feature` exercising
  add-at-head, add-in-middle, move, remove, and Slide.delete.
- New UAT script (untracked) at `uat_slide_crud.py` builds a deck
  exercising every new entry-point, saves, reopens, and prints a
  per-slide read-back so the maintainer can compare in PowerPoint /
  Keynote per CLAUDE.md §6.

Tangential test-isolation fix
-----------------------------
`tests/opc/test_package.py:DescribePartFactory` mutates
`PartFactory.part_type_for[CT.PML_SLIDE]` to a class_mock and
never reverts it. The mock leaked into any later test that loaded
slides through `Presentation()` — caught here by the new
round-trip integration tests. Restored via a request finalizer;
isolated to that one test, no behavior change for the production
code path.

Verification
------------
```
$ python3 -m pytest tests/ -q | tail -3
3086 passed in 3.76s

$ ruff check src tests | tail -3
All checks passed!

$ python3 -m behave features/ --no-color | tail -3
986 scenarios passed, 0 failed, 0 skipped
2952 steps passed, 0 failed, 0 skipped
```

Refs #11

* feat(slides): add Slide.duplicate / Slides.duplicate (slide-CRUD Phase 2) (#34)

Closes the Phase 2 sub-task of the slide-CRUD epic (#11) and the
12-year-old most-commented upstream feature request scanny#132
in this fork.

API
---

- `Slides.duplicate(slide, index=None)` — collection-level operation,
  returns a `Slide`. `index=None` defaults to `source_index + 1`
  (immediately after the source). Raises `ValueError` if `slide` is
  not a member, `IndexError` if `index` is out of range.
- `Slide.duplicate(index=None)` — convenience alias delegating to
  `Slides.duplicate(self, index)`. Mirrors `Slide.delete()` →
  `Slides.remove(self)` from Phase 1.

Design
------

The implementation walks the source slide's rels and decides per
reltype whether to share the target part (image, media, slide-layout,
slide-master, theme — package-level dedup) or deep-copy it (chart,
OLE-object, embedded-package, notes-slide). For chart parts the chart
XML is `copy.deepcopy`'d but the embedded xlsx workbook is *blob-
copied* — the workbook is the chart's data and `<c:numCache>` values
in the chart XML mirror it; deepcopy of XML would not capture the
binary payload.

After deepcopy, the duplicated `<p:sld>` still references the source's
`rId`s. We walk every descendant element via `lxml.iter()` and
substitute any attribute whose name is in the OOXML relationships
namespace — catches `r:id`, `r:embed`, `r:link`, `r:pict`, `r:href`
in one pass without enumerating local-names (the local-name approach
silently misses `r:link` on linked images, which is the most common
real-world failure mode of community recipes).

Notes-slide handling addresses upstream gotcha scanny#961: a NotesSlidePart
back-references its parent slide via `RT.SLIDE`. The duplicate's
notes-slide MUST point at the new slide, not the source. The new
helper `duplicate_notes_slide_for(src, new)` rewires the back-rel
and is invoked AFTER the new slide part is registered in presentation
rels.

Comments parts (`RT.COMMENTS` / `RT.COMMENT_AUTHORS`) are dropped on
duplicate by an explicit reltype filter — comments support is out of
scope for Phase 2 of #11. A test documents the drop so the policy
doesn't silently change.

Test coverage
-------------

- `tests/test_slide_duplicate.py` — 39 new tests across 7 describe
  classes covering the API surface (existence, signature, raises,
  index variants, alias delegation), the part-graph invariants (new
  unique partname / rId / sldId, layout-part shared with source,
  modification isolation, image-dedup invariant unchanged), the
  notes-slide path (own copy, back-ref rewired, isolation), the
  defensive XPath `r:*` check (no unmapped rIds), and round-trip
  through save/reopen.
- `features/sld-duplicate.feature` + 7 new step impls — 4 acceptance
  scenarios for default-index, explicit-index, alias, and IndexError
  paths.

Verification
------------

```
$ python3 -m pytest tests/ -q
3125 passed in 4.02s
$ python3 -m ruff format src tests
203 files left unchanged
$ python3 -m ruff check src tests
All checks passed!
$ python3 -m behave features/ --no-color
990 scenarios passed, 0 failed, 0 skipped
```

UAT script `uat_slide_duplicate.py` (untracked at repo root per
CLAUDE.md §6): builds a 3-slide deck with a textbox, a picture, and
speaker notes; exercises both `Slides.duplicate` and
`Slide.duplicate`; round-trips; prints structural summary
(image_part_count = 2 before AND after — dedup invariant holds).

Out of scope (deferred per epic split):
- `Presentation.append_from` — Phase 3 (cross-deck rels).
- `Presentation.sections` — Phase 4 (`<p14:sectionLst>`).

Refs #11.

Co-authored-by: Matthew Horoszowski <mhoroszowski@Winton.lan>

---------

Co-authored-by: Matthew Horoszowski <mhoroszowski@Winton.lan>
MHoroszowski added a commit that referenced this pull request May 8, 2026
) (#36)

Phase 4 of issue #11 (slide CRUD epic). Implements PowerPoint Sections
support — read, write, and round-trip the
`p:extLst/p:ext{uri=521415D9-…}/p14:sectionLst` extension that
PowerPoint uses to organize slides into named groups in the slide
pane. The fork's slide CRUD epic is now complete: Phase 1 (#33),
Phase 2 (#34), Phase 3 (#35), and Phase 4 (this PR) collectively
shipped duplicate / delete / reorder / copy-between-decks / sections
across the originally listed sub-features.

New public API
--------------
- `Presentation.sections` → `_Sections` collection.
- `_Sections` is sequence-like: `__len__`, `__iter__`,
  `__getitem__`, `index()`. Adds:
  * `add_section(name, after=None) -> Section` — append, or insert
    immediately after an existing section when `after` is given.
  * `remove(section)` — drop section; cleans up the wrapping
    `p14:sectionLst` / `p:ext` / `p:extLst` chain when the last
    section goes.
- `Section` exposes:
  * `name` (read/write str)
  * `id` (read-only GUID-with-braces, e.g. `{ABC-…}`)
  * `slides` (tuple of |Slide|, in section order)
  * `add_slide(slide)` — assign or move slide into this section
    (a slide can belong to at most one section)
  * `remove_slide(slide)` — drop a slide's section assignment;
    slide remains in the presentation.

Membership invariants
---------------------
Section membership references slides by their stable
`p:sldId/@id` integer (NOT by `r:id` and NOT by position), so
`Slides.move(...)`, indexed `add_slide(...)`, and `Slides.remove(...)`
preserve assignment without any extra plumbing. Removed slides
become orphan ids in the section's `p14:sldIdLst`; `Section.slides`
silently skips them on read but the XML round-trips them untouched
(deliberate — matching python-pptx's "preserve foreign data"
doctrine).

PowerPoint compatibility
------------------------
- Empty sections emit `<p14:sldIdLst/>` so all PowerPoint versions
  treat them consistently (some interpret an omitted `sldIdLst`
  as "all unsectioned slides," which is not what we mean).
- Section ids generated as `{<UPPER-CASE-GUID>}` matching
  PowerPoint's wire shape.
- Foreign `p:extLst/p:ext` siblings (`p15:*`, modification
  tracking, etc.) round-trip untouched — the section ext lives
  alongside, not in place of.

Internal additions
------------------
- New `pptx/sections.py` module hosting `Section` and
  `_Sections` proxy classes.
- `pptx/oxml/presentation.py` extended with
  `CT_PresentationExtensionList`, `CT_PresentationExtension`,
  `CT_SectionList`, `CT_Section`, `CT_SectionSlideIdList`,
  `CT_SectionSlideId`, plus `SECTION_LIST_EXT_URI` constant and
  `CT_Presentation.{section_list, get_or_add_section_list,
  remove_section_list}` helpers.
- `pptx/oxml/ns.py` registers the `p14` prefix
  (`http://schemas.microsoft.com/office/powerpoint/2010/main`).
- `pptx/oxml/__init__.py` registers the new element classes.
- `p:extLst` added as a successor entry on the existing
  ZeroOrOne `sldMasterIdLst`/`sldIdLst`/`sldSz` slots so
  insert ordering is correct.

Tangential test-infra fix
-------------------------
`tests/unitutil/cxml.py` namespace-prefix grammar widened from
`Word(alphas)` to `Word(alphas, alphanums)` so test fixtures
can address `p14:`, `w14:`, `o15:`, etc. Local-name grammar
already supported alphanums.

Test coverage
-------------
- 56 new unit tests in `tests/test_sections.py` covering oxml
  helpers, the `Section`/`_Sections` proxies, and 8 round-trip
  integration tests (no-sections baseline, names + membership,
  GUID preservation, membership-survives-move, removal pruning,
  orphan preservation, empty-section, unicode/XML-special name,
  unsectioned-on-section-remove, sibling-ext preservation).
- 5 new behave scenarios in `features/sld-sections.feature`
  (default empty, add, slide membership, move-preserves-membership,
  remove cleans up extLst).
- New `uat_slide_sections.py` (untracked per repo §6) builds a
  6-slide / 3-section deck, demonstrates membership preservation
  through a slide move, and prints a structural read-back.

Verification
------------
```
$ python3 -m pytest tests/ -q | tail -3
3222 passed in 4.33s

$ ruff check src tests | tail -3
All checks passed!

$ python3 -m behave features/ --no-color | tail -3
999 scenarios passed, 0 failed, 0 skipped
3000 steps passed, 0 failed, 0 skipped
```

Closes #11

Co-authored-by: Matthew Horoszowski <mhoroszowski@Winton.lan>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant