Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing video results in corrupted file. #974

Open
shoang22 opened this issue May 8, 2024 · 6 comments
Open

Removing video results in corrupted file. #974

shoang22 opened this issue May 8, 2024 · 6 comments

Comments

@shoang22
Copy link

shoang22 commented May 8, 2024

Hello,

I'm trying to remove all movies on each slide with the following:

def remove_movie(file_path: str):
    prs = pptx.Presentation(file_path)
    for slide in prs.slides:
        for shape in slide.shapes:
            if type(shape) == pptx.shapes.picture.Movie:
                vid = shape._element
                vid.getparent().remove(vid)
    prs.save(file_path.rpartition(".")[0] + "_no_movies.pptx")

The code executes successfully, but when I try to open the output file, I get the following error:

PowerPoint found a problem with content in blank_presentation_no_movies.pptx.
PowerPoint can attempt to repair the presentation.

If you trust the source of this presentation, click Repair.

Is there something that I'm doing wrong?

@MartinPacker
Copy link

There's probably rather more to deleting a movie than removing a chunk of XML.

@scanny
Copy link
Owner

scanny commented May 8, 2024

@shoang22 you're going to want to remove the relationship from the slide (package) part to the part containing the movie (Media part maybe?). Otherwise I expect PowerPoint isn't going to like seeing the orphaned movie. Not sure if that's the whole problem, unfortunately the repair error doesn't give us any idea of what it figures to be a "problem with content".

@MartinPacker
Copy link

MartinPacker commented May 8, 2024

And what would be the strategy for doing this - in Python code? I ask, @scanny, because this logic is probably common to other removals.

@scanny
Copy link
Owner

scanny commented May 8, 2024

Basically dig out the relationship and delete it.

The relationship(s) would be identified by an embed or link element with rId="rId{N}" I believe, dumping the XML for the moving shape would give you and idea.

Then you need to get to the slide part because that's the source side of the relationship, so something like:

slide_part = slide.part
slide_part.rels.drop_rel("rIdN")

Somebody can dig through and refine that with actual code if they have a mind to :)

@shoang22
Copy link
Author

shoang22 commented May 9, 2024

Somebody can dig through and refine that with actual code if they have a mind to :)

Something like this?

def remove_movie(file_path: str) -> None:
    slides_folder = os.path.dirname(file_path) + "/slides"
    os.makedirs(slides_folder, exist_ok=True)
    prs = pptx.Presentation(file_path)
    for idx, slide in enumerate(prs.slides):
        for shape in slide.shapes:
            if type(shape) == pptx.shapes.picture.Movie:
                p = slide.part
                x = etree.fromstring(p.rels.xml)
                before = etree.tostring(x, pretty_print=True)
                print(before.decode())
                vid = shape.element
                vid.getparent().remove(vid)
                p.rels.pop("rId2") 
                y = etree.fromstring(p.rels.xml)
                after = etree.tostring(y, pretty_print=True)
                print(after.decode())
    
    prs.save(file_path.rpartition(".")[0] + "_no_movies.pptx")

Prints:

<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
  <Relationship Id="rId1" Type="http://schemas.microsoft.com/office/2007/relationships/media" Target="../media/media1.mp4"/>
  <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/video" Target="../media/media1.mp4"/>
  <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slideLayout" Target="../slideLayouts/slideLayout1.xml"/>
  <Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/notesSlide" Target="../notesSlides/notesSlide1.xml"/>
  <Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="../media/image1.png"/>
  <Relationship Id="rId6" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="../media/image2.jpeg"/>
</Relationships>

<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
  <Relationship Id="rId1" Type="http://schemas.microsoft.com/office/2007/relationships/media" Target="../media/media1.mp4"/>
  <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slideLayout" Target="../slideLayouts/slideLayout1.xml"/>
  <Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/notesSlide" Target="../notesSlides/notesSlide1.xml"/>
  <Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="../media/image1.png"/>
  <Relationship Id="rId6" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="../media/image2.jpeg"/>
</Relationships>

But I'm still getting the same error when I attempt to open the file.

@scanny
Copy link
Owner

scanny commented May 9, 2024

Okay, so a couple possible approaches:

  1. do the repair and save it to a separate file. Then compare the XML from the original to the repaired version to see how PowerPoint "fixes" the presentation.
  2. Extract the original powerpoint to a directory ($ unzip original.pptx). Then make the changes by hand, re-zip the presentation into a PPTX file and keep trying things until it works.

The opc-diag tool was built for this kind of exploration:

  • you'll need to install from the develop branch on GitHub for it to work with Python 3: https://github.com/python-openxml/opc-diag/commits/develop/. Pretty sure it's something like: pip install -U git+https://github.com/python-openxml/opc-diag.git@develop
  • documentation is here: https://opc-diag.readthedocs.io/en/latest/index.html
  • The diff, extract, and repackage subcommands are most useful for this work. In particular, just unzipping a PPTX leaves all the content in any of the XML files on a single line, which of course is hard to edit. opc-diag automatically reformats that nicely for you.

You might want to do a mix of these two approaches. The diff approach is good when you have no clue of what changes are required. The edit->repackage->try cycle is best when you have a pretty good idea what changes to try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants