Skip to content
This repository has been archived by the owner on Dec 25, 2023. It is now read-only.

Suggestion: If No Invalid XML is Found, Try Opening the Non-Opening DOCX in WordPad #5

Closed
socrtwo opened this issue Aug 12, 2017 · 12 comments
Assignees

Comments

@socrtwo
Copy link

socrtwo commented Aug 12, 2017

After zip repair and if Word Corrupt Doc Checker does not find any invalid XML, to attempt to open the file in WordPad. I have found that surprisingly once the zip structure is fixed, many if not most repairable DOCX files will open in WordPad without any further repair. I know this is not a prestigious solution, but sometimes simple is best.

@desjarlais desjarlais self-assigned this Aug 12, 2017
@desjarlais
Copy link
Owner

I've seen WordPad handle Word documents that Word doesn't. I think at times what happens there is that WordPad is a limited client/reader of DOCX files, so it just ignores many aspects of the file and displays only the simpler data and gets to skip certain content.

Is the idea here to handle some of these scenarios where no corruption is technically found, but Word is still not going to open the file and giving the user an option to pass it off to a client that will at least read some of the data is better than leaving them with the sense that it has no issues?

@socrtwo
Copy link
Author

socrtwo commented Aug 13, 2017 via email

@desjarlais
Copy link
Owner

I've already started implementing the Open Xml SDK, so far works fairly well so I should be able to get this type of behavior into the program. I have some additional testing and verifying to work through, then I'll push the changes to github. Thanks for the report, but closing this for now.

@socrtwo
Copy link
Author

socrtwo commented Aug 14, 2017 via email

@desjarlais
Copy link
Owner

I downloaded the SDK and then added a reference to it, then I use it to open the file and if it fails, there are still corrupt tags. The SDK works better than automating the client application for a scenario like this.

@socrtwo
Copy link
Author

socrtwo commented Aug 14, 2017 via email

@desjarlais
Copy link
Owner

I'll keep looking into this because I do see the same behavior where it flagged the file as being correct still. I thought the SDK validated the Xml on open, so I'll need to do some research.

@desjarlais desjarlais reopened this Aug 14, 2017
@desjarlais
Copy link
Owner

I see what I did wrong, I forgot to try pulling the actual contents from the document.xml file. I was just opening the zip container, which is going to work. It is the document.xml that we need to try pulling the content from that will tell us if it still has bad tags. Fixed and pushed those changes.

@socrtwo
Copy link
Author

socrtwo commented Aug 15, 2017 via email

@socrtwo
Copy link
Author

socrtwo commented Aug 15, 2017 via email

@desjarlais
Copy link
Owner

I don't think it removes any actual content. The xml elements in question are the AlternateContent (AC) blocks. Each AC block will have multiple representations of the content, including fallback. It is up to the reader/client to choose which version of the AC block to read. Removing the fallback just removes one "version" of the content.

The caveat here would be a file that had an AC block and ONLY a fallback. In which case, yes the content would probably be deleted as well, but I have yet to see a corrupt file that had a bad fallback AND no other options in the AC block.

@socrtwo
Copy link
Author

socrtwo commented Aug 16, 2017 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants