Suggestion: If No Invalid XML is Found, Try Opening the Non-Opening DOCX in WordPad #5

socrtwo · 2017-08-12T13:10:36Z

After zip repair and if Word Corrupt Doc Checker does not find any invalid XML, to attempt to open the file in WordPad. I have found that surprisingly once the zip structure is fixed, many if not most repairable DOCX files will open in WordPad without any further repair. I know this is not a prestigious solution, but sometimes simple is best.

desjarlais · 2017-08-12T23:01:43Z

I've seen WordPad handle Word documents that Word doesn't. I think at times what happens there is that WordPad is a limited client/reader of DOCX files, so it just ignores many aspects of the file and displays only the simpler data and gets to skip certain content.

Is the idea here to handle some of these scenarios where no corruption is technically found, but Word is still not going to open the file and giving the user an option to pass it off to a client that will at least read some of the data is better than leaving them with the sense that it has no issues?

socrtwo · 2017-08-13T06:27:26Z

Yes, that's the idea. I suppose you could add to the address text that disclaimer that WordPad will ignore some of the more complicated formatting Word does. WordPad itself warns users when it is ignoring complex stuff. You could also suggest that the results are not the best of all worlds and that for better results, use a commercial program or contact a manual repair person like you or myself or through a Microsoft forum (I have some this links to threads where Word MVPs fix corrupt DOCX files manually for free). Best Wishes, Paul D Pruitt socrtwo@s2services.com (301) 493-4982 9006 Friars Rd. Bethesda, MD 20817-3320 - Have a manuscript lying around gathering dust? Let me help you self-publish it <socrtwo@s2services.com>.

…

On Sat, Aug 12, 2017 at 7:01 PM, Brandon Desjarlais < ***@***.***> wrote: I've seen WordPad handle Word documents that Word doesn't. I think at times what happens there is that WordPad is a limited client/reader of DOCX files, so it just ignores many aspects of the file and displays only the simpler data and gets to skip certain content. Is the idea here to handle some of these scenarios where no corruption is technically found, but Word is still not going to open the file and giving the user an option to pass it off to a client that will at least read some of the data is better than leaving them with the sense that it has no issues? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA2CpviD5TUpLnu396CxvqAIPQRiHthYks5sXi7YgaJpZM4O1aQ7> .

desjarlais · 2017-08-14T04:12:34Z

I've already started implementing the Open Xml SDK, so far works fairly well so I should be able to get this type of behavior into the program. I have some additional testing and verifying to work through, then I'll push the changes to github. Thanks for the report, but closing this for now.

socrtwo · 2017-08-14T04:58:42Z

OK, no problem. I'm not clear how you are going to use Open XML SDK. Can you explain a little further? Best Wishes, Paul D Pruitt socrtwo@s2services.com (301) 493-4982 9006 Friars Rd. Bethesda, MD 20817-3320 - Have a manuscript lying around gathering dust? Let me help you self-publish it <socrtwo@s2services.com>.

…

On Mon, Aug 14, 2017 at 12:12 AM, Brandon Desjarlais < ***@***.***> wrote: I've already started implementing the Open Xml SDK, so far works fairly well so I should be able to get this type of behavior into the program. I have some additional testing and verifying to work through, then I'll push the changes to github. Thanks for the report, but closing this for now. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA2Cpq7CNDPmvPpDjZvoprVodcgND04nks5sX8kzgaJpZM4O1aQ7> .

desjarlais · 2017-08-14T06:04:37Z

I downloaded the SDK and then added a reference to it, then I use it to open the file and if it fails, there are still corrupt tags. The SDK works better than automating the client application for a scenario like this.

socrtwo · 2017-08-14T07:59:34Z

Yes I agree. I was checking my corrupt files with the Open XML SDK Tool from the 2.5 version and it was giving some very detailed good info on about half of my fixable corrupt documents. Can you check with this file <https://drive.google.com/file/d/0B4rG1uoXTSmyZUQwemRFY2RRNEl3VzZLWV91RUtMaGlwR2Vj/view?usp=sharing> to make sure your changes are working correctly? I used your frmMain.cs, AssemblyInfo.cs and DocCorruptionChecker.cproj from 2 hours ago and Open XML SDK 2.5 as a reference to rebuild the exe. It reported the file was correctly fixed, but it still won't open...It will open if the fallback tag remove box is checked. Best Wishes, Paul D Pruitt socrtwo@s2services.com (301) 493-4982 9006 Friars Rd. Bethesda, MD 20817-3320 - Have a manuscript lying around gathering dust? Let me help you self-publish it <socrtwo@s2services.com>.

…

On Mon, Aug 14, 2017 at 2:04 AM, Brandon Desjarlais < ***@***.***> wrote: I downloaded the SDK and then added a reference to it, then I use it to open the file and if it fails, there are still corrupt tags. The SDK works better than automating the client application for a scenario like this. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA2CptdSOnVsA4qP5xvZ0ApcrYAgF-0aks5sX-N2gaJpZM4O1aQ7> .

desjarlais · 2017-08-14T15:26:15Z

I'll keep looking into this because I do see the same behavior where it flagged the file as being correct still. I thought the SDK validated the Xml on open, so I'll need to do some research.

desjarlais · 2017-08-14T15:38:42Z

I see what I did wrong, I forgot to try pulling the actual contents from the document.xml file. I was just opening the zip container, which is going to work. It is the document.xml that we need to try pulling the content from that will tell us if it still has bad tags. Fixed and pushed those changes.

socrtwo · 2017-08-15T01:07:24Z

OK, that worked. Cool. Best Wishes, Paul D Pruitt socrtwo@s2services.com (301) 493-4982 9006 Friars Rd. Bethesda, MD 20817-3320 - Have a manuscript lying around gathering dust? Let me help you self-publish it <socrtwo@s2services.com>.

…

On Mon, Aug 14, 2017 at 11:38 AM, Brandon Desjarlais < ***@***.***> wrote: I see what I did wrong, I forgot to try pulling the actual contents from the document.xml file. I was just opening the zip container, which is going to work. It is the document.xml that we need to try pulling the content from that will tell us if it still has bad tags. Fixed and pushed those changes. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA2Cpqon5v5VQhA63Wmeo87g8K70JTRoks5sYGoCgaJpZM4O1aQ7> .

socrtwo · 2017-08-15T01:19:15Z

Removing fallback tags, removes some content right? Do you think it might be a good thing to reproduce the text and images removed from a fallback removal operation? For instance with the EXPO SISTEMAS.docx, aren't we removing an image and some text found in a bad textbox code? Perhaps the program could save the image that is being removed to a path which is described in the lstOutput.Items.Add text as well as outputting out the text that has been removed also to the lstOutput.Items.Add...without the XML tags of course. You could counsel the user then to re-add the content is desired, but advise them how to do it without causing the same error. Best Wishes, Paul D Pruitt socrtwo@s2services.com (301) 493-4982 9006 Friars Rd. Bethesda, MD 20817-3320 - Have a manuscript lying around gathering dust? Let me help you self-publish it <socrtwo@s2services.com>. On Mon, Aug 14, 2017 at 9:06 PM, Paul D Pruitt <socrtwo@s2services.com> wrote:

…

OK, that worked. Cool. Best Wishes, Paul D Pruitt ***@***.*** (301) 493-4982 9006 Friars Rd. Bethesda, MD 20817-3320 - Have a manuscript lying around gathering dust? Let me help you self-publish it ***@***.***>. On Mon, Aug 14, 2017 at 11:38 AM, Brandon Desjarlais < ***@***.***> wrote: > I see what I did wrong, I forgot to try pulling the actual contents from > the document.xml file. I was just opening the zip container, which is going > to work. It is the document.xml that we need to try pulling the content > from that will tell us if it still has bad tags. Fixed and pushed those > changes. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#5 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AA2Cpqon5v5VQhA63Wmeo87g8K70JTRoks5sYGoCgaJpZM4O1aQ7> > . >

desjarlais · 2017-08-15T05:16:30Z

I don't think it removes any actual content. The xml elements in question are the AlternateContent (AC) blocks. Each AC block will have multiple representations of the content, including fallback. It is up to the reader/client to choose which version of the AC block to read. Removing the fallback just removes one "version" of the content.

The caveat here would be a file that had an AC block and ONLY a fallback. In which case, yes the content would probably be deleted as well, but I have yet to see a corrupt file that had a bad fallback AND no other options in the AC block.

socrtwo · 2017-08-16T20:46:51Z

OK, that's interesting. I didn't know how that worked. Thanks. Best Wishes, Paul D Pruitt socrtwo@s2services.com (301) 493-4982 9006 Friars Rd. Bethesda, MD 20817-3320 - Have a manuscript lying around gathering dust? Let me help you self-publish it <socrtwo@s2services.com>.

…

On Tue, Aug 15, 2017 at 1:16 AM, Brandon Desjarlais < ***@***.***> wrote: I don't think it removes any actual content. The xml elements in question are the AlternateContent (AC) blocks. Each AC block will have multiple representations of the content, including fallback. It is up to the reader/client to choose which version of the AC block to read. Removing the fallback just removes one "version" of the content. The caveat here would be a file that had an AC block and ONLY a fallback. In which case, yes the content would probably be deleted as well, but I have yet to see a corrupt file that had a bad fallback AND no other options in the AC block. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA2Cpk1_ccIcgq2mP1e73tQzrJD_4iBYks5sYSmugaJpZM4O1aQ7> .

desjarlais self-assigned this Aug 12, 2017

desjarlais added the enhancement label Aug 12, 2017

desjarlais closed this as completed Aug 14, 2017

desjarlais reopened this Aug 14, 2017

desjarlais closed this as completed Aug 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: If No Invalid XML is Found, Try Opening the Non-Opening DOCX in WordPad #5

Suggestion: If No Invalid XML is Found, Try Opening the Non-Opening DOCX in WordPad #5

socrtwo commented Aug 12, 2017

desjarlais commented Aug 12, 2017

socrtwo commented Aug 13, 2017 via email

desjarlais commented Aug 14, 2017

socrtwo commented Aug 14, 2017 via email

desjarlais commented Aug 14, 2017

socrtwo commented Aug 14, 2017 via email

desjarlais commented Aug 14, 2017

desjarlais commented Aug 14, 2017

socrtwo commented Aug 15, 2017 via email

socrtwo commented Aug 15, 2017 via email

desjarlais commented Aug 15, 2017

socrtwo commented Aug 16, 2017 via email

Suggestion: If No Invalid XML is Found, Try Opening the Non-Opening DOCX in WordPad #5

Suggestion: If No Invalid XML is Found, Try Opening the Non-Opening DOCX in WordPad #5

Comments

socrtwo commented Aug 12, 2017

desjarlais commented Aug 12, 2017

socrtwo commented Aug 13, 2017 via email

desjarlais commented Aug 14, 2017

socrtwo commented Aug 14, 2017 via email

desjarlais commented Aug 14, 2017

socrtwo commented Aug 14, 2017 via email

desjarlais commented Aug 14, 2017

desjarlais commented Aug 14, 2017

socrtwo commented Aug 15, 2017 via email

socrtwo commented Aug 15, 2017 via email

desjarlais commented Aug 15, 2017

socrtwo commented Aug 16, 2017 via email