-
Notifications
You must be signed in to change notification settings - Fork 2
Suggestion: If No Invalid XML is Found, Try Opening the Non-Opening DOCX in WordPad #5
Comments
I've seen WordPad handle Word documents that Word doesn't. I think at times what happens there is that WordPad is a limited client/reader of DOCX files, so it just ignores many aspects of the file and displays only the simpler data and gets to skip certain content. Is the idea here to handle some of these scenarios where no corruption is technically found, but Word is still not going to open the file and giving the user an option to pass it off to a client that will at least read some of the data is better than leaving them with the sense that it has no issues? |
Yes, that's the idea. I suppose you could add to the address text that
disclaimer that WordPad will ignore some of the more complicated formatting
Word does. WordPad itself warns users when it is ignoring complex stuff.
You could also suggest that the results are not the best of all worlds and
that for better results, use a commercial program or contact a manual
repair person like you or myself or through a Microsoft forum (I have some
this links to threads where Word MVPs fix corrupt DOCX files manually for
free).
Best Wishes,
Paul D Pruitt
socrtwo@s2services.com
(301) 493-4982
9006 Friars Rd.
Bethesda, MD 20817-3320
- Have a manuscript lying around gathering dust? Let me help you
self-publish it <socrtwo@s2services.com>.
…On Sat, Aug 12, 2017 at 7:01 PM, Brandon Desjarlais < ***@***.***> wrote:
I've seen WordPad handle Word documents that Word doesn't. I think at
times what happens there is that WordPad is a limited client/reader of DOCX
files, so it just ignores many aspects of the file and displays only the
simpler data and gets to skip certain content.
Is the idea here to handle some of these scenarios where no corruption is
technically found, but Word is still not going to open the file and giving
the user an option to pass it off to a client that will at least read some
of the data is better than leaving them with the sense that it has no
issues?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#5 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA2CpviD5TUpLnu396CxvqAIPQRiHthYks5sXi7YgaJpZM4O1aQ7>
.
|
I've already started implementing the Open Xml SDK, so far works fairly well so I should be able to get this type of behavior into the program. I have some additional testing and verifying to work through, then I'll push the changes to github. Thanks for the report, but closing this for now. |
OK, no problem. I'm not clear how you are going to use Open XML SDK. Can
you explain a little further?
Best Wishes,
Paul D Pruitt
socrtwo@s2services.com
(301) 493-4982
9006 Friars Rd.
Bethesda, MD 20817-3320
- Have a manuscript lying around gathering dust? Let me help you
self-publish it <socrtwo@s2services.com>.
…On Mon, Aug 14, 2017 at 12:12 AM, Brandon Desjarlais < ***@***.***> wrote:
I've already started implementing the Open Xml SDK, so far works fairly
well so I should be able to get this type of behavior into the program. I
have some additional testing and verifying to work through, then I'll push
the changes to github. Thanks for the report, but closing this for now.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#5 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA2Cpq7CNDPmvPpDjZvoprVodcgND04nks5sX8kzgaJpZM4O1aQ7>
.
|
I downloaded the SDK and then added a reference to it, then I use it to open the file and if it fails, there are still corrupt tags. The SDK works better than automating the client application for a scenario like this. |
Yes I agree. I was checking my corrupt files with the Open XML SDK Tool
from the 2.5 version and it was giving some very detailed good info on
about half of my fixable corrupt documents.
Can you check with this file
<https://drive.google.com/file/d/0B4rG1uoXTSmyZUQwemRFY2RRNEl3VzZLWV91RUtMaGlwR2Vj/view?usp=sharing>
to make sure your changes are working correctly? I used your frmMain.cs,
AssemblyInfo.cs and DocCorruptionChecker.cproj from 2 hours ago and Open
XML SDK 2.5 as a reference to rebuild the exe. It reported the file was
correctly fixed, but it still won't open...It will open if the fallback tag
remove box is checked.
Best Wishes,
Paul D Pruitt
socrtwo@s2services.com
(301) 493-4982
9006 Friars Rd.
Bethesda, MD 20817-3320
- Have a manuscript lying around gathering dust? Let me help you
self-publish it <socrtwo@s2services.com>.
…On Mon, Aug 14, 2017 at 2:04 AM, Brandon Desjarlais < ***@***.***> wrote:
I downloaded the SDK and then added a reference to it, then I use it to
open the file and if it fails, there are still corrupt tags. The SDK works
better than automating the client application for a scenario like this.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#5 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA2CptdSOnVsA4qP5xvZ0ApcrYAgF-0aks5sX-N2gaJpZM4O1aQ7>
.
|
I'll keep looking into this because I do see the same behavior where it flagged the file as being correct still. I thought the SDK validated the Xml on open, so I'll need to do some research. |
I see what I did wrong, I forgot to try pulling the actual contents from the document.xml file. I was just opening the zip container, which is going to work. It is the document.xml that we need to try pulling the content from that will tell us if it still has bad tags. Fixed and pushed those changes. |
OK, that worked. Cool.
Best Wishes,
Paul D Pruitt
socrtwo@s2services.com
(301) 493-4982
9006 Friars Rd.
Bethesda, MD 20817-3320
- Have a manuscript lying around gathering dust? Let me help you
self-publish it <socrtwo@s2services.com>.
…On Mon, Aug 14, 2017 at 11:38 AM, Brandon Desjarlais < ***@***.***> wrote:
I see what I did wrong, I forgot to try pulling the actual contents from
the document.xml file. I was just opening the zip container, which is going
to work. It is the document.xml that we need to try pulling the content
from that will tell us if it still has bad tags. Fixed and pushed those
changes.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#5 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA2Cpqon5v5VQhA63Wmeo87g8K70JTRoks5sYGoCgaJpZM4O1aQ7>
.
|
Removing fallback tags, removes some content right? Do you think it might
be a good thing to reproduce the text and images removed from a fallback
removal operation?
For instance with the EXPO SISTEMAS.docx, aren't we removing an image and
some text found in a bad textbox code? Perhaps the program could save the
image that is being removed to a path which is described in
the lstOutput.Items.Add text as well as outputting out the text that has
been removed also to the lstOutput.Items.Add...without the XML tags of
course.
You could counsel the user then to re-add the content is desired, but
advise them how to do it without causing the same error.
Best Wishes,
Paul D Pruitt
socrtwo@s2services.com
(301) 493-4982
9006 Friars Rd.
Bethesda, MD 20817-3320
- Have a manuscript lying around gathering dust? Let me help you
self-publish it <socrtwo@s2services.com>.
On Mon, Aug 14, 2017 at 9:06 PM, Paul D Pruitt <socrtwo@s2services.com>
wrote:
… OK, that worked. Cool.
Best Wishes,
Paul D Pruitt
***@***.***
(301) 493-4982
9006 Friars Rd.
Bethesda, MD 20817-3320
- Have a manuscript lying around gathering dust? Let me help you
self-publish it ***@***.***>.
On Mon, Aug 14, 2017 at 11:38 AM, Brandon Desjarlais <
***@***.***> wrote:
> I see what I did wrong, I forgot to try pulling the actual contents from
> the document.xml file. I was just opening the zip container, which is going
> to work. It is the document.xml that we need to try pulling the content
> from that will tell us if it still has bad tags. Fixed and pushed those
> changes.
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#5 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AA2Cpqon5v5VQhA63Wmeo87g8K70JTRoks5sYGoCgaJpZM4O1aQ7>
> .
>
|
I don't think it removes any actual content. The xml elements in question are the AlternateContent (AC) blocks. Each AC block will have multiple representations of the content, including fallback. It is up to the reader/client to choose which version of the AC block to read. Removing the fallback just removes one "version" of the content. The caveat here would be a file that had an AC block and ONLY a fallback. In which case, yes the content would probably be deleted as well, but I have yet to see a corrupt file that had a bad fallback AND no other options in the AC block. |
OK, that's interesting. I didn't know how that worked. Thanks.
Best Wishes,
Paul D Pruitt
socrtwo@s2services.com
(301) 493-4982
9006 Friars Rd.
Bethesda, MD 20817-3320
- Have a manuscript lying around gathering dust? Let me help you
self-publish it <socrtwo@s2services.com>.
…On Tue, Aug 15, 2017 at 1:16 AM, Brandon Desjarlais < ***@***.***> wrote:
I don't think it removes any actual content. The xml elements in question
are the AlternateContent (AC) blocks. Each AC block will have multiple
representations of the content, including fallback. It is up to the
reader/client to choose which version of the AC block to read. Removing the
fallback just removes one "version" of the content.
The caveat here would be a file that had an AC block and ONLY a fallback.
In which case, yes the content would probably be deleted as well, but I
have yet to see a corrupt file that had a bad fallback AND no other options
in the AC block.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#5 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA2Cpk1_ccIcgq2mP1e73tQzrJD_4iBYks5sYSmugaJpZM4O1aQ7>
.
|
After zip repair and if Word Corrupt Doc Checker does not find any invalid XML, to attempt to open the file in WordPad. I have found that surprisingly once the zip structure is fixed, many if not most repairable DOCX files will open in WordPad without any further repair. I know this is not a prestigious solution, but sometimes simple is best.
The text was updated successfully, but these errors were encountered: