Skip to content
This repository has been archived by the owner on Feb 5, 2019. It is now read-only.

Using MarkupSimplifier.SimplifyMarkup on Word 2013/2016 files generate invalid styles.xml file #102

Closed
ManoShu opened this issue Dec 6, 2016 · 3 comments

Comments

@ManoShu
Copy link

ManoShu commented Dec 6, 2016

Hello,

i am posting here the issue I posted on Eric's forum, so that maybe someone else had that problem.

When opening a Word document that was created on versions 2013 and 2016, using the MarkupSimplifier.SimplifyMarkup method with NormalizeXml marked as true,saving it as-is and finally opening it on Word itself will cause the following error to be shown:

The XML data is invalid according to the schema
Location: Part: /word/styles.xml, Line: 0, Column: 0

That impede the document to be normally opened via interop, unless the OpenAndRepair parameter is set to true, however it is not desirable to use that approach.

Digging on the styles.xml file on both the origin document and "corrupt" processed document,
there's additional attributes in the first elements of the respective documents:

Original document:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> 
<w:styles xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" 
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" 
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" 
xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" 
xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" 
xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" 
mc:Ignorable="w14 w15 w16se"> 
[...]

Processed document:

<?xml version="1.0" encoding="utf-8"?> 
<w:styles mc:Ignorable="w14 w15 w16se" 
xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" 
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" 
xmlns:o="urn:schemas-microsoft-com:office:office" 
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" 
xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" 
xmlns:v="urn:schemas-microsoft-com:vml" 
xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" 
xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" 
xmlns:w10="urn:schemas-microsoft-com:office:word" 
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" 
xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" 
xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" 
xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" 
xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" 
xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape"> 
[...]

Everything below those tags doesn't seem to be much different, mainly the order of the attributes is changed, but I don't believe this would cause the error. A complete test WinMerge HTML report can be found here, use the google drive viewer or download it and open in a browser.

Below are the files/project to test it. Drag and drop any docx file on the executable and it will generate a {FILENAME}_processed file. Open it to test if the format is correct.

CompiledProject.zip
DocxFiles.zip
DocxTestProject.zip

Thanks in advance.

EDIT: Open-XML-SDK and Open-XML-PowerTools assemblies used were cloned and compiled on 2016-12-06

@ManoShu
Copy link
Author

ManoShu commented Dec 6, 2016

I added the missing tags from the newer versions of Word to the list on each namespace attribute list that i located. Please review it and check if there's nothing wrong if this change.

@quails4Eva
Copy link

Same issue as here #48
@ManoShu Thanks for submitting the fix, it looks more complete than the temporary workaround I used when dealing with this issue. Hopefully it gets merged soon and I won't have to keep using my custom build.

@tomjebo
Copy link

tomjebo commented Feb 1, 2019

Closing all issues as this repo is being archived and will no longer be maintained by Microsoft. The project is licensed for continued use and development by forking to your own repo.

@tomjebo tomjebo closed this as completed Feb 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants