New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strict http://purl.oclc.org/ooxml/spreadsheetml/main tables fail validation #393

Open
ghost opened this Issue Jan 26, 2018 · 1 comment

Comments

Projects
None yet
1 participant
@ghost

ghost commented Jan 26, 2018

Description

Strict namespace spreadsheetml tables throw an exception in OpenXmlValidator.

Information

  • .NET Target: whatever the default is in VS
  • DocumentFormat.OpenXml Version: 2.5.0, 2.8.1

Repro

Trivial spreadsheet that triggers the problem: test.xlsx

_rels/worksheet.xml.rels has <Relationship Id='rId1' Type='http://purl.oclc.org/ooxml/officeDocu ment/relationships/table' Target='table.xml' /> which should, I think, enable StrictTranslation.

table.xml has <table xmlns='http://purl.oclc.org/ooxml/spreadsheetml/main' id='1' displayName='Data' ref='A1:B3'>

Both URLs are straight from ECMA-376, Part 1, 12.3.21 Table Definition Part.

Additionally, both LibreOffice and Office Online are able to load the table definition, which arguably might not signify anything.

SpreadsheetDocument document = SpreadsheetDocument.Open("test.xlsx", false);
OpenXmlValidator validator = new OpenXmlValidator(FileFormatVersions.Office2010);
validator.Validate(document);

Observed

System.IO.InvalidDataException
  HResult=0x80131501
  Message=Cannot load the root element from the part. The part contains invalid data.
  Source=DocumentFormat.OpenXml
  StackTrace:
   at DocumentFormat.OpenXml.Packaging.OpenXmlPart.LoadDomTree[T]()
   at DocumentFormat.OpenXml.Packaging.TableDefinitionPart.get_PartRootElement()
   at DocumentFormat.OpenXml.Validation.DocumentValidator.ValidatePart(OpenXmlPart part)
   at DocumentFormat.OpenXml.Validation.DocumentValidator.Validate(OpenXmlPackage document)
   at DocumentFormat.OpenXml.Validation.OpenXmlValidator.Validate(OpenXmlPackage openXmlPackage)
   at ConsoleApp1.Program.Main(String[] args) in C:\Users\IEUser\source\repos\ConsoleApp1\ConsoleApp1\Program.cs:line 18

Inner Exception 1:
InvalidDataException: The root XML element "http://purl.oclc.org/ooxml/spreadsheetml/main:table" in the part is incorrect. The expected root XML element is: "http://schemas.openxmlformats.org/spreadsheetml/2006/main:table".

Expected

No exception should be thrown and no errors returned.

@ghost ghost changed the title from Strict http://purl.oclc.org/ooxml/spreadsheetml/main fails validation to Strict http://purl.oclc.org/ooxml/spreadsheetml/main tables fail validation Jan 26, 2018

@ghost

This comment has been minimized.

ghost commented Feb 1, 2018

I've found that the file actually has mixed transitional/strict URLs. Only parts defined by the OPC may use the schemas.openxmlformats.org domain in a purely strict document.

Once united under purl.oclc.org where applicable, the error disappeared, thus closing the issue. I don't know if mixed documents should be valid but the idea seems hardly worth pursuing.

One remaining point is that the function should probably not throw exceptions when it returns errors.

@ghost ghost closed this Feb 1, 2018

@twsouthwick twsouthwick reopened this Feb 2, 2018

@twsouthwick twsouthwick added the schema label Feb 2, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment