Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"No value associated with key CodingKeys(stringValue: \"cols\", intValue: nil) #12

Closed
maxvol opened this issue Nov 22, 2018 · 27 comments
Closed
Assignees

Comments

@maxvol
Copy link

maxvol commented Nov 22, 2018

(lldb) po file.parseWorksheetPaths()
▿ 3 elements

  • 0 : "xl/worksheets/sheet3.xml"
  • 1 : "xl/worksheets/sheet2.xml"
  • 2 : "xl/worksheets/sheet1.xml"

(lldb) po file.parseWorksheet(at: "xl/worksheets/sheet3.xml")
▿ DecodingError
▿ keyNotFound : 2 elements
- .0 : CodingKeys(stringValue: "cols", intValue: nil)
▿ .1 : Context
- codingPath : 0 elements
- debugDescription : "No value associated with key CodingKeys(stringValue: "cols", intValue: nil) ("cols")."
- underlyingError : nil

(lldb) po file.parseWorksheet(at: "xl/worksheets/sheet2.xml")
▿ DecodingError
▿ keyNotFound : 2 elements
- .0 : CodingKeys(stringValue: "cols", intValue: nil)
▿ .1 : Context
- codingPath : 0 elements
- debugDescription : "No value associated with key CodingKeys(stringValue: "cols", intValue: nil) ("cols")."
- underlyingError : nil

(lldb) po file.parseWorksheet(at: "xl/worksheets/sheet1.xml")
Error Domain=NSXMLParserErrorDomain Code=39 "(null)" UserInfo={NSXMLParserErrorColumn=16328, NSXMLParserErrorLineNumber=2, NSXMLParserErrorMessage=AttValue: " or ' expected
}
▿ DecodingError
▿ dataCorrupted : Context
- codingPath : 0 elements
- debugDescription : "The given data was not valid XML."
▿ underlyingError : Optional
- some : Error Domain=NSXMLParserErrorDomain Code=111 "(null)"

@MaxDesiatov
Copy link
Collaborator

Hi @maxvol, many thanks for reporting this. I hope that issues with sheet3.xml and sheet2.xml are fixed in #14. As for sheet1.xml, could you please share that sheet1.xml file or .xlsx archive itself? You can also share it privately to this email.

MaxDesiatov added a commit that referenced this issue Nov 23, 2018
As reported in #12, there are worksheets that apparently don't have `cols` node, which means that `columns` property on `Worksheet` should be optional.
@maxvol
Copy link
Author

maxvol commented Nov 23, 2018

Regarding sheet1.xml - my guess it has something to do with encoding. I can unpack it and get proper XML string from data with .utf8 encoding. Not sure which encoding XMLCoder is using internally.

@maxvol
Copy link
Author

maxvol commented Nov 24, 2018

Error Domain=NSXMLParserErrorDomain Code=39 "(null)" UserInfo={NSXMLParserErrorColumn=16328, NSXMLParserErrorLineNumber=2, NSXMLParserErrorMessage=AttValue: " or ' expected
}

Apparently because quotes are escaped like: ="

<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\r\n<worksheet xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\" xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\" xmlns:mc=\"http://schemas.openxmlformats.org/markup-compatibility/2006\" mc:Ignorable=\"x14ac\" xmlns:x14ac=\"http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac\"><dimension ref=\"A1:E128\"/><sheetViews><sheetView tabSelected=\"1\" workbookViewId=\"0\"/></sheetViews><sheetFormatPr defaultRowHeight=\"15\" x14ac:dyDescent=\"0.25\"/><cols><col min=\"1\" max=\"1\" width=\"26.42578125\" customWidth=\"1\"/><col min=\"2\" max=\"2\" width=\"16.7109375\" customWidth=\"1\"/><col min=\"3\" max=\"3\" width=\"9.140625\" style=\"1\"/><col min=\"4\" max=\"4\" width=\"82\" customWidth=\"1\"/><col min=\"6\" max=\"6\" width=\"16\" customWidth=\"1\"/></cols><sheetData> ...```

@MaxDesiatov
Copy link
Collaborator

XMLCoder internally uses XMLParser from Foundation, which I guess picks up encoding from a standard XML header like this <?xml version="1.0" encoding="UTF-8"?>. Quotes could be an issue, but really interesting what could generate these quotes. Could you please share the source XML? I'm not sure I'd be able to investigate without having the actual XML at hand to investigate and test.

@maxvol
Copy link
Author

maxvol commented Nov 24, 2018

Here it is (without rows within sheetData, due to sensitive data) -

<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\r\n<worksheet xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\" xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\" xmlns:mc=\"http://schemas.openxmlformats.org/markup-compatibility/2006\" mc:Ignorable=\"x14ac\" xmlns:x14ac=\"http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac\"><dimension ref=\"A1:E128\"/><sheetViews><sheetView tabSelected=\"1\" workbookViewId=\"0\"/></sheetViews><sheetFormatPr defaultRowHeight=\"15\" x14ac:dyDescent=\"0.25\"/><cols><col min=\"1\" max=\"1\" width=\"26.42578125\" customWidth=\"1\"/><col min=\"2\" max=\"2\" width=\"16.7109375\" customWidth=\"1\"/><col min=\"3\" max=\"3\" width=\"9.140625\" style=\"1\"/><col min=\"4\" max=\"4\" width=\"82\" customWidth=\"1\"/><col min=\"6\" max=\"6\" width=\"16\" customWidth=\"1\"/></cols><sheetData></sheetData><conditionalFormatting sqref=\"C56:C1048576 C1:C54\"><cfRule type=\"duplicateValues\" dxfId=\"2\" priority=\"48\"/></conditionalFormatting><conditionalFormatting sqref=\"C1:C1048576\"><cfRule type=\"duplicateValues\" dxfId=\"1\" priority=\"1\"/></conditionalFormatting><conditionalFormatting sqref=\"C2:C54\"><cfRule type=\"duplicateValues\" dxfId=\"0\" priority=\"359\"/></conditionalFormatting><pageMargins left=\"0.7\" right=\"0.7\" top=\"0.75\" bottom=\"0.75\" header=\"0.3\" footer=\"0.3\"/></worksheet>

@MaxDesiatov
Copy link
Collaborator

Sorry, I'm a bit confused, is that the whole XML without sheetData, just two symbols \r\n?

@maxvol
Copy link
Author

maxvol commented Nov 24, 2018

Huh? Apparently you cannot see it for some reason, I will check formatting in my previous post.
P.S. Done! I forgot to add triple-quote-xml. Pls have a look at the previous post again.

@MaxDesiatov
Copy link
Collaborator

MaxDesiatov commented Nov 24, 2018

Thanks, can see it now, will have a closer look. BTW, if this string was printed from the debugger, the quotes are escaped by default, to print an unescaped string you can use something like po NSString(string: worksheetString). By using that printing command we could verify if the original XML had quotes escaped or not.

@maxvol
Copy link
Author

maxvol commented Nov 24, 2018

Aha! Here it is -

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"><dimension ref="A1:E128"/><sheetViews><sheetView tabSelected="1" workbookViewId="0"/></sheetViews><sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/><cols><col min="1" max="1" width="26.42578125" customWidth="1"/><col min="2" max="2" width="16.7109375" customWidth="1"/><col min="3" max="3" width="9.140625" style="1"/><col min="4" max="4" width="82" customWidth="1"/><col min="6" max="6" width="16" customWidth="1"/></cols><sheetData>
</sheetData><conditionalFormatting sqref="C56:C1048576 C1:C54"><cfRule type="duplicateValues" dxfId="2" priority="48"/></conditionalFormatting><conditionalFormatting sqref="C1:C1048576"><cfRule type="duplicateValues" dxfId="1" priority="1"/></conditionalFormatting><conditionalFormatting sqref="C2:C54"><cfRule type="duplicateValues" dxfId="0" priority="359"/></conditionalFormatting><pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/></worksheet>

P.S. The entire document printed with NSString(string: is deemed valid by XML editor. So perhaps that NSString(string: is not being called before feeding the worksheet to XMLCoder?

@MaxDesiatov
Copy link
Collaborator

Many thanks for this @maxvol! I think I've found a source of the issue and this can be parsed now in a separate branch that I'm preparing as a PR. Would you mind if I add this XML as a unit-test to CoreXLSX source code?

@maxvol
Copy link
Author

maxvol commented Nov 24, 2018

Sure, please do!

MaxDesiatov added a commit that referenced this issue Nov 24, 2018
A worksheet provided in #12 has a `Column` value without `customWidth` attribute. This attribute is now optional, also types of properties on `Column` were updated according to [documentation from Microsoft](https://docs.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.column?view=openxml-2.8.1). A few other types and names were updated accordingly with deprecation warnings added where possible. Worksheet XML that couldn't be parsed previously has been added to `WorksheetTests`.

* Update names and types of properties on Worksheet
* Update types on Column model and related tests
* Increase test coverage for new/deprecated props
@MaxDesiatov
Copy link
Collaborator

Great, thanks again. I've just merged #18 to master, could you please try master then to parse your file? The test with that XML is now passing both for me locally and on CI, so unless there's some more breaking stuff in sheetData node, I hope this should work.

@maxvol
Copy link
Author

maxvol commented Nov 24, 2018

Still crashing with the same error. Why not make most attributes optional? Here's a few more lines from the same sheet -

        <row r="1" spans="1:5" s="2" customFormat="1" ht="15.75" thickBot="1" x14ac:dyDescent="0.3">
            <c r="A1" s="3" t="s">

        <row r="3" spans="1:5" x14ac:dyDescent="0.25">
            <c r="A3" t="s">

BTW is it possible to modify/save XLSX?

@MaxDesiatov
Copy link
Collaborator

MaxDesiatov commented Nov 24, 2018

I've just added these rows and cells to the test and it passes with no modifications in CoreXLSX itself, I highly doubt it has something to do with optionality at this point. When you're saying "with the same error" do you mean this one?

Error Domain=NSXMLParserErrorDomain Code=39 "(null)" UserInfo={NSXMLParserErrorColumn=16328, NSXMLParserErrorLineNumber=2, NSXMLParserErrorMessage=AttValue: " or ' expected
}
▿ DecodingError
▿ dataCorrupted : Context
- codingPath : 0 elements
- debugDescription : "The given data was not valid XML."
▿ underlyingError : Optional
- some : Error Domain=NSXMLParserErrorDomain Code=111 "(null)"

If so, that's most probably caused by unscaped quotes somewhere in sheetData, but very hard to say without having the complete file to test. Maybe you could mangle the cell values in that file or isolate the error to only a few cells that could be shared publicly or privately to an email that I shared here before?

Currently there's no API in CoreXLSX for writing to an XLSX file, but in principle it wouldn't be very hard to add as XMLCoder supports both decoding and encoding. We'd only need to make sure to write correct model types to correct file names and create a zip archive with those file before writing a resulting output.

@maxvol
Copy link
Author

maxvol commented Nov 24, 2018

Unlikely, XML is valid when pasted into validating editor. Can it be missing 's' and 't' attributes in 'c' element? xml <c r="E2">

Error Domain=NSXMLParserErrorDomain Code=39 "(null)" UserInfo={NSXMLParserErrorColumn=16328, NSXMLParserErrorLineNumber=2, NSXMLParserErrorMessage=AttValue: " or ' expected
}

@MaxDesiatov
Copy link
Collaborator

Just to be completely sure, I've added this cell with a single r attribute as well to the test, but it still passes.

It looks like an error is thrown from XMLParser in Foundation module, as neither CoreXLSX nor XMLCoder use NSXMLParserErrorDomain for their error reporting. This might mean that Apple's XMLParser is not able to parse this XML at all.

Looking at XMLParser docs, I'm not sure we could get more info out of the debugger directly than we currently have:

When this method is invoked, parsing is stopped. For further information about the error, you can query parseError or you can send the parser a parserError message. You can also send the parser lineNumber and columnNumber messages to further isolate where the error occurred. Typically you implement this method to display information about the error to the user.

These line and column numbers are already printed in the error. Here it looks like the line number is 2 (I assume line 1 is the XML header) and the column is 16328. I don't know an easy way to get a substring from a string by line and column within the debugger, but you could unzip the XLSX file (e.g. with unzip in command-line), open sheet1.xml and check that line/column in a text editor that's able to navigate to the column.

Another approach could to be to reformat sheet1.xml to contain one node (or even better one attribute) per line. If you use VSCode, there's an XML formatter plugin available here.
This way, you could unzip the XLSX file (e.g. with unzip in command-line), open sheet1.xml, reformat it and zip those files again into a single archive and debug again. This would probably take more time though as it requires archiving everything back and repeating the debugging cycle.

@maxvol
Copy link
Author

maxvol commented Nov 24, 2018

(attributeNotStartedError = 39)

Error Domain=NSXMLParserErrorDomain Code=39 "(null)" UserInfo={NSXMLParserErrorColumn=16328, NSXMLParserErrorLineNumber=2, NSXMLParserErrorMessage=AttValue: " or ' expected
}
["NSUnderlyingError": Error Domain=NSXMLParserErrorDomain Code=111 "(null)", "NSCodingPath": [], "NSDebugDescription": "The given data was not valid XML."]

@MaxDesiatov
Copy link
Collaborator

MaxDesiatov commented Nov 24, 2018

Thanks, this gives the same info though, the error is coming from Apple's XMLParser due to use of NSXMLParserErrorDomain and the offending character is at column 16328 line 2. I'd appreciate if you shared at least a single XML node with attributes around that column 16328 at line 2. Thinking more about it, you wouldn't even need to decompress the XLSX as you already have access to that raw XML in the debugger as you've shared snippets from it previously. You can just paste the whole XML to vim and navigate to line 2 and column 16328 as described here or any other editor of your choice that allows navigating to a specific column (I've also found a tip for Sublime Text here).

@maxvol
Copy link
Author

maxvol commented Nov 25, 2018

Aha, the reason is whitespace characters (newline, spaces) between equal sign and first quote:

</row><row r=
    "78" spans="1:5" x14ac:dyDescent="0.25">

@maxvol
Copy link
Author

maxvol commented Nov 25, 2018

So, do you think you can fix it? Because that break occurs automatically, I have no control over it.

@MaxDesiatov
Copy link
Collaborator

MaxDesiatov commented Nov 25, 2018

I've added this snippet and a few other similar variations to the test suite with a newline and also \r\n combination between equal sign and first quote here. CoreXLSX master branch without any changes is able to pass these tests as is and these snippets are parsed without a problem, which means the cause is probably different. Maybe quotes are different or some other (hidden?) symbols were lost before sharing on GitHub?

If you're getting these files some other system or app, would it be possible to generate a file with fake data that could be shared so that I would be able to reproduce it on my side? Or maybe you could delete all cells except the offending cell, which I hope wouldn't leave any sensitive data and it could be shared? That would tremendously help in diagnosing the issue. Thanks!

@maxvol
Copy link
Author

maxvol commented Nov 25, 2018

Still the same error. Here's the snippet including the offending row -

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"><dimension ref="A1:E128"/><sheetViews><sheetView tabSelected="1" workbookViewId="0"/></sheetViews><sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/><cols><col min="1" max="1" width="26.42578125" customWidth="1"/><col min="2" max="2" width="16.7109375" customWidth="1"/><col min="3" max="3" width="9.140625" style="1"/><col min="4" max="4" width="82" customWidth="1"/><col min="6" max="6" width="16" customWidth="1"/></cols><sheetData><row r=
    "78" spans="1:5" x14ac:dyDescent="0.25"><c r="A78" t="s"><v>3</v></c><c r="B78" t="s"><v>4</v></c><c r="C78" s="1"><v>421</v></c><c r="D78" t="s"><v>56</v></c><c r="E78"><v>57</v></c></row></sheetData><conditionalFormatting sqref="C56:C1048576 C1:C54"><cfRule type="duplicateValues" dxfId="2" priority="48"/></conditionalFormatting><conditionalFormatting sqref="C1:C1048576"><cfRule type="duplicateValues" dxfId="1" priority="1"/></conditionalFormatting><conditionalFormatting sqref="C2:C54"><cfRule type="duplicateValues" dxfId="0" priority="359"/></conditionalFormatting><pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/></worksheet>

@MaxDesiatov
Copy link
Collaborator

I've added the complete snippet to the test suite here, it is parsed successfully with all cells. If you unzip the file, remove all cells except the offending row and zip it back again, is the error still reproducible? Or maybe you could open the file in Microsoft Excel or Apple Numbers, delete all rows except row 78 and save a new version, would that version cause any issues in CoreXLSX? I suspect there might be something going on with encoding or compression that's lost when testing isolated XML snippets, so I hope testing a complete XLSX file end-to-end on my side would help.

@nIgect
Copy link

nIgect commented Nov 29, 2018

Hi @MaxDesiatov , I have a similar problem
Error Domain=NSXMLParserErrorDomain Code=41 "(null)" UserInfo={NSXMLParserErrorColumn=16328, NSXMLParserErrorLineNumber=2, NSXMLParserErrorMessage=Specification mandates value for attribute s

@MaxDesiatov
Copy link
Collaborator

MaxDesiatov commented Nov 29, 2018

Hi @nIgect, could you please share the file that causes this issue? This would allow me to reproduce the problem on my side, find the cause and fix it. Thank you!

MaxDesiatov pushed a commit to CoreOffice/XMLCoder that referenced this issue Dec 30, 2018
As reported in CoreOffice/CoreXLSX#12 there are multiple issues where it's hard to understand why parsing of a certain XML has failed. `XMLCoder` should be able to report a snippet of a given XML of specified length around the position where the error has happened. This should be an property of type `UInt` named `errorContextLength` on `XMLDecoder` instance. An error thrown from XML parser that has line/column location should be repackaged having a context around that location of specified length.

For example, if an error was thrown indicating that there's an unexpected character at line 3, column 15 with `errorContextLength` set to `10`, a new error type should be rethrown containing 5 characters before column 15 and 5 characters after, all on line 3. Line wrapping should be handled correctly too as the context can span more than a few lines.

Also, currently there's a single use of `print` in the codebase for printing an error in `XMLStackParser.swift`. This has been removed.

* Add ErrorHandleTest
* Fix ErrorContextTest and formatting
@MaxDesiatov
Copy link
Collaborator

Hi folks, sorry for the delay. I hope you could try the latest 0.4.0 version and pass a sufficiently large value (10, maybe 20 for example) as errorContextLength argument to XLSXFile initializer. This would give more debug info and provide a snippet of XML that failed to parse, which would help us to debug the issue. Thanks!

@MaxDesiatov
Copy link
Collaborator

Hi everyone, if you've previously had issues, please try the latest 0.4.0 version or if those persist – the master branch and post the details here or create new issues. Otherwise if no updates are posted here I'll have to close this issue as outdated.

Thanks for understanding!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants