Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XMLParser fixes (SR-13546, SR-2301) #2920

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

dhoepfl
Copy link
Contributor

@dhoepfl dhoepfl commented Nov 10, 2020

XMLParser on linux was not reentrance safe.

While fixing this, I found that XMLParser also never calls several other delegate methods. (see SR-2301)

Still unfixed: Internal and unparsed external entities might not work as XML expects them to work. Unfortunately I could not find any reference (asides from the XML spec) how entities are actually expected to work.

macOS’s NSXMLParser does not seem to support them at all.

       These whitespaces were reported as normal character content.
       foundElementDeclaration delegate calls were not set.
One case of reentrance is reading an external entity.
This was not working so this change implements that, too.

The CFXMLInterface callback's parameters mostly missed nullability info.
This lead to unexpected crashes in optimised builds even when the parameter
was correctly checked for nil.
saxHandler->processingInstruction = (processingInstructionSAXFunc)__CFSwiftXMLParserBridge.processingInstruction;
saxHandler->comment = (commentSAXFunc)__CFSwiftXMLParserBridge.comment;
// saxHandler->warning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be able to clean up these commented lines or add more context around them please? When the final code is reviewed without any context of this PR, it won't be clear why they are commented out and what the implications of that are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do so.

I sorted the fields as they are in _xmlSAXHandler and added all unused fields as comments to make it more obvious what is (not) used.
Do you think it is better to remove the unused fields or to add a short explanation above the block with “currently not used by CFXMLInterface” at these fields?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@millenomi WDYT? What kind of clarification in code comments would you prefer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a minor update that adds some comments.

Comment makes clear why some libxml callbacks are commented out.
@spevans
Copy link
Collaborator

spevans commented Jan 18, 2021

@swift-ci test

3 similar comments
@dhoepfl
Copy link
Contributor Author

dhoepfl commented Jan 18, 2021

@swift-ci test

@spevans
Copy link
Collaborator

spevans commented Jan 18, 2021

@swift-ci test

@spevans
Copy link
Collaborator

spevans commented Jan 19, 2021

@swift-ci test

@dhoepfl
Copy link
Contributor Author

dhoepfl commented Jan 20, 2021

Can you trigger it once again?

@MaxDesiatov
Copy link
Member

@swift-ci please test

@dhoepfl
Copy link
Contributor Author

dhoepfl commented Feb 18, 2021

I’m about to give up on this. 😟

@MaxDesiatov
Copy link
Member

@swift-ci please test

@dhoepfl
Copy link
Contributor Author

dhoepfl commented Feb 19, 2021

Does Build #2526 count for macOS?

If so, both platforms have a successful build now.

@MaxDesiatov
Copy link
Member

@swift-ci please test

@dhoepfl
Copy link
Contributor Author

dhoepfl commented Feb 22, 2021

“Swift Test macOS Platform” finished a build yesterday, another try?

@MaxDesiatov
Copy link
Member

@swift-ci please test macOS platform

@spevans
Copy link
Collaborator

spevans commented Feb 24, 2021

I ran the two testcases on Big Sur usingDarwinCompatibilityTests and they both failed.
test_sr13546_stackedParsers failed with NSXMLParser does not support reentrant parsing. (NSInternalConsistencyException) so it looks like Darwin Foundation explicitly blocks reentrancy
(The equivalent on Linux would actually be to call fatalError())

test_entities() also had test failures with the expected entities, the Darwin version doesnt emit the .foundIgnoreableWhitespace event and some of the declarations are missing entities. I suspect this could be due to different versions of libxml2 on Darwin v Linux - I have seen different events emitted in other tests when comparing Darwin to Linux.

I think for the first test re-entrancy will need to be blocked to match Darwin. I dont think there is much that can be done with the 2nd one and probably underlies libxml2 differences. It might be worth at some point building swift-corelibs-foundation against the version that Darwin uses (https://opensource.apple.com/source/libxml2/libxml2-34.9/) and seeing if it fixes any of these minor bugs (and in other tests that fail on Darwin as well)

@dhoepfl
Copy link
Contributor Author

dhoepfl commented Mar 1, 2021

You are right, the implementation does not mimic Apple/Darwin’s implementation in all aspects.

I do not see how XMLParser could support (external) entities without being re-entrance-tolerant. Maybe that’s the reason Apple’s implementation does not support entities?

I consider entity handling in NSXMLParser broken:

let xml = """
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE root [
<!ENTITY internalEntity "Internal">
<!ELEMENT root ANY >
]>
<root><foo>&internalEntity;</foo></root>
"""

class MyXMLParserDelegate: NSObject, XMLParserDelegate {
   func parser(_ parser: XMLParser, foundCharacters string: String) {
      print("characters: \(string)")
   }
}

let xmlData = xml.data(using: .utf8)!
let parser = XMLParser(data: xmlData)
let del = MyXMLParserDelegate()
parser.delegate = del
_ = parser.parse()

Never calls func parser(_ parser: XMLParser, foundCharacters string: String) on macOS, silently loosing the content of foo. The pull-version correctly prints:

characters: Internal

The difference in .foundIgnoreableWhitespace is easy to explain: If you do not set the ignorableWhitespace-callback for SAX, all spaces are reported as normal characters (see parser.c:2831). That’s what NSXMLParser seems to do. libxml isn’t good in regards to ignorable whitespaces, it currently uses a heuristic to detect them instead of checking the schema (see parser.c:2863).

One could argue if calling parser(_:foundIgnorableWhitespace:) for some ignorable whitespaces (based on an heuristic!) is a good idea. On the other hand, it being called is documented in the API contract.

What do you think about the following aproach: If the delegate implements parser(_:foundIgnorableWhitespace:), we call it (as libxml suggests), hoping for libxml getting better one day. If the delegate does not implement the callback, we handle it as is, forwarding the call to parser(_:foundCharacters:) for all whitespaces.

@dhoepfl
Copy link
Contributor Author

dhoepfl commented Oct 24, 2021

Any change this gets merged?

Regarding the conflict: Am I supposed to keep the pull request up to date? How is this typically handled?
Resolving the conflict is easy: Accepting both changes will probably work (WASI should be before SR-13546).

That WASI-changeset seems rather unfinished:

  • It always fails on init?(contentsOf url: URL).
  • It completely removes init(stream: InputStream) (which is part of the public API) when WASI is defined.

@MaxDesiatov
Copy link
Member

The WASI API is partial on purpose, this platform doesn't support streams, multithreading, and file access.

As for merging, I assume this needs an approval from @millenomi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants