Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Continue parsing even with illegal characters #15

Closed
jakeri opened this Issue · 4 comments

2 participants

@jakeri

I know this is not correct but I am trying to parse some sources that have illegal characters in the xml feed.
Would it be possible to let the parser continue parse without exception and warn instead?

com.fasterxml.aalto.WFCException: Illegal XML character ((CTRL-CHAR, code 11))
 at [row,col {unknown-source}]: [29273,485]
    at com.fasterxml.aalto.in.XmlScanner.reportInputProblem(XmlScanner.java:1335) ~[aalto-xml-0.9.8.jar:na]
    at com.fasterxml.aalto.in.XmlScanner.throwInvalidXmlChar(XmlScanner.java:1525) ~[aalto-xml-0.9.8.jar:na]
    at com.fasterxml.aalto.async.AsyncUtfScanner.skipCharacters(AsyncUtfScanner.java:755) ~[aalto-xml-0.9.8.jar:na]
    at com.fasterxml.aalto.async.AsyncByteScanner.nextFromTree(AsyncByteScanner.java:572) ~[aalto-xml-0.9.8.jar:na]
    at com.fasterxml.aalto.stax.StreamReaderImpl.next(StreamReaderImpl.java:720) ~[aalto-xml-0.9.8.jar:na]
    ...

and

com.fasterxml.aalto.WFCException: Illegal XML character ((CTRL-CHAR, code 22))
 at [row,col {unknown-source}]: [8982,3]
    at com.fasterxml.aalto.in.XmlScanner.reportInputProblem(XmlScanner.java:1335)
    at com.fasterxml.aalto.in.XmlScanner.throwInvalidXmlChar(XmlScanner.java:1525)
    at com.fasterxml.aalto.async.AsyncUtfScanner.skipCharacters(AsyncUtfScanner.java:755)
    at com.fasterxml.aalto.async.AsyncByteScanner.nextFromTree(AsyncByteScanner.java:572)
    at com.fasterxml.aalto.stax.StreamReaderImpl.next(StreamReaderImpl.java:720)
    ...
@cowtowncoder

I think what can be done is allow configurable handler for illegal characters (Woodstox does this) -- you can then define dummy handler that just returns char as is. This way one could also "fix" these, convert to spaces or such.

@jakeri

I did a small hack to be able to replace illegal characters. Not the most pretty solution. Not to familiar with the way properties are handled. jakeri@1efc473

@cowtowncoder
Owner

Your patch looks good -- could you send me a pull request, so I could have a closer look, and if all goes well, merge it?

@cowtowncoder
Owner

Never mind -- saw the pull request, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.