Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MXParser error when parsing an UTF-8 file without BOM and ISO-8859-1 element encoding #242

Closed
belingueres opened this issue Mar 12, 2023 · 3 comments

Comments

@belingueres
Copy link
Contributor

Taken as a source this issue: mojohaus/build-helper-maven-plugin#168 a new regression has been found, when trying to parse a file encoded as UTF-8 with no BOM, and the element encoding is ISO-8859-1, throwing and exception with this message:

"UTF-8 BOM plus xml decl of ISO-8859-1 is incompatible ..."

belingueres added a commit to belingueres/plexus-utils that referenced this issue Mar 12, 2023
…us-plexus#242)

* Deleted most code handling encoding (leaving that job to the XmlReader
* Fixed tests exercising encoding checks. Unsupported tests were skipped
* Simplified test-encoding-ISO-8859-1.xml test file
belingueres added a commit to belingueres/plexus-utils that referenced this issue Mar 13, 2023
…us-plexus#242)

* Deleted most code handling encoding (leaving that job to the XmlReader
* Fixed tests exercising encoding checks. Unsupported tests were skipped
* Simplified test-encoding-ISO-8859-1.xml test file

Skipped even more tests that pass on Linux but fail on Windows.
@elharo
Copy link

elharo commented Apr 6, 2023

Specifically the message comes from

throw new XmlPullParserException( "UTF-8 BOM plus xml decl of " + inputEncoding + " is incompatible",

           if ( "UTF8".equals( fileEncoding ) && inputEncoding.toUpperCase().startsWith( "ISO-" ) )
            {
                throw new XmlPullParserException( "UTF-8 BOM plus xml decl of " + inputEncoding + " is incompatible",
                                                  this, null );
            }

The error message is incorrect. This exception can be thrown when parsing an 8859-1 XML file that does not have a BOM from a UTF-8 reader. See MPMD-369

@elharo
Copy link

elharo commented Apr 6, 2023

What should happen:

  1. Fix the error message.
  2. Deprecate the methods that read from a reader. Read from an input stream instead.

@belingueres
Copy link
Contributor Author

Fixed in PR #243

gnodet referenced this issue in codehaus-plexus/plexus-xml Apr 7, 2023
* Deleted most code handling encoding (leaving that job to the XmlReader
* Fixed tests exercising encoding checks. Unsupported tests were skipped
* Simplified test-encoding-ISO-8859-1.xml test file

Skipped even more tests that pass on Linux but fail on Windows.
@gnodet gnodet closed this as completed Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants