Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: encoding error when parsing a ISO-8859-1 pom with Maven Resolver #163

Closed
tmortagne opened this issue Aug 9, 2021 · 11 comments
Closed
Labels

Comments

@tmortagne
Copy link

tmortagne commented Aug 9, 2021

Maven Resolver (at least the 3.6.0 one) model processor uses ReaderFactory#newXmlReader to create a XmlStreamReader and then pass it to MXParser.

Problem is that since 1e18ddc MXParser assumes the input encoding is UTF8 and only changes that if the input reader is a InputStreamReader (which is not the case here since it's a XmlStreamReader). So we end up with exception:

UTF-8 BOM plus xml decl of ISO-8859-1 is incompatible

This whole check is not really needed here since XmlStreamReader is taking care of potential encoding issues already from what I understand.

@michael-o
Copy link
Member

michael-o commented Aug 9, 2021

Will take a look. Is your POM public?

@tmortagne
Copy link
Author

tmortagne commented Aug 9, 2021

Yes, in my case the problem was with https://repo1.maven.org/maven2/org/apache/commons/commons-parent/39/commons-parent-39.pom (when resolving the parents of commons-collections:commons-collections:3.2.2 to build its Model).

@michael-o
Copy link
Member

michael-o commented Aug 9, 2021

@belingueres Can you have a look at this?

@tmortagne
Copy link
Author

tmortagne commented Aug 9, 2021

I feel like MXParser#fileEncoding should be null by default instead of the arbitrary "UTF8" and the check should be done only if it's not null.

@michael-o
Copy link
Member

michael-o commented Aug 9, 2021

I feel like MXParser#fileEncoding should be null by default instead of the arbitrary "UTF8" and the check should be done only if it's not null.

So it should autodiscover the encoding itself?

@tmortagne
Copy link
Author

tmortagne commented Aug 9, 2021

So it should autodiscover the encoding itself?

No, just not compare the encoding with the declaration when it does not know what encoding the reader uses, same as before this change.

@michael-o
Copy link
Member

michael-o commented Aug 9, 2021

So it should autodiscover the encoding itself?

No, just not compare the encoding with the declaration when it does not know what encoding the reader uses, same as before this change.

Makes sense.

@belingueres
Copy link
Contributor

belingueres commented Aug 9, 2021

I'll take a look at it.

belingueres added a commit to belingueres/plexus-utils that referenced this issue Aug 9, 2021
…ISO-8859-1 xml

- Fixed code.
- Added tests.
@haster
Copy link

haster commented Aug 26, 2021

We just ran into the same issue and after a bit of searching found the bug ... only to then see this issue that perfectly captures what's going on 😄

Thanks for raising it @tmortagne and for the quick fix @belingueres , can we expect a patch soon?

@olamy
Copy link
Member

olamy commented Aug 26, 2021

closes by belingueres@f999bdb

@olamy olamy closed this as completed Aug 26, 2021
@olamy
Copy link
Member

olamy commented Aug 26, 2021

releasing today

@olamy olamy added the bug label Aug 26, 2021
olamy pushed a commit that referenced this issue Aug 26, 2021
belingueres added a commit to belingueres/plexus-utils that referenced this issue Apr 5, 2022
* codehaus-plexus#163 - new case:  Don't assume UTF8 as default, to allow parsing from String.
* codehaus-plexus#194 - Incorrect getText() after parsing the DOCDECL section.
belingueres added a commit to belingueres/plexus-utils that referenced this issue Apr 6, 2022
* codehaus-plexus#163 - new case:  Don't assume UTF8 as default, to allow parsing from String.
* codehaus-plexus#194 - Incorrect getText() after parsing the DOCDECL section.
belingueres added a commit to belingueres/plexus-utils that referenced this issue Apr 14, 2022
* codehaus-plexus#163 - new case:  Don't assume UTF8 as default, to allow parsing from String.
* codehaus-plexus#194 - Incorrect getText() after parsing the DOCDECL section.
* Added tests exercising other regressions exposed while fixing this issues.
belingueres added a commit to belingueres/plexus-utils that referenced this issue Apr 15, 2022
* codehaus-plexus#163 - new case:  Don't assume UTF8 as default, to allow parsing from String.
* codehaus-plexus#194 - Incorrect getText() after parsing the DOCDECL section.
* Added tests exercising other regressions exposed while fixing this issues.
michael-o pushed a commit that referenced this issue Apr 16, 2022
* #163 - new case:  Don't assume UTF8 as default, to allow parsing from String.
* #194 - Incorrect getText() after parsing the DOCDECL section.
* Added tests exercising other regressions exposed while fixing this issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants