-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests fail when locale is not UTF-8 #136
Comments
This is a locale error. It assumes you use |
sh-4.2# locale |
This will not work. We need to test for Can you disable this test and see whether the rest works? |
I can reproduce with
on
|
plexus-archiver
on ppc64le platform
These tests need to be viewed whether they work as intended:
|
I tried to set the locale using
|
Hi @sarveshtamba, this is very interesting issue. Thanks for reporting it. It seems that there is a bug in Plexus Archiver when working with files containing unicode characters when the system locale is not UTF-8. I've set my locale to When I clone Plexus Archiver the files are cloned: $ ls src/test/resources/miscUtf8/
aFileWithA#.html
'aPi'$'\303\261''ata.txt'
'an'$'\303\274''mlaut.txt'
''$'\342\202\254''uro.txt' Although $ mvn clean verify
$ ls target/output/unzip/utf8
aFileWithA#.html The generated zip file also includes a single file: $ unzip -l target/output/unzip/utf8-default.zip
Archive: target/output/unzip/utf8-default.zip
Length Date Time Name
--------- ---------- ----- ----
20 2020-04-17 10:06 aFileWithA#.html
--------- -------
20 1 file If I use When I change the locale to $ LC_ALL=en_US.UTF-8 mvn verify
$ ls target/output/unzip/utf8
aFileWithA#.html
'aPi'$'\303\261''ata.txt'
'an'$'\303\274''mlaut.txt'
''$'\342\202\254''uro.txt'
$ unzip -l target/output/unzip/utf8-default.zip
Archive: target/output/unzip/utf8-default.zip
Length Date Time Name
--------- ---------- ----- ----
31 2020-04-17 10:06 €uro.txt
20 2020-04-17 10:06 aFileWithA#.html
39 2020-04-17 10:06 anümlaut.txt
29 2020-04-17 10:06 aPiñata.txt
--------- -------
119 4 files p.s. I'm testing on Ubuntu, ext4 file system with locale set to |
The problem is Java cannot properly map bytes to characters when encoding is wrong. Unix filesystems are not charset aware. They simply store bytes, not codepoints. |
Thanks for the inputs @michael-o |
@michael-o thanks for the tip. It really looks like the character encoding is the problem. Still it looks like if Path and URI are used Java can work with such files as expected. The URI has the bytes properly escaped. As now Java 7 is required maybe we can look into those "new" APIs in order to better support use cases as the one reported here. @sarveshtamba thanks. I think I understood where the issue is, so no need to verify it. |
@plamentotev @michael-o thanks for the inputs. |
This is not an issue for plexus-archiver, it's how Java works, Java uses the locale from the operating system, if the OS is configured with a non-utf8 locale, then Java will use that, and not even the new Java 7 APIs will help here: Accented or extended UTF-8 characters cause "Malformed input or input contains unmappable characters" error.
Java 11 won't support setting So the only possible and real solution is to use the correct locale, at least So, as this is a Java thing, even trying to clean the project with |
|
Well, just to show a warning message that a utf-8 locale is required, it works, but just hides the real issue. |
Correct, but unfortunately I don't see a better portable way on POSIX-like systems. |
Trying to build
plexus-archiver
v3.7.0 and v4.2.2 on ppc64le platform, however facing the following test case error:-The text was updated successfully, but these errors were encountered: