Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utf-8 output on Java >= 18 is not shown correctly in Console #530

Closed
chris21k opened this issue Mar 22, 2023 · 7 comments · Fixed by eclipse-jdt/eclipse.jdt.debug#246
Closed
Assignees
Labels
help wanted Extra attention is needed

Comments

@chris21k
Copy link

chris21k commented Mar 22, 2023

See https://bugs.eclipse.org/bugs/show_bug.cgi?id=579383.

simple utf-8 encoded Java program which produces utf-8 output with german umlauts

Eclipse 4.23 with Java 18 patch/P-Build from 03/21/22 with OpenJDK 18 under german Win11 and, thus, system default encoding CP1252/ISO-8859-1. Pereferences > General > Workspace > Textfile Encoding > Default (UTF-8) is set and the test project does inherit this setting, as Java 18 will only compile UTF-8 encoded files by default. So far so good.

However, a simple Java program which outputs german umlauts (see attachment for the full running example)
System.out.println("äöüß Test");
if started now via eclipse produces
���� Test
as output in the Console window. The encoding of the Java source file is correctly UTF-8. Thereby the runtime configuration has the inherited Common > Encoding > Default - inherited (UTF-8) set. Redirecting the output to a text file via setting an "Output File" in the run configuration produces the correct UTF-8 encoded result file containing
äöüß Test

Using Java 17 instead 18 in eclipse or using java from JDK18 on the command line do not show the problem. The former shows that there is no font problem. Also only switching from Common > Encoding > Default - inherited (UTF-8) to "Other ISO-8859-1" in the run configuration with the same Java 18 UTF-8 eclipse setup shows the correct output in Console. So from outside these two options seem to be exactly interchanged...

Chris

@chris21k chris21k changed the title encoding proeblem: utf-8 output of Java >= 18 not shown correctly in Console encoding problem: utf-8 output of Java >= 18 not shown correctly in Console Mar 22, 2023
@iloveeclipse iloveeclipse transferred this issue from eclipse-platform/eclipse.platform Mar 22, 2023
@iloveeclipse
Copy link
Member

See especially my comment

JEP-400 changed everything to use UTF-8 except the system in/out, and also it seem to "mostly" ignore -Dfile.encoding for system in/out streams.

So no -Dfile.encoding has same effect as -Dfile.encoding=ANY_NON_UTF-8, because anything except UTF-8 is ignored.

With the new "Use system encoding" option in launch config one could avoid this confusion: https://www.eclipse.org/eclipse/news/4.25/platform.php#debug-system-encoding.

But still, on Linux if LANG is set to C, Java 18 produces garbage.

The only solution for Linux (without modifying LANG) is to specify (-Dsun.stdout.encoding=UTF-8 JVM argument OR LANG=en_US.UTF-8 environment) AND set console encoding (-Dfile.encoding) to UTF-8.

On Windows one either need "Use system encoding" for console (unsets -Dfile.encoding), or both -Dsun.stdout.encoding=UTF-8 AND console encoding (-Dfile.encoding) to UTF-8.

No idea what happens on Mac.

A solution that would work for Linux/Windows is also only specifying both -Dsun.stdout.encoding=UTF-8 and -Dfile.encoding=UTF-8.

Contributions are welcome!

@iloveeclipse iloveeclipse added the help wanted Extra attention is needed label Mar 22, 2023
@iloveeclipse
Copy link
Member

@trancexpress : this is probably interesting for us in context of transition to Java 21. Not urgent however.

@iloveeclipse iloveeclipse changed the title encoding problem: utf-8 output of Java >= 18 not shown correctly in Console utf-8 output on Java >= 18 is not shown correctly in Console Mar 22, 2023
@trancexpress
Copy link
Contributor

@trancexpress : this is probably interesting for us in context of transition to Java 21. Not urgent however.

I thought we set UTF 8 in LANG?

But still, on Linux if LANG is set to C, Java 18 produces garbage.

@iloveeclipse
Copy link
Member

I meant probably :-) because we can't foresee now what will happen on RHEL 9 and Java 21 and whatever we will have in environment at that time.

@jukzi
Copy link
Contributor

jukzi commented May 23, 2023

I can reproduce on win10, jdk19. i will investigate

@jukzi jukzi self-assigned this May 23, 2023
@jukzi
Copy link
Contributor

jukzi commented May 23, 2023

note that JDK 19 (https://bugs.openjdk.org/browse/JDK-8283620) changed
-Dsun.stdout.encoding=UTF-8 to
-Dstdout.encoding=UTF-8
and also documented that new option in java.lang.System.out
I think we can skip jdk18 support and use the official -Dstdout.encoding.

There are two possibilities:
a) two distinct configurations for -Dfile.encoding and -Dstdout.encoding or
b) fill both with the same value configured in the existing launch option
image

Since i personally never needed to configure them independently i suggest to keep the UI as is and just fill -Dstdout.encoding with the selected encoding as well.
WDYT?

@jukzi
Copy link
Contributor

jukzi commented May 23, 2023

Also note that https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/lang/System.html#stdout.encoding
defines any other value then UTF-8 as undefined behavior, however it works on win with -Dstdout.encoding=windows-1252 as well.
An equivalent option is available for the error stream -Dstderr.encoding - which i would also set to the same value.
For testing purpose its good to use characters where windows-1252 and ISO-8859-1 don't match. For example the Euro sign '€' (https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html#compare)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants