Skip to content

Default encoding is ANSI_X3.4-1968 instead of UTF-8 #739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jennyowen opened this issue Mar 17, 2023 · 5 comments
Closed

Default encoding is ANSI_X3.4-1968 instead of UTF-8 #739

jennyowen opened this issue Mar 17, 2023 · 5 comments
Labels
not a jvm bug question Further information is requested

Comments

@jennyowen
Copy link

jennyowen commented Mar 17, 2023

Please provide a brief summary of the bug

It seems as though in situations where no locale has been set (e.g. in docker containers) temurin java defaults to expect file encoding ANSI_X3.4-1968. However UTF-8 is the defacto standard, so I would expect that to be the default.

Please provide steps to reproduce where possible

Dockerfile:

FROM eclipse-temurin:17-jdk-jammy as jdk17
FROM ubuntu:22.04

ENV JAVA_HOME_17=/usr/local/openjdk17
ENV PATH=${JAVA_HOME_17}/bin:$PATH

COPY --from=jdk17 /opt/java/openjdk ${JAVA_HOME_17}

then to build and run...

docker build -t test - < Dockerfile
docker run -it --rm test java  -XshowSettings:properties -version

Expected Results

I expect the java properties to show file.encoding sun.stdout.encoding sun.stderr.encoding etc to be some form of UTF-8 encoding.
This is true if I run the java -XshowSettings:properties -version command in a temurin java 17 docker container, which is why I'm creating the issue here rather than on the docker source repository.

Actual Results

Property settings:
    file.encoding = ANSI_X3.4-1968
    file.separator = /
    java.class.path = 
    java.class.version = 61.0
    java.home = /usr/local/openjdk17
    java.io.tmpdir = /tmp
    java.library.path = /usr/java/packages/lib
        /usr/lib64
        /lib64
        /lib
        /usr/lib
    java.runtime.name = OpenJDK Runtime Environment
    java.runtime.version = 17.0.6+10
    java.specification.name = Java Platform API Specification
    java.specification.vendor = Oracle Corporation
    java.specification.version = 17
    java.vendor = Eclipse Adoptium
    java.vendor.url = https://adoptium.net/
    java.vendor.url.bug = https://github.com/adoptium/adoptium-support/issues
    java.vendor.version = Temurin-17.0.6+10
    java.version = 17.0.6
    java.version.date = 2023-01-17
    java.vm.compressedOopsMode = Zero based
    java.vm.info = mixed mode, sharing
    java.vm.name = OpenJDK 64-Bit Server VM
    java.vm.specification.name = Java Virtual Machine Specification
    java.vm.specification.vendor = Oracle Corporation
    java.vm.specification.version = 17
    java.vm.vendor = Eclipse Adoptium
    java.vm.version = 17.0.6+10
    jdk.debug = release
    line.separator = \n 
    native.encoding = ANSI_X3.4-1968
    os.arch = amd64
    os.name = Linux
    os.version = 5.14.0-1057-oem
    path.separator = :
    sun.arch.data.model = 64
    sun.boot.library.path = /usr/local/openjdk17/lib
    sun.cpu.endian = little
    sun.io.unicode.encoding = UnicodeLittle
    sun.java.launcher = SUN_STANDARD
    sun.jnu.encoding = ANSI_X3.4-1968
    sun.management.compiler = HotSpot 64-Bit Tiered Compilers
    sun.stderr.encoding = ANSI_X3.4-1968
    sun.stdout.encoding = ANSI_X3.4-1968
    user.country = US
    user.dir = /
    user.home = /root
    user.language = en
    user.name = root

openjdk version "17.0.6" 2023-01-17
OpenJDK Runtime Environment Temurin-17.0.6+10 (build 17.0.6+10)
OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (build 17.0.6+10, mixed mode, sharing)

What Java Version are you using?

openjdk version "17.0.6" 2023-01-17 OpenJDK Runtime Environment Temurin-17.0.6+10 (build 17.0.6+10) OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (build 17.0.6+10, mixed mode, sharing)

What is your operating system and platform?

Ubuntu 20.04 on amd64 architecture.

Docker version:

$ docker version
Client: Docker Engine - Community
 Cloud integration: v1.0.31
 Version:           23.0.1
 API version:       1.42
 Go version:        go1.19.5
 Git commit:        a5ee5b1
 Built:             Thu Feb  9 19:46:56 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          23.0.1
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.19.5
  Git commit:       bc3805a
  Built:            Thu Feb  9 19:46:56 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.18
  GitCommit:        2456e983eb9e37e47538f59ea18f2043c9a73640
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

How did you install Java?

in docker by copying from the official image (see reproduction steps)

Did it work before?

No response

Did you test with the latest update version?

No response

Did you test with other Java versions?

Same result if I use java 11 instead of java 17. I didn't try any other versions.

Relevant log output

No response

@jennyowen jennyowen added the bug Something isn't working label Mar 17, 2023
@jerboaa
Copy link

jerboaa commented Mar 17, 2023

See https://openjdk.org/jeps/400. So you might want to try JDK 19 (or soon 20). In particular this section from Motivation section of the JEP:

If a charset argument is not passed, then standard Java APIs typically use the default charset. The JDK chooses the default charset at startup based upon the run-time environment: the operating system, the user's locale, and other factors.

Setting environment LANG=C.utf8 should change what you are seeing.

@jerboaa jerboaa added question Further information is requested not a jvm bug and removed bug Something isn't working labels Mar 17, 2023
@jerboaa jerboaa closed this as completed Mar 17, 2023
@btbouwens
Copy link

See https://openjdk.org/jeps/400. So you might want to try JDK 19 (or soon 20). In particular this section from Motivation section of the JEP:

If a charset argument is not passed, then standard Java APIs typically use the default charset. The JDK chooses the default charset at startup based upon the run-time environment: the operating system, the user's locale, and other factors.

Setting environment LANG=C.utf8 should change what you are seeing.

Frankly, this is totally wrong

I know this is and old issue, but still I'm running into it again:

% LC_ALL=en_US.UTF-8 /usr/lib/jvm/java-11-openjdk/bin/java -XshowSettings:all 2>&1 | grep jnu.encod
sun.jnu.encoding = UTF-8
% LC_CTYPE=en_US.UTF-8 /usr/lib/jvm/java-11-openjdk/bin/java -XshowSettings:all 2>&1 | grep jnu.encod
sun.jnu.encoding = ANSI_X3.4-1968
% LC_PAPER=en_US.UTF-8 /usr/lib/jvm/java-11-openjdk/bin/java -XshowSettings:all 2>&1 | grep jnu.encod
sun.jnu.encoding = UTF-8
% LC_PAPER=a4.UTF-8 /usr/lib/jvm/java-11-openjdk/bin/java -XshowSettings:all 2>&1 | grep jnu.encod
sun.jnu.encoding = ANSI_X3.4-1968
% LANG=C.utf8 /usr/lib/jvm/java-11-openjdk/bin/java -XshowSettings:all 2>&1 | grep jnu.encod
sun.jnu.encoding = ANSI_X3.4-1968
% LANG=C.utf8 /usr/lib/jvm/java-17-openjdk/bin/java -XshowSettings:all 2>&1 | grep jnu.encod
sun.jnu.encoding = ANSI_X3.4-1968

Note that setting LC_ALL is bad practise, and setting LC_PAPER to en_US.UTF-8 is ridiculous too.

So this is supposedly fixed?

% LANG=C.utf8 /usr/lib/jvm/java-21-openjdk/bin/java -XshowSettings:all 2>&1 | grep jnu.encod
sun.jnu.encoding = ANSI_X3.4-1968

... nope!

@AlexanderSchuetz97
Copy link

AlexanderSchuetz97 commented Mar 28, 2025

This is still a problem. I have tried setting all the variables mentioned to en_US.UTF-8, C.utf8 and all variations thereof
sun.jnu.encoding remains ANSI_X3.4-1968 and everyone is mad that the application cannot open files with non ascii characters in the file name. (also using java 17)

@jerboaa
Copy link

jerboaa commented Mar 28, 2025

Before:

$ podman run --rm -ti localhost/ubuntu-24.04-jdk17-temurin:before java  -XshowSettings:properties -version | grep sun.jnu.encoding
    sun.jnu.encoding = ANSI_X3.4-1968

Fixed container file:

$ cat Dockerfile
FROM eclipse-temurin:17-jdk-jammy as jdk17
FROM ubuntu:24.04

ENV JAVA_HOME_17=/usr/local/openjdk17
ENV PATH=${JAVA_HOME_17}/bin:$PATH

ENV LANG=C.UTF-8
COPY --from=jdk17 /opt/java/openjdk ${JAVA_HOME_17}

Build it with:

podman build -t ubuntu-24.04-jdk17-temurin:after .

Check results:

podman run --rm -ti localhost/ubuntu-24.04-jdk17-temurin:after java  -XshowSettings:properties -version | grep sun.jnu.encoding
    sun.jnu.encoding = UTF-8

What's the difference (in container files)?

ENV LANG=C.UTF-8

@AlexanderSchuetz97
Copy link

I dont use adoptiums docker containers for this. The app I have issues with this is shipped as a archive which contains an adoptium jdk and start scripts. After trial and error I found out that it must be capital UTF-8 so "export LANG=C.UTF-8" is was trying with lower casr utf-8... I managed to get it to work.... finally...

The main problem I had was that this appeared to be platform specific. I.e. it worked on my machine but failed on someone elses linux box. Issues such as these always surface in production and that makes them nasty surprises. Anyhow I am sorted out now. Thanks for taking the time to respond to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not a jvm bug question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants