Skip to content

Conversation

@philwalk
Copy link
Contributor

This PR modifies generate-native-image.sh to generate scala-cli.exe with code page set to 65001 if running in Windows:

  • save current code page
  • set code page to 65001
  • generate scala-cli.exe
  • restore the code page atexit

Using a test script named /opt/ue/jsrc/løp.sc:

#!/usr/bin/env -S scala-cli shebang
printf("Hello %s\n", scriptPath)
printf("Hello %s\n", sys.props("scala.sources"))
printf("Hello %s\n", sys.props("scala.source.names"))

and with the PATH set to use the generated scala-cli.exe:

PATH="out/cli/3.3.7/base-image/nativeImage.dest:$PATH"

Manual verification that the bug and the fix depend on the code page set during GraalVM compile.

verify the bug

create a buggy scala-cli.exe by running generated-native-image.sh temporarily modified to set code page 437
Verify that the bug manifests itself when attempting to execute the script:

# /opt/ue/jsrc/løp.sc
[error]  File not found: C:\opt\ue\jsrc\løp.sc

Verify the fix

create fixed scala-cli.exe by running generated-native-image.sh as proposed in this PR (set code page 65001)
Verify that the bug is fixed:

# /opt/ue/jsrc/løp.sc
Hello /opt/ue/jsrc/løp.sc
Hello C:/opt/ue/jsrc/løp.sc
Hello løp.sc

@philwalk
Copy link
Contributor Author

The error messages for the failing tests are the result of having Windows code page 437 active.
I assume that (at least as of jdk17) we need to set the code page to 65001 in Windows before running this test.

The failure is that a script named TestÅÄÖåäö.sc with contents:

object TestÅÄÖåäö {
  def main(args: Array[String]): Unit = {
    println("Hello from TestÅÄÖåäö")
  }
}

failed with this error message:

[950] Error: Could not find or load main class TestÅÄÖåäö
[950] Caused by: java.lang.ClassNotFoundException: TestÅÄÖåäö
# ./mill integration.test.native 'scala.cli.integration.RunTestsDefault.UTF-8'
[949] Produced artifacts:
[949]  C:\Users\philwalk\workspace\scala-cli-3307\out\cli\3.3.7\base-image\nativeImage.dest\scala-cli.build_artifacts.txt (txt)
[949]  C:\Users\philwalk\workspace\scala-cli-3307\out\cli\3.3.7\base-image\nativeImage.dest\scala-cli.exe (executable)
[949] ========================================================================================================================
[949] Finished generating 'scala-cli' in 3m 49s.
[950/950] integration.test.native
[950] >==== Running 'UTF-8' from RunTestDefinitions
[950] Running warmup testà
[950] Compiling project (Scala 3.7.3, JVM (17))
[950] Compiled project (Scala 3.7.3, JVM (17))
[950] Done running warmup test.
[950] Compiling project (Scala 3.7.3, JVM (17))
[950] Compiled project (Scala 3.7.3, JVM (17))
[950] Error: Could not find or load main class TestÅÄÖåäö
[950] Caused by: java.lang.ClassNotFoundException: TestÅÄÖåäö
[950] X==== Finishing 'UTF-8' from RunTestDefinitions
[950] scala.cli.integration.RunTestsDefault:
[950] ==> X scala.cli.integration.RunTestsDefault.UTF-8  2.017s os.SubprocessException: Result of C:\Users\philwalk\workspace\scala-cli-3307\out\cli\3.3.7\base-image\nativeImage.dest\scala-cli.exeà: 1
[950]
[950]     at os.proc.call(ProcessOps.scala:232)
[950]     at scala.cli.integration.RunTestDefinitions.$anonfun$new$117(RunTestDefinitions.scala:1080)
[950]     at scala.cli.integration.RunTestDefinitions.$anonfun$new$117$adapted(RunTestDefinitions.scala:1072)
[950]     at scala.cli.integration.TestInputs.$anonfun$fromRoot$1(TestInputs.scala:35)
[950]     at scala.cli.integration.TestInputs$.scala$cli$integration$TestInputs$$withTmpDir(TestInputs.scala:95)
[950]     at scala.cli.integration.TestInputs.fromRoot(TestInputs.scala:33)
[950]     at scala.cli.integration.RunTestDefinitions.$anonfun$new$116(RunTestDefinitions.scala:1072)
[950]     at scala.cli.integration.WithWarmUpScalaCliSuite.$anonfun$test$1(WithWarmUpScalaCliSuite.scala:34)
[950]     at scala.cli.integration.WithWarmUpScalaCliSuite.$anonfun$test$2(WithWarmUpScalaCliSuite.scala:37)
[950/950, 1 failed] ============================== integration.test.native scala.cli.integration.RunTestsDefault.UTF-8 ============================== 265s
1 tasks failed
integration.test.native 1 tests failed:
  scala.cli.integration.RunTestsDefault.UTF-8 scala.cli.integration.RunTestsDefault.UTF-8
philwalk@d5 MSYS ~/workspace/scala-cli-3307

@Gedochao
Copy link
Contributor

I assume that (at least as of jdk17) we need to set the code page to 65001 in Windows before running this test.

Do you maybe know how to do it? Can't say I'm familiar with code pages in Windows, myself.
The fix looks good otherwise, we just need to get the tests fixed.

@Gedochao Gedochao linked an issue Oct 27, 2025 that may be closed by this pull request
@philwalk

This comment was marked as outdated.

@Gedochao
Copy link
Contributor

One possible fix might be to run this one test with jdk18 or later (not sure how to do that, if it's possible).

Depends what "running with JDK18" means here.
If it's the JDK used for user code and/or Bloop, then just check how we do it in other tests, for example here:

test(s"correct JVM is picked up by $launcherString when JAVA_HOME set to $index") {
TestUtil.retryOnCi() {
TestInputs(
os.rel / "check_java_home.sc" ->
s"""assert(
| System.getProperty("java.version").startsWith("$javaVersion") ||
| System.getProperty("java.version").startsWith("1.$javaVersion")
|)
|println(System.getProperty("java.home"))""".stripMargin
).fromRoot { root =>
val javaHome =
os.Path(
os.proc(TestUtil.cs, "java-home", "--jvm", index).call().out.trim(),
os.pwd
)
withLauncher(root) { launcher =>
val res = os.proc(launcher, "run", ".", extraOptions)
.call(cwd = root, env = Map("JAVA_HOME" -> javaHome.toString))
expect(res.out.trim().contains(javaHome.toString))
}
}
}
}

-Dfile.encoding=UTF-8 should be possible to pass directly to the scala-cli native launcher (before the sub-command):

scala-cli -Dfile.encoding=UTF-8 compile (...)

If, however, this would mean the scala-cli launcher would require JDK18, that's a tougher cookie... I don't think we'd be bumping the required Java for the time being (not before Scala 3 compiler does it, and the upcoming 3.9 LTS will require JDK 17)

@philwalk

This comment was marked as outdated.

@philwalk
Copy link
Contributor Author

It turns out that when an os.proc(...).call(...) is run from mill it inherits 2 system properties that cause scripts with non-ascii script names to fail. They can be set to UTF-8 in the jvm running the integration test, but the spawned scala-cli script reports these values:

sun.jnu.encoding = Cp1252
native.encoding = Cp1252

If I duplicate the test in a scala-cli script, the bad encoding values are present if the script is run by scala-cli version 1.9.1 so the script test also fails. But if I run the script with a new GraalVM scala-cli.exe (compiled after setting the code page to 65001) then these values are inherited (apparently from scala-cli.exe) and the test succeeds:

sun.jnu.encoding = UTF-8
native.encoding = UTF-8

The same is true of the launcher out/cli/3.3.7/standaloneLauncher.dest/launcher.bat as demonstrated by this bash script. The implication is that the launcher succeeds or fails depending on whether Windows code page is 65001 or not. This seems to imply that the reason these two tests pass or fail depends on whether mill is running code compiled with the modified code page.

# time ./mill integration.test.jvm 'scala.cli.integration.RunTestsDefault.UTF-8' 
# time ./mill integration.test.native 'scala.cli.integration.RunTestsDefault.UTF-8' 

Here's the demo shell script. Run it with the current release in your PATH and it fails, run it with scala-cli from this PR build and it succeeds.

#!/bin/bash

cat > "/tmp/testÅÄÖåäö" <<'EOF'
#!/usr/bin/env -S scala-cli shebang
//> using dep com.lihaoyi::os-lib:0.11.6
object TestÅÄÖåäö {
  import java.nio.charset.Charset
  def showEncoding(fileOpt: Option[os.Path] = None): Unit = {
    printf("======= jvm encoding configuration:\n")
    printf("%s\n", s"sun.jnu.encoding = ${System.getProperty("sun.jnu.encoding")}")
    printf("%s\n", s"file.encoding = ${System.getProperty("file.encoding")}")
    printf("%s\n", s"Charset.defaultCharset = ${Charset.defaultCharset()}")
    printf("%s\n", s"Class name = ${getClass.getName}")
    printf("fileName[%s]\n", sys.props("scala.source.names"))
    fileOpt.foreach { file =>
      printf("======= filename and file contents encoding:\n")
      printf("### [%s]\n", file)
      if (!os.exists(file)) {
        printf("########### not found: [%s]\n", file)
        printf("nio: [%s]\n", file.toNIO.toString)
      } else {
        val str = os.read(file)
        val nonAscii = str.replaceAll("[\\x00-\\x7F]", "")
        printf("####### str[%s]\nnonAscii[%s]\nnonAscii.getBytes.length[%d]\n", str, nonAscii, nonAscii.getBytes.length)
      }
    }
  }
  val filename = "testÅÄÖåäö.sc"
  val scriptPath = sys.props("scala.sources")
  def main(args: Array[String]): Unit = {
    showEncoding(Some(os.Path(scriptPath)))
    println(s"Hello from $filename")
    println(s"Hello from $scriptPath")
  }
}
EOF

FNAME=/tmp/testÅÄÖåäö.sc
scala-cli.exe run $FNAME # okay
echo "######################################" 1>&2
java -Xmx512m -Xms128m -jar 'out/cli/3.3.7/standaloneLauncher.dest/launcher.bat' $FNAME

Here's a self-contained scala-cli script that also demonstrates the problem:

#!/usr/bin/env -S scala-cli shebang

//> using dep com.lihaoyi::os-lib:0.11.6

object TestÅÄÖåäö {
  import java.nio.charset.Charset
  // with version 1.9.1 in the path, the `os.proc` call fails, but succeeds with the new version.
  var launcherName = "scala-cli.exe" 
  val filename = "testÅÄÖåäö.sc"
  def showEncoding(fileOpt: Option[os.Path] = None): Unit = {
    printf("======= jvm encoding configuration:\n")
    import scala.jdk.CollectionConverters.*
    import scala.sys.process.*
    System.err.printf("code-page: [%s]\n", ("chcp.com".!!).trim)
    System.err.printf("JAVA_TOOL_OPTIONS[%s]\n", System.getenv("JAVA_TOOL_OPTIONS"))
    System.err.printf("native.encoding = %s\n",  System.getProperty("native.encoding"))
    System.err.printf("sun.jnu.encoding = %s\n", System.getProperty("sun.jnu.encoding"))
    System.err.printf("file.encoding = %s\n",    System.getProperty("file.encoding"))
    System.err.printf("Charset.defaultCharset = %s\n", java.nio.charset.Charset.defaultCharset())
    System.err.printf("Class name = %s\n", this.getClass.getName)
    printf("fileName[%s]\n", sys.props("scala.source.names"))
  }
  def main(args: Array[String]): Unit = {
    showEncoding(Some(os.Path(scriptPath)))
    println(s"Hello from $filename")
    println(s"Hello from $scriptPath")
    printf("launcher: %s\n", launcher)
  
    val endOfCopy = "=== script cloning ends here ==="
    //
    val content = os.read(os.Path(scriptPath))
    val topPart = content.split("[\r\n]+").takeWhile(!_.contains(endOfCopy)).filter(!_.contains("launcher")).mkString("\n")+"\n  }\n}"

    val filepath = s"/tmp/$filename"
    os.write.over(os.Path(filepath), topPart)

    val extraArgs = Seq("--bloop-startup-timeout", "2min", "--bloop-bsp-timeout", "1min")

    printf("\nos.proc.call cloned copy: filepath [%s]\n", filepath)
    val res = os.proc(
      launcher,
      filepath
    )
      .call()
    val outstr = res.out.text(scala.io.Codec.UTF8).trim
    printf("outstr[%s]\n", outstr)
  }
  import scala.sys.process.*
  lazy val launcher = Seq("where.exe", launcherName).!!.trim.replace('\\', '/').split("[\r\n]+").toSeq.head
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cannot find the path specified with UTF-8 characters on Windows

2 participants