-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve configuration of native image compilation #455
Conversation
pkl-cli/pkl-cli.gradle.kts
Outdated
// disable automatic support for JVM CLI options (puts our main class in full control of argument parsing) | ||
,"-H:-ParseRuntimeOptions" | ||
// quick build mode: 40% faster compilation, 20% smaller (but presumably also slower) executable | ||
,"-Ob" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good addition! Let's use buildInfo.isReleaseBuild
to toggle this off for our actual releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Note that binary size keeps increasing (now at 120 MiB for amd64). I think it's worth trying to switch to --initialize-at-runtime
for most things other than the Truffle interpreter. I also tried switching to the G1 garbage collector, but this added another 20 MiB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to Truffle, we also want to initialize the standard library at build time, to cut down the overhead of Pkl execution.
There's certainly some tuning that we can do here, though.
pkl-cli/pkl-cli.gradle.kts
Outdated
// Tested on ZEN 2. | ||
// Feature list: CX8, CMOV, FXSR, MMX, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, POPCNT, | ||
// LZCNT, AVX, AVX2, AES, CLMUL, BMI1, BMI2, FMA, AMD_3DNOW_PREFETCH, ADX, FLUSHOPT | ||
, if (buildInfo.arch == "amd64") "-march=skylake" else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do? Does this mean native executables are not compatible with Intel CPUs older than 2015, and AMD CPUs older than 2017? And if so, what do we get from this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this raises the baseline for supported CPUs from "x64-v3" (default) to "skylake". In return, the native image compiler can use many additional CPU features/instructions, resulting in faster and more efficient assembly code.
I arrived at this by checking the output of -march=list
. "skylake" seemed to be a good middle ground between supporting a wide range of CPUs and leveraging many CPU features.
For a native Java app that runs on known (cloud) hardware, it would make sense to be more aggressive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much perf do we gain from targeting skylake? Can you enable the native CLI builds to CI so we can compare it to the default architecture? Or, if you're on an x86 machine, feel free to just run some tests locally and report the results. I'm on an M1 machine, so I can't really test this myself alas.
To run tests, you can ./gradlew pkl-core:testLinuxExecutableAmd64
assuming you're driving this through WSL.
These CLIs are libraries, and have a broad set of use-cases. For that reason, I'm thinking that we should probably be conservative here. We probably shouldn't just bump the version, but if there's a significant perf gain, we can consider adding this as an additional architecture.
CC @holzensp @stackoverflow for comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Dan that we should be conservative here, even though skylake is almost 9 years old. Specially without any performance data to compare. If the performance bump is noticeable we can create a new architecture release or just drop old CPUs in the new version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @bioball and @stackoverflow. We're also considering spending a version-cycle on GraalVM/Truffle/native-image updates and perf (including binary size). For now, let's be quite conservative.
f7b28a0
to
cb539d6
Compare
Does Pkl have a comprehensive benchmark suite? If you want proof, that’s probably the only way.
Your call. I feel that “skylake” is fairly conservative yet enables most CPU features. “x64-v3” is an arbitrary GraalVM default—we should figure out what’s right for Pkl. Also, this is easy to undo depending on feedback. |
Not yet. We have some very lightweight benchmark tests, but we don't have an environment (yet) that can run workloads with bare metal isolation. Until then, the best we can do is run benchmarks locally then compare. |
But what benchmarks do you run? |
There is an in-language stdlib module that we use to write one-off benchmarks when we want to test specific things when iterating locally. There's also the The suggestion of running |
Here's a quick benchmark that might be a good starting point for comparison. Save this to a file somewhere, then amends "pkl:Benchmark"
import "package://pkg.pkl-lang.org/pkl-pantry/org.json_schema.contrib@1.0.5#/internal/ModulesGenerator.pkl"
import "package://pkg.pkl-lang.org/pkl-pantry/org.json_schema@1.0.2#/Parser.pkl"
import "package://pkg.pkl-lang.org/pkl-pantry/pkl.experimental.uri@1.0.1#/URI.pkl"
local githubActionJson = read("https://json.schemastore.org/github-action.json")
microbenchmarks {
["json schema generator"] {
expression =
let (parsed = Parser.parse(githubActionJson))
new ModulesGenerator {
rootSchema = parsed
baseUri = URI.parse("file:///foo")!!
}
.modules
.map((it) -> it.moduleNode.output.text)
}
} |
.circleci/config.yml
Outdated
@@ -753,6 +753,9 @@ workflows: | |||
- gradle-check-jdk21: | |||
requires: | |||
- hold | |||
- pkl-cli-linux-alpine-amd64-snapshot: | |||
requires: | |||
- hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you change config.pkl
and forgot to commit, or someone else forgot to commit their changes to config.pkl
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. This commit belongs to a different PR, and I don't know how it ended up here. Removed.
pkl-cli/pkl-cli.gradle.kts
Outdated
// Tested on ZEN 2. | ||
// Feature list: CX8, CMOV, FXSR, MMX, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, POPCNT, | ||
// LZCNT, AVX, AVX2, AES, CLMUL, BMI1, BMI2, FMA, AMD_3DNOW_PREFETCH, ADX, FLUSHOPT | ||
, if (buildInfo.arch == "amd64") "-march=skylake" else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Dan that we should be conservative here, even though skylake is almost 9 years old. Specially without any performance data to compare. If the performance bump is noticeable we can create a new architecture release or just drop old CPUs in the new version.
cb539d6
to
561589c
Compare
On my laptop, the variance of this benchmark is too high to draw any conclusions about
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks for running that benchmark for us!
The one remaining request is to undo the change to --add-opens
here.
I’m still working on that. I’m not convinced that --add-opens is required. After all, it isn’t required when running on the JVM. |
Ready to review from my side. See commit messages for details. The "Improve BinaryEvaluatorSnippetTests(Engine)" refactoring isn't essential because I ultimately decided to add |
Look like you're right; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
I think let's keep the language snippet stuff the way it is right now (see comment). But the other changes here look good to me.
// replace line number with equivalent number of 'x' characters to keep formatting intact | ||
(result.groups[1]!!.value) + "x".repeat(result.groups[3]!!.value.length) + " |" | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this change improves our code; this moves some fields that are only relevant to the language snippet test engine (e.g. the package server, and also selection
, which is a quality of life thing when working on snippet tests for pkl-core.
I think let's keep it the way it was. Also, we might move the binary evaluator into pkl-core, which means the binary evaluator snippet tests will change then, making this abstraction for naught.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It fixes dozens of IntelliJ warnings and eliminates code duplication. PackageServer is defined in commons-test and only started if needed.
I agree that binary evaluator belongs into pkl-core. Ideally, every interaction with the evaluator, even within the same JVM, would use the binary protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just suppress those warnings for now, then; once we move it into pkl-core, this will all change again, and we'd probably end up removing this abstraction layer if it exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The refactoring makes binary evaluator snippet tests more similar to snippet tests in pkl-core and eliminates code duplication between the two. It also generates output files with suppress warnings comments. I don't understand why you want to discard it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The abstraction might create more work for us later on if we eliminate BinaryEvaluatorSnippetTests
later on if we move the binary evaluator into pkl-core.
Also, this would give us: LanguageSnippetTestsEngine
-> AbstractLanguageSnippetTestsEngine
-> AbstractSnippetTestsEngine
-> InputOutputTestEngine
-> HierarchicalTestEngine
. That's just a lot of complexity that adds cognitive burden when trying to understand our test code.
It also moves properties that either shouldn't be moved, or don't have any meaning for the binary evaluator snippet test (val selection: String = ""
is a property that exists for the purpose of working on the snippet tests in pkl-core).
To reduce the friction, I'd be willing to accept this PR if we moved these members into AbstractLanguageSnippetTestsEngine
:
packageServer
selection
includedTests
excludedTests
Path.getProjectDir()
val firstExisting: Path | ||
get() = existing.first() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val firstExisting: Path | |
get() = existing.first() | |
val firstExisting: Path | |
get() { | |
require(existing.isNotEmpty()) { | |
"Native executable not found on system. Create one with `./gradlew buildNative`." | |
} | |
return existing.first() | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a slightly modified version that's similar to existing code
client.close() | ||
server.destroy() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Great addition!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just hope that nativeTest.dependsOn(":pkl-cli:assembleNative")
is OK from a build cache perspective. (assembleNative
is a lifecycle task that doesn't declare outputs.)
- Solve msgpack issue with `--initialize-at-run-time`. - Use quick build mode for non-release builds: 40% faster compilation, 20% smaller executable. - Remove options that were commented out.
- Extract AbstractSnippetTestsEngine from AbstractLanguageSnippetTestsEngine and reuse it for BinaryEvaluatorSnippetTestEngine. - Rename BinaryEvaluatorSnippetTestEngine to BinaryEvaluatorSnippetTestsEngine. - Rename expected output files to end in ".yml.pkl" to satisfy expectation of AbstractSnippetTestsEngine. - Suppress bogus IntelliJ warnings in expected output files. - Add trailing newline to expected output files.
Motivation: - improve test coverage of server mode - verify that "--initialize-at-run-time=org.msgpack.core.buffer.DirectBufferAccess" works fine and opening JDK internals with "--add-opens" isn't required Changes: - split ServerTest into JvmServerTest and NativeServerTest - extract native executable paths to PklExecutablePaths
This reverts commit b4f4348.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just some minor comments.
.withPathSensitivity(PathSensitivity.RELATIVE) | ||
} | ||
|
||
private fun Test.configureExecutableTest(engineName: String) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why testWindowsExecutableAmd64
doesn't call this function, just mac and linux?
} | ||
|
||
private val transports: Pair<MessageTransport, MessageTransport> = run { | ||
if (USE_DIRECT_TRANSPORT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is always false
, why have an if
?
--initialize-at-run-time
.