fix: source jvm.config from peon.sh for K8s peon containers#19364
Conversation
| * {@code /status/properties} on the peon pod. | ||
| */ | ||
| @Disabled("requires charts.datainfra.io chart, see https://github.com/apache/druid/pull/19047") | ||
| public class KubernetesPeonJvmConfigDockerTest extends BaseKubernetesTaskRunnerDockerTest |
There was a problem hiding this comment.
P2 Regression coverage is permanently disabled
The new regression test that verifies peon.sh sources jvm.config is annotated with @disabled, so the fix has no active automated coverage. Since this bug is specifically in container startup behavior, leaving the only regression test disabled means the same breakage can return without CI noticing. Please either make this test runnable in the existing embedded test setup or add a narrower active test around the script/config generation path.
There was a problem hiding this comment.
Just pushed something to cover this
|
I see a CI failure but it also seems to be failing on |
FrankChen021
left a comment
There was a problem hiding this comment.
Handled the follow-up on the disabled regression test. The newly added PeonShellJvmConfigTest provides active coverage for the peon.sh jvm.config sourcing behavior, so I have no further findings on this thread.
This is an automated review by Codex GPT-5
This is an automated review by Codex GPT-5
Fixes #18791.
Description
distribution/docker/peon.shnever sourced$SERVICE_CONF_DIR/jvm.config. As a result, JVM options declared in ajvm.configConfigMap, which is documented atdocs/development/extensions-core/k8s-jobs.mdunder the Custom Template Pod Adapter are silently ignored by peon containers.Peons started with whatever
$JAVA_OPTSthe container happened to inherit and then go back to container-aware JDK defaults (MaxRAMPercentage=25%,MaxDirectMemorySize=Xmx). This causesOutOfMemoryErrorwith a mysterious cause.The bug affects all three K8s pod adapters equally at the shell level.
peon.shis the entry point for every K8s-launched peon but is most impactful forcustomTemplateAdapterusers.K8sTaskAdapter(used byoverlordSingleContainer/overlordMultiContainer) additionally injects aJAVA_OPTSenv var fromdruid.indexer.runner.javaOptsArray, so its users had a working configuration path even withoutjvm.configsourcing.PodTemplateTaskAdapter#getEnvdoes not injectJAVA_OPTS, leavingcustomTemplateAdapterusers with only the documented-but-brokenjvm.configpath and the workaround of hardcodingJAVA_OPTSinto the pod template'senv:block.Fix: source jvm.config from peon.sh
Mirror the behaviour that
distribution/docker/druid.shalready has at line 188:The
-fguard is the only deviation fromdruid.sh. It preserves existing behaviour for clusters that don't mount ajvm.configat all, avoiding a newcat: No such file or directorystderr line on every peon startup.Options in
JAVA_OPTStake precedence over those injvm.configunder OpenJDK (same caveat asdruid.sh), so adapters that injectJAVA_OPTSviajavaOptsArraycontinue to win duplicates.Docs update
Added a bit of info to
docs/development/extensions-core/k8s-jobs.mdunder the Custom Template Pod Adapter section, making the mechanism explicit:peon.shsourcesjvm.configfrom the directory mounted asnodetype-config-volume, prepends it toJAVA_OPTS, andJAVA_OPTSwins for duplicates. The existing example ConfigMap with-Xmx1024M -XX:MaxDirectMemorySize=1000Mnow behaves as documented.Test added
Added
KubernetesPeonJvmConfigDockerTestunderembedded-tests/.../k8s/. It uses a new operator manifest variant that injects-Ddruid.test.peon.jvmconfig.marker=trueinto the cluster-leveljvm.options(written by the Druid operator into each node'sjvm.config), submits a long-runningNoopTask, port-forwards to the resulting peon pod via the fabric8 client, and asserts the marker appears in the peon JVM's/status/properties. The test is disabled though, like the existingKubernetesTaskRunnerDirectModeDockerTest/KubernetesTaskRunnerCachingModeDockerTestpending resolution of thecharts.datainfra.ioavailability issue (#19047).Small harness refactor to support this:
BaseKubernetesTaskRunnerDockerTestnow exposesgetManifestTemplate()as an overridable hook and promotes the localk3sClustervariable to a protected field so subclasses can reach into the cluster.K3sClusterResourcegains a publicgetKubernetesClient()getter returning the fabric8 client.Release note
Fixed: JVM options declared in a
jvm.configConfigMap are now honoured by peons launched via the Kubernetes overlord extension (including when using thecustomTemplateAdapter). Previously,peon.shsilently ignoredjvm.config, causing the peon JVM to fall back to container-aware JDK defaults.Key changed/added files in this PR
distribution/docker/peon.shdocs/development/extensions-core/k8s-jobs.mdembedded-tests/src/test/java/org/apache/druid/testing/embedded/k8s/KubernetesPeonJvmConfigDockerTest.java(new)embedded-tests/src/test/java/org/apache/druid/testing/embedded/k8s/BaseKubernetesTaskRunnerDockerTest.javaembedded-tests/src/test/java/org/apache/druid/testing/embedded/k8s/K3sClusterResource.javaembedded-tests/src/test/resources/manifests/druid-service-with-operator-peonjvmconfig.yaml(new)This PR has: