Jaeger Operator reverting Kafka custom resource to default/initial config #5278
mrocheleau
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there, hopefully this is accurate but all signs are pointing to it.
We have a Jaeger/Kafka/Strimzi setup in k8s, on EKS 1.28 right now. In the past our Kafka boxes ran out of space so I increased the PVC size by adjusting the
.spec.kafka.storage.volumes.size
line in thekafka
resource from 100Gi to 125Gi. I also added alog.retention.hours: "96"
config setting so override the default 168 hour retention.PVC's got resized automatically, Kafka pods restarted, came up with space available and log retention change caused some old logs to be pruned out, great, job done.
Then some time after this I notice the pvc is getting full again, check and my log retention setting is missing and the
kafka
topic shows 100Gi for the disk size again, plus thegeneration
increased by 1.No one is touching it I can guarantee, middle of the night it seems to happen. Anytime I adjust it it gets overwritten by the initial settings the next (early) morning.
In the Jaeger Operator pod logs I see a block right around the time of the revert with (trimmed down for brevity):
All within the same second so I hadn't assumed it was touching anything, but now I'm not so sure. The timing lines up with kafka pod restarts (which would occur if the
kafka
CR was updated).We're using Jaeger Operator
1.44.0
. Is there any suggestion for how to troubleshoot this? Unfortunately this is a setup I've inherited supporting and otherwise only have a basic idea of it's purpose, slowly piecing together how it works a bit though.My idea for a bandage if I can't find a way to stop the changes from happening, is to find where it's pulling these initial/default values from and update them to the intended values. This way when it does do this each night it will not actually change anything from how we want it.
The issue with all this is our logs grow too large sometimes, but also it's trying to revert the PVC resize which isn't possible anyways, but it sticks the
kafka
resource in a broken state then as it's trying to make a pending invalid change.Beta Was this translation helpful? Give feedback.
All reactions