-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agent db: make rejecting ooo samples configurable #14094
base: main
Are you sure you want to change the base?
agent db: make rejecting ooo samples configurable #14094
Conversation
2f991c2
to
323b88f
Compare
Maybe I'm missing something, but isn't it already possible to disable OOO ingestion with the configuration file? From my understanding if |
Maybe I miss something, but I don't see how this applies to the agent db, which, as I understand, is just the TSDB WAL. For context, the use-case here is to avoid remote writing ooo samples (e.g. when the sink would reject them anyhow). |
When agents ingest out-of-order samples and and remote write them to external stores, it can be extremely expensive on the other end. We run a setup with prometheus agents writing to thanos receivers, and out of order samples generate severe amounts of load. |
Yeah, I'm reading the code now and it looks like the agent indeed doesn't take this into consideration. I think that creating yet another configuration option for out of order would be confusing though. What do you think about re-using the already existing |
Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>
Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>
Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>
Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>
26aca44
to
157ec2e
Compare
I think it makes sense and I changed it and added some test cases for edge cases. Is this what you had in mind? It might be confusing now that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It LGTM, just a few nitpicks
I'll try to ping maintainers to take a look at this as well
I'm also realizing that changing the default behavior now (always accept OOO -> have to configure window to accept) might be a breaking change. Since agent-mode is a feature flag and OOO is marked as experimental in our docs, I think the change should be fine, but let's see what others think |
Good point 😬, I don't remember if the configuration documentation is auto-generated... I'm 90% sure it's not. So it should be fine to just add a statement here clarifying that it is also applied for agent-mode? |
Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>
I addressed the nits and thx! |
cc @rfratto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
I think rejecting OOO doing remote write is a fair choice regardless or whether or not the backend can ingest it. But its even a more valid choice given today's context where OOO is still not enabled by default.
Left a small nit that I'd like to see fixed for consistency 👍
Nice work 💪
tsdb/agent/db.go
Outdated
|
||
// mintTs returns the minimum timestamp that a sample can have | ||
// and is needed for preventing underflow. | ||
func (a *appender) minTs(lastTs int64) int64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rename this method to minValidTime
or appendable()
to make it consistent with oither parts of the codebase
prometheus/tsdb/head_append.go
Lines 453 to 455 in d699dc3
func (s *memSeries) appendable(t int64, v float64, headMaxt, minValidTime, oooTimeWindow int64) (isOOO bool, oooDelta int64, err error) { | |
// Check if we can append in the in-order chunk. | |
if t >= minValidTime { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, renamed to minValidTime
.
Holding off merge for a @rfratto review since he maintains the agent 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're reusing an existing Prometheus config field, should we update docs for out_of_order_time_window
around https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tsdb to mention this dual use-case?
I added it to prometheus/docs/configuration/configuration.md Lines 3805 to 3807 in 017b43e
WDYT? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the next release process starts next week, I'd merge this even without @rfratto's approval 😬.
@rabenhorst, it seems like you forgot to sign your lastest commit. Could you fix that please? 🙂
Add
storage.agent.reject-out-of-order-samples
flag and make rejecting out-of-order samples in agent's db configurable. This allows disabling remote writing out-of-order samples from agent, which was introduced in #12897.