Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of order sample from remote write #564

Closed
nishant-dash opened this issue Jan 24, 2024 · 3 comments
Closed

Out of order sample from remote write #564

nishant-dash opened this issue Jan 24, 2024 · 3 comments

Comments

@nishant-dash
Copy link

Bug Description

1 out of the 3 kubernetes controle plane units' grafana agent is not responsive and shows up as target down in prometheus.

Logs on both prometheus and grafana agent for this one problematic unit show signs of out of order sampling.

A cursory glance in the logs seems to suggest that the timestamps are ok and theres nothing wrong with ntp, suggesting the most likely source of issue to be with duplicate labels.

Some related topics:

To Reproduce

N/A

Environment

alertmanager       0.25.0   active      1  alertmanager-k8s       stable          96       
blackbox-exporter  0.24.0   active      1  blackbox-exporter-k8s  latest/edge      7       
catalogue                   active      1  catalogue-k8s          stable          33       
cos-configuration  3.5.0    active      1  cos-configuration-k8s  latest/stable   42       
grafana            9.2.1    active      1  grafana-k8s            stable          93       
loki               2.7.4    active      1  loki-k8s               stable         105       
prometheus         2.47.2   active      1  prometheus-k8s         stable         159       
traefik            2.10.4   active      1  traefik-k8s            stable         166 
kubernetes-control-plane            1.28.5        active      3  kubernetes-control-plane  1.29/beta    377  no       

kubernetes-control-plane:cos-agent                 grafana-agent-control:cos-agent                   cos_agent                      subordinate  

Relevant log output

nation=\"org.freedesktop.systemd1\" (uid=0 pid=1 comm=\"/lib/systemd/systemd --system --deserialize 52 \" label=\"unconfined\")"
Jan 23 13:04:03 juju-fdad14-0 grafana-agent.grafana-agent[342616]: ts=2024-01-23T13:04:03.978815932Z caller=collector.go:169 level=error integration=node_exporter msg="collector failed" name=processes duration_seconds=0.051447686 err="unable to retrieve limit number of threads: open /proc/sys/kernel/threads-max: permission denied"
Jan 23 13:04:20 juju-fdad14-0 grafana-agent.grafana-agent[342616]: ts=2024-01-23T13:04:20.068317073Z caller=dedupe.go:112 agent=prometheus instance=6f90f69dff6964c270f367fabe08620c component=remote level=error remote_name=6f90f6-1e0ab3 url=http://<REDACTED>/cos-prometheus-0/api/v1/write msg="non-recoverable error" count=500 exemplarCount=0 err="server returned HTTP status 400 Bad Request: out of bounds"
Jan 23 13:05:03 juju-fdad14-0 grafana-agent.grafana-agent[342616]: ts=2024-01-23T13:05:03.931641241Z caller=collector.go:169 level=error integration=node_exporter msg="collector failed" name=logind duration_seconds=0.003326897 err="unable to get seats: An AppArmor policy prevents this sender from sending this message to this recipient; type=\"method_call\", sender=\":1.23430\" (uid=0 pid=342616 comm=\"/snap/grafana-agent/16/agent -config.expand-env -c\" label=\"snap.grafana-agent.grafana-agent (enforce)\") interface=\"org.freedesktop.login1.Manager\" member=\"ListSeats\" error name=\"(unset)\" requested_reply=\"0\" destination=\"org.freedesktop.login1\" (uid=0 pid=1099 comm=\"/lib/systemd/systemd-logind \" label=\"unconfined\")"
Jan 23 13:05:03 juju-fdad14-0 grafana-agent.grafana-agent[342616]: ts=2024-01-23T13:05:03.933868492Z caller=collector.go:169 level=error integration=node_exporter msg="collector failed" name=systemd duration_seconds=0.005736097 err="couldn't get units: An AppArmor policy prevents this sender from sending this message to this recipient; type=\"method_call\", sender=\":1.23429\" (uid=0 pid=342616 comm=\"/snap/grafana-agent/16/agent -config.expand-env -c\" label=\"snap.grafana-agent.grafana-agent (enforce)\") interface=\"org.freedesktop.systemd1.Manager\" member=\"ListUnits\" error name=\"(unset)\" requested_reply=\"0\" destination=\"org.freedesktop.systemd1\" (uid=0 pid=1 comm=\"/lib/systemd/systemd --system --deserialize 52 \" label=\"unconfined\")"
Jan 23 13:05:03 juju-fdad14-0 grafana-agent.grafana-agent[342616]: ts=2024-01-23T13:05:03.978810771Z caller=collector.go:169 level=error integration=node_exporter msg="collector failed" name=processes duration_seconds=0.052494298 err="unable to retrieve limit number of threads: open /proc/sys/kernel/threads-max: permission denied"
Jan 23 13:06:03 juju-fdad14-0 grafana-agent.grafana-agent[342616]: ts=2024-01-23T13:06:03.929074718Z caller=collector.go:169 level=error integration=node_exporter msg="collector failed" name=logind duration_seconds=0.003438216 err="unable to get seats: An AppArmor policy prevents this sender from sending this message to this recipient; type=\"method_call\", sender=\":1.23445\" (uid=0 pid=342616 comm=\"/snap/grafana-agent/16/agent -config.expand-env -c\" label=\"snap.grafana-agent.grafana-agent (enforce)\") interface=\"org.freedesktop.login1.Manager\" member=\"ListSeats\" error name=\"(unset)\" requested_reply=\"0\" destination=\"org.freedesktop.login1\" (uid=0 pid=1099 comm=\"/lib/systemd/systemd-logind \" label=\"unconfined\")"
Jan 23 13:06:03 juju-fdad14-0 grafana-agent.grafana-agent[342616]: ts=2024-01-23T13:06:03.932572398Z caller=collector.go:169 level=error integration=node_exporter msg="collector failed" name=systemd duration_seconds=0.006726803 err="couldn't get units: An AppArmor policy prevents this sender from sending this message to this recipient; type=\"method_call\", sender=\":1.23446\" (uid=0 pid=342616 comm=\"/snap/grafana-agent/16/agent -config.expand-env -c\" label=\"snap.grafana-agent.grafana-agent (enforce)\") interface=\"org.freedesktop.systemd1.Manager\" member=\"ListUnits\" error name=\"(unset)\" requested_reply=\"0\" destination=\"org.freedesktop.systemd1\" (uid=0 pid=1 comm=\"/lib/systemd/systemd --system --deserialize 52 \" label=\"unconfined\")"
Jan 23 13:06:03 juju-fdad14-0 grafana-agent.grafana-agent[342616]: ts=2024-01-23T13:06:03.977970293Z caller=collector.go:169 level=error integration=node_exporter msg="collector failed" name=processes duration_seconds=0.050143604 err="unable to retrieve limit number of threads: open /proc/sys/kernel/threads-max: permission denied"
Jan 23 13:06:20 juju-fdad14-0 grafana-agent.grafana-agent[342616]: ts=2024-01-23T13:06:20.069994249Z caller=dedupe.go:112 agent=prometheus instance=6f90f69dff6964c270f367fabe08620c component=remote level=error remote_name=6f90f6-1e0ab3 url=http://<REDACTED>/cos-prometheus-0/api/v1/write msg="non-recoverable error" count=500 exemplarCount=0 err="server returned HTTP status 400 Bad Request: out of bounds"
...
2024-01-23T13:08:32.032Z [prometheus] ts=2024-01-23T13:08:32.032Z caller=write_handler.go:132 level=error component=web msg="Out of order sample from remote write" err="out of bounds" series="{__name__=\"container_start_time_seconds\", cluster=\"kubernetes-klstkonreqwlxbxqmivt28fj4w0gthlo\", container=\"calico-node\", instance=\"k8s-shared_6f6605ff-d721-47e5-8fb2-e21941fdad14_kubernetes-worker_kubernetes-worker/2\", job=\"kubelet\", juju_application=\"kubernetes-worker\", juju_model=\"k8s-shared\", juju_model_uuid=\"6f6605ff-d721-47e5-8fb2-e21941fdad14\", juju_unit=\"kubernetes-worker/2\", metrics_path=\"/metrics/resource\", namespace=\"kube-system\", node=\"juju-fdad14-5\", pod=\"calico-node-g2vns\"}" timestamp=1705939768833

and from grafana agent
Jan 23 13:06:20 juju-fdad14-0 grafana-agent.grafana-agent[342616]: ts=2024-01-23T13:06:20.069994249Z caller=dedupe.go:112 agent=prometheus instance=6f90f69dff6964c270f367fabe08620c component=remote level=error remote_name=6f90f6-1e0ab3 url=http://172.25.106.11/cos-prometheus-0/api/v1/write msg="non-recoverable error" count=500 exemplarCount=0 err="server returned HTTP status 400 Bad Request: out of bounds"

Additional context

No response

@simskij
Copy link
Member

simskij commented Jan 29, 2024

@dstathis
Copy link
Contributor

Can you give us steps to reproduce this? We don't understand how the charms are related.

@IbraAoad
Copy link
Contributor

IbraAoad commented Mar 7, 2024

We can't reproduce this, we'll close the issue for now, if this appear again please re-open the issue with the reproduction steps.

@IbraAoad IbraAoad closed this as completed Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants