Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coroot UI fails to see one of postgres instances. #15

Open
Compunctus opened this issue Jan 9, 2023 · 4 comments
Open

Coroot UI fails to see one of postgres instances. #15

Compunctus opened this issue Jan 9, 2023 · 4 comments

Comments

@Compunctus
Copy link

Hi. Coroot doesn't show one of the replicas in my 3-server postgres setup as a postgres node. Also it fails to see replication roles, and doesn't recognize other deployments (gitlab/nextcloud) as postgres.

image

The relevant metrics are present in prometheus:
image

image

image

Could you please advise on how to handle this? I get the feeling that coroot doesn't like low-query situations or my labels.

@def
Copy link
Member

def commented Jan 10, 2023

Is psqlha-postgresql-ha-postgresql-2 displayed on the Instances tab?
Can you also check the Coroot's log for errors?

Replication roles are currently discovered only for clusters managed by the k8s operators for Postgres (Zalando, CrunchyData, Percona).

@Compunctus
Copy link
Author

Yep, it's displayed on instances tab. All three are green on the instances tab, now psqlha-postgresql-ha-postgresql-1 is not recognised as a postgres service.

I do see new errors in logs, namely:

couldn't find actual instance for "postgres", initial instance is "10.1.18.172:80" (map[])

and that ip (18.172) does belong to pod psqlha-postgresql-ha-postgresql-1

@def
Copy link
Member

def commented Jan 12, 2023

Please show the output of the following Prometheus queries:

  • container_net_tcp_listen_info{listen_addr="10.1.18.172:80"}
  • container_net_tcp_listen_info{container_id=~".*psqlha-postgresql-ha-postgresql-1.*"}

@Compunctus
Copy link
Author

Compunctus commented Jan 12, 2023

Both metrics were empty. I also noticed that they were simply wrong for most IP addresses - showing ports that could never exist in a pod. Restarted daemonset, now those metrics seem valid. Waiting for 10 minutes to let coroot settle.

No noticeable errors were found in node-agent logs prior to restarting the daemonset.

P.S. psqlha-postgresql-ha-postgresql-1 was deployed on a node which had it's network connectivity abruptly cut off recently. Then it was rebooted without restoring networking first. Could help in narrowing it down, i guess...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants