Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datadog CA cert not available #245

Closed
slapcat opened this issue Apr 26, 2024 · 4 comments
Closed

Datadog CA cert not available #245

slapcat opened this issue Apr 26, 2024 · 4 comments

Comments

@slapcat
Copy link

slapcat commented Apr 26, 2024

Bug Description

When using the datadog receiver, it returns an error about unrecognized certificate authority:

2024-04-26T08:06:37.170Z [alertmanager] ts=2024-04-26T08:06:37.170Z caller=notify.go:732 level=warn component=dispatcher receiver=datadog integration=webhook[0] msg="Notify attempt failed, will retry later" attempts=1 err="Post \"https://app.datadoghq.eu/intake/webhook/prometheus?api_key=<API_KEY>\": x509: certificate signed by unknown authority"

To Reproduce

  1. juju deploy cos-lite --trust
  2. Create config file:
cat > /home/ubuntu/alertmanager.yml <<EOF
receivers:
- name: datadog
  webhook_configs:
  - send_resolved: true
    url: https://app.datadoghq.eu/intake/webhook/prometheus?api_key=<API_KEY>
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 5m
  receiver: datadog
  repeat_interval: 3h
EOF
  1. juju config alertmanager config_file="@/home/ubuntu/alertmanager.yml"
  2. kubectl logs -n cos alertmanager-0 -c alertmanager

Environment

Model  Controller           Cloud/Region                      Version  SLA          Timestamp

cos    microk8s-controller  snapped-microk8s_cloud/localhost  3.3.4    unsupported  12:26:11Z

App                              Version  Status  Scale  Charm                         Channel        Rev  Address         Exposed  Message

alertmanager                     0.25.0   active      1  alertmanager-k8s              latest/stable   96  10.152.183.148  no       

ca                                        active      1  self-signed-certificates      latest/edge     80  10.152.183.230  no       

catalogue                                 active      1  catalogue-k8s                 latest/edge     34  10.152.183.20   no       

external-ca                               active      1  self-signed-certificates      latest/edge     80  10.152.183.74   no       

grafana                          9.5.3    active      1  grafana-k8s                   latest/stable  105  10.152.183.226  no       

loki                             2.9.4    active      1  loki-k8s                      latest/edge    123  10.152.183.232  no       

prometheus                       2.49.1   active      1  prometheus-k8s                latest/stable  170  10.152.183.37   no       

scrape-interval-config-metrics   n/a      active      1  prometheus-scrape-config-k8s  latest/stable   44  10.152.183.251  no       

scrape-interval-config-monitors  n/a      active      1  prometheus-scrape-config-k8s  latest/stable   44  10.152.183.146  no       

traefik                          2.10.5   active      1  traefik-k8s                   latest/stable  170  10.4.26.228     no       

Unit                                Workload  Agent      Address      Ports  Message

alertmanager/0*                     active    idle       10.1.35.180         

ca/0*                               active    idle       10.1.151.77         

catalogue/0*                        active    idle       10.1.35.179         

external-ca/0*                      active    idle       10.1.151.80         

grafana/0*                          active    idle       10.1.35.183         

loki/0*                             active    idle       10.1.151.82         

prometheus/0*                       active    executing  10.1.151.84         

scrape-interval-config-metrics/0*   active    idle       10.1.35.181         

scrape-interval-config-monitors/0*  active    idle       10.1.35.182         

traefik/0*                          active    idle       10.1.35.184         

Offer                            Application                      Charm                         Rev  Connected  Endpoint                  Interface                Role

alertmanager                     alertmanager                     alertmanager-k8s              96   0/0        karma-dashboard           karma_dashboard          provider

grafana                          grafana                          grafana-k8s                   105  6/6        grafana-dashboard         grafana_dashboard        requirer

loki                             loki                             loki-k8s                      123  5/5        logging                   loki_push_api            provider

prometheus                       prometheus                       prometheus-k8s                170  6/6        metrics-endpoint          prometheus_scrape        requirer

                                                                                                                receive-remote-write      prometheus_remote_write  provider

scrape-interval-config-metrics   scrape-interval-config-metrics   prometheus-scrape-config-k8s  44   1/1        configurable-scrape-jobs  prometheus_scrape        requirer

scrape-interval-config-monitors  scrape-interval-config-monitors  prometheus-scrape-config-k8s  44   1/1        configurable-scrape-jobs  prometheus_scrape        requirer

Relevant log output

2024-04-26T08:06:37.170Z [alertmanager] ts=2024-04-26T08:06:37.170Z caller=notify.go:732 level=warn component=dispatcher receiver=datadog integration=webhook[0] msg="Notify attempt failed, will retry later" attempts=1 err="Post \"https://app.datadoghq.eu/intake/webhook/prometheus?api_key=<API_KEY>\": x509: certificate signed by unknown authority"

Additional context

No response

@sed-i
Copy link
Contributor

sed-i commented Apr 26, 2024

At first glance this is odd, because the alertmanager rock has root certs.

After installing curl in the alertmanager workload container, curl https://charmhub.io (an https) works fine without --insecure.

Also, both of the following pass verification too from within the workload container:

echo | openssl s_client -strict -verify_return_error -connect charmhub.io:443 || echo "failed"
echo | openssl s_client -strict -verify_return_error -connect app.datadoghq.eu:443 || echo "failed"

According to user accounts (1, 2), alertmanager should be able to talk over TLS.

@slapcat, would you be able to confirm that:

  1. The image in use indeed has certs in place?
$ juju ssh --container alertmanager am/0 ls -1 /etc/ssl/certs/ | wc -l
275
  1. Cert validation works from within the workload container?
$ juju ssh --container alertmanager am/0 bash -c "echo | openssl s_client -strict -verify_return_error -connect app.datadoghq.eu:443" | grep -i verif
verify return:1
verify return:1
verify return:1
Verification: OK
Verify return code: 0 (ok)
  1. Which revision of alertmanager is in use? juju status --format=json | jq '.applications.am."charm-rev"'

@sed-i
Copy link
Contributor

sed-i commented Apr 26, 2024

Ok from you env I see alertmanager 0.25, charm-rev 96.

@slapcat would you be able to try with a newer revision? The current stable is rev106 and should include the certs fix.
@lucabello will soon start the charm promotion train so there should be an even newer stable soon.

@sed-i
Copy link
Contributor

sed-i commented May 1, 2024

Closing for now. Feel free to reopen if this shows up in rev106 or newer!

@sed-i sed-i closed this as completed May 1, 2024
@slapcat
Copy link
Author

slapcat commented May 1, 2024

That fixed it, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants