Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow grafana to use Prometheus #1403

merged 4 commits into from
Jan 8, 2021


Copy link

@richardTowers richardTowers commented Jan 8, 2021

See also (which needs to be merged first).

This creates a new internal load balancer for Prometheus, and gives Grafana permission to access it. My previous attempt to allow Grafana to access Prometheus through its external load balancer (0ac9a0d) didn't work, so this PR does it "properly" (i.e. consistently with all the other applications).

To deploy this, we will need to apply three separate terraform deployments to three separate environments:

  • integration / infra-security-groups
  • integration / app-prometheus
  • integration / infra-public-services (changes in
  • staging / infra-security-groups
  • staging / app-prometheus
  • staging / infra-public-services
  • production / infra-security-groups
  • production / app-prometheus
  • production / infra-public-services

I love to apply terraform manually 9 times 馃槏

Terraform Plans in Integration


  + aws_security_group.prometheus_internal_elb
      id:                       <computed>
      arn:                      <computed>
      description:              "Prometheus Internal LB"
      egress.#:                 <computed>
      ingress.#:                <computed>
      name:                     "govuk_prometheus_internal_elb"
      owner_id:                 <computed>
      revoke_rules_on_delete:   "false"
      tags.%:                   "1"
      tags.Name:                "govuk_prometheus_internal_elb"
      vpc_id:                   "vpc-53cd2235"

  - aws_security_group_rule.prometheus-elb_ingress_grafana_https

  + aws_security_group_rule.prometheus-internal-elb_egress_prometheus_http
      id:                       <computed>
      from_port:                "80"
      protocol:                 "tcp"
      security_group_id:        "${}"
      self:                     "false"
      source_security_group_id: "sg-04b7ada0fe6b498c0"
      to_port:                  "80"
      type:                     "egress"

  + aws_security_group_rule.prometheus-internal-elb_ingress_grafana_https
      id:                       <computed>
      from_port:                "443"
      protocol:                 "tcp"
      security_group_id:        "${}"
      self:                     "false"
      source_security_group_id: "sg-e7a2909c"
      to_port:                  "443"
      type:                     "ingress"

  + aws_security_group_rule.prometheus-internal-elb_ingress_prometheus_http
      id:                       <computed>
      from_port:                "80"
      protocol:                 "tcp"
      security_group_id:        "sg-04b7ada0fe6b498c0"
      self:                     "false"
      source_security_group_id: "${}"
      to_port:                  "80"
      type:                     "ingress"

Plan: 4 to add, 0 to change, 1 to destroy.


(requires infra-security-groups to be applied before plan can be run)


Terraform will perform the following actions:

  ~ aws_lambda_function.aws_waf_log_trimmer
      filename:           "/var/lib/jenkins/workspace/Deploy_Terraform_GOVUK_AWS/terraform/projects/infra-public-services/../../lambda/WAFLogTrimmer/" => "/Users/richardtowers/govuk/govuk-aws/terraform/projects/infra-public-services/../../lambda/WAFLogTrimmer/"
      last_modified:      "2020-12-11T14:27:37.333+0000" => <computed>

  + aws_route53_record.prometheus_internal_service_names
      id:                 <computed>
      allow_overwrite:    <computed>
      fqdn:               <computed>
      name:               ""
      records.#:          "1"
      records.3840481468: ""
      ttl:                "300"
      type:               "CNAME"
      zone_id:            "Z15X3KXNVBQPDX"

Plan: 1 to add, 1 to change, 0 to destroy.

This reverts commit 7087386.

I'd tested this in integration, but there was a snowflaked change in
that environment which meant it worked there, but not in staging / prod.

We'll have to do this by giving prometheus an internal load balancer
Prometheus needs an internal load balancer, so internal services such as
Grafana can reach it.

Two new Security Group Rules allow the internal load balancer to make
requests to Prometheus, and allow Grafana to make requests to the load
@richardTowers richardTowers marked this pull request as ready for review January 8, 2021 12:28
Copy link

@sengi sengi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. (Security group rules make sense to me, healthcheck looks good.) Go for it - only affects the experimental Prometheus so pretty limited blast radius.

In case anyone's wondering, the diff about waf-logtrimmer is really a no-op, nothing to worry about (just a spurious diff because of the way we're inadvisedly deploying lambda code using TF).

@richardTowers richardTowers merged commit 8d0a663 into master Jan 8, 2021
@richardTowers richardTowers deleted the fix-grafana-prometheus branch January 8, 2021 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

Successfully merging this pull request may close these issues.

None yet

2 participants