This repository has been archived by the owner on Jan 11, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 38
Addresses Issue #27 #32
Merged
Merged
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
0115c8c
New recipe
032b43b
image
9d24de3
New recipe
8e6f672
adding arch diagram
393376c
Create amp-alertmanager-terraform.md
awsvikram 7135029
Update amp-alertmanager-terraform.md
awsvikram d111cca
Merge branch 'aws-observability:main' into main
awsvikram 2773b58
Issue 27
fdbd9af
issue 27
2870a65
Merge branch 'main' of https://github.com/saaish/aws-o11y-recipes
324c357
Update amp-alertmanager-terraform.md
awsvikram 15af3a2
Update amp-alertmanager-terraform.md
awsvikram 048c658
Update amp-alertmanager-terraform.md
awsvikram ddfc939
Create main.tf
awsvikram 23bb437
Update amp-alertmanager-terraform.md
awsvikram c153785
Update amp.md
awsvikram da73916
Update amp-alertmanager-terraform.md
awsvikram 9bdef82
Update amp-alertmanager-terraform.md
awsvikram 5252b06
Update amp-alertmanager-terraform.md
awsvikram daeeb29
Update amp-alertmanager-terraform.md
awsvikram cb64e16
Update main.tf
awsvikram 64aeb8d
Update amp-alertmanager-terraform.md
awsvikram File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
# Terraform as Infrastructure as a Code to deploy Amazon Managed Service for Prometheus and configure Alert manager | ||
|
||
In this recipe, we will demonstrate how you can use [Terraform](https://www.terraform.io/) to provision [Amazon Managed Service for Prometheus](https://aws.amazon.com/prometheus/) and configure rules management and alert manager to send notification to a [SNS](https://docs.aws.amazon.com/sns/) topic if a certain condition is met. | ||
|
||
|
||
!!! note | ||
This guide will take approximately 30 minutes to complete. | ||
|
||
## Prerequisites | ||
|
||
You will need the following to complete the setup: | ||
|
||
* [Amazon EKS cluster](https://docs.aws.amazon.com/eks/latest/userguide/create-cluster.html) | ||
* [AWS CLI version 2](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) | ||
* [Terraform CLI](https://www.terraform.io/downloads) | ||
* [AWS Distro for OpenTelemetry(ADOT)](https://aws-otel.github.io/) | ||
* [eksctl](https://eksctl.io/) | ||
* [kubectl](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html) | ||
* [jq](https://stedolan.github.io/jq/download/) | ||
* [helm](https://helm.sh/) | ||
* [SNS topic](https://docs.aws.amazon.com/sns/latest/dg/sns-create-topic.html) | ||
* [awscurl](https://github.com/okigan/awscurl) | ||
|
||
In the recipe, we will use a sample application in order to demonstrate the metric scraping using ADOT and remote write the metrics to the Amazon Managed Service for Prometheus workspace. Fork and clone the sample app from the repository at [aws-otel-community](https://github.com/aws-observability/aws-otel-community). | ||
|
||
This Prometheus sample app generates all 4 Prometheus metric types (counter, gauge, histogram, summary) and exposes them at the /metrics endpoint | ||
|
||
A health check endpoint also exists at / | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
The following is a list of optional command line flags for configuration: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Turn into a proper list, turn flags into code (for example |
||
|
||
listen_address: (default = 0.0.0.0:8080) defines the address and port that the sample app is exposed to. This is primarily to conform with the test framework requirements. | ||
|
||
metric_count: (default=1) the amount of each type of metric to generate. The same amount of metrics is always generated per metric type. | ||
|
||
label_count: (default=1) the amount of labels per metric to generate. | ||
|
||
|
||
datapoint_count: (default=1) the number of data-points per metric to generate. | ||
|
||
### Enabling Metric collection using AWS Distro for Opentelemetry | ||
1. Fork and clone the sample app from the repository at aws-otel-community. | ||
Then run the following commands. | ||
|
||
``` | ||
cd ./sample-apps/prometheus | ||
docker build . -t prometheus-sample-app:latest | ||
``` | ||
2. Push this image to a registry such as Amazon ECR. You can use the following command to create a new ECR repository in your account. Make sure to set <YOUR_REGION> as well. | ||
|
||
``` | ||
aws ecr create-repository \ | ||
--repository-name prometheus-sample-app \ | ||
--image-scanning-configuration scanOnPush=true \ | ||
--region <YOUR_REGION> | ||
``` | ||
3. Deploy the sample app in the cluster by copying this Kubernetes configuration and applying it. Change the image to the image that you just pushed by replacing `PUBLIC_SAMPLE_APP_IMAGE` in the prometheus-sample-app.yaml file. | ||
|
||
``` | ||
curl https://raw.githubusercontent.com/aws-observability/aws-otel-collector/main/examples/eks/aws-prometheus/prometheus-sample-app.yaml -o prometheus-sample-app.yaml | ||
kubectl apply -f prometheus-sample-app.yaml | ||
``` | ||
4. Start a default instance of the ADOT Collector. To do so, first enter the following command to pull the Kubernetes configuration for ADOT Collector. | ||
|
||
``` | ||
curl https://raw.githubusercontent.com/aws-observability/aws-otel-collector/main/examples/eks/aws-prometheus/prometheus-daemonset.yaml -o prometheus-daemonset.yaml | ||
``` | ||
Then edit the template file, substituting the remote_write endpoint for your Amazon Managed Service for Prometheus workspace for `YOUR_ENDPOINT` and your Region for `YOUR_REGION`. | ||
Use the remote_write endpoint that is displayed in the Amazon Managed Service for Prometheus console when you look at your workspace details. | ||
You'll also need to change `YOUR_ACCOUNT_ID` in the service account section of the Kubernetes configuration to your AWS account ID. | ||
|
||
In this recipe, the ADOT Collector configuration uses an annotation `(scrape=true)` to tell which target endpoints to scrape. This allows the ADOT Collector to distinguish the sample app endpoint from kube-system endpoints in your cluster. You can remove this from the re-label configurations if you want to scrape a different sample app. | ||
5. Enter the following command to deploy the ADOT collector. | ||
``` | ||
kubectl apply -f eks-prometheus-daemonset.yaml | ||
``` | ||
|
||
### Configure workspace with Terraform | ||
|
||
Now, we will provision a Amazon Managed Service for Prometheus workspace and will define an alerting rule that causes the Alert Manager to send a notification if a certain condition (defined in ```expr```) holds true for a specified time period (```for```). Code in the Terraform language is stored in plain text files with the .tf file extension. There is also a JSON-based variant of the language that is named with the .tf.json file extension. | ||
|
||
We will now use the [main.tf](./amp-alertmanager-terraform/main.tf) to deploy the resources using terraform. Before running the terraform command, we will export the `region` and `sns_topic` variable. | ||
|
||
``` | ||
export TF_VAR_region=<your region> | ||
export TF_VAR_sns_topic=<ARN of the SNS topic used by the SNS receiver> | ||
``` | ||
|
||
Now, we will execute the below commands to provision the workspace: | ||
|
||
``` | ||
terraform init | ||
terraform plan | ||
terraform apply | ||
``` | ||
|
||
Once the above steps are complete, verify the setup end-to-end by using awscurl and query the endpoint. Ensure the `WORKSPACE_ID` variable is replaced with the appropriate Amazon Managed Service for Prometheus workspace id. | ||
|
||
On running the below command, look for the metric “metric:recording_rule”, and, if you successfully find the metric, then you’ve successfully created a recording rule: | ||
|
||
``` | ||
awscurl https://aps-workspaces.us-east-1.amazonaws.com/workspaces/$WORKSPACE_ID/api/v1/rules --service="aps" | ||
``` | ||
Sample Output: | ||
``` | ||
"status":"success","data":{"groups":[{"name":"alert-test","file":"rules","rules":[{"state":"firing","name":"metric:alerting_rule","query":"rate(adot_test_counter0[5m]) \u003e 5","duration":0,"labels":{},"annotations":{},"alerts":[{"labels":{"alertname":"metric:alerting_rule"},"annotations":{},"state":"firing","activeAt":"2021-09-16T13:20:35.9664022Z","value":"6.96890019778219e+01"}],"health":"ok","lastError":"","type":"alerting","lastEvaluation":"2021-09-16T18:41:35.967122005Z","evaluationTime":0.018121408}],"interval":60,"lastEvaluation":"2021-09-16T18:41:35.967104769Z","evaluationTime":0.018142997},{"name":"test","file":"rules","rules":[{"name":"metric:recording_rule","query":"rate(adot_test_counter0[5m])","labels":{},"health":"ok","lastError":"","type":"recording","lastEvaluation":"2021-09-16T18:40:44.650001548Z","evaluationTime":0.018381387}],"interval":60,"lastEvaluation":"2021-09-16T18:40:44.649986468Z","evaluationTime":0.018400463}]},"errorType":"","error":""} | ||
``` | ||
|
||
We can further query the alertmanager endpoint to confirm the same | ||
``` | ||
awscurl https://aps-workspaces.us-east-1.amazonaws.com/workspaces/$WORKSPACE_ID/alertmanager/api/v2/alerts --service="aps" -H "Content-Type: application/json" | ||
``` | ||
Sample Output: | ||
``` | ||
[{"annotations":{},"endsAt":"2021-09-16T18:48:35.966Z","fingerprint":"114212a24ca97549","receivers":[{"name":"default"}],"startsAt":"2021-09-16T13:20:35.966Z","status":{"inhibitedBy":[],"silencedBy":[],"state":"active"},"updatedAt":"2021-09-16T18:44:35.984Z","generatorURL":"/graph?g0.expr=sum%28rate%28envoy_http_downstream_rq_time_bucket%5B1m%5D%29%29+%3E+5\u0026g0.tab=1","labels":{"alertname":"metric:alerting_rule"}}] | ||
``` | ||
This confirms the alert was triggered and sent to SNS via the SNS receiver | ||
|
||
## Clean up | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove this section, it's a dupe (see below) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cleared the section |
||
|
||
Run the following command to terminate the Amazon Managed Service for Prometheus workspace. Make sure you delete the EKS Cluster that was created as well: | ||
|
||
|
||
``` | ||
terraform destroy | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
provider "aws" { | ||
profile = "default" | ||
region = us-east-1 | ||
} | ||
variable "region" { | ||
} | ||
variable "sns_topic" { | ||
} | ||
resource "aws_prometheus_workspace" "amp-terraform-ws" { | ||
alias = "amp-terraform-ws" | ||
} | ||
|
||
resource "aws_prometheus_rule_group_namespace" "amp-terraform-ws" { | ||
name = "rules" | ||
workspace_id = aws_prometheus_workspace.amp-terraform-ws.id | ||
data = <<EOF | ||
groups: | ||
- name: test | ||
rules: | ||
- record: metric:recording_rule | ||
expr: rate(adot_test_counter0[5m]) | ||
- name: alert-test | ||
rules: | ||
- alert: metric:alerting_rule | ||
expr: rate(adot_test_counter0[5m]) > 0.014 | ||
for: 5m | ||
EOF | ||
} | ||
|
||
resource "aws_prometheus_alert_manager_definition" "amp-terraform-ws" { | ||
workspace_id = aws_prometheus_workspace.amp-terraform-ws.id | ||
definition = <<EOF | ||
alertmanager_config: | | ||
route: | ||
receiver: 'default' | ||
receivers: | ||
- name: 'default' | ||
sns_configs: | ||
- topic_arn: ${var.sns_topic} | ||
sigv4: | ||
region: ${var.region} | ||
attributes: | ||
key: severity | ||
value: SEV2 | ||
EOF | ||
} |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 -> four and
/metrics