Skip to content

Commit

Permalink
add detectors for cloud functions
Browse files Browse the repository at this point in the history
  • Loading branch information
ndo77 committed May 28, 2024
1 parent 93dd417 commit efd4a18
Show file tree
Hide file tree
Showing 13 changed files with 307 additions and 0 deletions.
8 changes: 8 additions & 0 deletions docs/severity.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@
- [integration_azure-virtual-machine-scaleset](#integration_azure-virtual-machine-scaleset)
- [integration_azure-virtual-machine](#integration_azure-virtual-machine)
- [integration_gcp-bigquery](#integration_gcp-bigquery)
- [integration_gcp-cloud-functions](#integration_gcp-cloud-functions)
- [integration_gcp-cloud-sql-common](#integration_gcp-cloud-sql-common)
- [integration_gcp-cloud-sql-failover](#integration_gcp-cloud-sql-failover)
- [integration_gcp-cloud-sql-mysql](#integration_gcp-cloud-sql-mysql)
Expand Down Expand Up @@ -737,6 +738,13 @@
|GCP BigQuery uploaded bytes billed|X|X|-|-|-|


## integration_gcp-cloud-functions

|Detector|Critical|Major|Minor|Warning|Info|
|---|---|---|---|---|---|
|GCP Cloud Functions pending|-|-|X|X|-|


## integration_gcp-cloud-sql-common

|Detector|Critical|Major|Minor|Warning|Info|
Expand Down
122 changes: 122 additions & 0 deletions modules/integration_gcp-cloud-functions/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# GCP-CLOUD-FUNCTIONS SignalFx detectors

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
:link: **Contents**

- [How to use this module?](#how-to-use-this-module)
- [What are the available detectors in this module?](#what-are-the-available-detectors-in-this-module)
- [How to collect required metrics?](#how-to-collect-required-metrics)
- [Metrics](#metrics)
- [Notes](#notes)
- [Related documentation](#related-documentation)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->

## How to use this module?

This directory defines a [Terraform](https://www.terraform.io/)
[module](https://www.terraform.io/language/modules/syntax) you can use in your
existing [stack](https://github.com/claranet/terraform-signalfx-detectors/wiki/Getting-started#stack) by adding a
`module` configuration and setting its `source` parameter to URL of this folder:

```hcl
module "signalfx-detectors-integration-gcp-cloud-functions" {
source = "github.com/claranet/terraform-signalfx-detectors.git//modules/integration_gcp-cloud-functions?ref={revision}"
environment = var.environment
notifications = local.notifications
}
```

Note the following parameters:

* `source`: Use this parameter to specify the URL of the module. The double slash (`//`) is intentional and required.
Terraform uses it to specify subfolders within a Git repo (see [module
sources](https://www.terraform.io/language/modules/sources)). The `ref` parameter specifies a specific Git tag in
this repository. It is recommended to use the latest "pinned" version in place of `{revision}`. Avoid using a branch
like `master` except for testing purpose. Note that every modules in this repository are available on the Terraform
[registry](https://registry.terraform.io/modules/claranet/detectors/signalfx) and we recommend using it as source
instead of `git` which is more flexible but less future-proof.

* `environment`: Use this parameter to specify the
[environment](https://github.com/claranet/terraform-signalfx-detectors/wiki/Getting-started#environment) used by this
instance of the module.
Its value will be added to the `prefixes` list at the start of the [detector
name](https://github.com/claranet/terraform-signalfx-detectors/wiki/Templating#example).
In general, it will also be used in the `filtering` internal sub-module to [apply
filters](https://github.com/claranet/terraform-signalfx-detectors/wiki/Guidance#filtering) based on our default
[tagging convention](https://github.com/claranet/terraform-signalfx-detectors/wiki/Tagging-convention) by default.

* `notifications`: Use this parameter to define where alerts should be sent depending on their severity. It consists
of a Terraform [object](https://www.terraform.io/language/expressions/type-constraints#object) where each key represents an available
[detector rule severity](https://docs.splunk.com/observability/alerts-detectors-notifications/create-detectors-for-alerts.html#severity)
and its value is a list of recipients. Every recipients must respect the [detector notification
format](https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs/resources/detector#notification-format).
Check the [notification binding](https://github.com/claranet/terraform-signalfx-detectors/wiki/Notifications-binding)
documentation to understand the recommended role of each severity.

These 3 parameters along with all variables defined in [common-variables.tf](common-variables.tf) are common to all
[modules](../) in this repository. Other variables, specific to this module, are available in
[variables-gen.tf](variables-gen.tf).
In general, the default configuration "works" but all of these Terraform
[variables](https://www.terraform.io/language/values/variables) make it possible to
customize the detectors behavior to better fit your needs.

Most of them represent usual tips and rules detailed in the
[guidance](https://github.com/claranet/terraform-signalfx-detectors/wiki/Guidance) documentation and listed in the
common [variables](https://github.com/claranet/terraform-signalfx-detectors/wiki/Variables) dedicated documentation.

Feel free to explore the [wiki](https://github.com/claranet/terraform-signalfx-detectors/wiki) for more information about
general usage of this repository.

## What are the available detectors in this module?

This module creates the following SignalFx detectors which could contain one or multiple alerting rules:

|Detector|Critical|Major|Minor|Warning|Info|
|---|---|---|---|---|---|
|GCP Cloud Functions pending|-|-|X|X|-|

## How to collect required metrics?

This module deploys detectors using metrics reported by the
[GCP integration](https://docs.splunk.com/observability/en/gdi/get-data-in/connect/gcp/gcp-metrics.html) configurable
with [this Terraform module](https://github.com/claranet/terraform-signalfx-integrations/tree/master/cloud/gcp).


Check the [Related documentation](#related-documentation) section for more detailed and specific information about this module dependencies.



### Metrics


Here is the list of required metrics for detectors in this module.

* `function/execution_times`


## Notes

Retrieve the number of error invocation of the cloud functions.
```
module "signalfx-detectors-cloud-gcp-cloud-functions" {
source = "github.com/claranet/terraform-signalfx-detectors.git//cloud/gcp/cloud-functions"
environment = var.environment
notifications = [local.slack_notification]
# Given that the default policy exclude `-replica` we have to override id entirely
filtering_append = false
# We reuse `project_id` from the default policy but we change the read replica filter
filtering_custom = "filter('function_name', '${var.function_id}') and filter('environment', '${var.env}')"
}
```


## Related documentation

* [Terraform SignalFx provider](https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs)
* [Terraform SignalFx detector](https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs/resources/detector)
* [Splunk Observability integrations](https://docs.splunk.com/Observability/gdi/get-data-in/integrations.html)
* [Stackdriver metrics](https://cloud.google.com/monitoring/api/metrics_gcp#gcp-cloudfunctions)
1 change: 1 addition & 0 deletions modules/integration_gcp-cloud-functions/common-filters.tf
1 change: 1 addition & 0 deletions modules/integration_gcp-cloud-functions/common-locals.tf
1 change: 1 addition & 0 deletions modules/integration_gcp-cloud-functions/common-modules.tf
1 change: 1 addition & 0 deletions modules/integration_gcp-cloud-functions/common-versions.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
module: GCP Cloud Functions
name: pending

transformation: false
aggregation: true

signals:
signal:
metric: "function/execution_times"
filter: "not filter('status', 'ok')"
extrapolation: zero
rollup: sum
rules:
minor:
threshold: 10
comparator: ">"
lasting_duration: 5m
warning:
threshold: 20
comparator: ">"
lasting_duration: 5m
18 changes: 18 additions & 0 deletions modules/integration_gcp-cloud-functions/conf/readme.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
documentations:
- name: Stackdriver metrics
url: 'https://cloud.google.com/monitoring/api/metrics_gcp#gcp-cloudfunctions'

notes: |
Retrieve the number of error invocation of the cloud functions.
```
module "signalfx-detectors-cloud-gcp-cloud-functions" {
source = "github.com/claranet/terraform-signalfx-detectors.git//cloud/gcp/cloud-functions"
environment = var.environment
notifications = [local.slack_notification]
# Given that the default policy exclude `-replica` we have to override id entirely
filtering_append = false
# We reuse `project_id` from the default policy but we change the read replica filter
filtering_custom = "filter('function_name', '${var.function_id}') and filter('environment', '${var.env}')"
}
```
40 changes: 40 additions & 0 deletions modules/integration_gcp-cloud-functions/detectors-gen.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
resource "signalfx_detector" "pending" {
name = format("%s %s", local.detector_name_prefix, "GCP Cloud Functions pending")

authorized_writer_teams = var.authorized_writer_teams
teams = try(coalescelist(var.teams, var.authorized_writer_teams), null)
tags = compact(concat(local.common_tags, local.tags, var.extra_tags))

program_text = <<-EOF
signal = data('function/execution_times', filter=not filter('status', 'ok') and ${module.filtering.signalflow}, rollup='sum', extrapolation='zero')${var.pending_aggregation_function}.publish('signal')
detect(when(signal > ${var.pending_threshold_minor}%{if var.pending_lasting_duration_minor != null}, lasting='${var.pending_lasting_duration_minor}', at_least=${var.pending_at_least_percentage_minor}%{endif})).publish('MINOR')
detect(when(signal > ${var.pending_threshold_warning}%{if var.pending_lasting_duration_warning != null}, lasting='${var.pending_lasting_duration_warning}', at_least=${var.pending_at_least_percentage_warning}%{endif})).publish('WARN')
EOF

rule {
description = "is too high > ${var.pending_threshold_minor}"
severity = "Minor"
detect_label = "MINOR"
disabled = coalesce(var.pending_disabled_minor, var.pending_disabled, var.detectors_disabled)
notifications = try(coalescelist(lookup(var.pending_notifications, "minor", []), var.notifications.minor), null)
runbook_url = try(coalesce(var.pending_runbook_url, var.runbook_url), "")
tip = var.pending_tip
parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject
parameterized_body = var.message_body == "" ? local.rule_body : var.message_body
}

rule {
description = "is too high > ${var.pending_threshold_warning}"
severity = "Warning"
detect_label = "WARN"
disabled = coalesce(var.pending_disabled_warning, var.pending_disabled, var.detectors_disabled)
notifications = try(coalescelist(lookup(var.pending_notifications, "warning", []), var.notifications.warning), null)
runbook_url = try(coalesce(var.pending_runbook_url, var.runbook_url), "")
tip = var.pending_tip
parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject
parameterized_body = var.message_body == "" ? local.rule_body : var.message_body
}

max_delay = var.pending_max_delay
}

5 changes: 5 additions & 0 deletions modules/integration_gcp-cloud-functions/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
output "pending" {
description = "Detector resource for pending"
value = signalfx_detector.pending
}

4 changes: 4 additions & 0 deletions modules/integration_gcp-cloud-functions/tags.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
locals {
tags = ["integration", "gcp-cloud-functions"]
}

84 changes: 84 additions & 0 deletions modules/integration_gcp-cloud-functions/variables-gen.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# pending detector

variable "pending_notifications" {
description = "Notification recipients list per severity overridden for pending detector"
type = map(list(string))
default = {}
}

variable "pending_aggregation_function" {
description = "Aggregation function and group by for pending detector (i.e. \".mean(by=['host'])\")"
type = string
default = ""
}

variable "pending_max_delay" {
description = "Enforce max delay for pending detector (use \"0\" or \"null\" for \"Auto\")"
type = number
default = null
}

variable "pending_tip" {
description = "Suggested first course of action or any note useful for incident handling"
type = string
default = ""
}

variable "pending_runbook_url" {
description = "URL like SignalFx dashboard or wiki page which can help to troubleshoot the incident cause"
type = string
default = ""
}

variable "pending_disabled" {
description = "Disable all alerting rules for pending detector"
type = bool
default = null
}

variable "pending_disabled_minor" {
description = "Disable minor alerting rule for pending detector"
type = bool
default = null
}

variable "pending_disabled_warning" {
description = "Disable warning alerting rule for pending detector"
type = bool
default = null
}

variable "pending_threshold_minor" {
description = "Minor threshold for pending detector"
type = number
default = 10
}

variable "pending_lasting_duration_minor" {
description = "Minimum duration that conditions must be true before raising alert"
type = string
default = "5m"
}

variable "pending_at_least_percentage_minor" {
description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)"
type = number
default = 1
}
variable "pending_threshold_warning" {
description = "Warning threshold for pending detector"
type = number
default = 20
}

variable "pending_lasting_duration_warning" {
description = "Minimum duration that conditions must be true before raising alert"
type = string
default = "5m"
}

variable "pending_at_least_percentage_warning" {
description = "Percentage of lasting that conditions must be true before raising alert (>= 0.0 and <= 1.0)"
type = number
default = 1
}

0 comments on commit efd4a18

Please sign in to comment.