Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework smart agent elasticsearch #542

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 17 additions & 17 deletions docs/severity.md
Original file line number Diff line number Diff line change
Expand Up @@ -975,25 +975,25 @@

|Detector|Critical|Major|Minor|Warning|Info|
|---|---|---|---|---|---|
|ElasticSearch heartbeat|X|-|-|-|-|
|Elasticsearch heartbeat|X|-|-|-|-|
|ElasticSearch cluster status|X|X|-|-|-|
|ElasticSearch cluster initializing shards|X|X|-|-|-|
|ElasticSearch cluster relocating shards|X|X|-|-|-|
|ElasticSearch Cluster unassigned shards|X|X|-|-|-|
|ElasticSearch Pending tasks|X|X|-|-|-|
|Elasticsearch CPU usage|X|X|-|-|-|
|Elasticsearch file descriptors usage|X|X|-|-|-|
|Elasticsearch JVM heap memory usage|X|X|-|-|-|
|Elasticsearch JVM memory young usage|-|X|X|-|-|
|Elasticsearch JVM memory old usage|-|X|X|-|-|
|Elasticsearch old-generation garbage collections latency|-|X|X|-|-|
|Elasticsearch young-generation garbage collections latency|-|X|X|-|-|
|Elasticsearch indexing latency|-|X|X|-|-|
|Elasticsearch index flushing to disk latency|-|X|X|-|-|
|Elasticsearch search query latency|-|X|X|-|-|
|Elasticsearch search fetch latency|-|X|X|-|-|
|Elasticsearch fielddata cache evictions rate of change|-|X|X|-|-|
|Elasticsearch max time spent by task in queue rate of change|-|X|X|-|-|
|ElasticSearch cluster relocating shards|X|-|-|-|-|
|ElasticSearch cluster unassigned shards|X|-|-|-|-|
|ElasticSearch pending tasks|X|X|-|-|-|
|ElasticSearch cpu usage|X|X|-|-|-|
|ElasticSearch file descriptors|X|X|-|-|-|
|ElasticSearch jvm heap memory usage|X|X|-|-|-|
|ElasticSearch jvm memory young usage|-|X|X|-|-|
|ElasticSearch jvm memory old usage|-|X|X|-|-|
|ElasticSearch jvm gc old collection latency|-|X|X|-|-|
|ElasticSearch jvm gc young collection latency|-|X|X|-|-|
|ElasticSearch indexing latency|-|X|X|-|-|
|ElasticSearch flush latency|-|X|X|-|-|
|ElasticSearch search latency|-|X|X|-|-|
|ElasticSearch fetch latency|-|X|X|-|-|
|ElasticSearch field_data evictions change|-|X|X|-|-|
|ElasticSearch task time in queue change|-|X|X|-|-|


## smart-agent_genericjmx
Expand Down
36 changes: 18 additions & 18 deletions modules/smart-agent_elasticsearch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ Note the following parameters:

These 3 parameters along with all variables defined in [common-variables.tf](common-variables.tf) are common to all
[modules](../) in this repository. Other variables, specific to this module, are available in
[variables.tf](variables.tf).
[variables-gen.tf](variables-gen.tf).
In general, the default configuration "works" but all of these Terraform
[variables](https://www.terraform.io/language/values/variables) make it possible to
customize the detectors behavior to better fit your needs.
Expand All @@ -77,25 +77,25 @@ This module creates the following SignalFx detectors which could contain one or

|Detector|Critical|Major|Minor|Warning|Info|
|---|---|---|---|---|---|
|ElasticSearch heartbeat|X|-|-|-|-|
|Elasticsearch heartbeat|X|-|-|-|-|
|ElasticSearch cluster status|X|X|-|-|-|
|ElasticSearch cluster initializing shards|X|X|-|-|-|
|ElasticSearch cluster relocating shards|X|X|-|-|-|
|ElasticSearch Cluster unassigned shards|X|X|-|-|-|
|ElasticSearch Pending tasks|X|X|-|-|-|
|Elasticsearch CPU usage|X|X|-|-|-|
|Elasticsearch file descriptors usage|X|X|-|-|-|
|Elasticsearch JVM heap memory usage|X|X|-|-|-|
|Elasticsearch JVM memory young usage|-|X|X|-|-|
|Elasticsearch JVM memory old usage|-|X|X|-|-|
|Elasticsearch old-generation garbage collections latency|-|X|X|-|-|
|Elasticsearch young-generation garbage collections latency|-|X|X|-|-|
|Elasticsearch indexing latency|-|X|X|-|-|
|Elasticsearch index flushing to disk latency|-|X|X|-|-|
|Elasticsearch search query latency|-|X|X|-|-|
|Elasticsearch search fetch latency|-|X|X|-|-|
|Elasticsearch fielddata cache evictions rate of change|-|X|X|-|-|
|Elasticsearch max time spent by task in queue rate of change|-|X|X|-|-|
|ElasticSearch cluster relocating shards|X|-|-|-|-|
|ElasticSearch cluster unassigned shards|X|-|-|-|-|
|ElasticSearch pending tasks|X|X|-|-|-|
|ElasticSearch cpu usage|X|X|-|-|-|
|ElasticSearch file descriptors|X|X|-|-|-|
|ElasticSearch jvm heap memory usage|X|X|-|-|-|
|ElasticSearch jvm memory young usage|-|X|X|-|-|
|ElasticSearch jvm memory old usage|-|X|X|-|-|
|ElasticSearch jvm gc old collection latency|-|X|X|-|-|
|ElasticSearch jvm gc young collection latency|-|X|X|-|-|
|ElasticSearch indexing latency|-|X|X|-|-|
|ElasticSearch flush latency|-|X|X|-|-|
|ElasticSearch search latency|-|X|X|-|-|
|ElasticSearch fetch latency|-|X|X|-|-|
|ElasticSearch field_data evictions change|-|X|X|-|-|
|ElasticSearch task time in queue change|-|X|X|-|-|

## How to collect required metrics?

Expand Down
13 changes: 13 additions & 0 deletions modules/smart-agent_elasticsearch/conf/00-heartbeat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
module: Elasticsearch
name: heartbeat

transformation: false
aggregation: ".mean(by=['cluster'])"
exclude_not_running_vm: true
filtering: "filter('plugin', 'elasticsearch')"

signals:
signal:
metric: elasticsearch.cluster.number-of-nodes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use elasticsearch.process.cpu.percent because current metric is not present if clusterHealthStatsMasterOnly: true

rules:
critical:
18 changes: 18 additions & 0 deletions modules/smart-agent_elasticsearch/conf/01-cluster-status.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
module: ElasticSearch
name: "cluster status"
aggregation: ".max(by=['cluster'])"
filtering: "filter('plugin', 'elasticsearch')"
signals:
signal:
metric: "elasticsearch.cluster.status"
rules:
critical:
threshold: 2
comparator: ">="
description: "is red"
lasting_duration: '5m'
major:
threshold: 1
comparator: ">="
description: "is yellow"
lasting_duration: '5m'
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
module: ElasticSearch
name: "cluster initializing shards"
aggregation: ".max(by=['cluster'])"
filtering: "filter('plugin', 'elasticsearch')"
signals:
signal:
metric: "elasticsearch.cluster.initializing-shards"
rollup: average
rules:
critical:
threshold: 1
comparator: ">="
description: "is too high"
lasting_duration: '15m'
major:
threshold: 0
comparator: ">"
dependency: critical
description: "is too high"
lasting_duration: '15m'
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
module: ElasticSearch
name: "cluster relocating shards"
aggregation: ".max(by=['cluster'])"
filtering: "filter('plugin', 'elasticsearch')"
signals:
signal:
metric: "elasticsearch.cluster.relocating-shards"
rollup: average
rules:
critical:
threshold: 0
comparator: ">"
description: "is too high"
lasting_duration: '15m'
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
module: ElasticSearch
name: "cluster unassigned shards"
aggregation: ".max(by=['cluster'])"
filtering: "filter('plugin', 'elasticsearch')"
signals:
signal:
metric: "elasticsearch.cluster.unassigned-shards"
rollup: average
rules:
critical:
threshold: 0
comparator: ">"
description: "is too high"
lasting_duration: '10m'
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
module: ElasticSearch
name: "pending tasks"
aggregation: ".max(by=['cluster'])"
filtering: "filter('plugin', 'elasticsearch')"
signals:
signal:
metric: "elasticsearch.cluster.pending-tasks"
rollup: average
rules:
critical:
threshold: 5
comparator: ">="
description: "are too high"
lasting_duration: '15m'
major:
threshold: 0
comparator: ">"
dependency: critical
description: "are too high"
lasting_duration: '15m'
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
module: ElasticSearch
name: "cpu usage"
aggregation: ".max(by=['cluster'])"
filtering: "filter('node_name', '*') and filter('plugin', 'elasticsearch')"
signals:
signal:
metric: "elasticsearch.process.cpu.percent"
rollup: average
rules:
critical:
threshold: 95
comparator: ">="
description: "is too high"
lasting_duration: '30m'
major:
threshold: 85
comparator: ">"
dependency: critical
description: "is too high"
lasting_duration: '30m'
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
module: ElasticSearch
name: "file descriptors"
aggregation: ".max(by=['cluster'])"
filtering: "filter('node_name', '*') and filter('plugin', 'elasticsearch')"
signals:
A:
metric: "elasticsearch.process.open_file_descriptors"
rollup: average
B:
metric: "elasticsearch.process.max_file_descriptors"
rollup: average
signal:
formula: "(A/B).scale(100)"
rules:
critical:
threshold: 95
comparator: ">="
description: "is too high"
lasting_duration: '15m'
major:
threshold: 90
comparator: ">"
dependency: critical
description: "is too high"
lasting_duration: '15m'
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
module: ElasticSearch
name: "JVM heap memory usage"
aggregation: ".max(by=['cluster'])"
filtering: "filter('node_name', '*') and filter('plugin', 'elasticsearch')"
signals:
signal:
metric: "elasticsearch.jvm.mem.heap-used-percent"
rollup: average
rules:
critical:
threshold: 90
comparator: ">="
description: "is too high"
lasting_duration: '5m'
major:
threshold: 80
comparator: ">"
dependency: critical
description: "is too high"
lasting_duration: '5m'
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
module: ElasticSearch
name: "JVM memory young usage"
aggregation: ".max(by=['cluster'])"
filtering: "filter('node_name', '*') and filter('plugin', 'elasticsearch')"
signals:
A:
metric: "elasticsearch.jvm.mem.pools.young.used_in_bytes"
rollup: average
B:
metric: "elasticsearch.jvm.mem.pools.young.max_in_bytes"
rollup: average
signal:
formula: "(A/B).fill(0).scale(100)"
rules:
major:
threshold: 90
comparator: ">="
description: "is too high"
lasting_duration: '10m'
minor:
threshold: 80
comparator: ">"
description: "is too high"
dependency: major
lasting_duration: '10m'
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
module: ElasticSearch
name: "JVM memory old usage"
aggregation: ".max(by=['cluster'])"
filtering: "filter('node_name', '*') and filter('plugin', 'elasticsearch')"
signals:
A:
metric: "elasticsearch.jvm.mem.pools.old.used_in_bytes"
rollup: average
B:
metric: "elasticsearch.jvm.mem.pools.old.max_in_bytes"
rollup: average
signal:
formula: "(A/B).fill(0).scale(100)"
rules:
major:
threshold: 90
comparator: ">="
description: "is too high"
lasting_duration: '10m'
minor:
threshold: 80
comparator: ">"
description: "is too high"
dependency: major
lasting_duration: '10m'
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
module: ElasticSearch
name: "jvm gc old collection latency"
aggregation: ".max(by=['cluster'])"
filtering: "filter('node_name', '*') and filter('plugin', 'elasticsearch')"
signals:
A:
metric: "elasticsearch.jvm.gc.old-time"
extrapolation: zero
rollup: delta
B:
metric: "elasticsearch.jvm.gc.old-count"
extrapolation: zero
rollup: delta
signal:
formula: "(A/B).fill(0)"
rules:
major:
threshold: 300
comparator: ">="
description: "is too high"
lasting_duration: '15m'
minor:
threshold: 200
comparator: ">"
description: "is too high"
dependency: major
lasting_duration: '15m'
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
module: ElasticSearch
name: "jvm gc young collection latency"
aggregation: ".max(by=['cluster'])"
filtering: "filter('node_name', '*') and filter('plugin', 'elasticsearch')"
signals:
A:
metric: "elasticsearch.jvm.gc.time"
extrapolation: zero
rollup: delta
B:
metric: "elasticsearch.jvm.gc.count"
extrapolation: zero
rollup: delta
signal:
formula: "(A/B).fill(0)"
rules:
major:
threshold: 40
comparator: ">="
description: "is too high"
lasting_duration: '15m'
minor:
threshold: 20
comparator: ">"
description: "is too high"
dependency: major
lasting_duration: '15m'
Loading
Loading