From fcd35a0b6b8ead7cefbc7fd0d58320d65e694676 Mon Sep 17 00:00:00 2001 From: Chetan Choudhary Date: Sun, 8 Dec 2024 13:43:31 +0530 Subject: [PATCH 1/9] SUMO-251499: Adding monitor's information to OTEL Apps Set1 --- .../opentelemetry/rabbitmq-opentelemetry.md | 22 +++++++++++++++ .../opentelemetry/cassandra-opentelemetry.md | 25 +++++++++++++++++ .../opentelemetry/memcached-opentelemetry.md | 24 +++++++++++++++- .../opentelemetry/mongodb-opentelemetry.md | 28 +++++++++++++++++++ .../opentelemetry/redis-opentelemetry.md | 20 +++++++++++++ .../opentelemetry/haproxy-opentelemetry.md | 23 ++++++++++++++- 6 files changed, 140 insertions(+), 2 deletions(-) diff --git a/docs/integrations/containers-orchestration/opentelemetry/rabbitmq-opentelemetry.md b/docs/integrations/containers-orchestration/opentelemetry/rabbitmq-opentelemetry.md index 0ed4870aa5..ef6d00e062 100644 --- a/docs/integrations/containers-orchestration/opentelemetry/rabbitmq-opentelemetry.md +++ b/docs/integrations/containers-orchestration/opentelemetry/rabbitmq-opentelemetry.md @@ -15,6 +15,11 @@ RabbitMQ logs are sent to Sumo Logic through the OpenTelemetry [filelog receiver Schematics + +:::info +This app includes [built-in monitors](#rabbitmq-alerts). For details on creating custom monitors, refer to [Create monitors for RabbitMQ app](#create-monitors-for-rabbitmq-app). +::: + ## Fields creation in Sumo Logic for RabbitMQ Following are the [Fields](/docs/manage/fields/) which will be created as part of RabbitMQ App install if not already present. @@ -230,3 +235,20 @@ The **RabbitMQ - Logs** dashboard gives you an at-a-glance view of error message The **RabbitMQ - Metrics** dashboard gives you an at-a-glance view of your RabbitMQ deployment across brokers, queue, exchange, consumer, and messages. RabbitMQ Metrics dashboards + +## Create monitors for RabbitMQ app + +import CreateMonitors from '../../../reuse/apps/create-monitors.md'; + + + +### RabbitMQ alerts + +| Alert Name | Alert Description and conditions | Alert Condition | Recover Condition | +|:---------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|:-------------------| +| `RabbitMQ - High Consumer Count` | This alert is triggered when consumers are higher than given value (Default 10000) in a queue. | `>=` 10000 | `<` 10000 | +| `RabbitMQ - High Message Queue Size` | This alert is triggered when the number of messages in a queue exceeds a given threshold (Default 10000), indicating potential consumer issues or message processing bottlenecks. | `>=` 10000 | `<` 10000 | +| `RabbitMQ - High Messages Count` | This alert is triggered when messages are higher than given value (Default 10000) in a queue. | `>=` 10000 | `<` 10000 | +| `RabbitMQ - High Unacknowledged Messages` | This alert is triggered when there are too many unacknowledged messages (Default 5000), suggesting consumer processing issues. | `>=` 5000 | `<` 5000 | +| `RabbitMQ - Node Down` | This alert is triggered when a node in the RabbitMQ cluster is down. | `>=` 1 | `<` 1 | +| `RabbitMQ - Zero Consumers Alert` | This alert is triggered when a queue has no consumers, indicating potential service issues. | `<=` 0 | `>` 0 | diff --git a/docs/integrations/databases/opentelemetry/cassandra-opentelemetry.md b/docs/integrations/databases/opentelemetry/cassandra-opentelemetry.md index 1dbb93b407..b985759db9 100644 --- a/docs/integrations/databases/opentelemetry/cassandra-opentelemetry.md +++ b/docs/integrations/databases/opentelemetry/cassandra-opentelemetry.md @@ -19,6 +19,10 @@ The app supports Logs from the open-source version of Cassandra. The App is test Schematics +:::info +This app includes [built-in monitors](#cassandra-alerts). For details on creating custom monitors, refer to [Create monitors for Cassandra app](#create-monitors-for-cassandra-app). +::: + ## Fields creation in Sumo Logic for Cassandra Following are the [Fields](/docs/manage/fields/) which will be created as part of Cassandra App install if not already present: @@ -279,3 +283,24 @@ The **Cassandra - Requests** dashboard provides insight into the number of reque The **Cassandra - Storage** dashboard provides insight into the current value of total hints of your Cassandra cluster along with storage managed by the cluster. Storage + + +## Create monitors for Cassandra app + +import CreateMonitors from '../../../reuse/apps/create-monitors.md'; + + + +### Cassandra alerts + +| Alert Name | Alert Description and conditions | Alert Condition | Recover Condition | +|:---------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|:-------------------| +| `Cassandra - Compaction Task Pending` | This alert is triggered when there are more than 15 Compaction tasks which are pending. | `>=` 15 | `<` 15 | +| `Cassandra - High Hints Backlog` | This alert is triggered when the number of in-progress hints exceeds given value (Default 5000) for 5 minutes. | `>=` 5000 | `<` 5000 | +| `Cassandra - High Memory Usage` | This alert is triggered when memory used exceeds 85% of committed memory for more than 10 minutes. | `>=` 1 | `<` 1 | +| `Cassandra - Node Down Alert` | This alert is triggered when a Cassandra node status changes to DOWN for more than 5 minutes. | `>=` 1 | `<` 1 | +| `Cassandra - Operation Error Rate High` | This alert is triggered when the error rate of operations exceeds given value (Default 5%) for 5 minutes. | `>` 5 | `<=` 5 | +| `Cassandra - Range Query Latency High (99th Percentile)` | This alert is triggered when the 99th percentile of range query latency exceeds the given value (Default 2 seconds) for 5 minutes. | `>=` 2000000 | `<` 2000000 | +| `Cassandra - Read Latency High (99th Percentile)` | This alert is triggered when the 99th percentile of read latency exceeds given value (Default 500ms) for 5 minutes. | `>=` 500000 | `<` 500000 | +| `Cassandra - Storage Growth Rate Abnormal` | This alert is triggered when the storage growth rate exceeds given value (Default 25MB/minute) for 5 minutes. | `>=` 26214400 | `<` 26214400 | +| `Cassandra - Write Latency High (99th Percentile)` | This alert is triggered when the 99th percentile of write latency exceeds given value (Default 200ms) for 5 minutes. | `>=` 200000 | `<` 200000 | diff --git a/docs/integrations/databases/opentelemetry/memcached-opentelemetry.md b/docs/integrations/databases/opentelemetry/memcached-opentelemetry.md index ade5501398..8a66b5f379 100644 --- a/docs/integrations/databases/opentelemetry/memcached-opentelemetry.md +++ b/docs/integrations/databases/opentelemetry/memcached-opentelemetry.md @@ -19,6 +19,11 @@ Memcached logs are sent to Sumo Logic through the OpenTelemetry [filelog receive Schematics + +:::info +This app includes [built-in monitors](#memcached-alerts). For details on creating custom monitors, refer to [Create monitors for Memcached app](#create-monitors-for-memcached-app). +::: + ## Fields creation in Sumo Logic for Memcached Following are the [Fields](/docs/manage/fields/) which will be created as part of Memcached App install if not already present. @@ -258,4 +263,21 @@ The **Memcached - Cache Information** dashboard provides insight into cache stat The **Memcached - Logs** dashboard helps you quickly analyze your Memcached error logs, commands executed, and objects stored. -Memcached dashboards \ No newline at end of file +Memcached dashboards + + +## Create monitors for Memcached app + +import CreateMonitors from '../../../reuse/apps/create-monitors.md'; + + + +### Memcached alerts + +| Alert Name | Alert Description and conditions | Alert Condition | Recover Condition | +|:---------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|:-------------------| +| `Memcached - Cache Hit Ratio` | The hit rate is one of the most important indicators of Memcached performance. A high hit rate means faster responses to your users. If the hit rate is falling, you need quick visibility into why. This alert is triggered when low cache hit ratio is less than 50% | `<=` 50 | `>` 50 | +| `Memcached - Commands Error` | This alert is triggered when Memcached has error commands. | `>` 0 | `<=` 0 | +| `Memcached - Current Connections` | This alert is triggered when current connections to Memcached are zero. | `<=` 0 | `>` 0 | +| `Memcached - High Memory Usage` | This alert is triggered when the memcached exceed given threshold memory usage (in GB) | `>` 5 | `<=` 5 | +| `Memcached - High Number of Connections` | This alert is triggered when the number of current connection for memcached exceed given threshold. | `>=` 1000 | `<` 1000 | diff --git a/docs/integrations/databases/opentelemetry/mongodb-opentelemetry.md b/docs/integrations/databases/opentelemetry/mongodb-opentelemetry.md index 6102111565..ff3d38922b 100644 --- a/docs/integrations/databases/opentelemetry/mongodb-opentelemetry.md +++ b/docs/integrations/databases/opentelemetry/mongodb-opentelemetry.md @@ -23,6 +23,10 @@ This app supports logs and metrics for MongoDB instance. The MongoDB logs are ge The app supports metrics generated by the [MongoDB Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/mongodbreceiver/documentation.md). +:::info +This app includes [built-in monitors](#mongodb-alerts). For details on creating custom monitors, refer to [Create monitors for MongoDB app](#create-monitors-for-mongodb-app). +::: + ## Fields creation in Sumo Logic for MongoDB Following are the [Fields](/docs/manage/fields/) which will be created as part of MongoDB App install if not already present. @@ -339,3 +343,27 @@ Use this dashboard to: * Operation count like query, insert, and delete. Replication + + +## Create monitors for MongoDB app + +import CreateMonitors from '../../../reuse/apps/create-monitors.md'; + + + +### MongoDB alerts + +| Alert Name | Alert Description and conditions | Alert Condition | Recover Condition | +|:---------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|:-------------------| +| `MongoDB - Instance Down` | This alert is triggered when we detect that the MongoDB instance is down. | Missing Data | Missing Data | +| `MongoDB - Replication Error` | This alert is triggered when we detect errors in MongoDB replication operations. | `>` 0 | `<=` 0 | +| `MongoDB - Replication Heartbeat Error` | This alert is triggered when we detect that the MongoDB Replication Heartbeat request has errors, which indicates replication is not working as expected. | `>` 0 | `<=` 0 | +| `MongoDB - Secondary Node Replication Failure` | This alert is triggered when we detect that a MongoDB secondary node is out of sync for replication. | `>` 0 | `<=` 0 | +| `MongoDB - Sharding Balancer Failure` | This alert is triggered when we detect that data balancing failed on a MongoDB Cluster with 1 mongos instance and 3 mongod instances. | `>` 0 | `<=` 0 | +| `MongoDB - Sharding Chunk Split Failure` | This alert is triggered when we detect that a MongoDB chunk not been split during sharding. | `>` 0 | `<=` 0 | +| `MongoDB - Sharding Error` | This alert is triggered when we detect errors in MongoDB sharding operations. | `>` 0 | `<=` 0 | +| `MongoDB - Sharding Warning` | This alert is triggered when we detect warnings in MongoDB sharding operations. | `>` 0 | `<=` 0 | +| `MongoDB - Slow Queries` | This alert is triggered when we detect that a MongoDB cluster is executing slow queries. | `>` 0 | `<=` 0 | +| `MongoDB - Too Many Connections` | This alert is triggered when we detect a given MongoDB server has too many connections (over 80% of capacity). | `>=` 80 | `<` 80 | +| `MongoDB - Too Many Cursors Open` | This alert is triggered when we detect that there are too many cursors (>10K) opened by MongoDB. | `>=` 10000 | `<` 10000 | +| `MongoDB - Too Many Cursors Timeouts` | This alert is triggered when we detect that there are too many cursors (100) timing out on a MongoDB server within a 5 minute time interval. | `>=` 100 | `<` 100 | diff --git a/docs/integrations/databases/opentelemetry/redis-opentelemetry.md b/docs/integrations/databases/opentelemetry/redis-opentelemetry.md index 84b7d5e6d5..acf9a664ef 100644 --- a/docs/integrations/databases/opentelemetry/redis-opentelemetry.md +++ b/docs/integrations/databases/opentelemetry/redis-opentelemetry.md @@ -25,6 +25,10 @@ This app supports metrics and logs for Redis in Cluster mode or Standalone mode. The app supports metrics generated by the [Redis Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/redisreceiver/documentation.md). +:::info +This app includes [built-in monitors](#redis-alerts). For details on creating custom monitors, refer to [Create monitors for Redis app](#create-monitors-for-redis-app). +::: + ## Creating fields in Sumo Logic for Redis The following are [fields](/docs/manage/fields/) that will be created as part of the Redis App install if not already present. @@ -243,3 +247,19 @@ Use this dashboard to review String Commands, Key Commands, Hash Commands, Sets Redis Command Stats Delta dashboards +## Create monitors for Redis app + +import CreateMonitors from '../../../reuse/apps/create-monitors.md'; + + + +### Redis alerts + +| Alert Name | Alert Description and conditions | Alert Condition | Recover Condition | +|:---------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|:-------------------| +| `Redis - High Memory Fragmentation Ratio` | This alert is triggered when the ration of Redis memory usage to Linux virtual memory pages (mapped to physical memory chunks) is higher than 1.5. A high ratio will lead to swapping and can adversely affect performance. | `>` 1.5 | `<=` 1.5 | +| `Redis - Instance Down` | This alert is triggered when we detect that the Redis instance is down for 5 minutes. | Missing Data | Missing Data | +| `Redis - Rejected Connections` | This alert is triggered when we detect that some connections to a Redis cluster have been rejected. | `>` 0 | `<=` 0 | +| `Redis - Replication Broken` | This alert is triggered when we detect that a Redis instance has lost all slaves. This will affect the redundancy of data stored in Redis. Please review how replication has been configured. | `<` 1 | `>=` 1 | +| `Redis - Replication Offset` | This alert is triggered when the replication offset in a given Redis cluster is greater than 1 MB for last five minutes. Please review how replication has been configured. | `>=` 1 | `<` 1 | +| `Redis - Too Many Connections` | This alert is triggered when we detect a given Redis server has too many connections (default is 100). | `>` 100 | `<=` 100 | diff --git a/docs/integrations/web-servers/opentelemetry/haproxy-opentelemetry.md b/docs/integrations/web-servers/opentelemetry/haproxy-opentelemetry.md index 24c5df3504..6848b74074 100644 --- a/docs/integrations/web-servers/opentelemetry/haproxy-opentelemetry.md +++ b/docs/integrations/web-servers/opentelemetry/haproxy-opentelemetry.md @@ -23,6 +23,10 @@ The app supports Logs from the open source version of HAProxy. The App is tested The HAProxy logs are generated in files as configured in the configuration file `/etc/haproxy/haproxy.cfg` ([learn more](https://www.haproxy.com/blog/introduction-to-haproxy-logging/)). +:::info +This app includes [built-in monitors](#haproxy-alerts). For details on creating custom monitors, refer to [Create monitors for HAProxy app](#create-monitors-for-haproxy-app). +::: + ## Fields Create in Sumo Logic for HAProxy Following are the [Fields](/docs/manage/fields/) which will be created as part of HAProxy App install if not already present. @@ -348,4 +352,21 @@ Use this dashboard to: - Gain insights into Client, Server Responses on HAProxy Server. This helps you identify errors in HAProxy Server. - To identify geo locations of all Client errors. This helps you identify client location causing errors and helps you to block client IPs. -Web Server Operations \ No newline at end of file +Web Server Operations + + +## Create monitors for HAProxy app + +import CreateMonitors from '../../../reuse/apps/create-monitors.md'; + + + +### HAProxy alerts + +| Alert Name | Alert Description and conditions | Alert Condition | Recover Condition | +|:---------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|:-------------------| +| `HAProxy - Access from Highly Malicious Sources` | This alert is triggered when an HAProxy is accessed from highly malicious IP addresses. | `>` 0 | `<=` 0 | +| `HAProxy - Backend Error` | This alert is triggered when we detect backend server errors. | `>` 0 | `<=` 0 | +| `HAProxy - Backend Server Down` | This alert is triggered when we detect a backend server for a given HAProxy server is down. | `>` 0 | `<=` 0 | +| `HAProxy - High Client (HTTP 4xx) Error Rate` | This alert is triggered when there are too many HTTP requests (>5%) with a response status of 4xx. | `>` 0 | `<=` 0 | +| `HAProxy - High Server (HTTP 5xx) Error Rate` | This alert is triggered when there are too many HTTP requests (>5%) with a response status of 5xx. | `>` 0 | `<=` 0 | From 3756f372260dc9298e1e279c7309ea019a2a40f9 Mon Sep 17 00:00:00 2001 From: Chetan Choudhary Date: Mon, 9 Dec 2024 12:01:41 +0530 Subject: [PATCH 2/9] updating mongodb instance down recovery condition --- .../opentelemetry/mongodb-opentelemetry.md | 26 +++++++++---------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/docs/integrations/databases/opentelemetry/mongodb-opentelemetry.md b/docs/integrations/databases/opentelemetry/mongodb-opentelemetry.md index ff3d38922b..37412f565a 100644 --- a/docs/integrations/databases/opentelemetry/mongodb-opentelemetry.md +++ b/docs/integrations/databases/opentelemetry/mongodb-opentelemetry.md @@ -354,16 +354,16 @@ import CreateMonitors from '../../../reuse/apps/create-monitors.md'; ### MongoDB alerts | Alert Name | Alert Description and conditions | Alert Condition | Recover Condition | -|:---------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|:-------------------| -| `MongoDB - Instance Down` | This alert is triggered when we detect that the MongoDB instance is down. | Missing Data | Missing Data | -| `MongoDB - Replication Error` | This alert is triggered when we detect errors in MongoDB replication operations. | `>` 0 | `<=` 0 | -| `MongoDB - Replication Heartbeat Error` | This alert is triggered when we detect that the MongoDB Replication Heartbeat request has errors, which indicates replication is not working as expected. | `>` 0 | `<=` 0 | -| `MongoDB - Secondary Node Replication Failure` | This alert is triggered when we detect that a MongoDB secondary node is out of sync for replication. | `>` 0 | `<=` 0 | -| `MongoDB - Sharding Balancer Failure` | This alert is triggered when we detect that data balancing failed on a MongoDB Cluster with 1 mongos instance and 3 mongod instances. | `>` 0 | `<=` 0 | -| `MongoDB - Sharding Chunk Split Failure` | This alert is triggered when we detect that a MongoDB chunk not been split during sharding. | `>` 0 | `<=` 0 | -| `MongoDB - Sharding Error` | This alert is triggered when we detect errors in MongoDB sharding operations. | `>` 0 | `<=` 0 | -| `MongoDB - Sharding Warning` | This alert is triggered when we detect warnings in MongoDB sharding operations. | `>` 0 | `<=` 0 | -| `MongoDB - Slow Queries` | This alert is triggered when we detect that a MongoDB cluster is executing slow queries. | `>` 0 | `<=` 0 | -| `MongoDB - Too Many Connections` | This alert is triggered when we detect a given MongoDB server has too many connections (over 80% of capacity). | `>=` 80 | `<` 80 | -| `MongoDB - Too Many Cursors Open` | This alert is triggered when we detect that there are too many cursors (>10K) opened by MongoDB. | `>=` 10000 | `<` 10000 | -| `MongoDB - Too Many Cursors Timeouts` | This alert is triggered when we detect that there are too many cursors (100) timing out on a MongoDB server within a 5 minute time interval. | `>=` 100 | `<` 100 | +|:--|:--|:--|:--| +| `MongoDB - Instance Down` | This alert gets triggered when we detect that the MongoDB instance is down. | Missing Data | Data Found | +| `MongoDB - Replication Error` | This alert gets triggered when we detect errors in MongoDB replication operations. | `>` 0 | `<=` 0 | +| `MongoDB - Replication Heartbeat Error` | This alert gets triggered when we detect that the MongoDB Replication Heartbeat request has errors, which indicates replication is not working as expected. | `>` 0 | `<=` 0 | +| `MongoDB - Secondary Node Replication Failure` | This alert gets triggered when we detect that a MongoDB secondary node is out of sync for replication. | `>` 0 | `<=` 0 | +| `MongoDB - Sharding Balancer Failure` | This alert gets triggered when we detect that data balancing failed on a MongoDB Cluster with 1 mongos instance and 3 mongod instances. | `>` 0 | `<=` 0 | +| `MongoDB - Sharding Chunk Split Failure` | This alert gets triggered when we detect that a MongoDB chunk not been split during sharding. | `>` 0 | `<=` 0 | +| `MongoDB - Sharding Error` | This alert gets triggered when we detect errors in MongoDB sharding operations. | `>` 0 | `<=` 0 | +| `MongoDB - Sharding Warning` | This alert gets triggered when we detect warnings in MongoDB sharding operations. | `>` 0 | `<=` 0 | +| `MongoDB - Slow Queries` | This alert gets triggered when we detect that a MongoDB cluster is executing slow queries. | `>` 0 | `<=` 0 | +| `MongoDB - Too Many Connections` | This alert gets triggered when we detect a given MongoDB server has too many connections (over 80% of capacity). | `>=` 80 | `<` 80 | +| `MongoDB - Too Many Cursors Open` | This alert gets triggered when we detect that there are too many cursors (>10K) opened by MongoDB. | `>=` 10000 | `<` 10000 | +| `MongoDB - Too Many Cursors Timeouts` | This alert gets triggered when we detect that there are too many cursors (100) timing out on a MongoDB server within a 5 minute time interval. | `>=` 100 | `<` 100 | From 8a7e3029b4f2702dca2911b86dd8e1deec4a43b3 Mon Sep 17 00:00:00 2001 From: Jagadisha V <129049263+JV0812@users.noreply.github.com> Date: Mon, 9 Dec 2024 13:44:18 +0530 Subject: [PATCH 3/9] Update rabbitmq-opentelemetry.md --- .../opentelemetry/rabbitmq-opentelemetry.md | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/docs/integrations/containers-orchestration/opentelemetry/rabbitmq-opentelemetry.md b/docs/integrations/containers-orchestration/opentelemetry/rabbitmq-opentelemetry.md index ef6d00e062..e105a541ec 100644 --- a/docs/integrations/containers-orchestration/opentelemetry/rabbitmq-opentelemetry.md +++ b/docs/integrations/containers-orchestration/opentelemetry/rabbitmq-opentelemetry.md @@ -15,9 +15,8 @@ RabbitMQ logs are sent to Sumo Logic through the OpenTelemetry [filelog receiver Schematics - :::info -This app includes [built-in monitors](#rabbitmq-alerts). For details on creating custom monitors, refer to [Create monitors for RabbitMQ app](#create-monitors-for-rabbitmq-app). +This app includes [built-in monitors](#rabbitmq-alerts). For details on creating custom monitors, refer to the [Create monitors for RabbitMQ app](#create-monitors-for-rabbitmq-app). ::: ## Fields creation in Sumo Logic for RabbitMQ @@ -244,11 +243,11 @@ import CreateMonitors from '../../../reuse/apps/create-monitors.md'; ### RabbitMQ alerts -| Alert Name | Alert Description and conditions | Alert Condition | Recover Condition | -|:---------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|:-------------------| -| `RabbitMQ - High Consumer Count` | This alert is triggered when consumers are higher than given value (Default 10000) in a queue. | `>=` 10000 | `<` 10000 | -| `RabbitMQ - High Message Queue Size` | This alert is triggered when the number of messages in a queue exceeds a given threshold (Default 10000), indicating potential consumer issues or message processing bottlenecks. | `>=` 10000 | `<` 10000 | -| `RabbitMQ - High Messages Count` | This alert is triggered when messages are higher than given value (Default 10000) in a queue. | `>=` 10000 | `<` 10000 | -| `RabbitMQ - High Unacknowledged Messages` | This alert is triggered when there are too many unacknowledged messages (Default 5000), suggesting consumer processing issues. | `>=` 5000 | `<` 5000 | -| `RabbitMQ - Node Down` | This alert is triggered when a node in the RabbitMQ cluster is down. | `>=` 1 | `<` 1 | -| `RabbitMQ - Zero Consumers Alert` | This alert is triggered when a queue has no consumers, indicating potential service issues. | `<=` 0 | `>` 0 | +| Name | Description | Alert Condition | Recover Condition | +|:--|:--|:--|:--| +| `RabbitMQ - High Consumer Count` | This alert is triggered when consumers are higher than given value in a queue. | Count > = 10000 | Count < 10000 | +| `RabbitMQ - High Message Queue Size` | This alert is triggered when the number of messages in a queue exceeds a given threshold, indicating potential consumer issues or message processing bottlenecks. | Count > = 10000 | Count < 10000 | +| `RabbitMQ - High Messages Count` | This alert is triggered when messages are higher than given value in a queue. | Count > = 10000 | Count < 10000 | +| `RabbitMQ - High Unacknowledged Messages` | This alert is triggered when there are too many unacknowledged messages, suggesting consumer processing issues. | Count > = 5000 | Count < 5000 | +| `RabbitMQ - Node Down` | This alert is triggered when a node in the RabbitMQ cluster is down. | Count > = 1 | Count < 1 | +| `RabbitMQ - Zero Consumers Alert` | This alert is triggered when a queue has no consumers, indicating potential service issues. | Count < = 0 | Count > 0 | From bef13eedc37da255aa2a448e6fc1eda22f67b9cc Mon Sep 17 00:00:00 2001 From: Chetan Choudhary Date: Mon, 9 Dec 2024 14:09:37 +0530 Subject: [PATCH 4/9] Addressing feedbacks --- .../opentelemetry/rabbitmq-opentelemetry.md | 12 ++++---- .../opentelemetry/cassandra-opentelemetry.md | 24 ++++++++-------- .../opentelemetry/memcached-opentelemetry.md | 16 +++++------ .../opentelemetry/mongodb-opentelemetry.md | 28 +++++++++---------- .../opentelemetry/redis-opentelemetry.md | 18 ++++++------ .../opentelemetry/haproxy-opentelemetry.md | 16 +++++------ 6 files changed, 57 insertions(+), 57 deletions(-) diff --git a/docs/integrations/containers-orchestration/opentelemetry/rabbitmq-opentelemetry.md b/docs/integrations/containers-orchestration/opentelemetry/rabbitmq-opentelemetry.md index e105a541ec..ac87f0a4c9 100644 --- a/docs/integrations/containers-orchestration/opentelemetry/rabbitmq-opentelemetry.md +++ b/docs/integrations/containers-orchestration/opentelemetry/rabbitmq-opentelemetry.md @@ -245,9 +245,9 @@ import CreateMonitors from '../../../reuse/apps/create-monitors.md'; | Name | Description | Alert Condition | Recover Condition | |:--|:--|:--|:--| -| `RabbitMQ - High Consumer Count` | This alert is triggered when consumers are higher than given value in a queue. | Count > = 10000 | Count < 10000 | -| `RabbitMQ - High Message Queue Size` | This alert is triggered when the number of messages in a queue exceeds a given threshold, indicating potential consumer issues or message processing bottlenecks. | Count > = 10000 | Count < 10000 | -| `RabbitMQ - High Messages Count` | This alert is triggered when messages are higher than given value in a queue. | Count > = 10000 | Count < 10000 | -| `RabbitMQ - High Unacknowledged Messages` | This alert is triggered when there are too many unacknowledged messages, suggesting consumer processing issues. | Count > = 5000 | Count < 5000 | -| `RabbitMQ - Node Down` | This alert is triggered when a node in the RabbitMQ cluster is down. | Count > = 1 | Count < 1 | -| `RabbitMQ - Zero Consumers Alert` | This alert is triggered when a queue has no consumers, indicating potential service issues. | Count < = 0 | Count > 0 | +| `RabbitMQ - High Consumer Count` | This alert is triggered when consumers are higher than given value (Default 10000) in a queue. | Count `>=` 10000 | Count `<` 10000 | +| `RabbitMQ - High Message Queue Size` | This alert is triggered when the number of messages in a queue exceeds a given threshold (Default 10000), indicating potential consumer issues or message processing bottlenecks. | Count `>=` 10000 | Count `<` 10000 | +| `RabbitMQ - High Messages Count` | This alert is triggered when messages are higher than given value (Default 10000) in a queue. | Count `>=` 10000 | Count `<` 10000 | +| `RabbitMQ - High Unacknowledged Messages` | This alert is triggered when there are too many unacknowledged messages (Default 5000), suggesting consumer processing issues. | Count `>=` 5000 | Count `<` 5000 | +| `RabbitMQ - Node Down` | This alert is triggered when a node in the RabbitMQ cluster is down. | Count `>=` 1 | Count `<` 1 | +| `RabbitMQ - Zero Consumers Alert` | This alert is triggered when a queue has no consumers, indicating potential service issues. | Count `<=` 0 | Count `>` 0 | diff --git a/docs/integrations/databases/opentelemetry/cassandra-opentelemetry.md b/docs/integrations/databases/opentelemetry/cassandra-opentelemetry.md index b985759db9..1076335c46 100644 --- a/docs/integrations/databases/opentelemetry/cassandra-opentelemetry.md +++ b/docs/integrations/databases/opentelemetry/cassandra-opentelemetry.md @@ -20,7 +20,7 @@ The app supports Logs from the open-source version of Cassandra. The App is test Schematics :::info -This app includes [built-in monitors](#cassandra-alerts). For details on creating custom monitors, refer to [Create monitors for Cassandra app](#create-monitors-for-cassandra-app). +This app includes [built-in monitors](#cassandra-alerts). For details on creating custom monitors, refer to the [Create monitors for Cassandra app](#create-monitors-for-cassandra-app). ::: ## Fields creation in Sumo Logic for Cassandra @@ -293,14 +293,14 @@ import CreateMonitors from '../../../reuse/apps/create-monitors.md'; ### Cassandra alerts -| Alert Name | Alert Description and conditions | Alert Condition | Recover Condition | -|:---------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|:-------------------| -| `Cassandra - Compaction Task Pending` | This alert is triggered when there are more than 15 Compaction tasks which are pending. | `>=` 15 | `<` 15 | -| `Cassandra - High Hints Backlog` | This alert is triggered when the number of in-progress hints exceeds given value (Default 5000) for 5 minutes. | `>=` 5000 | `<` 5000 | -| `Cassandra - High Memory Usage` | This alert is triggered when memory used exceeds 85% of committed memory for more than 10 minutes. | `>=` 1 | `<` 1 | -| `Cassandra - Node Down Alert` | This alert is triggered when a Cassandra node status changes to DOWN for more than 5 minutes. | `>=` 1 | `<` 1 | -| `Cassandra - Operation Error Rate High` | This alert is triggered when the error rate of operations exceeds given value (Default 5%) for 5 minutes. | `>` 5 | `<=` 5 | -| `Cassandra - Range Query Latency High (99th Percentile)` | This alert is triggered when the 99th percentile of range query latency exceeds the given value (Default 2 seconds) for 5 minutes. | `>=` 2000000 | `<` 2000000 | -| `Cassandra - Read Latency High (99th Percentile)` | This alert is triggered when the 99th percentile of read latency exceeds given value (Default 500ms) for 5 minutes. | `>=` 500000 | `<` 500000 | -| `Cassandra - Storage Growth Rate Abnormal` | This alert is triggered when the storage growth rate exceeds given value (Default 25MB/minute) for 5 minutes. | `>=` 26214400 | `<` 26214400 | -| `Cassandra - Write Latency High (99th Percentile)` | This alert is triggered when the 99th percentile of write latency exceeds given value (Default 200ms) for 5 minutes. | `>=` 200000 | `<` 200000 | +| Name | Description | Alert Condition | Recover Condition | +|:--|:--|:--|:--| +| `Cassandra - Compaction Task Pending` | This alert is triggered when there are more than 15 Compaction tasks which are pending. | Count `>=` 15 | Count `<` 15 | +| `Cassandra - High Hints Backlog` | This alert is triggered when the number of in-progress hints exceeds given value (Default 5000) for 5 minutes. | Count `>=` 5000 | Count `<` 5000 | +| `Cassandra - High Memory Usage` | This alert is triggered when memory used exceeds 85% of committed memory for more than 10 minutes. | Count `>=` 1 | Count `<` 1 | +| `Cassandra - Node Down Alert` | This alert is triggered when a Cassandra node status changes to DOWN for more than 5 minutes. | Count `>=` 1 | Count `<` 1 | +| `Cassandra - Operation Error Rate High` | This alert is triggered when the error rate of operations exceeds given value (Default 5%) for 5 minutes. | Count `>` 5 | Count `<=` 5 | +| `Cassandra - Range Query Latency High (99th Percentile)` | This alert is triggered when the 99th percentile of range query latency exceeds the given value (Default 2 seconds) for 5 minutes. | Count `>=` 2000000 | Count `<` 2000000 | +| `Cassandra - Read Latency High (99th Percentile)` | This alert is triggered when the 99th percentile of read latency exceeds given value (Default 500ms) for 5 minutes. | Count `>=` 500000 | Count `<` 500000 | +| `Cassandra - Storage Growth Rate Abnormal` | This alert is triggered when the storage growth rate exceeds given value (Default 25MB/minute) for 5 minutes. | Count `>=` 26214400 | Count `<` 26214400 | +| `Cassandra - Write Latency High (99th Percentile)` | This alert is triggered when the 99th percentile of write latency exceeds given value (Default 200ms) for 5 minutes. | Count `>=` 200000 | Count `<` 200000 | diff --git a/docs/integrations/databases/opentelemetry/memcached-opentelemetry.md b/docs/integrations/databases/opentelemetry/memcached-opentelemetry.md index 8a66b5f379..111ee9d63b 100644 --- a/docs/integrations/databases/opentelemetry/memcached-opentelemetry.md +++ b/docs/integrations/databases/opentelemetry/memcached-opentelemetry.md @@ -21,7 +21,7 @@ Memcached logs are sent to Sumo Logic through the OpenTelemetry [filelog receive :::info -This app includes [built-in monitors](#memcached-alerts). For details on creating custom monitors, refer to [Create monitors for Memcached app](#create-monitors-for-memcached-app). +This app includes [built-in monitors](#memcached-alerts). For details on creating custom monitors, refer to the [Create monitors for Memcached app](#create-monitors-for-memcached-app). ::: ## Fields creation in Sumo Logic for Memcached @@ -274,10 +274,10 @@ import CreateMonitors from '../../../reuse/apps/create-monitors.md'; ### Memcached alerts -| Alert Name | Alert Description and conditions | Alert Condition | Recover Condition | -|:---------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|:-------------------| -| `Memcached - Cache Hit Ratio` | The hit rate is one of the most important indicators of Memcached performance. A high hit rate means faster responses to your users. If the hit rate is falling, you need quick visibility into why. This alert is triggered when low cache hit ratio is less than 50% | `<=` 50 | `>` 50 | -| `Memcached - Commands Error` | This alert is triggered when Memcached has error commands. | `>` 0 | `<=` 0 | -| `Memcached - Current Connections` | This alert is triggered when current connections to Memcached are zero. | `<=` 0 | `>` 0 | -| `Memcached - High Memory Usage` | This alert is triggered when the memcached exceed given threshold memory usage (in GB) | `>` 5 | `<=` 5 | -| `Memcached - High Number of Connections` | This alert is triggered when the number of current connection for memcached exceed given threshold. | `>=` 1000 | `<` 1000 | +| Name | Description | Alert Condition | Recover Condition | +|:--|:--|:--|:--| +| `Memcached - Cache Hit Ratio` | The hit rate is one of the most important indicators of Memcached performance. A high hit rate means faster responses to your users. If the hit rate is falling, you need quick visibility into why. This alert is triggered when low cache hit ratio is less than 50% | Count `<=` 50 | Count `>` 50 | +| `Memcached - Commands Error` | This alert is triggered when Memcached has error commands. | Count `>` 0 | Count `<=` 0 | +| `Memcached - Current Connections` | This alert is triggered when current connections to Memcached are zero. | Count `<=` 0 | Count `>` 0 | +| `Memcached - High Memory Usage` | This alert is triggered when the memcached exceed given threshold memory usage (in GB) | Count `>` 5 | Count `<=` 5 | +| `Memcached - High Number of Connections` | This alert is triggered when the number of current connection for memcached exceed given threshold. | Count `>=` 1000 | Count `<` 1000 | diff --git a/docs/integrations/databases/opentelemetry/mongodb-opentelemetry.md b/docs/integrations/databases/opentelemetry/mongodb-opentelemetry.md index 37412f565a..b98149a2b9 100644 --- a/docs/integrations/databases/opentelemetry/mongodb-opentelemetry.md +++ b/docs/integrations/databases/opentelemetry/mongodb-opentelemetry.md @@ -24,7 +24,7 @@ This app supports logs and metrics for MongoDB instance. The MongoDB logs are ge The app supports metrics generated by the [MongoDB Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/mongodbreceiver/documentation.md). :::info -This app includes [built-in monitors](#mongodb-alerts). For details on creating custom monitors, refer to [Create monitors for MongoDB app](#create-monitors-for-mongodb-app). +This app includes [built-in monitors](#mongodb-alerts). For details on creating custom monitors, refer to the [Create monitors for MongoDB app](#create-monitors-for-mongodb-app). ::: ## Fields creation in Sumo Logic for MongoDB @@ -353,17 +353,17 @@ import CreateMonitors from '../../../reuse/apps/create-monitors.md'; ### MongoDB alerts -| Alert Name | Alert Description and conditions | Alert Condition | Recover Condition | +| Name | Description | Alert Condition | Recover Condition | |:--|:--|:--|:--| -| `MongoDB - Instance Down` | This alert gets triggered when we detect that the MongoDB instance is down. | Missing Data | Data Found | -| `MongoDB - Replication Error` | This alert gets triggered when we detect errors in MongoDB replication operations. | `>` 0 | `<=` 0 | -| `MongoDB - Replication Heartbeat Error` | This alert gets triggered when we detect that the MongoDB Replication Heartbeat request has errors, which indicates replication is not working as expected. | `>` 0 | `<=` 0 | -| `MongoDB - Secondary Node Replication Failure` | This alert gets triggered when we detect that a MongoDB secondary node is out of sync for replication. | `>` 0 | `<=` 0 | -| `MongoDB - Sharding Balancer Failure` | This alert gets triggered when we detect that data balancing failed on a MongoDB Cluster with 1 mongos instance and 3 mongod instances. | `>` 0 | `<=` 0 | -| `MongoDB - Sharding Chunk Split Failure` | This alert gets triggered when we detect that a MongoDB chunk not been split during sharding. | `>` 0 | `<=` 0 | -| `MongoDB - Sharding Error` | This alert gets triggered when we detect errors in MongoDB sharding operations. | `>` 0 | `<=` 0 | -| `MongoDB - Sharding Warning` | This alert gets triggered when we detect warnings in MongoDB sharding operations. | `>` 0 | `<=` 0 | -| `MongoDB - Slow Queries` | This alert gets triggered when we detect that a MongoDB cluster is executing slow queries. | `>` 0 | `<=` 0 | -| `MongoDB - Too Many Connections` | This alert gets triggered when we detect a given MongoDB server has too many connections (over 80% of capacity). | `>=` 80 | `<` 80 | -| `MongoDB - Too Many Cursors Open` | This alert gets triggered when we detect that there are too many cursors (>10K) opened by MongoDB. | `>=` 10000 | `<` 10000 | -| `MongoDB - Too Many Cursors Timeouts` | This alert gets triggered when we detect that there are too many cursors (100) timing out on a MongoDB server within a 5 minute time interval. | `>=` 100 | `<` 100 | +| `MongoDB - Instance Down` | This alert is triggered when we detect that the MongoDB instance is down. | Missing Data | Data Found | +| `MongoDB - Replication Error` | This alert is triggered when we detect errors in MongoDB replication operations. | Count `>` 0 | Count `<=` 0 | +| `MongoDB - Replication Heartbeat Error` | This alert is triggered when we detect that the MongoDB Replication Heartbeat request has errors, which indicates replication is not working as expected. | Count `>` 0 | Count `<=` 0 | +| `MongoDB - Secondary Node Replication Failure` | This alert is triggered when we detect that a MongoDB secondary node is out of sync for replication. | Count `>` 0 | Count `<=` 0 | +| `MongoDB - Sharding Balancer Failure` | This alert is triggered when we detect that data balancing failed on a MongoDB Cluster with 1 mongos instance and 3 mongod instances. | Count `>` 0 | Count `<=` 0 | +| `MongoDB - Sharding Chunk Split Failure` | This alert is triggered when we detect that a MongoDB chunk not been split during sharding. | Count `>` 0 | Count `<=` 0 | +| `MongoDB - Sharding Error` | This alert is triggered when we detect errors in MongoDB sharding operations. | Count `>` 0 | Count `<=` 0 | +| `MongoDB - Sharding Warning` | This alert is triggered when we detect warnings in MongoDB sharding operations. | Count `>` 0 | Count `<=` 0 | +| `MongoDB - Slow Queries` | This alert is triggered when we detect that a MongoDB cluster is executing slow queries. | Count `>` 0 | Count `<=` 0 | +| `MongoDB - Too Many Connections` | This alert is triggered when we detect a given MongoDB server has too many connections (over 80% of capacity). | Count `>=` 80 | Count `<` 80 | +| `MongoDB - Too Many Cursors Open` | This alert is triggered when we detect that there are too many cursors (>10K) opened by MongoDB. | Count `>=` 10000 | Count `<` 10000 | +| `MongoDB - Too Many Cursors Timeouts` | This alert is triggered when we detect that there are too many cursors (100) timing out on a MongoDB server within a 5 minute time interval. | Count `>=` 100 | Count `<` 100 | diff --git a/docs/integrations/databases/opentelemetry/redis-opentelemetry.md b/docs/integrations/databases/opentelemetry/redis-opentelemetry.md index acf9a664ef..3ea4e3b9db 100644 --- a/docs/integrations/databases/opentelemetry/redis-opentelemetry.md +++ b/docs/integrations/databases/opentelemetry/redis-opentelemetry.md @@ -26,7 +26,7 @@ This app supports metrics and logs for Redis in Cluster mode or Standalone mode. The app supports metrics generated by the [Redis Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/redisreceiver/documentation.md). :::info -This app includes [built-in monitors](#redis-alerts). For details on creating custom monitors, refer to [Create monitors for Redis app](#create-monitors-for-redis-app). +This app includes [built-in monitors](#redis-alerts). For details on creating custom monitors, refer to the [Create monitors for Redis app](#create-monitors-for-redis-app). ::: ## Creating fields in Sumo Logic for Redis @@ -255,11 +255,11 @@ import CreateMonitors from '../../../reuse/apps/create-monitors.md'; ### Redis alerts -| Alert Name | Alert Description and conditions | Alert Condition | Recover Condition | -|:---------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|:-------------------| -| `Redis - High Memory Fragmentation Ratio` | This alert is triggered when the ration of Redis memory usage to Linux virtual memory pages (mapped to physical memory chunks) is higher than 1.5. A high ratio will lead to swapping and can adversely affect performance. | `>` 1.5 | `<=` 1.5 | -| `Redis - Instance Down` | This alert is triggered when we detect that the Redis instance is down for 5 minutes. | Missing Data | Missing Data | -| `Redis - Rejected Connections` | This alert is triggered when we detect that some connections to a Redis cluster have been rejected. | `>` 0 | `<=` 0 | -| `Redis - Replication Broken` | This alert is triggered when we detect that a Redis instance has lost all slaves. This will affect the redundancy of data stored in Redis. Please review how replication has been configured. | `<` 1 | `>=` 1 | -| `Redis - Replication Offset` | This alert is triggered when the replication offset in a given Redis cluster is greater than 1 MB for last five minutes. Please review how replication has been configured. | `>=` 1 | `<` 1 | -| `Redis - Too Many Connections` | This alert is triggered when we detect a given Redis server has too many connections (default is 100). | `>` 100 | `<=` 100 | +| Name | Description | Alert Condition | Recover Condition | +|:--|:--|:--|:--| +| `Redis - High Memory Fragmentation Ratio` | This alert is triggered when the ration of Redis memory usage to Linux virtual memory pages (mapped to physical memory chunks) is higher than 1.5. A high ratio will lead to swapping and can adversely affect performance. | Count `>` 1.5 | Count `<=` 1.5 | +| `Redis - Instance Down` | This alert is triggered when we detect that the Redis instance is down for 5 minutes. | Missing Data | Data Found | +| `Redis - Rejected Connections` | This alert is triggered when we detect that some connections to a Redis cluster have been rejected. | Count `>` 0 | Count `<=` 0 | +| `Redis - Replication Broken` | This alert is triggered when we detect that a Redis instance has lost all slaves. This will affect the redundancy of data stored in Redis. Please review how replication has been configured. | Count `<` 1 | Count `>=` 1 | +| `Redis - Replication Offset` | This alert is triggered when the replication offset in a given Redis cluster is greater than 1 MB for last five minutes. Please review how replication has been configured. | Count `>=` 1 | Count `<` 1 | +| `Redis - Too Many Connections` | This alert is triggered when we detect a given Redis server has too many connections (default is 100). | Count `>` 100 | Count `<=` 100 | diff --git a/docs/integrations/web-servers/opentelemetry/haproxy-opentelemetry.md b/docs/integrations/web-servers/opentelemetry/haproxy-opentelemetry.md index 6848b74074..3d87118445 100644 --- a/docs/integrations/web-servers/opentelemetry/haproxy-opentelemetry.md +++ b/docs/integrations/web-servers/opentelemetry/haproxy-opentelemetry.md @@ -24,7 +24,7 @@ The app supports Logs from the open source version of HAProxy. The App is tested The HAProxy logs are generated in files as configured in the configuration file `/etc/haproxy/haproxy.cfg` ([learn more](https://www.haproxy.com/blog/introduction-to-haproxy-logging/)). :::info -This app includes [built-in monitors](#haproxy-alerts). For details on creating custom monitors, refer to [Create monitors for HAProxy app](#create-monitors-for-haproxy-app). +This app includes [built-in monitors](#haproxy-alerts). For details on creating custom monitors, refer to the [Create monitors for HAProxy app](#create-monitors-for-haproxy-app). ::: ## Fields Create in Sumo Logic for HAProxy @@ -363,10 +363,10 @@ import CreateMonitors from '../../../reuse/apps/create-monitors.md'; ### HAProxy alerts -| Alert Name | Alert Description and conditions | Alert Condition | Recover Condition | -|:---------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|:-------------------| -| `HAProxy - Access from Highly Malicious Sources` | This alert is triggered when an HAProxy is accessed from highly malicious IP addresses. | `>` 0 | `<=` 0 | -| `HAProxy - Backend Error` | This alert is triggered when we detect backend server errors. | `>` 0 | `<=` 0 | -| `HAProxy - Backend Server Down` | This alert is triggered when we detect a backend server for a given HAProxy server is down. | `>` 0 | `<=` 0 | -| `HAProxy - High Client (HTTP 4xx) Error Rate` | This alert is triggered when there are too many HTTP requests (>5%) with a response status of 4xx. | `>` 0 | `<=` 0 | -| `HAProxy - High Server (HTTP 5xx) Error Rate` | This alert is triggered when there are too many HTTP requests (>5%) with a response status of 5xx. | `>` 0 | `<=` 0 | +| Name | Description | Alert Condition | Recover Condition | +|:--|:--|:--|:--| +| `HAProxy - Access from Highly Malicious Sources` | This alert is triggered when an HAProxy is accessed from highly malicious IP addresses. | Count `>` 0 | Count `<=` 0 | +| `HAProxy - Backend Error` | This alert is triggered when we detect backend server errors. | Count `>` 0 | Count `<=` 0 | +| `HAProxy - Backend Server Down` | This alert is triggered when we detect a backend server for a given HAProxy server is down. | Count `>` 0 | Count `<=` 0 | +| `HAProxy - High Client (HTTP 4xx) Error Rate` | This alert is triggered when there are too many HTTP requests (>5%) with a response status of 4xx. | Count `>` 0 | Count `<=` 0 | +| `HAProxy - High Server (HTTP 5xx) Error Rate` | This alert fires when there are too many HTTP requests (>5%) with a response status of 5xx. | Count `>` 0 | Count `<=` 0 | From ba035b09c5fe24b149f9e1279f7c1a039941c091 Mon Sep 17 00:00:00 2001 From: Jagadisha V <129049263+JV0812@users.noreply.github.com> Date: Mon, 9 Dec 2024 14:28:42 +0530 Subject: [PATCH 5/9] Update cassandra-opentelemetry.md --- .../opentelemetry/cassandra-opentelemetry.md | 63 ++++++++----------- 1 file changed, 26 insertions(+), 37 deletions(-) diff --git a/docs/integrations/databases/opentelemetry/cassandra-opentelemetry.md b/docs/integrations/databases/opentelemetry/cassandra-opentelemetry.md index 1076335c46..935ecbd51c 100644 --- a/docs/integrations/databases/opentelemetry/cassandra-opentelemetry.md +++ b/docs/integrations/databases/opentelemetry/cassandra-opentelemetry.md @@ -2,7 +2,7 @@ id: cassandra-opentelemetry title: Cassandra - OpenTelemetry Collector sidebar_label: Cassandra - OTel Collector -description: Learn about the Sumo Logic OpenTelemetry App for Cassandra. +description: Learn about the Sumo Logic OpenTelemetry app for Cassandra. --- import useBaseUrl from '@docusaurus/useBaseUrl'; @@ -15,7 +15,7 @@ The [Cassandra](https://cassandra.apache.org/_/cassandra-basics.html) app is a l Cassandra logs are sent to Sumo Logic through OpenTelemetry [filelog receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/filelogreceiver) and cassandra metrics are sent to Sumo Logic using [JMX opentelemetry receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/jmxreceiver) with the `target_system` set as [`cassandra`](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/jmx-metrics/docs/target-systems/cassandra.md). -The app supports Logs from the open-source version of Cassandra. The App is tested on the 4.0.0 version of Cassandra. +The app supports logs from the open-source version of Cassandra. The app is tested on the 4.0.0 version of Cassandra. Schematics @@ -40,7 +40,6 @@ Following are the [Fields](/docs/manage/fields/) which will be created as part o JMX receiver collects Cassandra metrics from Cassandra server as part of the OpenTelemetry Collector (OTC). 1. Follow the instructions in [JMX - OpenTelemetry's prerequisites section](/docs/integrations/app-development/opentelemetry/jmx-opentelemetry/#prerequisites) to download the [JMX Metric Gatherer](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/jmx-metrics/README.md). This gatherer is used by the [JMX Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/jmxreceiver#details). - 2. Set the JMX port as part of `JAVA_OPTS` for Tomcat startup. Usually, it is set in the `/etc/systemd/system/cassandra.service` or `C:\Program Files\apache-tomcat\bin\tomcat.bat` file. ```json @@ -48,9 +47,10 @@ JMX receiver collects Cassandra metrics from Cassandra server as part of the Ope ``` #### For log collection -Cassandra has three main logs: system.log, debug.log, and gc.log which hold general logging messages, debugging logging messages, and java garbage collection logs respectively. -These logs by default live in `${CASSANDRA_HOME}/logs`, but most Linux distributions relocate logs to `/var/log/cassandra`. Operators can tune this location as well as what levels are logged using the provided logback.xml file. For more details on Cassandra logs, see[ this](https://cassandra.apache.org/doc/latest/troubleshooting/reading_logs.html) link. +Cassandra has three main logs: `system.log`, `debug.log`, and `gc.log`, which hold general logging messages, debugging logging messages, and java garbage collection logs respectively. + +These logs by default live in `${CASSANDRA_HOME}/logs`, but most Linux distributions relocate logs to `/var/log/cassandra`. Operators can tune this location as well as what levels are logged using the provided logback.xml file. For more details on Cassandra logs, see [this](https://cassandra.apache.org/doc/latest/troubleshooting/reading_logs.html). import LogsCollectionPrereqisites from '../../../reuse/apps/logs-collection-prereqisites.md'; @@ -82,7 +82,7 @@ You can add any custom fields which you want to be tagged with the data ingested YAML -### Step 3: Send logs to Sumo +### Step 3: Send logs to Sumo Logic import LogsIntro from '../../../reuse/apps/opentelemetry/send-logs-intro.md'; @@ -137,7 +137,7 @@ import LogsOutro from '../../../reuse/apps/opentelemetry/send-logs-outro.md'; -## Sample log messages +## Sample log message ```sql INFO [ScheduledTasks:1] 2023-01-08 09:18:47,347 StatusLogger.java:101 - system.schema_aggregates @@ -180,7 +180,7 @@ import LogsOutro from '../../../reuse/apps/opentelemetry/send-logs-outro.md'; } ``` -## Sample log queries  +## Sample log query  Following is a query from the Cassandra app's **Cassandra - Overview** dashboard Nodes Up panel: @@ -195,6 +195,7 @@ Following is a query from the Cassandra app's **Cassandra - Overview** dashboard ``` ## Sample metrics query + Following is the query from Cassandra App's overview Dashboard's Number of Requests Panel: ```sql @@ -209,20 +210,15 @@ The **Cassandra - Overview** dashboard provides an at-a-glance view of Cassandra Use this dashboard to: -- Identify number of nodes which are up and down -- Gain insights into Memory - Init, used, Max and committed -- Gain insights into the error and warning logs by thread and Node activity +- Identify number of nodes which are up and down. +- Gain insights into Memory - Init, used, Max, and committed. +- Gain insights into the error and warning logs by thread and Node activity. Collector ### Cache Stats -The **Cassandra - Cache Stats** dashboard provides insight into the database cache status, schedule, and items. - -Use this dashboard to: - -- Monitor Cache performance. -- Identify Cache usage statistics. +The **Cassandra - Cache Stats** dashboard provides insight into the database cache status, schedule, and items. Use this dashboard to monitor cache performance and identify cache usage statistics. Cache Stats @@ -250,32 +246,26 @@ Use this dashboard to: ### Memtable -The **Cassandra - Memtable** dashboard provides insights into memtable statistics. - -Use this dashboard to: - -- Review flush activity and memtable status. +The **Cassandra - Memtable** dashboard provides insights into memtable statistics. Use this dashboard to review flush activity and memtable status. Memtable ### Resource Usage -The **Cassandra - Resource Usage** dashboard provides details of resource utilization across Cassandra clusters. - -Use this dashboard to: - -- Identify resource utilization. This can help you to determine whether resources are over-allocated or under-allocated. +The **Cassandra - Resource Usage** dashboard provides details of resource utilization across Cassandra clusters. Use this dashboard to identify resource utilization. This can help you to determine whether resources are over-allocated or under-allocated. Resource Usage ### Compaction The **Cassandra - Compactions** dashboard provides insight into the completed and pending compaction tasks. + Compaction ### Requests The **Cassandra - Requests** dashboard provides insight into the number of request served, number of error request, and their distribution by status and operation. Also you can monitor the read and write latency of the cluster instance using this dashboard. + Requests ### Storage @@ -284,7 +274,6 @@ The **Cassandra - Storage** dashboard provides insight into the current value of Storage - ## Create monitors for Cassandra app import CreateMonitors from '../../../reuse/apps/create-monitors.md'; @@ -295,12 +284,12 @@ import CreateMonitors from '../../../reuse/apps/create-monitors.md'; | Name | Description | Alert Condition | Recover Condition | |:--|:--|:--|:--| -| `Cassandra - Compaction Task Pending` | This alert is triggered when there are more than 15 Compaction tasks which are pending. | Count `>=` 15 | Count `<` 15 | -| `Cassandra - High Hints Backlog` | This alert is triggered when the number of in-progress hints exceeds given value (Default 5000) for 5 minutes. | Count `>=` 5000 | Count `<` 5000 | -| `Cassandra - High Memory Usage` | This alert is triggered when memory used exceeds 85% of committed memory for more than 10 minutes. | Count `>=` 1 | Count `<` 1 | -| `Cassandra - Node Down Alert` | This alert is triggered when a Cassandra node status changes to DOWN for more than 5 minutes. | Count `>=` 1 | Count `<` 1 | -| `Cassandra - Operation Error Rate High` | This alert is triggered when the error rate of operations exceeds given value (Default 5%) for 5 minutes. | Count `>` 5 | Count `<=` 5 | -| `Cassandra - Range Query Latency High (99th Percentile)` | This alert is triggered when the 99th percentile of range query latency exceeds the given value (Default 2 seconds) for 5 minutes. | Count `>=` 2000000 | Count `<` 2000000 | -| `Cassandra - Read Latency High (99th Percentile)` | This alert is triggered when the 99th percentile of read latency exceeds given value (Default 500ms) for 5 minutes. | Count `>=` 500000 | Count `<` 500000 | -| `Cassandra - Storage Growth Rate Abnormal` | This alert is triggered when the storage growth rate exceeds given value (Default 25MB/minute) for 5 minutes. | Count `>=` 26214400 | Count `<` 26214400 | -| `Cassandra - Write Latency High (99th Percentile)` | This alert is triggered when the 99th percentile of write latency exceeds given value (Default 200ms) for 5 minutes. | Count `>=` 200000 | Count `<` 200000 | +| `Cassandra - Compaction Task Pending` | This alert is triggered when there are more than 15 pending Compaction tasks. | Count > = 15 | Count < 15 | +| `Cassandra - High Hints Backlog` | This alert is triggered when the number of in-progress hints exceeds the given value for 5 minutes. | Count > = 5000 | Count < 5000 | +| `Cassandra - High Memory Usage` | This alert is triggered when memory used exceeds 85% of committed memory for more than 10 minutes. | Count > = 1 | Count < 1 | +| `Cassandra - Node Down Alert` | This alert is triggered when a Cassandra node status changes to DOWN for more than 5 minutes. | Count > = 1 | Count < 1 | +| `Cassandra - Operation Error Rate High` | This alert is triggered when the error rate of operations exceeds given value (Default 5%) for 5 minutes. | Count > 5 | Count < = 5 | +| `Cassandra - Range Query Latency High (99th Percentile)` | This alert is triggered when the 99th percentile of range query latency exceeds the given value (Default 2 seconds) for 5 minutes. | Count > = 2000000 | Count < 2000000 | +| `Cassandra - Read Latency High (99th Percentile)` | This alert is triggered when the 99th percentile of read latency exceeds given value (Default 500ms) for 5 minutes. | Count > = 500000 | Count < 500000 | +| `Cassandra - Storage Growth Rate Abnormal` | This alert is triggered when the storage growth rate exceeds given value (Default 25MB/minute) for 5 minutes. | Count > = 26214400 | Count < 26214400 | +| `Cassandra - Write Latency High (99th Percentile)` | This alert is triggered when the 99th percentile of write latency exceeds given value (Default 200ms) for 5 minutes. | Count > = 200000 | Count < 200000 | From 25cb8107b9db19bd4bd50c8aac61676306d9938b Mon Sep 17 00:00:00 2001 From: Jagadisha V <129049263+JV0812@users.noreply.github.com> Date: Mon, 9 Dec 2024 14:32:31 +0530 Subject: [PATCH 6/9] Update memcached-opentelemetry.md --- .../opentelemetry/memcached-opentelemetry.md | 24 ++++++++----------- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/docs/integrations/databases/opentelemetry/memcached-opentelemetry.md b/docs/integrations/databases/opentelemetry/memcached-opentelemetry.md index 111ee9d63b..47944ec3b4 100644 --- a/docs/integrations/databases/opentelemetry/memcached-opentelemetry.md +++ b/docs/integrations/databases/opentelemetry/memcached-opentelemetry.md @@ -19,7 +19,6 @@ Memcached logs are sent to Sumo Logic through the OpenTelemetry [filelog receive Schematics - :::info This app includes [built-in monitors](#memcached-alerts). For details on creating custom monitors, refer to the [Create monitors for Memcached app](#create-monitors-for-memcached-app). ::: @@ -29,12 +28,11 @@ This app includes [built-in monitors](#memcached-alerts). For details on creatin Following are the [Fields](/docs/manage/fields/) which will be created as part of Memcached App install if not already present. - **`sumo.datasource`**. Has a fixed value of **memcached**. -- **`db.system`**. Has a fixed value of **memcached** -- **`deployment.environment`**. User configured. This is the deployment environment where the Memcache cluster resides. For example: dev, prod or qa. +- **`db.system`**. Has a fixed value of **memcached**. +- **`deployment.environment`**. User configured. This is the deployment environment where the Memcache cluster resides. For example: dev, prod, or qa. - **`db.cluster.name`**. User configured. Enter a name to identify this Memcached cluster. This cluster name will be shown in the Sumo Logic dashboards. - **`db.node.name`**. This has value of the FQDN of the machine where OpenTelemetry collector is collecting logs and metrics from. - ## Prerequisites 1. Configure logging in Memcached: By default, the installation of Memcached will not write any request logs to disk. To add a log file for Memcached, you can use the following syntax: @@ -226,13 +224,12 @@ Following is the query from Errors panel of Memcached app's overview Dashboard: | sum(ERROR) as ERROR by _timeslice ``` ## Sample metrics queries -**Total Get** -``` +```sql title="Total Get" sumo.datasource=memcached deployment.environment=* db.cluster.name=* db.node.name=* metric=memcached.commands command=get | sum ``` -## Viewing Memcached Dashboards +## Viewing the Memcached dashboards ### Overview @@ -242,7 +239,7 @@ The **Memcached - Overview** dashboard provides an at-a-glance view of the Memca ### Operations -The **Memcached - Operations** Dashboard provides detailed analysis on connections, thread requested, network bytes, table size. +The **Memcached - Operations** Dashboard provides detailed analysis on connections, thread requested, network bytes, and table size. Memcached dashboards @@ -252,7 +249,6 @@ The **Memcached - Command Stats** dashboard provides detailed insights into the Memcached dashboards - ### Cache Information The **Memcached - Cache Information** dashboard provides insight into cache states, cache hit, and miss rate over time. @@ -276,8 +272,8 @@ import CreateMonitors from '../../../reuse/apps/create-monitors.md'; | Name | Description | Alert Condition | Recover Condition | |:--|:--|:--|:--| -| `Memcached - Cache Hit Ratio` | The hit rate is one of the most important indicators of Memcached performance. A high hit rate means faster responses to your users. If the hit rate is falling, you need quick visibility into why. This alert is triggered when low cache hit ratio is less than 50% | Count `<=` 50 | Count `>` 50 | -| `Memcached - Commands Error` | This alert is triggered when Memcached has error commands. | Count `>` 0 | Count `<=` 0 | -| `Memcached - Current Connections` | This alert is triggered when current connections to Memcached are zero. | Count `<=` 0 | Count `>` 0 | -| `Memcached - High Memory Usage` | This alert is triggered when the memcached exceed given threshold memory usage (in GB) | Count `>` 5 | Count `<=` 5 | -| `Memcached - High Number of Connections` | This alert is triggered when the number of current connection for memcached exceed given threshold. | Count `>=` 1000 | Count `<` 1000 | +| `Memcached - Cache Hit Ratio` | This alert is triggered when low cache hit ratio is less than 50%. The hit rate is one of the most important indicators of Memcached performance. A high hit rate means faster responses to your users. If the hit rate is falling, you need quick visibility into why. | Count < = 50% | Count > 50% | +| `Memcached - Commands Error` | This alert is triggered when Memcached has error commands. | Count > 0 | Count < = 0 | +| `Memcached - Current Connections` | This alert is triggered when current connections to Memcached are zero. | Count < = 0 | Count > 0 | +| `Memcached - High Memory Usage` | This alert is triggered when the Memcached exceed given threshold memory usage (in GB). | Count > 5 | Count < = 5 | +| `Memcached - High Number of Connections` | This alert is triggered when the number of current connection for Memcached exceed given threshold. | Count > = 1000 | Count < 1000 | From 0c6ed42395f06491e90a5d427b6edd55a514353a Mon Sep 17 00:00:00 2001 From: Jagadisha V <129049263+JV0812@users.noreply.github.com> Date: Mon, 9 Dec 2024 14:36:12 +0530 Subject: [PATCH 7/9] Update mongodb-opentelemetry.md --- .../opentelemetry/mongodb-opentelemetry.md | 32 ++++++++----------- 1 file changed, 14 insertions(+), 18 deletions(-) diff --git a/docs/integrations/databases/opentelemetry/mongodb-opentelemetry.md b/docs/integrations/databases/opentelemetry/mongodb-opentelemetry.md index b98149a2b9..c1f85d3c94 100644 --- a/docs/integrations/databases/opentelemetry/mongodb-opentelemetry.md +++ b/docs/integrations/databases/opentelemetry/mongodb-opentelemetry.md @@ -33,7 +33,7 @@ Following are the [Fields](/docs/manage/fields/) which will be created as part o - **`db.cluster.name`**. User configured. Enter a name to identify this MongoDb cluster. This cluster name will be shown in the Sumo Logic dashboards. - **`db.system`**. Has fixed value of **mongodb**. -- **`deployment.environment`**. User configured. This is the deployment environment where the Mongodb cluster resides. For example: dev, prod or qa. +- **`deployment.environment`**. User configured. This is the deployment environment where the Mongodb cluster resides. For example: dev, prod, or qa. - **`sumo.datasource`**. has a fixed value of **mongodb**. * **`db.node.name`**. Has the value of host name of the machine which is being monitored. @@ -242,7 +242,7 @@ deployment.environment=* db.cluster.name=* sumo.datasource=mongodb  | json "log | count by component ``` -## Viewing MongoDB dashboards +## Viewing the MongoDB dashboards import ViewDashboards from '../../../reuse/apps/view-dashboards.md'; @@ -262,11 +262,7 @@ Use this dashboard to: ### Errors and Warnings -The **MongoDB - Errors and Warnings** dashboard shows errors and warnings by the MongoDB component. - -Use this dashboard to: - -- Determine components producing multiple errors or warnings. +The **MongoDB - Errors and Warnings** dashboard shows errors and warnings by the MongoDB component. Use this dashboard to determine the components producing multiple errors or warnings. Access @@ -356,14 +352,14 @@ import CreateMonitors from '../../../reuse/apps/create-monitors.md'; | Name | Description | Alert Condition | Recover Condition | |:--|:--|:--|:--| | `MongoDB - Instance Down` | This alert is triggered when we detect that the MongoDB instance is down. | Missing Data | Data Found | -| `MongoDB - Replication Error` | This alert is triggered when we detect errors in MongoDB replication operations. | Count `>` 0 | Count `<=` 0 | -| `MongoDB - Replication Heartbeat Error` | This alert is triggered when we detect that the MongoDB Replication Heartbeat request has errors, which indicates replication is not working as expected. | Count `>` 0 | Count `<=` 0 | -| `MongoDB - Secondary Node Replication Failure` | This alert is triggered when we detect that a MongoDB secondary node is out of sync for replication. | Count `>` 0 | Count `<=` 0 | -| `MongoDB - Sharding Balancer Failure` | This alert is triggered when we detect that data balancing failed on a MongoDB Cluster with 1 mongos instance and 3 mongod instances. | Count `>` 0 | Count `<=` 0 | -| `MongoDB - Sharding Chunk Split Failure` | This alert is triggered when we detect that a MongoDB chunk not been split during sharding. | Count `>` 0 | Count `<=` 0 | -| `MongoDB - Sharding Error` | This alert is triggered when we detect errors in MongoDB sharding operations. | Count `>` 0 | Count `<=` 0 | -| `MongoDB - Sharding Warning` | This alert is triggered when we detect warnings in MongoDB sharding operations. | Count `>` 0 | Count `<=` 0 | -| `MongoDB - Slow Queries` | This alert is triggered when we detect that a MongoDB cluster is executing slow queries. | Count `>` 0 | Count `<=` 0 | -| `MongoDB - Too Many Connections` | This alert is triggered when we detect a given MongoDB server has too many connections (over 80% of capacity). | Count `>=` 80 | Count `<` 80 | -| `MongoDB - Too Many Cursors Open` | This alert is triggered when we detect that there are too many cursors (>10K) opened by MongoDB. | Count `>=` 10000 | Count `<` 10000 | -| `MongoDB - Too Many Cursors Timeouts` | This alert is triggered when we detect that there are too many cursors (100) timing out on a MongoDB server within a 5 minute time interval. | Count `>=` 100 | Count `<` 100 | +| `MongoDB - Replication Error` | This alert is triggered when we detect errors in MongoDB replication operations. | Count > 0 | Count < = 0 | +| `MongoDB - Replication Heartbeat Error` | This alert is triggered when we detect that the MongoDB Replication Heartbeat request has errors, which indicates replication is not working as expected. | Count > 0 | Count < = 0 | +| `MongoDB - Secondary Node Replication Failure` | This alert is triggered when we detect that a MongoDB secondary node is out of sync for replication. | Count > 0 | Count < = 0 | +| `MongoDB - Sharding Balancer Failure` | This alert is triggered when we detect that data balancing failed on a MongoDB Cluster with 1 mongos instance and 3 mongod instances. | Count > 0 | Count < = 0 | +| `MongoDB - Sharding Chunk Split Failure` | This alert is triggered when we detect that a MongoDB chunk not been split during sharding. | Count > 0 | Count < = 0 | +| `MongoDB - Sharding Error` | This alert is triggered when we detect errors in MongoDB sharding operations. | Count > 0 | Count < = 0 | +| `MongoDB - Sharding Warning` | This alert is triggered when we detect warnings in MongoDB sharding operations. | Count > 0 | Count < = 0 | +| `MongoDB - Slow Queries` | This alert is triggered when we detect that a MongoDB cluster is executing slow queries. | Count > 0 | Count < = 0 | +| `MongoDB - Too Many Connections` | This alert is triggered when we detect a given MongoDB server has too many connections (over 80% of capacity). | Count > = 80% | Count < 80% | +| `MongoDB - Too Many Cursors Open` | This alert is triggered when we detect that there are too many cursors (>10K) opened by MongoDB. | Count > = 10000 | Count < 10000 | +| `MongoDB - Too Many Cursors Timeouts` | This alert is triggered when we detect that there are too many cursors (100) timing out on a MongoDB server within a 5 minute time interval. | Count > = 100 | Count < 100 | From 86c6b0a0dbf1c283c856d04455e63c37171a3180 Mon Sep 17 00:00:00 2001 From: Jagadisha V <129049263+JV0812@users.noreply.github.com> Date: Mon, 9 Dec 2024 14:40:09 +0530 Subject: [PATCH 8/9] Update redis-opentelemetry.md --- .../databases/opentelemetry/redis-opentelemetry.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/integrations/databases/opentelemetry/redis-opentelemetry.md b/docs/integrations/databases/opentelemetry/redis-opentelemetry.md index 3ea4e3b9db..64eac0af57 100644 --- a/docs/integrations/databases/opentelemetry/redis-opentelemetry.md +++ b/docs/integrations/databases/opentelemetry/redis-opentelemetry.md @@ -35,7 +35,7 @@ The following are [fields](/docs/manage/fields/) that will be created as part of * **`db.cluster.name`**. User configured. Enter a name to identify this Redis cluster. This cluster name will be shown in the Sumo Logic dashboards. * **`db.system`**. Has fixed value of redis. -* **`deployment.environment`**. User configured. This is the deployment environment where the Redis cluster resides. For example: dev, prod or qa. +* **`deployment.environment`**. User configured. This is the deployment environment where the Redis cluster resides. For example: dev, prod, or qa. * **`db.node.name`**. Has the value of host name of the machine which is being monitored. * **`sumo.datasource`**. Has fixed value of redis. @@ -257,9 +257,9 @@ import CreateMonitors from '../../../reuse/apps/create-monitors.md'; | Name | Description | Alert Condition | Recover Condition | |:--|:--|:--|:--| -| `Redis - High Memory Fragmentation Ratio` | This alert is triggered when the ration of Redis memory usage to Linux virtual memory pages (mapped to physical memory chunks) is higher than 1.5. A high ratio will lead to swapping and can adversely affect performance. | Count `>` 1.5 | Count `<=` 1.5 | -| `Redis - Instance Down` | This alert is triggered when we detect that the Redis instance is down for 5 minutes. | Missing Data | Data Found | -| `Redis - Rejected Connections` | This alert is triggered when we detect that some connections to a Redis cluster have been rejected. | Count `>` 0 | Count `<=` 0 | -| `Redis - Replication Broken` | This alert is triggered when we detect that a Redis instance has lost all slaves. This will affect the redundancy of data stored in Redis. Please review how replication has been configured. | Count `<` 1 | Count `>=` 1 | -| `Redis - Replication Offset` | This alert is triggered when the replication offset in a given Redis cluster is greater than 1 MB for last five minutes. Please review how replication has been configured. | Count `>=` 1 | Count `<` 1 | -| `Redis - Too Many Connections` | This alert is triggered when we detect a given Redis server has too many connections (default is 100). | Count `>` 100 | Count `<=` 100 | +| `Redis - High Memory Fragmentation Ratio` | This alert is triggered when the ratio of Redis memory usage to Linux virtual memory pages (mapped to physical memory chunks) is higher than 1.5. A high ratio will lead to swapping and can adversely affect performance. | Count > 1.5 | Count < = 1.5 | +| `Redis - Instance Down` | This alert is triggered if the Redis instance is down for 5 minutes. | Missing Data | Data Found | +| `Redis - Rejected Connections` | This alert is triggered when some connections to a Redis cluster have been rejected. | Count > 0 | Count < = 0 | +| `Redis - Replication Broken` | This alert is triggered when a Redis instance has lost all slaves. This will affect the redundancy of data stored in Redis. We suggest you to review how replication has been configured. | Count < 1 | Count > = 1 | +| `Redis - Replication Offset` | This alert is triggered when the replication offset in a given Redis cluster is greater than 1 MB for last 5 minutes. We suggest you to review how replication has been configured. | Count > = 1 | Count < 1 | +| `Redis - Too Many Connections` | This alert is triggered when a given Redis server has too many connections (default is 100). | Count > 100 | Count < = 100 | From ef63d56058706d1d237ab82d7bb4d85b164465ce Mon Sep 17 00:00:00 2001 From: Jagadisha V <129049263+JV0812@users.noreply.github.com> Date: Mon, 9 Dec 2024 14:42:52 +0530 Subject: [PATCH 9/9] Update haproxy-opentelemetry.md --- .../opentelemetry/haproxy-opentelemetry.md | 50 +++++++++---------- 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/docs/integrations/web-servers/opentelemetry/haproxy-opentelemetry.md b/docs/integrations/web-servers/opentelemetry/haproxy-opentelemetry.md index 3d87118445..cba874ef1c 100644 --- a/docs/integrations/web-servers/opentelemetry/haproxy-opentelemetry.md +++ b/docs/integrations/web-servers/opentelemetry/haproxy-opentelemetry.md @@ -19,7 +19,7 @@ The OpenTelemetry collector runs on the same host as HAProxy, where it uses the ## HAProxy log types -The app supports Logs from the open source version of HAProxy. The App is tested on the 2.3.9 version of HAProxy. +The app supports logs from the open source version of HAProxy. This app is tested on the `2.3.9` version of HAProxy. The HAProxy logs are generated in files as configured in the configuration file `/etc/haproxy/haproxy.cfg` ([learn more](https://www.haproxy.com/blog/introduction-to-haproxy-logging/)). @@ -189,6 +189,24 @@ May 13 08:24:43 localhost haproxy[21813]: 27.2.81.92:64274 [13/May/2021:08:24:43.921] web-edupia.vn-4 ``` +## Sample metrics + +```json +{ + "Query": "A", + "metric": "avg", + "haproxy.proxy_name": "stats", + "webengine.cluster.name": "haproxy_otel_cluster", + "webengine.node.name": "node1", + "min": 3385124.8, + "max": 3553632, + "latest": 3553632, + "avg": 3469494.86851211, + "sum": 1002684017.0, + "count": 289, +} +``` + ## Sample queries ### Logs @@ -224,25 +242,7 @@ sumo.datasource=haproxy metric=haproxy.requests.total status_code=* haproxy.serv | avg by webengine.cluster.name,webengine.node.name,haproxy.proxy_name,code ``` -## Sample metrics - -```json -{ - "Query": "A", - "metric": "avg", - "haproxy.proxy_name": "stats", - "webengine.cluster.name": "haproxy_otel_cluster", - "webengine.node.name": "node1", - "min": 3385124.8, - "max": 3553632, - "latest": 3553632, - "avg": 3469494.86851211, - "sum": 1002684017.0, - "count": 289, -} -``` - -## Viewing HAProxy dashboards +## Viewing the HAProxy dashboards ### Overview @@ -365,8 +365,8 @@ import CreateMonitors from '../../../reuse/apps/create-monitors.md'; | Name | Description | Alert Condition | Recover Condition | |:--|:--|:--|:--| -| `HAProxy - Access from Highly Malicious Sources` | This alert is triggered when an HAProxy is accessed from highly malicious IP addresses. | Count `>` 0 | Count `<=` 0 | -| `HAProxy - Backend Error` | This alert is triggered when we detect backend server errors. | Count `>` 0 | Count `<=` 0 | -| `HAProxy - Backend Server Down` | This alert is triggered when we detect a backend server for a given HAProxy server is down. | Count `>` 0 | Count `<=` 0 | -| `HAProxy - High Client (HTTP 4xx) Error Rate` | This alert is triggered when there are too many HTTP requests (>5%) with a response status of 4xx. | Count `>` 0 | Count `<=` 0 | -| `HAProxy - High Server (HTTP 5xx) Error Rate` | This alert fires when there are too many HTTP requests (>5%) with a response status of 5xx. | Count `>` 0 | Count `<=` 0 | +| `HAProxy - Access from Highly Malicious Sources` | This alert is triggered when an HAProxy is accessed from highly malicious IP addresses. | Count > 0 | Count < = 0 | +| `HAProxy - Backend Error` | This alert is triggered when backend server error is detected. | Count > 0 | Count < = 0 | +| `HAProxy - Backend Server Down` | This alert is triggered when a backend server for a given HAProxy server is down. | Count > 0 | Count < = 0 | +| `HAProxy - High Client (HTTP 4xx) Error Rate` | This alert is triggered when there are too many HTTP requests (>5%) with a response status of 4xx. | Count > 0 | Count < = 0 | +| `HAProxy - High Server (HTTP 5xx) Error Rate` | This alert fires when there are too many HTTP requests (>5%) with a response status of 5xx. | Count > 0 | Count < = 0 |