Add Hazelcast integration

DataDog · Apr 7, 2020 · c04ceab · c04ceab
1 parent c76862b
commit c04ceab
Show file tree

Hide file tree

Showing 25 changed files with 2,019 additions and 1 deletion.
diff --git a/.azure-pipelines/templates/test-all-checks.yml b/.azure-pipelines/templates/test-all-checks.yml
@@ -1,6 +1,5 @@
 parameters:
   pip_cache_config: null
-
 jobs:
 - template: ./test-all.yml
   parameters:
@@ -156,6 +155,9 @@ jobs:
     - checkName: harbor
       displayName: Harbor
       os: linux
+    - checkName: hazelcast
+      displayName: Hazelcast
+      os: linux
     - checkName: hdfs_datanode
       displayName: HDFS Datanode
       os: linux

diff --git a/hazelcast/CHANGELOG.md b/hazelcast/CHANGELOG.md
@@ -0,0 +1,2 @@
+# CHANGELOG - Hazelcast
+
diff --git a/hazelcast/MANIFEST.in b/hazelcast/MANIFEST.in
@@ -0,0 +1,7 @@
+graft datadog_checks
+
+include MANIFEST.in
+include README.md
+include manifest.json
+
+global-exclude *.py[cod] __pycache__
diff --git a/hazelcast/README.md b/hazelcast/README.md
@@ -0,0 +1,168 @@
+# Agent Check: Hazelcast
+
+## Overview
+
+This check monitors [Hazelcast][1].
+
+## Setup
+
+### Installation
+
+The Hazelcast check is included in the [Datadog Agent][2] package.
+No additional installation is needed on your server.
+
+### Configuration
+
+#### Host
+
+Follow the instructions below to configure this check for an Agent running on a host. For containerized environments, see the [Containerized](#containerized) section.
+
+##### Metric collection
+
+1. Edit the `hazelcast.d/conf.yaml` file, in the `conf.d/` folder at the root of your
+   Agent's configuration directory to start collecting your hazelcast performance data.
+   See the [sample hazelcast.d/conf.yaml][3] for all available configuration options.
+
+   This check has a limit of 350 metrics per instance. The number of returned metrics is indicated in the info page.
+   You can specify the metrics you are interested in by editing the configuration below.
+   To learn how to customize the metrics to collect visit the [JMX Checks documentation][4] for more detailed instructions.
+   If you need to monitor more metrics, contact [Datadog support][5].
+
+2. [Restart the Agent][6]
+
+##### Log collection
+
+1. Hazelcast support many different [logging adapters][7]. Here is an example of a `log4j2.properties` file:
+
+   ```text
+   rootLogger=file
+   rootLogger.level=info
+   property.filepath=/path/to/log/files
+   property.filename=hazelcast
+
+   appender.file.type=RollingFile
+   appender.file.name=RollingFile
+   appender.file.fileName=${filepath}/${filename}.log
+   appender.file.filePattern=${filepath}/${filename}-%d{yyyy-MM-dd}-%i.log.gz
+   appender.file.layout.type=PatternLayout
+   appender.file.layout.pattern = %d{yyyy-MM-dd HH:mm:ss} [%thread] %level{length=10} %c{1}:%L - %m%n
+   appender.file.policies.type=Policies
+   appender.file.policies.time.type=TimeBasedTriggeringPolicy
+   appender.file.policies.time.interval=1
+   appender.file.policies.time.modulate=true
+   appender.file.policies.size.type=SizeBasedTriggeringPolicy
+   appender.file.policies.size.size=50MB
+   appender.file.strategy.type=DefaultRolloverStrategy
+   appender.file.strategy.max=100
+
+   rootLogger.appenderRefs=file
+   rootLogger.appenderRef.file.ref=RollingFile
+
+   #Hazelcast specific logs.
+
+   #log4j.logger.com.hazelcast=debug
+
+   #log4j.logger.com.hazelcast.cluster=debug
+   #log4j.logger.com.hazelcast.partition=debug
+   #log4j.logger.com.hazelcast.partition.InternalPartitionService=debug
+   #log4j.logger.com.hazelcast.nio=debug
+   #log4j.logger.com.hazelcast.hibernate=debug
+   ```
+
+2. By default, our integration pipeline supports the following conversion [pattern][8]:
+
+   ```text
+   %d{yyyy-MM-dd HH:mm:ss} [%thread] %level{length=10} %c{1}:%L - %m%n
+   ```
+
+    Clone and edit the [integration pipeline][9] if you have a different format.
+
+3. Collecting logs is disabled by default in the Datadog Agent, enable it in your `datadog.yaml` file:
+
+   ```yaml
+   logs_enabled: true
+   ```
+
+4. Add the following configuration block to your `hazelcast.d/conf.yaml` file. Change the `path` and `service` parameter values based on your environment. See the [sample hazelcast.d/conf.yaml][3] for all available configuration options.
+
+   ```yaml
+   logs:
+     - type: file
+       path: /var/log/hazelcast.log
+       source: hazelcast
+       service: <SERVICE>
+       log_processing_rules:
+         - type: multi_line
+           name: log_start_with_date
+           pattern: \d{4}\.\d{2}\.\d{2}
+   ```
+
+5. [Restart the Agent][6].
+
+#### Containerized
+
+##### Metric collection
+
+For containerized environments, see the [Autodiscovery with JMX][10] guide.
+
+##### Log collection
+
+Collecting logs is disabled by default in the Datadog Agent. To enable it, see [Docker log collection][11].
+
+| Parameter      | Value                                              |
+| -------------- | -------------------------------------------------- |
+| `<LOG_CONFIG>` | `{"source": "hazelcast", "service": "<SERVICE_NAME>"}` |
+
+### Validation
+
+[Run the Agent's status subcommand][12] and look for `hazelcast` under the **JMXFetch** section:
+
+```text
+========
+JMXFetch
+========
+  Initialized checks
+  ==================
+    hazelcast
+      instance_name : hazelcast-localhost-9999
+      message :
+      metric_count : 46
+      service_check_count : 0
+      status : OK
+```
+
+## Data Collected
+
+### Metrics
+
+See [metadata.csv][13] for a list of metrics provided by this check.
+
+### Service Checks
+
+**hazelcast.can_connect**:<br>
+Returns `CRITICAL` if the Agent is unable to connect to and collect metrics from the monitored Hazelcast instance, otherwise returns `OK`.
+
+**hazelcast.mc_cluster_state**:<br>
+Represents the state of the Hazelcast Management Center as indicated by its health check.
+
+### Events
+
+Hazelcast does not include any events.
+
+## Troubleshooting
+
+Need help? Contact [Datadog support][5].
+
+[1]: https://hazelcast.org
+[2]: https://docs.datadoghq.com/agent/
+[3]: https://github.com/DataDog/integrations-core/blob/master/hazelcast/datadog_checks/hazelcast/data/conf.yaml.example
+[4]: https://docs.datadoghq.com/integrations/java
+[5]: https://docs.datadoghq.com/help
+[6]: https://docs.datadoghq.com/agent/guide/agent-commands/#start-stop-and-restart-the-agent
+[7]: https://docs.hazelcast.org/docs/latest/manual/html-single/index.html#logging-configuration
+[8]: https://logging.apache.org/log4j/2.x/manual/layouts.html#Patterns
+[9]: https://docs.datadoghq.com/logs/processing/#integration-pipelines
+[10]: https://docs.datadoghq.com/agent/guide/autodiscovery-with-jmx/?tab=containerizedagent
+[11]: https://docs.datadoghq.com/agent/docker/log/
+[12]: https://docs.datadoghq.com/agent/guide/agent-commands/#agent-status-and-information
+[13]: https://github.com/DataDog/integrations-core/blob/master/hazelcast/metadata.csv
diff --git a/hazelcast/assets/configuration/spec.yaml b/hazelcast/assets/configuration/spec.yaml
@@ -0,0 +1,28 @@
+name: Hazelcast
+files:
+- name: hazelcast.yaml
+  options:
+  - template: init_config
+    options:
+    - template: init_config/jmx
+      overrides:
+        is_jmx.value.example: false
+    - template: init_config/default
+  - template: instances
+    options:
+    - template: instances/jmx
+      overrides:
+        host.description: Hazelcast or Hazelcast Management Center server with which to connect.
+        port.description: Hazelcast or Hazelcast Management Center port with which to connect.
+        port.value.example: 1099
+    - template: instances/default
+  - template: logs
+    example:
+    - type: file
+      path: /var/log/hazelcast.log
+      source: hazelcast
+      service: <SERVICE>
+      log_processing_rules:
+      - type: multi_line
+        name: log_start_with_date
+        pattern: \d{4}\.\d{2}\.\d{2}
diff --git a/hazelcast/assets/service_checks.json b/hazelcast/assets/service_checks.json
@@ -0,0 +1,33 @@
+[
+    {
+        "agent_version": "6.20.0",
+        "integration": "Hazelcast",
+        "groups": [
+            "host",
+            "instance"
+        ],
+        "statuses": [
+            "ok",
+            "critical"
+        ],
+        "check": "hazelcast.can_connect",
+        "name": "Can Connect",
+        "description": "Returns `CRITICAL` if the Agent is unable to connect to Hazelcast, otherwise returns `OK`."
+    },
+    {
+        "agent_version": "6.20.0",
+        "integration": "Hazelcast",
+        "groups": [
+            "host",
+            "endpoint"
+        ],
+        "statuses": [
+            "ok",
+            "warning",
+            "critical"
+        ],
+        "check": "hazelcast.mc_cluster_state",
+        "name": "Management Center cluster state",
+        "description": "Represents the state of the Hazelcast Management Center as indicated by its health check."
+    }
+]
diff --git a/hazelcast/datadog_checks/__init__.py b/hazelcast/datadog_checks/__init__.py
@@ -0,0 +1,4 @@
+# (C) Datadog, Inc. 2020-present
+# All rights reserved
+# Licensed under a 3-clause BSD style license (see LICENSE)
+__path__ = __import__('pkgutil').extend_path(__path__, __name__)  # type: ignore
diff --git a/hazelcast/datadog_checks/hazelcast/__about__.py b/hazelcast/datadog_checks/hazelcast/__about__.py
@@ -0,0 +1,4 @@
+# (C) Datadog, Inc. 2020-present
+# All rights reserved
+# Licensed under a 3-clause BSD style license (see LICENSE)
+__version__ = '0.0.1'
diff --git a/hazelcast/datadog_checks/hazelcast/__init__.py b/hazelcast/datadog_checks/hazelcast/__init__.py
@@ -0,0 +1,7 @@
+# (C) Datadog, Inc. 2020-present
+# All rights reserved
+# Licensed under a 3-clause BSD style license (see LICENSE)
+from .__about__ import __version__
+from .check import HazelcastCheck
+
+__all__ = ['__version__', 'HazelcastCheck']
diff --git a/hazelcast/datadog_checks/hazelcast/check.py b/hazelcast/datadog_checks/hazelcast/check.py
@@ -0,0 +1,50 @@
+# (C) Datadog, Inc. 2020-present
+# All rights reserved
+# Licensed under a 3-clause BSD style license (see LICENSE)
+from datadog_checks.base import AgentCheck
+
+from . import utils
+
+
+class HazelcastCheck(AgentCheck):
+    __NAMESPACE__ = 'hazelcast'
+    SERVICE_CHECK_CONNECT = 'can_connect'
+    SERVICE_CHECK_MC_CLUSTER_STATE = 'mc_cluster_state'
+
+    def __init__(self, name, init_config, instances):
+        super(HazelcastCheck, self).__init__(name, init_config, instances)
+
+        self._mc_health_check_endpoint = self.instance.get('mc_health_check_endpoint', '')
+        if self._mc_health_check_endpoint and not self._mc_health_check_endpoint.startswith('http'):
+            self._mc_health_check_endpoint = 'http://{}'.format(self._mc_health_check_endpoint)
+
+        self._mc_cluster_states = utils.ServiceCheckStatus(
+            utils.MC_CLUSTER_STATES, self.instance.get('mc_cluster_states', {})
+        )
+
+        self._tags = tuple(self.instance.get('tags', []))
+
+    def check(self, _):
+        self.process_mc_health_check()
+
+    def process_mc_health_check(self):
+        url = self._mc_health_check_endpoint
+        if not url:
+            return
+
+        tags = ['endpoint:{}'.format(url)]
+        tags.extend(self._tags)
+
+        try:
+            response = self.http.get(self._mc_health_check_endpoint)
+            response.raise_for_status()
+            status = response.json()
+        except Exception:
+            self.service_check(self.SERVICE_CHECK_CONNECT, AgentCheck.CRITICAL, tags=tags)
+            raise
+        else:
+            self.service_check(self.SERVICE_CHECK_CONNECT, AgentCheck.OK, tags=tags)
+
+        self.service_check(
+            self.SERVICE_CHECK_MC_CLUSTER_STATE, self._mc_cluster_states.get(status['managementCenterState']), tags=tags
+        )