diff --git a/symmetric-assemble/src/asciidoc/configuration.ad b/symmetric-assemble/src/asciidoc/configuration.ad index dc50e2012f..49ab640212 100644 --- a/symmetric-assemble/src/asciidoc/configuration.ad +++ b/symmetric-assemble/src/asciidoc/configuration.ad @@ -43,6 +43,9 @@ include::configuration/parameters.ad[] ifdef::pro[] include::configuration/users.ad[] include::configuration/ldap.ad[] -include::configuration/mail-server.ad[] include::configuration/license-key.ad[] endif::pro[] + +include::configuration/mail-server.ad[] +include::configuration/monitors.ad[] +include::configuration/notifications.ad[] diff --git a/symmetric-assemble/src/asciidoc/configuration/monitors.ad b/symmetric-assemble/src/asciidoc/configuration/monitors.ad new file mode 100644 index 0000000000..ba9040ddb3 --- /dev/null +++ b/symmetric-assemble/src/asciidoc/configuration/monitors.ad @@ -0,0 +1,47 @@ + +=== Monitors + +A monitor watches some part of the system for a problem, checking to see if the monitored value exceeds a threshold. +(To be notified immediately of new monitor events, configure a notification.) + +Monitor ID:: The monitor ID is a unique name to refer to the monitor. + +ifndef::pro[] +Node Group ID:: The node group that will run this monitor. Use "ALL" to match all groups. +External ID:: The external ID of nodes that will run this monitor. Use "ALL" to match all nodes. +endif::pro[] +ifdef::pro[] +Target Nodes:: The group of nodes that will run this monitor. +endif::pro[] + +Monitor Type:: The monitor type is one of several built-in or custom types that run a specific check and return a numeric value that can +be compared to a threshold value. + +[cols="<2,<7", options="header"] +|=== +|Type +|Description + +|cpu|Percentage from 0 to 100 of CPU usage for the server process. + +|disk|Percentage from 0 to 100 of disk usage (tmp folder staging area) available to the server process. + +|memory|Percentage from 0 to 100 of memory usage (tenured heap pool) available to the server process. + +|batchError|Number of incoming and outgoing batches in error. + +|batchUnsent|Number of outgoing batches waiting to be sent. + +|dataUnrouted|Number of change capture rows that are waiting to be batched and sent. + +|dataGaps|Number of active data gaps that are being checked during routing for data to commit. + +|=== + +Threshold:: When this threshold value is reached or exceeded, an event is recorded. +Run Period:: The time in seconds of how often to run this monitor. The monitor job runs on a period also, so the monitor can only run as often +as the monitor job. +Run Count:: The number of times to run the monitor before calculating an average value to compare against the threshold. +Severity Level:: The importance of this monitor event when it exceeds the threshold. +Enabled:: Whether or not this monitor is enabled to run. + diff --git a/symmetric-assemble/src/asciidoc/configuration/notifications.ad b/symmetric-assemble/src/asciidoc/configuration/notifications.ad new file mode 100644 index 0000000000..8dc881468e --- /dev/null +++ b/symmetric-assemble/src/asciidoc/configuration/notifications.ad @@ -0,0 +1,40 @@ + +=== Notifications + +A notification sends a message to the user when a monitor event records a system problem. +First configure a monitor to watch the system and record events with a specific severity level. +Then, configure a notification to match the severity level and write to the log or send an email. + +Notification ID:: The notification ID is a unique name to refer to the notification. + +ifndef::pro[] +Node Group ID:: The node group that will run this monitor. Use "ALL" to match all groups. +External ID:: The external ID of nodes that will run this monitor. Use "ALL" to match all nodes. +endif::pro[] +ifdef::pro[] +Target Nodes:: The group of nodes that will run this monitor. +endif::pro[] + +Notification Type:: The notification type is either a built-in or custom type that is given the list of monitor events to send. + +[cols="<2,<7", options="header"] +|=== +|Type +|Description + +|log|The monitor events are written to the log using the same severity level. +ifdef::pro[] +The web console will indicate WARN and ERROR level notifications in the top-right corner, which are also displayed on the main Dashboard screen. +endif::pro[] + +|email|The monitor events are sent in an email to a list of recipients. Use the expression for the comma-separated list of email addresses. +ifdef::pro[] +Use the Configure->Mail Server screen to configure a mail server to use for sending emails. +endif::pro[] + +|=== + +Expression:: Additional information to configure the notification type. +Severity Level:: Find monitor events that occur at this severity level or above. +Enabled:: Whether or not this notification is enabled to run. + diff --git a/symmetric-assemble/src/asciidoc/images/manage/manage-monitors.png b/symmetric-assemble/src/asciidoc/images/manage/manage-monitors.png new file mode 100644 index 0000000000..4b4bec6c50 Binary files /dev/null and b/symmetric-assemble/src/asciidoc/images/manage/manage-monitors.png differ diff --git a/symmetric-assemble/src/asciidoc/manage.ad b/symmetric-assemble/src/asciidoc/manage.ad index 735f591217..0def81a27b 100644 --- a/symmetric-assemble/src/asciidoc/manage.ad +++ b/symmetric-assemble/src/asciidoc/manage.ad @@ -74,5 +74,7 @@ include::manage/jvm-properties.ad[] include::manage/jvm-threads.ad[] endif::pro[] +include::manage/monitors.ad[] + include::manage/logging.ad[] diff --git a/symmetric-assemble/src/asciidoc/manage/monitors.ad b/symmetric-assemble/src/asciidoc/manage/monitors.ad new file mode 100644 index 0000000000..53ef3307fb --- /dev/null +++ b/symmetric-assemble/src/asciidoc/manage/monitors.ad @@ -0,0 +1,28 @@ +=== Monitors + +ifdef::pro[] + +The Monitors screen allows you to view events of system problems recorded by both local and remote nodes. +The list of events can be filtered to limit the number of events displayed, event type, severity level, and node ID. +Filtering by severity level will match the level you choose and any level above it. +Events are listed in descending order by event time, but the order can be changed by clicking column headings. +The remove button will clear the event from the table on the current node. Events are purged automatically each night +using the `purge.retention.minutes` parameter to remove by event time. + +image::manage/manage-monitors.png[] + +endif::pro[] + +When a <<_monitors,Monitor>> is configured, it is run periodically to check the current value of a system metric and compare it to a threshold value. +Different monitor types can check the CPU usage, disk usage, memory usage, batch errors, outstanding batches, unrouted data, and number +of data gaps. +Custom monitor types can be created using <<_extensions,Extensions>> that use the IMonitorType interface. +When the value returned from the check meets or exceeds the threshold value, a <<_monitor_event>> is recorded. +The <<_monitor_event>> table is synchronized on the heartbeat channel, which allows a central server to see events from remote nodes, +but this behavior can be disabled by setting the `monitor.events.capture.enabled` parameter to false. + +To be immediately notified of a monitor event, use <<_notifications,Notifications>> to match on the severity level. +Different notification type can send a message by writing to the log or sending an email. +Custom notification types can be created using <<_extensions,Extensions>> that use the INotificationType interface. +In order to send email, the <<_mail_server,Mail Server>> should be configured. +