Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
0002547: Notification of system event problems
- Loading branch information
Showing
6 changed files
with
121 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
|
||
=== Monitors | ||
|
||
A monitor watches some part of the system for a problem, checking to see if the monitored value exceeds a threshold. | ||
(To be notified immediately of new monitor events, configure a notification.) | ||
|
||
Monitor ID:: The monitor ID is a unique name to refer to the monitor. | ||
|
||
ifndef::pro[] | ||
Node Group ID:: The node group that will run this monitor. Use "ALL" to match all groups. | ||
External ID:: The external ID of nodes that will run this monitor. Use "ALL" to match all nodes. | ||
endif::pro[] | ||
ifdef::pro[] | ||
Target Nodes:: The group of nodes that will run this monitor. | ||
endif::pro[] | ||
|
||
Monitor Type:: The monitor type is one of several built-in or custom types that run a specific check and return a numeric value that can | ||
be compared to a threshold value. | ||
|
||
[cols="<2,<7", options="header"] | ||
|=== | ||
|Type | ||
|Description | ||
|
||
|cpu|Percentage from 0 to 100 of CPU usage for the server process. | ||
|
||
|disk|Percentage from 0 to 100 of disk usage (tmp folder staging area) available to the server process. | ||
|
||
|memory|Percentage from 0 to 100 of memory usage (tenured heap pool) available to the server process. | ||
|
||
|batchError|Number of incoming and outgoing batches in error. | ||
|
||
|batchUnsent|Number of outgoing batches waiting to be sent. | ||
|
||
|dataUnrouted|Number of change capture rows that are waiting to be batched and sent. | ||
|
||
|dataGaps|Number of active data gaps that are being checked during routing for data to commit. | ||
|
||
|=== | ||
|
||
Threshold:: When this threshold value is reached or exceeded, an event is recorded. | ||
Run Period:: The time in seconds of how often to run this monitor. The monitor job runs on a period also, so the monitor can only run as often | ||
as the monitor job. | ||
Run Count:: The number of times to run the monitor before calculating an average value to compare against the threshold. | ||
Severity Level:: The importance of this monitor event when it exceeds the threshold. | ||
Enabled:: Whether or not this monitor is enabled to run. | ||
|
40 changes: 40 additions & 0 deletions
40
symmetric-assemble/src/asciidoc/configuration/notifications.ad
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
|
||
=== Notifications | ||
|
||
A notification sends a message to the user when a monitor event records a system problem. | ||
First configure a monitor to watch the system and record events with a specific severity level. | ||
Then, configure a notification to match the severity level and write to the log or send an email. | ||
|
||
Notification ID:: The notification ID is a unique name to refer to the notification. | ||
|
||
ifndef::pro[] | ||
Node Group ID:: The node group that will run this monitor. Use "ALL" to match all groups. | ||
External ID:: The external ID of nodes that will run this monitor. Use "ALL" to match all nodes. | ||
endif::pro[] | ||
ifdef::pro[] | ||
Target Nodes:: The group of nodes that will run this monitor. | ||
endif::pro[] | ||
|
||
Notification Type:: The notification type is either a built-in or custom type that is given the list of monitor events to send. | ||
|
||
[cols="<2,<7", options="header"] | ||
|=== | ||
|Type | ||
|Description | ||
|
||
|log|The monitor events are written to the log using the same severity level. | ||
ifdef::pro[] | ||
The web console will indicate WARN and ERROR level notifications in the top-right corner, which are also displayed on the main Dashboard screen. | ||
endif::pro[] | ||
|
||
|email|The monitor events are sent in an email to a list of recipients. Use the expression for the comma-separated list of email addresses. | ||
ifdef::pro[] | ||
Use the Configure->Mail Server screen to configure a mail server to use for sending emails. | ||
endif::pro[] | ||
|
||
|=== | ||
|
||
Expression:: Additional information to configure the notification type. | ||
Severity Level:: Find monitor events that occur at this severity level or above. | ||
Enabled:: Whether or not this notification is enabled to run. | ||
|
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
=== Monitors | ||
|
||
ifdef::pro[] | ||
|
||
The Monitors screen allows you to view events of system problems recorded by both local and remote nodes. | ||
The list of events can be filtered to limit the number of events displayed, event type, severity level, and node ID. | ||
Filtering by severity level will match the level you choose and any level above it. | ||
Events are listed in descending order by event time, but the order can be changed by clicking column headings. | ||
The remove button will clear the event from the table on the current node. Events are purged automatically each night | ||
using the `purge.retention.minutes` parameter to remove by event time. | ||
|
||
image::manage/manage-monitors.png[] | ||
|
||
endif::pro[] | ||
|
||
When a <<_monitors,Monitor>> is configured, it is run periodically to check the current value of a system metric and compare it to a threshold value. | ||
Different monitor types can check the CPU usage, disk usage, memory usage, batch errors, outstanding batches, unrouted data, and number | ||
of data gaps. | ||
Custom monitor types can be created using <<_extensions,Extensions>> that use the IMonitorType interface. | ||
When the value returned from the check meets or exceeds the threshold value, a <<_monitor_event>> is recorded. | ||
The <<_monitor_event>> table is synchronized on the heartbeat channel, which allows a central server to see events from remote nodes, | ||
but this behavior can be disabled by setting the `monitor.events.capture.enabled` parameter to false. | ||
|
||
To be immediately notified of a monitor event, use <<_notifications,Notifications>> to match on the severity level. | ||
Different notification type can send a message by writing to the log or sending an email. | ||
Custom notification types can be created using <<_extensions,Extensions>> that use the INotificationType interface. | ||
In order to send email, the <<_mail_server,Mail Server>> should be configured. | ||
|