Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Logging-based alarms, part 1 (logback API)
This is part 1 (out of 4) of the third version of the proposed alarms component. In the previous version, a Java hierarchy was defined for Alarm types. This required actual code changes every time a new alarm is defined for use, in particular: - an implementation of the new alarm type - use of this type at the call sites generating the alarm (plus the requirement that the logging call explicitly add the Marker as the first parameter). After reconsideration, I deemed this too heavy-weight and invasive. This patch modifies the API/logging core of the alarms component so it is much more flexible and does not necessarily demand changes to the code (if the necessary logging statements concerning the alarm are already present); moreover, there is no difference between adding a logging statement and an alarm statement; any logging statement could be redefined as an alarm after the fact. This is accomplished as follows: 1. An AlarmDefinition class allows for a JSON description of the alarm to be provided as part of a logging filter. Refer to the attached image for a capture of the Javadoc for this class. 2. The AlarmDefinitionFilter is provided for use with the AlarmDefinitionAppender. This filter intercepts logging events and attempts to match them on the basis of available AlarmDefinitions; when a definition match is found, the event is accepted; otherwise it is denied. The match relies on an implicit function (logger,level,regex,thread)->definition; hence a given alarm type can be generated by more than one logger; a logger in turn can send multiple type of alarms if these are mapped to different logging levels (e.g., fatal, error, warn), thread names and/or regex matches on the message string. 3. Before accepting the event, the filter calls a method on the matching definition type which embeds the type name and a JSON string representing the alarm in the original event's MDC map, for immediate downstream use in the same thread by the appender. 4. The accepted logging event then arrives at the AlarmDefinitionAppender, which converts the original event into an Alarm event by accessing the two embedded properties and turning them into the event Marker and message, respectively. The new event is then passed to this appender's children appenders, such as a SocketAppender which can send the event off to a remote Logback server. The logback.xml included in skel/etc has been modified so that an alarm appender is added to the root logger. The alarm appender in turn has a child Socket appender set to send events on port 60001 to localhost (this should be modified, depending on the location of the remote server). The second patch in this series includes the shell command script for launching a remote socket server to accept these events. The only actual alarm currently defined is for checksum errors generated by the logger in org.dcache.pool.classic.ChecksumScanner. The user/admin, however, may define additional alarms simply by including other <alarmType> elements in the <filter> element (fuller info available in the AlarmDefinition Javadoc): <!-- processes events from all loggers into alarms on the basis of the alarmType definitions provided; for further information, see the javadoc for org.dcache.alarms.logback.AlarmDefinition and the dCache Book --> <appender name="alarms" class="org.dcache.alarms.logback.AlarmDefinitionAppender"> <!-- this filter determines which events are to be interpreted as alarms; the appender converts these into alarm events and passes them to its embedded child appender(s) --> <filter class="org.dcache.alarms.logback.AlarmDefinitionFilter"> <alarmType> logger:org.dcache.pool.classic.ChecksumScanner, regex:"Checksum mismatch", type:CHECKSUM, level:ERROR, severity:MODERATE, include-in-key:message.type.host.service.domain </alarmType> </filter> <appender-ref ref="remote"/> </appender> This patch supersedes http://rb.dcache.org/r/4662; some of the code is still the same, but the structure has been largely simplified. There follow three more patches which provide the storage (DAO), front-end (webadmin) and unit test parts of the full implementation. Target: trunk Patch: http://rb.dcache.org/r/4885 Acked-by: Dmitry Require-notes: yes Require-book: yes
- Loading branch information