Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Integrating Monit for monitoring TorQ #105

Merged
merged 27 commits into from Sep 4, 2018
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
91799bd
First attempt at the monitoring directory structure
rdanutalexandru1993 Jul 6, 2018
695bca6
Created the backbone of fills_templates.sh
rdanutalexandru1993 Jul 6, 2018
e95611d
Added function createmonitrc which takes the procname and proctype fr…
rdanutalexandru1993 Jul 9, 2018
17e46de
Modified monittemplate.txt to use the new TorQ start script
rdanutalexandru1993 Jul 9, 2018
75bd90e
Removed unnecessary quotes after createmonconfig function
rdanutalexandru1993 Jul 10, 2018
355b13d
Added new alerts in monitalert.cfg to monitor the host
rdanutalexandru1993 Jul 17, 2018
ee1fc5c
Attempted to add email alerts using mail.mailutils
rdanutalexandru1993 Jul 17, 2018
3639a6e
Merge branch 'master' into monit
rdanutalexandru1993 Jul 23, 2018
f770e3e
Merge remote-tracking branch 'origin/master' into monit
rdanutalexandru1993 Aug 6, 2018
dc0b17d
Added function checkst in monit.sh and fill_templates.sh to facilitat…
rdanutalexandru1993 Aug 7, 2018
9a8be8f
Replaced ${TORQHOME} with ${PWD} in templates/monitconfig.cfg and set…
rdanutalexandru1993 Aug 8, 2018
2dcc6c1
Modified setenv.sh to set the path for Torq direcotry based on the di…
rdanutalexandru1993 Aug 8, 2018
1aeb204
Fixed spacing for comment
rdanutalexandru1993 Aug 8, 2018
ff5d14e
Removed unnecessary testmai.sh file
rdanutalexandru1993 Aug 8, 2018
cb93dd2
Removed personal e-mail, username and password from the template monitrc
rdanutalexandru1993 Aug 9, 2018
d3e7f85
Modified monit.sh to be more flexible with how the config files are c…
rdanutalexandru1993 Aug 9, 2018
45ea5a3
Removed old monit.sh
rdanutalexandru1993 Aug 10, 2018
9d82a96
Reverted a change in gateway.q
rdanutalexandru1993 Aug 10, 2018
dbd21ff
Removed unnecessary monitconfig.cfg
rdanutalexandru1993 Aug 10, 2018
b468b3b
Removed homer.aquaq.co.uk and google.com from the PR
rdanutalexandru1993 Aug 16, 2018
add361b
Fixed typo
rdanutalexandru1993 Aug 16, 2018
cef0a1b
Merge remote-tracking branch 'origin/master' into monit
rdanutalexandru1993 Sep 3, 2018
8d0c433
FIXED error with certificates in setenv.sh
rdanutalexandru1993 Sep 3, 2018
3c36469
Added documentation for monit
rdanutalexandru1993 Sep 3, 2018
cb05f05
Fixed indentation for nested list
rdanutalexandru1993 Sep 3, 2018
54dbcaa
Minor modifications to wording
jonnypress Sep 4, 2018
396ee30
added monit to menu
jonnypress Sep 4, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
14 changes: 14 additions & 0 deletions code/scripts/templates/monitalert.cfg
@@ -0,0 +1,14 @@
check system homer.aquaq.co.uk
if loadavg (5min) > 3 for 4 cycles then alert
if loadavg (15min) > 1 for 4 cycles then alert
if memory usage > 80% for 4 cycles then alert
if swap usage > 20% for 4 cycles then alert
if cpu usage (system) > 20% for 4 cycles then alert

check host google.com with address gwadawdawdwad.com
if failed url http://gawdawdawdaw.com
then alert

check file alerttest with path /.nonexistent
then alert

1 change: 1 addition & 0 deletions code/scripts/templates/monitconfig.cfg
@@ -0,0 +1 @@
appconfig/process.csv torq.sh monitconfig.cfg
321 changes: 321 additions & 0 deletions code/scripts/templates/monitrc
@@ -0,0 +1,321 @@
###############################################################################
## Monit control file
###############################################################################
##
## Comments begin with a '#' and extend through the end of the line. Keywords
## are case insensitive. All path's MUST BE FULLY QUALIFIED, starting with '/'.
##
## Below you will find examples of some frequently used statements. For
## information about the control file and a complete list of statements and
## options, please have a look in the Monit manual.
##
##
###############################################################################
## Global section
###############################################################################
##
## Start Monit in the background (run as a daemon):
#
set daemon 30 # check services at 30 seconds intervals
#with start delay 240 # optional: delay the first check by 4-minutes (by
# # default Monit check immediately after Monit start)
#
#
## Set syslog logging. If you want to log to a standalone log file instead,
## specify the full path to the log file
#
set logfile ${PWD}/monit.log

#
#
## Set the location of the Monit lock file which stores the process id of the
## running Monit instance. By default this file is stored in $HOME/.monit.pid
#
#set pidfile $TORQMONIT/monit.pid
#
## Set the location of the Monit id file which stores the unique id for the
## Monit instance. The id is generated and stored on first Monit start. By
## default the file is placed in $HOME/.monit.id.
#
# set idfile /var/.monit.id
#set idfile $TORQMONIT/.monit.id
#
## Set the location of the Monit state file which saves monitoring states
## on each cycle. By default the file is placed in $HOME/.monit.state. If
## the state file is stored on a persistent filesystem, Monit will recover
## the monitoring state across reboots. If it is on temporary filesystem, the
## state will be lost on reboot which may be convenient in some situations.
#
set statefile ${PWD}/monit.state
#set statefile /var/lib/monit/state
#
#

## Set limits for various tests. The following example shows the default values:
##
# set limits {
# programOutput: 512 B, # check program's output truncate limit
# sendExpectBuffer: 256 B, # limit for send/expect protocol test
# fileContentBuffer: 512 B, # limit for file content test
# httpContentBuffer: 1 MB, # limit for HTTP content test
# networkTimeout: 5 seconds # timeout for network I/O
# programTimeout: 300 seconds # timeout for check program
# stopTimeout: 30 seconds # timeout for service stop
# startTimeout: 30 seconds # timeout for service start
# restartTimeout: 30 seconds # timeout for service restart
# }

## Set global SSL options (just most common options showed, see manual for
## full list).
#
# set ssl {
# verify : enable, # verify SSL certificates (disabled by default but STRONGLY RECOMMENDED)
# selfsigned : allow # allow self signed SSL certificates (reject by default)
# }
#
#
## Set the list of mail servers for alert delivery. Multiple servers may be
## specified using a comma separator. If the first mail server fails, Monit
# will use the second mail server in the list and so on. By default Monit uses
# port 25 - it is possible to override this with the PORT option.
#
# set mailserver mail.bar.baz, # primary mailserver
# backup.bar.baz port 10025, # backup mailserver on port 10025
# localhost # fallback relay
set mailserver smtp.gmail.com port 587
username "USERNAME@MAILSERVICE.COM" password "PASSWORD"
using tlsv1
with timeout 30 seconds


set alert EMAIL
#
#
## By default Monit will drop alert events if no mail servers are available.
## If you want to keep the alerts for later delivery retry, you can use the
## EVENTQUEUE statement. The base directory where undelivered alerts will be
## stored is specified by the BASEDIR option. You can limit the queue size
## by using the SLOTS option (if omitted, the queue is limited by space
## available in the back end filesystem).
#
set eventqueue
# basedir /var/lib/monit/events # set the base directory where events will be stored
basedir ${PWD}/events
slots 100 # optionally limit the queue size
#
#
## Send status and events to M/Monit (for more informations about M/Monit
## see https://mmonit.com/). By default Monit registers credentials with
## M/Monit so M/Monit can smoothly communicate back to Monit and you don't
## have to register Monit credentials manually in M/Monit. It is possible to
## disable credential registration using the commented out option below.
## Though, if safety is a concern we recommend instead using https when
## communicating with M/Monit and send credentials encrypted. The password
## should be URL encoded if it contains URL-significant characters like
## \":\", \"?\", \"@\".
#
# set mmonit http://monit:monit@104.46.37.155:2810/collector
# # and register without credentials # Don't register credentials
#
#
## Monit by default uses the following format for alerts if the the mail-format
## statement is missing::
## --8<--
## set mail-format {
## from: Monit <monit@$HOST>
## subject: monit alert -- $EVENT $SERVICE
## message: $EVENT Service $SERVICE
## Date: $DATE
## Action: $ACTION
## Host: $HOST
## Description: $DESCRIPTION
##
## Your faithful employee,
## Monit
## }
## --8<--
set mail-format {
from: torqmonit@gmail.com
subject: [\$SERVICE] monit alert -- \$EVENT at \$DATE
message: Monit Report:
ACTION: \$ACTION
SERVICE: \$SERVICE
DATE: \$DATE
HOST: \$HOST
DESCRIPTION: \$DESCRIPTION

Powered by Monit

This message has been generated automatically!
}
##
## You can override this message format or parts of it, such as subject
## or sender using the MAIL-FORMAT statement. Macros such as $DATE, etc.
## are expanded at runtime. For example, to override the sender, use:
#
# set mail-format { from: monit@foo.bar }
#
#
## You can set alert recipients whom will receive alerts if/when a
## service defined in this file has errors. Alerts may be restricted on
## events by using a filter as in the second example below.
#
# set alert sysadm@foo.bar # receive all alerts
#
## Do not alert when Monit starts, stops or performs a user initiated action.
## This filter is recommended to avoid getting alerts for trivial cases.
#
# set alert your-name@your.domain not on { instance, action }
#
#
## Monit has an embedded HTTP interface which can be used to view status of
## services monitored and manage services from a web interface. The HTTP
## interface is also required if you want to issue Monit commands from the
## command line, such as 'monit status' or 'monit restart service' The reason
## for this is that the Monit client uses the HTTP interface to send these
## commands to a running Monit daemon. See the Monit Wiki if you want to
## enable SSL for the HTTP interface.
#
set httpd port 11000 and
# use address localhost # only accept connection from localhost
# allow localhost # allow localhost to connect to the server and
allow admin:monit # require user 'admin' with password 'monit'

###############################################################################
## Services
###############################################################################
##
## Check general system resources such as load average, cpu and memory
## usage. Each test specifies a resource, conditions and the action to be
## performed should a test fail.
#
# check system $HOST
# if loadavg (1min) > 4 then alert
# if loadavg (5min) > 2 then alert
# if cpu usage > 95% for 10 cycles then alert
# if memory usage > 75% then alert
# if swap usage > 25% then alert
#
#
## Check if a file exists, checksum, permissions, uid and gid. In addition
## to alert recipients in the global section, customized alert can be sent to
## additional recipients by specifying a local alert handler. The service may
## be grouped using the GROUP option. More than one group can be specified by
## repeating the 'group name' statement.
#
# check file apache_bin with path /usr/local/apache/bin/httpd
# if failed checksum and
# expect the sum 8f7f419955cefa0b33a2ba316cba3659 then unmonitor
# if failed permission 755 then unmonitor
# if failed uid root then unmonitor
# if failed gid root then unmonitor
# alert security@foo.bar on {
# checksum, permission, uid, gid, unmonitor
# } with the mail-format { subject: Alarm! }
# group server
#
#
## Check that a process is running, in this case Apache, and that it respond
## to HTTP and HTTPS requests. Check its resource usage such as cpu and memory,
## and number of children. If the process is not running, Monit will restart
## it by default. In case the service is restarted very often and the
## problem remains, it is possible to disable monitoring using the TIMEOUT
## statement. This service depends on another service (apache_bin) which
## is defined above.
#
# check process apache with pidfile /usr/local/apache/logs/httpd.pid
# start program = \"/etc/init.d/httpd start\" with timeout 60 seconds
# stop program = \"/etc/init.d/httpd stop\"
# if cpu > 60% for 2 cycles then alert
# if cpu > 80% for 5 cycles then restart
# if totalmem > 200.0 MB for 5 cycles then restart
# if children > 250 then restart
# if loadavg(5min) greater than 10 for 8 cycles then stop
# if failed host www.tildeslash.com port 80 protocol http
# and request \"/somefile.html\"
# then restart
# if failed port 443 protocol https with timeout 15 seconds then restart
# if 3 restarts within 5 cycles then unmonitor
# depends on apache_bin
# group server
#
#
## Check filesystem permissions, uid, gid, space and inode usage. Other services,
## such as databases, may depend on this resource and an automatically graceful
## stop may be cascaded to them before the filesystem will become full and data
## lost.
#
# check filesystem datafs with path /dev/sdb1
# start program = \"/bin/mount /data\"
# stop program = \"/bin/umount /data\"
# if failed permission 660 then unmonitor
# if failed uid root then unmonitor
# if failed gid disk then unmonitor
# if space usage > 80% for 5 times within 15 cycles then alert
# if space usage > 99% then stop
# if inode usage > 30000 then alert
# if inode usage > 99% then stop
# group server
#
#
## Check a file's timestamp. In this example, we test if a file is older
## than 15 minutes and assume something is wrong if its not updated. Also,
## if the file size exceed a given limit, execute a script
#
# check file database with path /data/mydatabase.db
# if failed permission 700 then alert
# if failed uid data then alert
# if failed gid data then alert
# if timestamp > 15 minutes then alert
# if size > 100 MB then exec \"/my/cleanup/script\" as uid dba and gid dba
#
#
## Check directory permission, uid and gid. An event is triggered if the
## directory does not belong to the user with uid 0 and gid 0. In addition,
## the permissions have to match the octal description of 755 (see chmod(1)).
#
# check directory bin with path /bin
# if failed permission 755 then unmonitor
# if failed uid 0 then unmonitor
# if failed gid 0 then unmonitor
#
#
## Check a remote host availability by issuing a ping test and check the
## content of a response from a web server. Up to three pings are sent and
## connection to a port and an application level network check is performed.
#
# check host myserver with address 192.168.1.1
# if failed ping then alert
# if failed port 3306 protocol mysql with timeout 15 seconds then alert
# if failed port 80 protocol http
# and request /some/path with content = \"a string\"
# then alert
#
#
## Check a network link status (up/down), link capacity changes, saturation
## and bandwidth usage.
#
# check network public with interface eth0
# if failed link then alert
# if changed link then alert
# if saturation > 90% then alert
# if download > 10 MB/s then alert
# if total uploaded > 1 GB in last hour then alert
#
#
## Check custom program status output.
#
# check program myscript with path /usr/local/bin/myscript.sh
# if status != 0 then alert
#
#
###############################################################################
## Includes
###############################################################################
##
## It is possible to include additional configuration parts from other files or
## directories.
#
# include /etc/monit.d/*
#
include ${TORQHOME}/code/scripts/monit/*.cfg
7 changes: 7 additions & 0 deletions code/scripts/templates/monittemplate.txt
@@ -0,0 +1,7 @@
check process $procname
matching \"$KDBBASEPORT -proctype $proctype -procname $procname\"
start program = \"/bin/bash -c '$startstopsc start $procname'\"
with timeout 10 seconds
stop program = \"/bin/bash -c '$startstopsc stop $procname'\"
every \"* * * * *\"
mode active
2 changes: 1 addition & 1 deletion config/settings/gateway.q
Expand Up @@ -11,5 +11,5 @@ clearinactivetime:0D01:00 // the time to keep inactive handle data

// Server connection details
\d .servers
CONNECTIONS:`rdb`hdb // list of connections to make at start up
CONNECTIONS:`rdb`hdb`wdb // list of connections to make at start up
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this here?

RETRY:0D00:01 // period on which to retry dead connections. If 0, no reconnection attempts