Skip to content

Commit

Permalink
Initial draft of snapshots size monitoring plugin
Browse files Browse the repository at this point in the history
CREDIT:

This plugin would simply not have been possible without the help from
@dougm. I'm grateful for both the general feedback (confirming I was
looking in the right direction) and taking the time to craft code
samples based on a review of the PowerCLI (C#) `Get-Snapshot`
implementation.

The logic used in this plugin is heavily inspired from the ideas
presented, but *attempts* to use a slightly different (but probably
less efficient) approach that made more sense to me. The current
implementation is based on an evolution of my understanding of the
API. While I believe the end results between the two implementations
are the same, with further refactoring the implementation used by this
project will probably end up looking nearly the same as the code
examples originally presented.

OVERVIEW:

As with several other plugins in this project, this one borrows
heavily from existing projects. In particular, this plugin was
initially based on a PowerShell / PowerCLI plugin I wrote in 2019.

In short, this plugin attempts to provide snapshot size details per
snapshot for review, but evaluates size of snapshots for a VM based on
the total of all snapshot size values, not individual values. This
differs from the snapshots age plugin, which checks each snapshot
individually to determine the service check result.

Doc updates have been applied, example usage has been added, including
a command definition "contrib" file illustrating how the plugin would
be referenced within a production Nagios configuration.

Partial work implemented with the snapshots age monitoring plugin to
handle size monitoring has been completed and is functional as of this
set of changes. Further refactoring and polish is needed, but based on
initial use in our production environment the results appear to match
the results provided by PowerCLI `Get-Snapshot` results.

OTHER CHANGES:

- Minor tweaks to snapshots age plugin to better mirror new snapshots
  size plugin. The idea is to eventually refactor both to share
  common code instead of replicating between the two.
- Refactoring (more todo) of `internal/vsphere` code used by both
  plugins

REFERENCES:

- refs #4
- refs vmware/govmomi#2243

SEE ALSO:

Note to self: See the following branches for "archival" commits that I
hammered out while trying to understand the API. There is a lot of
cruft and dead ends, but something there may be useful later.

- `ARCHIVE-i4-add-snapshots-size-monitoring-plugin`
  - basically a "dirty" version of the branch holding this commit
- `ARCHIVE-i4-add-snapshots-age-monitoring-plugin`
  - likely contains fragments of functionality from this branch before
    they were yanked to provide a more focused release (using what
    functionality was working at the time for the age-based checks)
- `i4-add-snapshots-monitoring-plugin`
  - what I thought was going to be a combined plugin for age and size
    checks; abandoned, older state than the other branches
  • Loading branch information
atc0005 committed Jan 27, 2021
1 parent db5f7ec commit 79f59b3
Show file tree
Hide file tree
Showing 15 changed files with 1,218 additions and 140 deletions.
1 change: 1 addition & 0 deletions .github/workflows/lint-and-build-code.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,3 +88,4 @@ jobs:
go build -v -mod=vendor ./cmd/check_vmware_hs2ds2vms
go build -v -mod=vendor ./cmd/check_vmware_datastore
go build -v -mod=vendor ./cmd/check_vmware_snapshots_age
go build -v -mod=vendor ./cmd/check_vmware_snapshots_size
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,4 @@ scratch/
/check_vmware_hs2ds2vms
/check_vmware_datastore
/check_vmware_snapshots_age
/check_vmware_snapshots_size
3 changes: 2 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ WHAT = check_vmware_tools \
check_vmware_vhw \
check_vmware_hs2ds2vms \
check_vmware_datastore \
check_vmware_snapshots_age
check_vmware_snapshots_age \
check_vmware_snapshots_size \


# What package holds the "version" variable used in branding/version output?
Expand Down
133 changes: 119 additions & 14 deletions README.md

Large diffs are not rendered by default.

20 changes: 7 additions & 13 deletions cmd/check_vmware_snapshots_age/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -201,15 +201,12 @@ func main() {
for _, vm := range filteredVMs {
vmNames = append(vmNames, vm.Name)
}
log.Debug().
Str("virtual_machines", strings.Join(vmNames, ", ")).
Msg("")
log.Debug().Str("virtual_machines", strings.Join(vmNames, ", ")).Msg("")

log.Debug().Msg("Filter VMs to those with snapshots")
vmsWithSnapshots := vsphere.FilterVMsWithSnapshots(filteredVMs)

log.Debug().Msg("Build snapshot sets for bulk processing")

snapshotSets := make(vsphere.SnapshotSummarySets, 0, len(vmsWithSnapshots))

for _, vm := range vmsWithSnapshots {
Expand All @@ -222,10 +219,11 @@ func main() {
vm,
cfg.SnapshotsAgeCritical,
cfg.SnapshotsAgeWarning,

// See atc0005/check-vmware#4,vmware/govmomi#2243
cfg.SnapshotsSizeCritical,
cfg.SnapshotsSizeWarning,

// revisit with GH-76
false,
),
)
}
Expand All @@ -235,9 +233,7 @@ func main() {
case snapshotSets.IsAgeCriticalState():

log.Error().
Int("age_days_critical", cfg.SnapshotsAgeCritical).
Int("age_days_warning", cfg.SnapshotsAgeWarning).
Int("snapshots_age_critical", snapshotSets.ExceedsAge(cfg.SnapshotsAgeCritical)).
Int("num_snapshots_age_critical", snapshotSets.ExceedsAge(cfg.SnapshotsAgeCritical)).
Msg("Snapshot sets contain a snapshot which exceeds specified age in days")

nagiosExitState.LastError = vsphere.ErrSnapshotAgeThresholdCrossed
Expand Down Expand Up @@ -273,10 +269,8 @@ func main() {
case snapshotSets.IsAgeWarningState():

log.Error().
Int("age_days_critical", cfg.SnapshotsAgeCritical).
Int("age_days_warning", cfg.SnapshotsAgeWarning).
Int("snapshots_age_warning", snapshotSets.ExceedsAge(cfg.SnapshotsAgeWarning)).
Msg("Snapshot sets contain a snapshot which exceeds specified age in days")
Int("num_snapshots_age_warning", snapshotSets.ExceedsAge(cfg.SnapshotsAgeWarning)).
Msg("Snapshot sets contain one or more snapshots which exceed specified age in days")

nagiosExitState.LastError = vsphere.ErrSnapshotAgeThresholdCrossed

Expand Down
Loading

0 comments on commit 79f59b3

Please sign in to comment.