Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create plugin to monitor snapshots #4

Closed
7 tasks
atc0005 opened this issue Dec 30, 2020 · 5 comments · Fixed by #94
Closed
7 tasks

Create plugin to monitor snapshots #4

atc0005 opened this issue Dec 30, 2020 · 5 comments · Fixed by #94
Assignees
Labels
Milestone

Comments

@atc0005
Copy link
Owner

atc0005 commented Dec 30, 2020

Overview

In the old codebase this was implemented as two plugins:

  • size
  • age

Both plugins allowed excluding individual VMs or resource pools as did other plugins in the set. I'm not sure yet whether this project will have two plugins or a shared plugin to handle both items. The check-path project uses a shared plugin approach where monitoring criteria can be specified as needed. If not specified, those thresholds are not checked.

Goals

  • accept CRITICAL/WARNING threshold values (with useful default values)
  • (IncludeRP) allow restricting VMs to select Resource Pools
  • optional User Domain (with automatic selection applied if not given)
  • (ExcludeRP) allow excluding a list of Resource Pools
    • reverse mode where VMs from all pools are checked, except for any VMs in this optional list of Resource Pools
  • (IgnoreVM) allow excluding a list of individual VMs
  • skip cert validation
  • emit ManagedObjectReference ID value in the Long Service Output
    • won't be needed for the vast majority of use cases, but could be useful with troubleshooting work

References

@atc0005 atc0005 self-assigned this Dec 30, 2020
@atc0005 atc0005 changed the title Create plugin to monitor VMware snapshots Create plugin to monitor snapshots Dec 30, 2020
@atc0005 atc0005 added this to the v0.3.0 milestone Jan 6, 2021
@atc0005 atc0005 modified the milestones: v0.3.0, v0.4.0 Jan 13, 2021
@atc0005
Copy link
Owner Author

atc0005 commented Jan 15, 2021

I'm not sure yet whether this project will have two plugins or a shared plugin to handle both items. The check-path project uses a shared plugin approach where monitoring criteria can be specified as needed. If not specified, those thresholds are not checked.

I'm going to give the shared binary approach a try, but will likely require that only one set (age or size) be used at a time.

@atc0005
Copy link
Owner Author

atc0005 commented Jan 15, 2021

I'm going to give the shared binary approach a try, but will likely require that only one set (age or size) be used at a time.

Working on this plugin now.

I'm likely going to stick with one focus per plugin. Thus far the idea of one "thing" per plugin has been followed; I think it's worth keeping the expected pattern as it will likely make the most sense for new users of the project plugins.

@atc0005
Copy link
Owner Author

atc0005 commented Jan 17, 2021

I've been spinning my wheels on the "size" aspect of this issue for a number of days now and have hit a wall. I'm able to get the size value for snapshotData files, but not for the files which make up the snapshot content. I've reached out to VMware {code} Slack and official project for some guidance. Hopefully one of those paths will yield some insight.

After digging into the simulator, I found this:

https://github.com/vmware/govmomi/blob/50c576d6470e7ab9803ceb466ac34f936971dfc1/simulator/virtual_machine.go#L548

func (vm *VirtualMachine) addSnapshotLayout(snapshot types.ManagedObjectReference, dataKey int32) {
	for _, snapshotLayout := range vm.Layout.Snapshot {
		if snapshotLayout.Key == snapshot {
			return
		}
	}

	var snapshotFiles []string
	for _, file := range vm.LayoutEx.File {
		if file.Key == dataKey || file.Type == "diskDescriptor" {
			snapshotFiles = append(snapshotFiles, file.Name)
		}
	}

	vm.Layout.Snapshot = append(vm.Layout.Snapshot, types.VirtualMachineFileLayoutSnapshotLayout{
		Key:          snapshot,
		SnapshotFile: snapshotFiles,
	})

	vm.updateStorage()
}
func (vm *VirtualMachine) addSnapshotLayoutEx(snapshot types.ManagedObjectReference, dataKey int32, memoryKey int32) {
	for _, snapshotLayoutEx := range vm.LayoutEx.Snapshot {
		if snapshotLayoutEx.Key == snapshot {
			return
		}
	}

	vm.LayoutEx.Snapshot = append(vm.LayoutEx.Snapshot, types.VirtualMachineFileLayoutExSnapshotLayout{
		DataKey:   dataKey,
		Disk:      vm.LayoutEx.Disk,
		Key:       snapshot,
		MemoryKey: memoryKey,
	})

	vm.LayoutEx.Timestamp = time.Now()

	vm.updateStorage()
}

I think the first is for a deprecated type, the latter is applicable here.

The arguments for the addSnapshotLayoutEx method seem to suggest that having the ManagedObjectReference and dataKey values are sufficient to build a comprehensive listing of files associated with a snapshot. I don't know yet whether those files also include the original data, or just the difference (what I'm after).

@atc0005
Copy link
Owner Author

atc0005 commented Jan 17, 2021

Replicating some notes I posted to the official repo here, for context if nothing else.


Hierarchy of types

Levels above and some below skipped for simplicity and due to my ignorance.

  • VirtualMachine
    • VirtualMachineSnapshotInfo
      • VirtualMachineSnapshotTree
    • VirtualMachineFileLayoutEx
      • VirtualMachineFileLayoutExFileInfo
      • VirtualMachineFileLayoutExSnapshotLayout

Fields expanded

  • vm
    • Name (Used in output)
    • Snapshot
      • RootSnapshotList (VirtualMachineSnapshotTree)
        • Name (Used in output)
        • Id (int32)
        • CreateTime (Used in output)
        • Snapshot (ManagedObjectReference) (e.g., snapshot-229099)
          • Value
            • The specific instance of Managed Object this
              ManagedObjectReference refers to.
            • Links to LayoutEx.Snapshot.Key.Value
        • ChildSnapshotList
          • can be multiple levels deep, or null
    • LayoutEx
      • File
        • type (filtered to snapshotData)
        • Key (e.g., 40)
          • links to LayoutEx.Snapshot.DataKey
        • Size (Used in output; by itself & aggregate)
      • Snapshot
        • DataKey (e.g., 40)
          • links to LayoutEx.File.Key
        • Key (ManagedObjectReference) (e.g., snapshot-229099)
          • Value
            • links to vm.Snapshot.Value

Unfortunately this didn't work. I ended up with the size of the vmsn files associated with the snapshots instead of the snapshots themselves.


Downloaded, compiled and ran the VMSnapshot C# project from the vSphere Management SDK and it did not report the size of snapshots, so no go on using it as a guide for an equivalent govmomi approach.

I pulled this from page 155 of vSphere Web Services SDK Programming Guide - VMware vSphere 7.0 (U1):

File Extension Usage File Description
.vmsd vmname.vmsd Virtual machine snapshot file.
.vmsn vmname.vmsn Virtual machine snapshot data file.
**.delta.vmdk Snapshot difference file. A number preceding the extension increases with more snapshots.
**.vmdk Metadata about a snapshot.
-Snapshot#.vmsn Snapshot of virtual machine memory. Snapshot size is equal to the size of your virtual machine's maximum memory.

This is starting to look like I'll need to match all of those file types in order to calculate snapshot size. The challenge will be tying the specific files back to a specific snapshot ID value. Digging further.

I still can't escape the feeling that I'm overlooking something obvious.

@atc0005
Copy link
Owner Author

atc0005 commented Jan 18, 2021

I've hit a wall on checking the size of a snapshot, but I've got enough to work with to complete an age-based monitoring plugin. I'll do that, then hit pause on further size-based monitoring until I get further feedback on the GH issue I opened in the vmware/govmomi project repo.

atc0005 added a commit that referenced this issue Jan 19, 2021
As with several other plugins in this project, this one borrows
heavily from existing projects. In particular, this plugin
was initially based on a PowerShell / PowerCLI plugin I wrote
in 2019.

Doc updates have been applied, example usage has been added,
including a command definition "contrib" file illustrating
how the plugin would be referenced within a production
Nagios configuration.

Note: Some minor scratch notes from my attempt at crafting
a combined age/size plugin are also included. Those notes
mostly focus on my attempts to understand the process of
determining the size of a snapshot using govmomi and
the vSphere Web Services API.

Partial work towards implementing snapshot size monitoring
has also been included, though it is non-functional at
this time. I hope to return to this once I understand how
the vSphere API (through govmomi) can be used to reliably
determine snapshot size information.

Other small (unrelated) fixes have also been included, including
some bad copy/paste/modify attempts in the README, doc comments,
etc.

- refs GH-4
- refs GH-32
@atc0005 atc0005 modified the milestones: v0.4.0, Future Jan 19, 2021
@atc0005 atc0005 modified the milestones: Future, v0.5.0 Jan 26, 2021
atc0005 added a commit that referenced this issue Jan 27, 2021
CREDIT:

This plugin would simply not have been possible without the help from
@dougm. I'm grateful for both the general feedback (confirming I was
looking in the right direction) and taking the time to craft code
samples based on a review of the PowerCLI (C#) `Get-Snapshot`
implementation.

The logic used in this plugin is heavily inspired from the ideas
presented, but *attempts* to use a slightly different (but probably
less efficient) approach that made more sense to me. The current
implementation is based on an evolution of my understanding of the
API. While I believe the end results between the two implementations
are the same, with further refactoring the implementation used by this
project will probably end up looking nearly the same as the code
examples originally presented.

OVERVIEW:

As with several other plugins in this project, this one borrows
heavily from existing projects. In particular, this plugin was
initially based on a PowerShell / PowerCLI plugin I wrote in 2019.

In short, this plugin attempts to provide snapshot size details per
snapshot for review, but evaluates size of snapshots for a VM based on
the total of all snapshot size values, not individual values. This
differs from the snapshots age plugin, which checks each snapshot
individually to determine the service check result.

Doc updates have been applied, example usage has been added, including
a command definition "contrib" file illustrating how the plugin would
be referenced within a production Nagios configuration.

Partial work implemented with the snapshots age monitoring plugin to
handle size monitoring has been completed and is functional as of this
set of changes. Further refactoring and polish is needed, but based on
initial use in our production environment the results appear to match
the results provided by PowerCLI `Get-Snapshot` results.

OTHER CHANGES:

- Minor tweaks to snapshots age plugin to better mirror new snapshots
  size plugin. The idea is to eventually refactor both to share
  common code instead of replicating between the two.
- Refactoring (more todo) of `internal/vsphere` code used by both
  plugins

REFERENCES:

- refs #4
- refs vmware/govmomi#2243

SEE ALSO:

Note to self: See the following branches for "archival" commits that I
hammered out while trying to understand the API. There is a lot of
cruft and dead ends, but something there may be useful later.

- `ARCHIVE-i4-add-snapshots-size-monitoring-plugin`
  - basically a "dirty" version of the branch holding this commit
- `ARCHIVE-i4-add-snapshots-age-monitoring-plugin`
  - likely contains fragments of functionality from this branch before
    they were yanked to provide a more focused release (using what
    functionality was working at the time for the age-based checks)
- `i4-add-snapshots-monitoring-plugin`
  - what I thought was going to be a combined plugin for age and size
    checks; abandoned, older state than the other branches

Squash
atc0005 added a commit that referenced this issue Jan 27, 2021
CREDIT:

This plugin would simply not have been possible without the help from
@dougm. I'm grateful for both the general feedback (confirming I was
looking in the right direction) and taking the time to craft code
samples based on a review of the PowerCLI (C#) `Get-Snapshot`
implementation.

The logic used in this plugin is heavily inspired from the ideas
presented, but *attempts* to use a slightly different (but probably
less efficient) approach that made more sense to me. The current
implementation is based on an evolution of my understanding of the
API. While I believe the end results between the two implementations
are the same, with further refactoring the implementation used by this
project will probably end up looking nearly the same as the code
examples originally presented.

OVERVIEW:

As with several other plugins in this project, this one borrows
heavily from existing projects. In particular, this plugin was
initially based on a PowerShell / PowerCLI plugin I wrote in 2019.

In short, this plugin attempts to provide snapshot size details per
snapshot for review, but evaluates size of snapshots for a VM based on
the total of all snapshot size values, not individual values. This
differs from the snapshots age plugin, which checks each snapshot
individually to determine the service check result.

Doc updates have been applied, example usage has been added, including
a command definition "contrib" file illustrating how the plugin would
be referenced within a production Nagios configuration.

Partial work implemented with the snapshots age monitoring plugin to
handle size monitoring has been completed and is functional as of this
set of changes. Further refactoring and polish is needed, but based on
initial use in our production environment the results appear to match
the results provided by PowerCLI `Get-Snapshot` results.

OTHER CHANGES:

- Minor tweaks to snapshots age plugin to better mirror new snapshots
  size plugin. The idea is to eventually refactor both to share
  common code instead of replicating between the two.
- Refactoring (more todo) of `internal/vsphere` code used by both
  plugins

REFERENCES:

- refs #4
- refs vmware/govmomi#2243

SEE ALSO:

Note to self: See the following branches for "archival" commits that I
hammered out while trying to understand the API. There is a lot of
cruft and dead ends, but something there may be useful later.

- `ARCHIVE-i4-add-snapshots-size-monitoring-plugin`
  - basically a "dirty" version of the branch holding this commit
- `ARCHIVE-i4-add-snapshots-age-monitoring-plugin`
  - likely contains fragments of functionality from this branch before
    they were yanked to provide a more focused release (using what
    functionality was working at the time for the age-based checks)
- `i4-add-snapshots-monitoring-plugin`
  - what I thought was going to be a combined plugin for age and size
    checks; abandoned, older state than the other branches
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant