Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare old/new plugin output for missing details #32

Closed
atc0005 opened this issue Jan 5, 2021 · 6 comments
Closed

Compare old/new plugin output for missing details #32

atc0005 opened this issue Jan 5, 2021 · 6 comments
Assignees
Labels
output/extended Long Service Output (aka, "extended" or "detailed") output/summary Service Output (aka, "one-line-summary")
Milestone

Comments

@atc0005
Copy link
Owner

atc0005 commented Jan 5, 2021

From the top of my mind I'm thinking of the CRITICAL, WARNING threshold details shown in the one-line summary output for the older plugins. That is useful to see why at a glance a Service Check state has been determined to be in a non-OK state.

@atc0005 atc0005 added this to the v0.1.0 milestone Jan 5, 2021
@atc0005 atc0005 self-assigned this Jan 5, 2021
@atc0005 atc0005 modified the milestones: v0.1.0, Future, v0.6.0 Jan 6, 2021
@atc0005 atc0005 pinned this issue Jan 6, 2021
@atc0005
Copy link
Owner Author

atc0005 commented Jan 7, 2021

As an example, here is what the PowerCLI plugin uses with its one-line summary (not the whole output string):

$usageLevelSummary = "vCPU allocation for powered VMs is $($vCPUsPercentageUsedOfAllowed)% of $($MaxVCPUsAllowed) ($($vCPUsRemaining) remaining) [WARNING: $($WarningUse)% , CRITICAL: $($CriticalUse)%]"

and here is real production output for the VMs in a specific resource pool:

OK: vCPU allocation for powered VMs is 70% of 20 (6 remaining) [WARNING: 95% , CRITICAL: 97%]

The new plugin should provide this info in 1:1 form, or in a modified form that better communicates the details.

@atc0005
Copy link
Owner Author

atc0005 commented Jan 11, 2021

Here is some live output from the v0.1.0 (and likely v0.1.1) release of the vCPUs allocation plugin:

OK: 14 vCPUs allocated (70.0%): 6 more remaining from 20 allowed (evaluated 5 VMs, 1 Resource Pools)
OK: 115 vCPUs allocated (71.9%): 45 more remaining from 160 allowed (evaluated 61 VMs, 4 Resource Pools)

The question is whether that format is any better. We don't explicitly note the WARNING and CRITICAL thresholds in the one-line summary, but we do list how many VMs have been evaluated, how many Resource Pools. The hope is that having that right there in the summary will help to pinpoint configuration issues more quickly vs a reminder what the thresholds are.

The threshold values are shown in the Long Service Output in the web UI like so:

Service State Information
Current Status:	  OK   (for 0d 15h 5m 4s)
Status Information:	OK: 115 vCPUs allocated (71.9%): 45 more remaining from 160 allowed (evaluated 61 VMs, 4 Resource Pools)

**ERRORS**

* None

**THRESHOLDS**

* CRITICAL: 100% of 160 vCPUs allocated
* WARNING: 97% of 160 vCPUs allocated

**DETAILED INFO**

* vCPUs
** Allocated: 115 (71.9%)
** Max Allowed: 160

This seems like a fair compromise?

@atc0005
Copy link
Owner Author

atc0005 commented Jan 11, 2021

One thing not clearly noted is whether the evaluated VMs are powered on or not. That's not noted in the one-line summary or the Long Service Output listing.

This should probably be noted for all plugins which allow filtering on power status. The same goes for any other explicit evaluation criteria toggled by the sysadmin configuring the service check command definition. Choices there should be explicitly noted in the Long Service Output, if not in the one-line summary.

atc0005 added a commit that referenced this issue Jan 19, 2021
As with several other plugins in this project, this one borrows
heavily from existing projects. In particular, this plugin
was initially based on a PowerShell / PowerCLI plugin I wrote
in 2019.

Doc updates have been applied, example usage has been added,
including a command definition "contrib" file illustrating
how the plugin would be referenced within a production
Nagios configuration.

Note: Some minor scratch notes from my attempt at crafting
a combined age/size plugin are also included. Those notes
mostly focus on my attempts to understand the process of
determining the size of a snapshot using govmomi and
the vSphere Web Services API.

Partial work towards implementing snapshot size monitoring
has also been included, though it is non-functional at
this time. I hope to return to this once I understand how
the vSphere API (through govmomi) can be used to reliably
determine snapshot size information.

Other small (unrelated) fixes have also been included, including
some bad copy/paste/modify attempts in the README, doc comments,
etc.

- refs GH-4
- refs GH-32
@atc0005 atc0005 unpinned this issue Jan 26, 2021
@atc0005
Copy link
Owner Author

atc0005 commented Feb 1, 2021

Working on deploying updated plugins based on a build of current master branch. The work from #107 in particular is up for extended testing.

#69 provided a new plugin which omits the [WARNING: 97% , CRITICAL: 99%] values displayed by the PowerCLI plugin. Not sure yet if that is a "problem".

From a different service check: 0.96% of total capacity.

That's not included in the new plugin's output.

EDIT:

Probably easier to just include an entire line for completeness:

OK: Memory usage is at 93.89% of 40 GB allowed (2.45 GB remaining), 0.96% of total capacity. [WARNING: 101% , CRITICAL: 110%]

@atc0005
Copy link
Owner Author

atc0005 commented Feb 1, 2021

OK: Memory usage is at 93.89% of 40 GB allowed (2.45 GB remaining), 0.96% of total capacity. [WARNING: 101% , CRITICAL: 110%]

The 0.96% of total capacity remark seems to be computed using these bits of PowerCLI logic:

$poolDetails = @{
    "name" = $_.Name;
    "cpuActive" = ($_.Runtime.Cpu.OverallUsage / 1000);
    "memoryConsumed" = ($_.Runtime.Memory.OverallUsage / 1GB)
    "memoryTotal" = ($_.Runtime.Memory.MaxUsage / 1GB)
}

and

# This property is attached to each entry in the pool; fetch value from first
# array entry.
if ($detailedPools.Count -gt 0) {
    $totalMemoryAvailable = $detailedPools[0].memoryTotal
}

$memoryPercentageAllowed = [math]::Round(($totalMemoryUsed / $MaxMemoryAllowed) * 100, 2)
$memoryPercentageTotalCapacity = [math]::Round(($totalMemoryUsed / $totalMemoryAvailable) * 100, 2)
$memoryRemaining = [math]::Round(($MaxMemoryAllowed - $totalMemoryUsed), 2)

Per the Data Object - ResourcePoolResourceUsage(vim.ResourcePool.ResourceUsage) doc, this is what the maxUsage field is about:

NAME TYPE DESCRIPTION
maxUsage xsd:long Current upper-bound on usage. The upper-bound is based on the limit configured on this resource pool, as well as limits configured on any parent resource pool.

It may be that I was able to compute the total memory available in the cluster due to the memory limit on the pool being unlimited? This doesn't seem like a reliable way to list the overall percentage of memory consumed from the cluster. Instead you'd have to get the list of hosts, tally the total memory, then calculate per pool and in aggregate.

If there are pool caps, that would need to factor in somehow?

@atc0005
Copy link
Owner Author

atc0005 commented Feb 1, 2021

The question is whether that format is any better.

I'm biased, but I like the new format better. Overall I think I've met the original intent for this issue, so I'm consider it resolved. #110 was spun off to handle testing the addition of reporting the percentage of memory used from total cluster capacity, and I can spin off new issues for anything not already covered.

Considering this resolved.

@atc0005 atc0005 closed this as completed Feb 1, 2021
@atc0005 atc0005 added output/extended Long Service Output (aka, "extended" or "detailed") output/summary Service Output (aka, "one-line-summary") labels Feb 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
output/extended Long Service Output (aka, "extended" or "detailed") output/summary Service Output (aka, "one-line-summary")
Projects
None yet
Development

No branches or pull requests

1 participant