Concept

The folks at Stack Exchange built a monitoring tool that gave me some ideas. I like that it gives just one line of status for each server, showing the most critical things you need to know about a server's overall health.

I want to create something similar, using PowerShell, but based on different perfmon counters - the ones from my old, old Perfmon Top Ten Counters article. You read these graphs a little differently than the standard %cpu graph everyone loves, but once you grok it, it's a snap. Because it's super-simple: a flat line at the bottom of the graph means the resource is not over-utilized. Anything above zero is a bottleneck.

Here's a a recent screenshot of PSPerf running against a few systems in my home: psperf

Many status graphs require the reader to know a lot about systems administration to understand what they are seeing. But to read the line graphs in this chart, the main thing you need to know is:

###Low values good, high values bad.

These counters don't require a person looking at the graph to know how much CPU, RAM, disk, or network bandwidth a system has. The idea here is simply to show whether a system is clearly overtaxed (that is, using more resources than are available to it), so as to be able to judge health over time. A system which remains overtaxed over a period of days or weeks would be a candidate for resource balancing: that is, either remove some workload, or upgrade the RAM/disk/CPU/bandwidth available to it.

PSPerf will create the web page and update it at a settable time interval. It can be placed on any web server that PowerShell can write a file to.

The graphs tell a story. Each graph-line shows the last 24 hours of operation for that server; performance is checked every 5 minutes. As you look at this graph, it's important to remember that this is not like the %cpu counter. A CPU at 99% utilization with a queue length of 0 is fine; it's handling all the work it is given, as the work comes in. No work is lining up and waiting while the CPU is busy with other tasks. Conversely, CPU queues higher than 10 signify a lot of work sitting around waiting for the CPU.

All graphs on this page work like that. When any line to comes up from the "floor" of the graph, that resource is literally overloaded. It's being asked to do more work than it can instantly do, so there is a queue of work items waiting to be serviced by the resource.

Ideally all graphs would flatline at the bottom all the time, showing that no resource is being asked to work harder than it can, creating 'wait states' for whatever services run on this server. But of course the real world never seems to work that way! It's just like the supermarket: sometimes you walk right up to a cash register where the checker had no other customers and can instantly serve you. Other times, you wait awhile in line before you can pay for your groceries. As system administrators, it's our goal to have short lines waiting to be served by the CPU, memory, and disk resources in a server. But money is not infinite, as we can't always over-provision our servers, so graphs get spiky.

Disk objects also have bar-charts showing the percentage of disk space used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concept

Clone this wiki locally