-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache VDEV Enumeration and Small Suggestions #1
Comments
Hey, thanks! I'm glad you're getting some use out of it; surprised how quickly anyone noticed this repo. 0.7.0 as a version was a bit arbitrary, based on guidelines from the Managing ZenPacks document, and it's the "seems to work with my one box" release. I think the cache device enumeration problem in the ZPool modeler was from trying to pull the device name from a non-matching regex and should be fixed now. Mind grabbing the 0.7.1 egg and giving it a shot? I really like your ideas and will add thresholds and "ignore" zProperties in the future. Could you elaborate on what you mean by DS usage? The Any other feedback or input you've got would be appreciated, too! |
Thanks for getting back so quick. DS usage being dataset usage. Far as io
|
I've added zProperties for ignoring datasets based on name/type and pool names. There's a section about it in the readme, but the types are pretty straightforward. Also included 80% (sev 3) and 90% (sev 4) capacity thresholds in the ZPool performance template. Mind seeing if 0.7.2 works for you? |
Seems to work, but probably want to drop the debug output (ignoring X) so
|
Ah, yeah, log.info() probably would generate a lot of noise with the number of snaps you've got. I'll crank that down to debug. |
This looks a lot better, now we're down to warnings being generated about command execution timeouts:
Seeing same for zpool-get and zpool-iostat on a system with tens of thousands of snaps (zfs send target for multiple SAN systems). |
AFAIK, the zencommand daemon's handling the SSH connections, datasource command executions, and returning the output to the datasources' parsers. Right now, the zfs-get, zpool-get, and zpool-iostat datasources execute every minute. The -gets could probably be okay with every 5, but since the zpool-iostat datasource is a point-in-time gauge, it's probably not useful at a 5-min interval. "zpool iostat" is probably a little less intensive than "zfs get all", too. Might be worth checking to see if the zSshConcurrentSessions value is less than what MaxSessions is configured to on your server, assuming it's running OpenSSH. Default's 10 for both. I also understand that there's a bug in Zenoss 5.1.5, to be fixed in 5.1.7, where zencommand just keeps adding to its queue if an SSH session times out. I'm still on 4.2.5. All the datasource commands are the same no matter the component, so zencommand should just be running each once, but that also means the zfs.get parser is spending time processing output for ignored DSes. Might be better if the zfs-get datasource passes the ZFSDataset component title to zfs, so it runs one for each. Probably a little more SSH traffic, but it might finish faster if it's not asking for stats for on snaps you're not modeling. I'll see if I can make that change soon. |
This is all 4.2.5, so no feedback on the 5.1 stuff.
for all 3 properties, registering as warnings on the zenoss host (not the ZFS host). |
Yeah, I got a few of the same overnight. Small number, though, not from every collection cycle. Odd. Digging around in zendmd, the device's zProperties are inherited by the component, so I've changed the ZFSStoragePool template's thresholds to |
If the performance templates are working for you, I'd like to close this issue. Or are you still getting timeouts? As for ZVol I/O, would you mind opening a separate issue for it? I don't honestly know if it's something I'll be able to implement, but it might not be a bad idea to have it in its own thread. |
Thank you for this zenpack - its a lifesaver in our environment. At the latest version (0.7.0), cache drive vdev enumeration fails, i've had to comment it out (https://github.com/daviswr/ZenPacks.daviswr.ZFS/blob/master/ZenPacks/daviswr/ZFS/modeler/plugins/daviswr/cmd/ZPool.py#L153). I'll spin up a lab system to replicate the error, but along the lines of "NoneType has no member named 'dev'".
Separately, i've added local thresholds for pool capacity notification - may be useful to have them in the zenpack. A pool at 90% is something to be concerned about (especially with automated snapshots or heavy use). Also, would be very useful to have a configuration option to disable enumeration of snapshots. Some of our systems have thousands of snapshots across datasets, it gets painful pretty quick (we are only monitoring pools for now anyway, but DS usage and ZVOL IO would be nice).
The text was updated successfully, but these errors were encountered: