-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion failed: ksp != NULL, file common.c, line 640 #71
Comments
Same thing today. |
With --enable-debug this time: [2012-04-13 16:53:45] kstat chain has been updated [2012-04-13 16:53:45] plugin_dispatch_values: time = 1334350425.059; interval = 30.000; host = cairo.our.org; plugin = memory; plugin_instance = ; type = memory; type_instance = kernel; [2012-04-13 16:53:45] uc_update: cairo.our.org/swap/swap-used: ds[0] = 911417344.000000 [2012-04-13 16:53:45] plugin: plugin_write: Writing values via write_graphite/rcf-metrics/2003. [2012-04-13 16:53:45] uc_update: cairo.our.org/memory/memory-kernel: ds[0] = 512491520.000000 [2012-04-13 16:53:45] plugin: plugin_write: Writing values via write_graphite/rcf-metrics/2003. [2012-04-13 16:53:45] uc_update: cairo.our.org/cpu-0/cpu-user: ds[0] = 0.566302 [2012-04-13 16:53:45] plugin: plugin_write: Writing values via write_graphite/rcf-metrics/2003. [2012-04-13 16:53:45] uc_update: cairo.our.org/nfs-v2client/nfs_procedure-null: ds[0] = 0.000000 [2012-04-13 16:53:45] plugin: plugin_write: Writing values via write_graphite/rcf-metrics/2003. [2012-04-13 16:53:45] write_graphite plugin: [rcf-metrics]:2003 buf 452/1428 (31.7 %) "cairo_our_org.collectd.zfs_arc.cache_eviction-eligible.value 0 1334350425 " [2012-04-13 16:53:45] plugin_dispatch_values: time = 1334350425.060; interval = 30.000; host = cairo.our.org; plugin = zfs_arc; plugin_instance = ; type = cache_eviction; type_instance = ineligible; [2012-04-13 16:53:45] uc_update: cairo.our.org/zfs_arc/cache_eviction-ineligible: ds[0] = 0.000000 [2012-04-13 16:53:45] plugin: plugin_write: Writing values via write_graphite/rcf-metrics/2003. [2012-04-13 16:53:45] write_graphite plugin: [rcf-metrics]:2003 buf 529/1428 (37.0 %) "cairo_our_org.collectd.nfs-v2client.nfs_procedure-null.value 0 1334350425 " [2012-04-13 16:53:45] plugin_dispatch_values: time = 1334350425.061; interval = 30.000; host = cairo.our.org; plugin = nfs; plugin_instance = v2client; type = nfs_procedure; type_instance = getattr; [2012-04-13 16:53:45] write_graphite plugin: [rcf-metrics]:2003 buf 608/1428 (42.6 %) "cairo_our_org.collectd.zfs_arc.cache_eviction-ineligible.value 0 1334350425 " [2012-04-13 16:53:45] uc_update: cairo.our.org/nfs-v2client/nfs_procedure-getattr: ds[0] = 0.000000 Assertion failed: ksp != NULL, file common.c, line 640 [2012-04-13 16:53:45] plugin: plugin_write: Writing values via write_graphite/rcf-metrics/2003. Abort (core dumped) |
More data. Note the NaNs:
|
…odes. Rather than asserting that an argument is not NULL, check this condition and return an error code. This should fix Github issue collectd#71.
For convenience, I'll copy the information you gave on the pull request to here:
Interestingly, this starts with the value deleted every time the problem comes up. This is from the following line in the ZFS-ARC plugin: /* Operations */
za_read_derive (ksp, "allocated","cache_operation", "allocated");
za_read_derive (ksp, "deleted", "cache_operation", "deleted");
za_read_derive (ksp, "stolen", "cache_operation", "stolen"); So reading the field "allocated" works, while reading "deleted" fails. But why doesn't the ZFS ARC plugin print an error message? When printing that message, Can you try the following in the command line?:
Best regards, |
Yes, the errors are being reported from zfs_arc -- I had omitted them due to size. Here are 2 full blocks from that time:
|
|
Okay, then it's not ZFS ARC as such that's broken, it's more that the kstat handling needs to be corrected.
One of the threads is checking the kstat chain periodically and updates it (and is calling init functions) when needed. This isn't handled gracefully in the ZFS ARC plugin and may result in this problem. I'll write a patch now that I know what's going on. |
I'm not sure if it matters, but note too that the kstat command output above does not even show an "allocated" stat. There is also no 'stolen' shown, but many errors: "[2012-09-11 10:00:54] zfs_arc plugin: Reading kstat value "stolen" failed." |
Good point. It seems to work most of the time though, right? Or are you getting error messages about "allocated" all the time? |
Looking now I see that those have been all the time. I started the current test collectd on Sep 7. Every Interval seconds the following appears:
|
Hello, In src/types.db (and in /usr/share/collectd/types.db) :
in src/zfs_arc.c
Notice the final "s" in mutex_operations in src/types.db. Short fix : update your /usr/share/collectd/types.db Real fix : same thing in src/types.db? Regards, |
Good eye, Yves. |
We are evaluating collectd (5.1.0), and my 'collectd -f' on our Solaris 10 box croaked at ~6:30AM today after 1 day of running:
The text was updated successfully, but these errors were encountered: