Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failed: ksp != NULL, file common.c, line 640 #71

Closed
jblaine opened this issue Apr 10, 2012 · 14 comments
Closed

Assertion failed: ksp != NULL, file common.c, line 640 #71

jblaine opened this issue Apr 10, 2012 · 14 comments
Labels
Bug A genuine bug
Milestone

Comments

@jblaine
Copy link

jblaine commented Apr 10, 2012

We are evaluating collectd (5.1.0), and my 'collectd -f' on our Solaris 10 box croaked at ~6:30AM today after 1 day of running:

Assertion failed: ksp != NULL, file common.c, line 640
@jblaine
Copy link
Author

jblaine commented Apr 11, 2012

Same thing today.

@jblaine
Copy link
Author

jblaine commented Apr 13, 2012

With --enable-debug this time:

[2012-04-13 16:53:45] kstat chain has been updated
[2012-04-13 16:53:45] plugin_dispatch_values: time = 1334350425.059; interval = 30.000; host = cairo.our.org; plugin = memory; plugin_instance = ; type = memory; type_instance = kernel;
[2012-04-13 16:53:45] uc_update: cairo.our.org/swap/swap-used: ds[0] = 911417344.000000
[2012-04-13 16:53:45] plugin: plugin_write: Writing values via write_graphite/rcf-metrics/2003.
[2012-04-13 16:53:45] uc_update: cairo.our.org/memory/memory-kernel: ds[0] = 512491520.000000
[2012-04-13 16:53:45] plugin: plugin_write: Writing values via write_graphite/rcf-metrics/2003.
[2012-04-13 16:53:45] uc_update: cairo.our.org/cpu-0/cpu-user: ds[0] = 0.566302
[2012-04-13 16:53:45] plugin: plugin_write: Writing values via write_graphite/rcf-metrics/2003.
[2012-04-13 16:53:45] uc_update: cairo.our.org/nfs-v2client/nfs_procedure-null: ds[0] = 0.000000
[2012-04-13 16:53:45] plugin: plugin_write: Writing values via write_graphite/rcf-metrics/2003.
[2012-04-13 16:53:45] write_graphite plugin: [rcf-metrics]:2003 buf 452/1428 (31.7 %) "cairo_our_org.collectd.zfs_arc.cache_eviction-eligible.value 0 1334350425
"
[2012-04-13 16:53:45] plugin_dispatch_values: time = 1334350425.060; interval = 30.000; host = cairo.our.org; plugin = zfs_arc; plugin_instance = ; type = cache_eviction; type_instance = ineligible;
[2012-04-13 16:53:45] uc_update: cairo.our.org/zfs_arc/cache_eviction-ineligible: ds[0] = 0.000000
[2012-04-13 16:53:45] plugin: plugin_write: Writing values via write_graphite/rcf-metrics/2003.
[2012-04-13 16:53:45] write_graphite plugin: [rcf-metrics]:2003 buf 529/1428 (37.0 %) "cairo_our_org.collectd.nfs-v2client.nfs_procedure-null.value 0 1334350425
"
[2012-04-13 16:53:45] plugin_dispatch_values: time = 1334350425.061; interval = 30.000; host = cairo.our.org; plugin = nfs; plugin_instance = v2client; type = nfs_procedure; type_instance = getattr;
[2012-04-13 16:53:45] write_graphite plugin: [rcf-metrics]:2003 buf 608/1428 (42.6 %) "cairo_our_org.collectd.zfs_arc.cache_eviction-ineligible.value 0 1334350425
"
[2012-04-13 16:53:45] uc_update: cairo.our.org/nfs-v2client/nfs_procedure-getattr: ds[0] = 0.000000
Assertion failed: ksp != NULL, file common.c, line 640
[2012-04-13 16:53:45] plugin: plugin_write: Writing values via write_graphite/rcf-metrics/2003.
Abort (core dumped)

@jblaine
Copy link
Author

jblaine commented Sep 6, 2012

More data. Note the NaNs:

[2012-09-05 22:59:03] plugin_dispatch_values: time = 1346900343.984; interval = 30.000; host = silmaril.our.
org; plugin = zfs_arc; plugin_instance = ; type = cache_size; type_instance = L2;
[2012-09-05 22:59:03] uc_update: silmaril.our.org/zfs_arc/cache_size-L2: ds[0] = 0.000000
[2012-09-05 22:59:03] plugin: plugin_write: Writing values via write_graphite/rcf-metrics/2003.
[2012-09-05 22:59:03] write_graphite plugin: wg_flush_nolock: timeout = 0.000; send_buf_fill = 1374;
[2012-09-05 22:59:03] write_graphite plugin: [rcf-metrics]:2003 buf 77/1428 (5.4 %) "silmaril_our_org.collec
td.zfs_arc.cache_size-L2.value 0.000000 1346900343
"
[2012-09-05 22:59:03] zfs_arc plugin: Reading kstat value "allocated" failed.
...
[2012-09-05 22:59:03] plugin_dispatch_values: time = 1346900343.990; interval = 30.000; host = silmaril.our.
org; plugin = zfs_arc; plugin_instance = ; type = cache_ratio; type_instance = L2;
[2012-09-05 22:59:03] uc_update: silmaril.our.org/zfs_arc/cache_ratio-L2: ds[0] = NaN
[2012-09-05 22:59:03] kstat chain has been updated
...
[2012-09-05 22:59:04] plugin_read_thread: Effective interval of the memory plugin is 30.000000000.
[2012-09-05 22:59:04] plugin_read_thread: Next read of the memory plugin at 1346900373.989394022.
[2012-09-05 22:59:03] plugin: plugin_write: Writing values via write_graphite/rcf-metrics/2003.
[2012-09-05 22:59:04] write_graphite plugin: [rcf-metrics]:2003 buf 158/1428 (11.1 %) "silmaril_our_org.collectd.zfs_arc.cache_ratio-L2.value NaN 1346900343
"
Assertion failed: ksp != NULL, file common.c, line 640
Abort (core dumped)

octo added a commit to octo/collectd that referenced this issue Sep 6, 2012
…odes.

Rather than asserting that an argument is not NULL, check this condition
and return an error code.

This should fix Github issue collectd#71.
@octo
Copy link
Member

octo commented Sep 6, 2012

Hi @jblaine,

could you test the changes in PR #126?

Thanks and best regards,
—octo

@octo
Copy link
Member

octo commented Sep 10, 2012

For convenience, I'll copy the information you gave on the pull request to here:

[2012-09-07 13:10:54] get_kstat_value ("deleted"): ksp is NULL.
[2012-09-07 13:10:54] get_kstat_value ("stolen"): ksp is NULL.
[2012-09-07 13:10:54] get_kstat_value ("mutex_miss"): ksp is NULL.
[...]
[2012-09-07 13:10:54] get_kstat_value ("l2_read_bytes"): ksp is NULL.
[2012-09-07 13:10:54] get_kstat_value ("l2_write_bytes"): ksp is NULL.

[2012-09-07 13:35:54] get_kstat_value ("deleted"): ksp is NULL.
[2012-09-07 13:35:54] get_kstat_value ("stolen"): ksp is NULL.
[2012-09-07 13:35:54] get_kstat_value ("mutex_miss"): ksp is NULL.
[...]
[2012-09-07 13:35:54] get_kstat_value ("l2_read_bytes"): ksp is NULL.
[2012-09-07 13:35:54] get_kstat_value ("l2_write_bytes"): ksp is NULL.

[2012-09-07 15:35:54] get_kstat_value ("deleted"): ksp is NULL.
[2012-09-07 15:35:54] get_kstat_value ("stolen"): ksp is NULL.
[2012-09-07 15:35:54] get_kstat_value ("mutex_miss"): ksp is NULL.
[...]
[2012-09-07 15:35:54] get_kstat_value ("l2_read_bytes"): ksp is NULL.
[2012-09-07 15:35:54] get_kstat_value ("l2_write_bytes"): ksp is NULL.

[2012-09-07 21:03:54] get_kstat_value ("deleted"): ksp is NULL.
[2012-09-07 21:03:54] get_kstat_value ("stolen"): ksp is NULL.
[...]
[2012-09-07 21:03:54] get_kstat_value ("l2_read_bytes"): ksp is NULL.
[2012-09-07 21:03:54] get_kstat_value ("l2_write_bytes"): ksp is NULL.

[2012-09-08 04:01:24] get_kstat_value ("deleted"): ksp is NULL.
[2012-09-08 04:01:24] get_kstat_value ("stolen"): ksp is NULL.
[2012-09-08 04:01:24] get_kstat_value ("mutex_miss"): ksp is NULL.
[...]
[2012-09-08 04:01:24] get_kstat_value ("l2_read_bytes"): ksp is NULL.
[2012-09-08 04:01:24] get_kstat_value ("l2_write_bytes"): ksp is NULL.

Interestingly, this starts with the value deleted every time the problem comes up. This is from the following line in the ZFS-ARC plugin:

        /* Operations */
        za_read_derive (ksp, "allocated","cache_operation", "allocated");
        za_read_derive (ksp, "deleted",  "cache_operation", "deleted");
        za_read_derive (ksp, "stolen",   "cache_operation", "stolen");

So reading the field "allocated" works, while reading "deleted" fails. But why doesn't the ZFS ARC plugin print an error message? When printing that message, get_kstat_value() should return an error indicator, which should result in a second error being printed by the ZFS ARC plugin.

Can you try the following in the command line?:

kstat -m zfs -i 0 -n arcstats

Best regards,
—octo

@jblaine
Copy link
Author

jblaine commented Sep 10, 2012

Yes, the errors are being reported from zfs_arc -- I had omitted them due to size.

Here are 2 full blocks from that time:

[2012-09-07 13:10:54] kstat chain has been updated
[2012-09-07 13:10:54] zfs_arc plugin: Reading kstat value "allocated" failed.
[2012-09-07 13:10:54] get_kstat_value ("deleted"): ksp is NULL.
[2012-09-07 13:10:54] zfs_arc plugin: Reading kstat value "deleted" failed.
[2012-09-07 13:10:54] get_kstat_value ("stolen"): ksp is NULL.
[2012-09-07 13:10:54] zfs_arc plugin: Reading kstat value "stolen" failed.
[2012-09-07 13:10:54] get_kstat_value ("mutex_miss"): ksp is NULL.
[2012-09-07 13:10:54] zfs_arc plugin: Reading kstat value "mutex_miss" failed.
[2012-09-07 13:10:54] get_kstat_value ("hash_collisions"): ksp is NULL.
[2012-09-07 13:10:54] zfs_arc plugin: Reading kstat value "hash_collisions" failed.
[2012-09-07 13:10:54] get_kstat_value ("evict_l2_cached"): ksp is NULL.
[2012-09-07 13:10:54] zfs_arc plugin: Reading kstat value "evict_l2_cached" failed.
[2012-09-07 13:10:54] get_kstat_value ("evict_l2_eligible"): ksp is NULL.
[2012-09-07 13:10:54] zfs_arc plugin: Reading kstat value "evict_l2_eligible" failed.
[2012-09-07 13:10:54] get_kstat_value ("evict_l2_ineligible"): ksp is NULL.
[2012-09-07 13:10:54] zfs_arc plugin: Reading kstat value "evict_l2_ineligible" failed.
[2012-09-07 13:10:54] get_kstat_value ("demand_data_hits"): ksp is NULL.
[2012-09-07 13:10:54] zfs_arc plugin: Reading kstat value "demand_data_hits" failed.
[2012-09-07 13:10:54] get_kstat_value ("demand_metadata_hits"): ksp is NULL.
[2012-09-07 13:10:54] zfs_arc plugin: Reading kstat value "demand_metadata_hits" failed.
[2012-09-07 13:10:54] get_kstat_value ("prefetch_data_hits"): ksp is NULL.
[2012-09-07 13:10:54] zfs_arc plugin: Reading kstat value "prefetch_data_hits" failed.
[2012-09-07 13:10:54] get_kstat_value ("prefetch_metadata_hits"): ksp is NULL.
[2012-09-07 13:10:54] zfs_arc plugin: Reading kstat value "prefetch_metadata_hits" failed.
[2012-09-07 13:10:54] get_kstat_value ("demand_data_misses"): ksp is NULL.
[2012-09-07 13:10:54] zfs_arc plugin: Reading kstat value "demand_data_misses" failed.
[2012-09-07 13:10:54] get_kstat_value ("demand_metadata_misses"): ksp is NULL.
[2012-09-07 13:10:54] zfs_arc plugin: Reading kstat value "demand_metadata_misses" failed.
[2012-09-07 13:10:54] get_kstat_value ("prefetch_data_misses"): ksp is NULL.
[2012-09-07 13:10:54] zfs_arc plugin: Reading kstat value "prefetch_data_misses" failed.
[2012-09-07 13:10:54] get_kstat_value ("prefetch_metadata_misses"): ksp is NULL.
[2012-09-07 13:10:54] zfs_arc plugin: Reading kstat value "prefetch_metadata_misses" failed.
[2012-09-07 13:10:54] get_kstat_value ("hits"): ksp is NULL.
[2012-09-07 13:10:54] get_kstat_value ("misses"): ksp is NULL.
[2012-09-07 13:10:54] get_kstat_value ("l2_hits"): ksp is NULL.
[2012-09-07 13:10:54] get_kstat_value ("l2_misses"): ksp is NULL.
[2012-09-07 13:10:54] get_kstat_value ("l2_read_bytes"): ksp is NULL.
[2012-09-07 13:10:54] get_kstat_value ("l2_write_bytes"): ksp is NULL.

...

[2012-09-07 13:35:54] kstat chain has been updated
[2012-09-07 13:35:54] zfs_arc plugin: Reading kstat value "allocated" failed.
[2012-09-07 13:35:54] get_kstat_value ("deleted"): ksp is NULL.
[2012-09-07 13:35:54] zfs_arc plugin: Reading kstat value "deleted" failed.
[2012-09-07 13:35:54] get_kstat_value ("stolen"): ksp is NULL.
[2012-09-07 13:35:54] zfs_arc plugin: Reading kstat value "stolen" failed.
[2012-09-07 13:35:54] get_kstat_value ("mutex_miss"): ksp is NULL.
[2012-09-07 13:35:54] zfs_arc plugin: Reading kstat value "mutex_miss" failed.
[2012-09-07 13:35:54] get_kstat_value ("hash_collisions"): ksp is NULL.
[2012-09-07 13:35:54] zfs_arc plugin: Reading kstat value "hash_collisions" failed.
[2012-09-07 13:35:54] get_kstat_value ("evict_l2_cached"): ksp is NULL.
[2012-09-07 13:35:54] zfs_arc plugin: Reading kstat value "evict_l2_cached" failed.
[2012-09-07 13:35:54] get_kstat_value ("evict_l2_eligible"): ksp is NULL.
[2012-09-07 13:35:54] zfs_arc plugin: Reading kstat value "evict_l2_eligible" failed.
[2012-09-07 13:35:54] get_kstat_value ("evict_l2_ineligible"): ksp is NULL.
[2012-09-07 13:35:54] zfs_arc plugin: Reading kstat value "evict_l2_ineligible" failed.
[2012-09-07 13:35:54] get_kstat_value ("demand_data_hits"): ksp is NULL.
[2012-09-07 13:35:54] zfs_arc plugin: Reading kstat value "demand_data_hits" failed.
[2012-09-07 13:35:54] get_kstat_value ("demand_metadata_hits"): ksp is NULL.
[2012-09-07 13:35:54] zfs_arc plugin: Reading kstat value "demand_metadata_hits" failed.
[2012-09-07 13:35:54] get_kstat_value ("prefetch_data_hits"): ksp is NULL.
[2012-09-07 13:35:54] zfs_arc plugin: Reading kstat value "prefetch_data_hits" failed.
[2012-09-07 13:35:54] get_kstat_value ("prefetch_metadata_hits"): ksp is NULL.
[2012-09-07 13:35:54] zfs_arc plugin: Reading kstat value "prefetch_metadata_hits" failed.
[2012-09-07 13:35:54] get_kstat_value ("demand_data_misses"): ksp is NULL.
[2012-09-07 13:35:54] zfs_arc plugin: Reading kstat value "demand_data_misses" failed.
[2012-09-07 13:35:54] get_kstat_value ("demand_metadata_misses"): ksp is NULL.
[2012-09-07 13:35:54] zfs_arc plugin: Reading kstat value "demand_metadata_misses" failed.
[2012-09-07 13:35:54] get_kstat_value ("prefetch_data_misses"): ksp is NULL.
[2012-09-07 13:35:54] zfs_arc plugin: Reading kstat value "prefetch_data_misses" failed.
[2012-09-07 13:35:54] get_kstat_value ("prefetch_metadata_misses"): ksp is NULL.
[2012-09-07 13:35:54] zfs_arc plugin: Reading kstat value "prefetch_metadata_misses" failed.
[2012-09-07 13:35:54] get_kstat_value ("hits"): ksp is NULL.
[2012-09-07 13:35:54] get_kstat_value ("misses"): ksp is NULL.
[2012-09-07 13:35:54] get_kstat_value ("l2_hits"): ksp is NULL.
[2012-09-07 13:35:54] get_kstat_value ("l2_misses"): ksp is NULL.
[2012-09-07 13:35:54] get_kstat_value ("l2_read_bytes"): ksp is NULL.
[2012-09-07 13:35:54] get_kstat_value ("l2_write_bytes"): ksp is NULL.
adm:silmaril>

@jblaine
Copy link
Author

jblaine commented Sep 10, 2012

kstat -m zfs -i 0 -n arcstats runs fine.


module: zfs                             instance: 0
name:   arcstats                        class:    misc
        c                               528136002
        c_max                           1578153984
        c_min                           197269248
        crtime                          66.794009685
        data_size                       350626816
        deleted                         6274390
        demand_data_hits                9378713
        demand_data_misses              1138984
        demand_metadata_hits            41489545
        demand_metadata_misses          4188808
        evict_l2_cached                 0
        evict_l2_eligible               330758987264
        evict_l2_ineligible             69180891136
        evict_skip                      110616641
        hash_chain_max                  16
        hash_chains                     9690
        hash_collisions                 22375272
        hash_elements                   35613
        hash_elements_max               127411
        hdr_size                        6646104
        hits                            71039707
        l2_abort_lowmem                 0
        l2_cksum_bad                    0
        l2_evict_lock_retry             0
        l2_evict_reading                0
        l2_feeds                        0
        l2_free_on_write                0
        l2_hdr_size                     0
        l2_hits                         0
        l2_io_error                     0
        l2_misses                       0
        l2_read_bytes                   0
        l2_rw_clash                     0
        l2_size                         0
        l2_write_bytes                  0
        l2_writes_done                  0
        l2_writes_error                 0
        l2_writes_hdr_miss              0
        l2_writes_sent                  0
        memory_throttle_count           1874
        mfu_ghost_hits                  2007968
        mfu_hits                        32130850
        misses                          10350139
        mru_ghost_hits                  4061865
        mru_hits                        20174786
        mutex_miss                      2
        other_size                      118803456
        p                               334877357
        prefetch_data_hits              1106754
        prefetch_data_misses            1294671
        prefetch_metadata_hits          19064695
        prefetch_metadata_misses        3727676
        recycle_miss                    3809672
        size                            476076376
        snaptime                        3815140.9903571

@octo
Copy link
Member

octo commented Sep 11, 2012

Okay, then it's not ZFS ARC as such that's broken, it's more that the kstat handling needs to be corrected.

[2012-09-07 13:10:54] kstat chain has been updated
[2012-09-07 13:10:54] zfs_arc plugin: Reading kstat value "allocated" failed.

One of the threads is checking the kstat chain periodically and updates it (and is calling init functions) when needed. This isn't handled gracefully in the ZFS ARC plugin and may result in this problem. I'll write a patch now that I know what's going on.

@jblaine
Copy link
Author

jblaine commented Sep 11, 2012

I'm not sure if it matters, but note too that the kstat command output above does not even show an "allocated" stat. There is also no 'stolen' shown, but many errors: "[2012-09-11 10:00:54] zfs_arc plugin: Reading kstat value "stolen" failed."

@octo
Copy link
Member

octo commented Sep 12, 2012

Good point. It seems to work most of the time though, right? Or are you getting error messages about "allocated" all the time?

@jblaine
Copy link
Author

jblaine commented Sep 12, 2012

Looking now I see that those have been all the time. I started the current test collectd on Sep 7. Every Interval seconds the following appears:

[2012-09-12 11:27:54] zfs_arc plugin: Reading kstat value "allocated" failed.
[2012-09-12 11:27:54] zfs_arc plugin: Reading kstat value "stolen" failed.
[2012-09-12 11:27:54] plugin_dispatch_values: Dataset not found: mutex_operation (from "silmaril.our.org/zfs_arc/mutex_operation-miss"), check your types.db!

@ymettier
Copy link
Contributor

Hello,

In src/types.db (and in /usr/share/collectd/types.db) :

mutex_operations        value:DERIVE:0:U

in src/zfs_arc.c

za_read_derive (ksp, "mutex_miss", "mutex_operation", "miss");

Notice the final "s" in mutex_operations in src/types.db.

Short fix : update your /usr/share/collectd/types.db

Real fix : same thing in src/types.db?

Regards,
Yves

@octo
Copy link
Member

octo commented Nov 29, 2012

@ymettier Looks like the bug is in the code since 4f5234d. I've fixed this in 4d99b79.

Best regards,
—octo

@jblaine
Copy link
Author

jblaine commented Dec 4, 2012

Good eye, Yves.

@octo octo closed this as completed Apr 6, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug A genuine bug
Projects
None yet
Development

No branches or pull requests

3 participants