-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNM: test: Remove check for "load: jerasure.*lrc" #11501
Conversation
Fix for broken test-erasure-code.sh and test-erasure-eio.sh Signed-off-by: David Zafman <dzafman@redhat.com>
I'm curious about why this is necessary ? |
Low space broke test, saw "flags nearfull,pauserd,pausewr...." Signed-off-by: David Zafman <dzafman@redhat.com>
Added another minor fix for a problem I saw on my build machine due to low space condition. |
The message is logged by ErasureCodePluginRegistry::load (*plugin)->library = library; *ss << __func__ << ": " << plugin_name << " "; |
loading the plugin happens at a very early stage in the mon bootstrap sequence. But there can be a race where CEPH_ARGS='' ceph --admin-daemon $dir/ceph-mon.a.asok log flush || return 1 succeeds before the erasure code plugins are initialized. The log flush command is registered in the CephContext constructor and only later is global_init_preload_erasure_code called. I think a safe way to avoid this rare case is to run ceph -s after the log flush. Does that make sense ? |
1014c67
to
e684bef
Compare
@ldachary The test-erasure-*.sh tests aren't even running in master. I know I saw them failing somewhere. I removed that commit but left my other minor fix. |
They are not running ? Do you mean they have been removed from make check at some point ? |
@ldachary Oops, I disabled them in master a while ago, but had them enabled in a new branch. So I still want a variation of this fix. |
@ldachary I tried a "ceph -s" that failed. Then added a 30 second sleep it still fails the grep! It isn't a rare race for in my build tree. |
@ldachary Using gdb I not only never hit the load() function but the address for it isn't even available for the class " ErasureCodePluginRegistry" Something must be statically linked or something else by default in my build tree. I used vstart.sh to first config up and 1 MON 6 OSD set-up. Then I kill and use gdb to check the ceph-mon start-up.
|
This reverts commit 5bc5533.
e684bef
to
0eb68e7
Compare
@ldachary Just to verify I put the greps back and Jenkins fails for both test the same way on the mon log. Now I'm going to push it with my fix again. I can't get a sleep or ceph -s to make it work. See my gdb comment above.
|
0eb68e7
to
23fc264
Compare
I see they have been disabled in 5bc5533. The global_init_preload_erasure_code calls ErasureCodePluginRegistry::preload which loops over the osd_erasure_code_plugins which has at least jerasure and lrc and will return on error and abort the mon/osd unless it adds the expected message to be printed at verbose level 10. Late september the global_init_preload_erasure_code was moved created to factorize ceph_mon.cc and ceph_osd.cc and the dout(10) was not modified. However they have #define dout_subsys ceph_subsys_mon and #define dout_subsys ceph_subsys_osd which commands a different level of debug than what is in global_init.cc #define dout_subsys ceph_subsys_ I think that raising the proper debug level will fix the problem. @bassamtabbara would have noticed it while working on the patch if the tests had not been disabled. |
This change has been applied to #9304. I did change the dout(10) to dout(0) for the plugin load message. |
Fix for broken test-erasure-code.sh and test-erasure-eio.sh
Signed-off-by: David Zafman dzafman@redhat.com