Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNM: test: Remove check for "load: jerasure.*lrc" #11501

Closed
wants to merge 3 commits into from

Conversation

dzafman
Copy link
Contributor

@dzafman dzafman commented Oct 14, 2016

Fix for broken test-erasure-code.sh and test-erasure-eio.sh

Signed-off-by: David Zafman dzafman@redhat.com

Fix for broken test-erasure-code.sh and test-erasure-eio.sh

Signed-off-by: David Zafman <dzafman@redhat.com>
@dzafman dzafman added the tests label Oct 14, 2016
@dzafman dzafman added this to the kraken milestone Oct 14, 2016
@dzafman dzafman assigned ghost Oct 14, 2016
@ghost
Copy link

ghost commented Oct 14, 2016

I'm curious about why this is necessary ?

@dzafman
Copy link
Contributor Author

dzafman commented Oct 14, 2016

@dachary This commit is the only one removing the logging which includes "load: " that was merged Aug 17 d0cd88b It doesn't look like the erasure code plug-in would be loaded by async msg code but what do I know.

Low space broke test, saw "flags nearfull,pauserd,pausewr...."

Signed-off-by: David Zafman <dzafman@redhat.com>
@dzafman
Copy link
Contributor Author

dzafman commented Oct 14, 2016

Added another minor fix for a problem I saw on my build machine due to low space condition.

@ghost
Copy link

ghost commented Oct 15, 2016

The message is logged by ErasureCodePluginRegistry::load

  (*plugin)->library = library;
  *ss << __func__ << ": " << plugin_name << " ";

@ghost
Copy link

ghost commented Oct 15, 2016

loading the plugin happens at a very early stage in the mon bootstrap sequence. But there can be a race where

CEPH_ARGS='' ceph --admin-daemon $dir/ceph-mon.a.asok log flush || return 1

succeeds before the erasure code plugins are initialized. The log flush command is registered in the CephContext constructor and only later is global_init_preload_erasure_code called.

I think a safe way to avoid this rare case is to run ceph -s after the log flush. Does that make sense ?

@dzafman dzafman changed the title test: Remove check for "load: jerasure.*lrc" test.sh: Make check for flags more robust Oct 17, 2016
@dzafman
Copy link
Contributor Author

dzafman commented Oct 17, 2016

@ldachary The test-erasure-*.sh tests aren't even running in master. I know I saw them failing somewhere.

I removed that commit but left my other minor fix.

@ghost
Copy link

ghost commented Oct 17, 2016

They are not running ? Do you mean they have been removed from make check at some point ?

@dzafman
Copy link
Contributor Author

dzafman commented Oct 17, 2016

@ldachary Oops, I disabled them in master a while ago, but had them enabled in a new branch. So I still want a variation of this fix.

@dzafman
Copy link
Contributor Author

dzafman commented Oct 17, 2016

@ldachary I tried a "ceph -s" that failed. Then added a 30 second sleep it still fails the grep! It isn't a rare race for in my build tree.

@dzafman
Copy link
Contributor Author

dzafman commented Oct 18, 2016

@ldachary Using gdb I not only never hit the load() function but the address for it isn't even available for the class " ErasureCodePluginRegistry" Something must be statically linked or something else by default in my build tree. I used vstart.sh to first config up and 1 MON 6 OSD set-up. Then I kill and use gdb to check the ceph-mon start-up.

(gdb) info breakpoints
Num     Type           Disp Enb Address    What
1       breakpoint     keep y   <PENDING>  ErasureCodePluginRegistry::load

@dzafman
Copy link
Contributor Author

dzafman commented Oct 18, 2016

@ldachary Just to verify I put the greps back and Jenkins fails for both test the same way on the mon log. Now I'm going to push it with my fix again. I can't get a sleep or ceph -s to make it work. See my gdb comment above.

/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/erasure-code/test-erasure-code.sh:35: run:  grep 'load: jerasure.*lrc' testdir/test-erasure-code/mon.a.log
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/erasure-code/test-erasure-code.sh:35: run:  return 1

@dzafman dzafman changed the title test.sh: Make check for flags more robust DNM: test: Remove check for "load: jerasure.*lrc" Oct 18, 2016
@ghost
Copy link

ghost commented Oct 18, 2016

I see they have been disabled in 5bc5533. The global_init_preload_erasure_code calls ErasureCodePluginRegistry::preload which loops over the osd_erasure_code_plugins which has at least jerasure and lrc and will return on error and abort the mon/osd unless it adds the expected message to be printed at verbose level 10.

Late september the global_init_preload_erasure_code was moved created to factorize ceph_mon.cc and ceph_osd.cc and the dout(10) was not modified. However they have

#define dout_subsys ceph_subsys_mon

and

#define dout_subsys ceph_subsys_osd

which commands a different level of debug than what is in global_init.cc

#define dout_subsys ceph_subsys_

I think that raising the proper debug level will fix the problem. @bassamtabbara would have noticed it while working on the patch if the tests had not been disabled.

@dzafman
Copy link
Contributor Author

dzafman commented Oct 27, 2016

This change has been applied to #9304. I did change the dout(10) to dout(0) for the plugin load message.

@dzafman dzafman closed this Oct 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
1 participant