Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon: change osdmap flags set and unset messages #9252

Merged
merged 1 commit into from Jul 11, 2016

Conversation

vumrao
Copy link
Contributor

@vumrao vumrao commented May 22, 2016

mon : change osdmap flags set and unset message

Fixes: http://tracker.ceph.com/issues/15983

Signed-off-by: Vikhyat Umrao vumrao@redhat.com

@vumrao
Copy link
Contributor Author

vumrao commented May 22, 2016

  • I have tested this change with vstart cluster and now results are as follows :
  • change log 1 : after Kefu suggestion now messages would be : Error EINVAL: noout flag is already set and Error EINVAL: noout flag is not set
  • change log 2 : after Sage inputs and discussion with Kefu now it is same as earlier before change log 1, we should return true without any error in ceph osd set and unset commands.
$ ./ceph osd set noout
set noout

$./ceph -s
    cluster 23bf9597-e8ae-4d04-9c49-4109f5c4bfcd
     health HEALTH_WARN
            noout,sortbitwise flag(s) set
     monmap e1: 3 mons at {a=192.168.200.47:6789/0,b=192.168.200.47:6790/0,c=192.168.200.47:6791/0}
            election epoch 6, quorum 0,1,2 a,b,c
      fsmap e5: 1/1/1 up {0=b=up:active}, 2 up:standby
     osdmap e12: 3 osds: 3 up, 3 in
            flags noout,sortbitwise
      pgmap v40: 24 pgs, 3 pools, 2068 bytes data, 20 objects
            143 GB used, 156 GB / 299 GB avail
                  24 active+clean

$ ./ceph osd set noout
noout flag is already set

$ ./ceph osd unset noout
unset noout

$ ./ceph -s
    cluster 23bf9597-e8ae-4d04-9c49-4109f5c4bfcd
     health HEALTH_OK
     monmap e1: 3 mons at {a=192.168.200.47:6789/0,b=192.168.200.47:6790/0,c=192.168.200.47:6791/0}
            election epoch 6, quorum 0,1,2 a,b,c
      fsmap e5: 1/1/1 up {0=b=up:active}, 2 up:standby
     osdmap e14: 3 osds: 3 up, 3 in
            flags sortbitwise
      pgmap v108: 24 pgs, 3 pools, 2068 bytes data, 20 objects
            143 GB used, 156 GB / 299 GB avail
                  24 active+clean

$ ./ceph osd unset noout
 noout flag is not set

@vumrao vumrao changed the title mon : add osdmap flags set and unset message when mon : add osdmap flags set and unset message when osdmap flags are already set and unset May 22, 2016
@tchaikov
Copy link
Contributor

@vumrao could you add a simple test in qa/workunits/cephtool/test.sh?

pending_inc.new_flags |= flag;
ss << "set " << OSDMap::get_flag_string(flag);
if (osdmap.test_flag(flag)) {
ss << OSDMap::get_flag_string(flag) << " flag is already set";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could return false here right away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack.

Copy link
Contributor Author

@vumrao vumrao May 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • After applying your suggestion , I tested $ceph osd set and $ceph osd unset commands just hangs if I try to return false without calling wait_for_finished_proposal() in if condition and if I try to return false with calling wait_for_finished_proposal() then also same behavior.
  • To me it looks like as $ceph osd set and $ceph osd unset both expect to return true with wait_for_finished_proposal() as I was doing.
  • As if we are setting a already set flag means condition is true as this is the behavior of $ceph osd set we should return true as flag is already set.
  • Same goes for unset as in unset also if we unset a flag which is not set then also we should return true as $ceph osd unset behavior is to flag should be unset and which is the case here.
  • Please let me know if my understanding is wrong.

@vumrao
Copy link
Contributor Author

vumrao commented May 23, 2016

@tchaikov sure will do it.

@vumrao vumrao force-pushed the wip-vumrao-15983 branch 3 times, most recently from e2952af to 16db7e0 Compare May 23, 2016 19:05
@vumrao
Copy link
Contributor Author

vumrao commented May 23, 2016

@tchaikov I have added a simple test in qa/workunits/cephtool/test.sh. I am not sure if this is how I should write, as this is first test case for me :)

  • Currently make check is running in my local branch.

@vumrao
Copy link
Contributor Author

vumrao commented May 24, 2016

Thank you @tchaikov for your help and nice discussion in #ceph-devel. As discussed will make the changes and will send the updated patch.

@vumrao vumrao force-pushed the wip-vumrao-15983 branch 3 times, most recently from c606f49 to fb1d3a5 Compare May 24, 2016 07:51
@vumrao
Copy link
Contributor Author

vumrao commented May 24, 2016

  • Need to look into sortbitwise flag test as this flag is by default set :
- qa/workunits/cephtool/test.sh:1096: test_mon_osd:  for f in noup nodown noin noout noscrub
   nodeep-scrub nobackfill norebalance norecover notieragent full sortbitwise
- qa/workunits/cephtool/test.sh:1098: test_mon_osd:  ceph osd set sortbitwise
   Error EINVAL: sortbitwise flag is already set
  • Need to look into full flag test as this flag gets unset if any OSD in cluster is not full and hence:
 - qa/workunits/cephtool/test.sh:1096: test_mon_osd:  for f in noup nodown noin noout noscrub
    nodeep-scrub nobackfill norebalance norecover notieragent full sortbitwise
 - qa/workunits/cephtool/test.sh:1098: test_mon_osd:  ceph osd set full
    set full

-  As it got unset in between as there was no OSD in cluster which was full 

 -  qa/workunits/cephtool/test.sh:1099: test_mon_osd:  ceph osd unset full
    Error EINVAL: full flag is not set

@vumrao
Copy link
Contributor Author

vumrao commented May 24, 2016

  • I have tested this current commit locally and now it is not failing in code change and it has one fail test in chain_xattr which is not related to code change in this commit :
[       OK ] chain_xattr.chunk_aligned (2272 ms)
[ RUN      ] chain_xattr.listxattr
[       OK ] chain_xattr.listxattr (85 ms)
[ RUN      ] chain_xattr.fskip_chain_cleanup_and_ensure_single_attr
test/objectstore/chain_xattr.cc:371: Failure
Value of: 1UL
  Actual: 1
Expected: get_xattrs(fd).size()
Which is: 2
[  FAILED  ] chain_xattr.fskip_chain_cleanup_and_ensure_single_attr (0 ms)
[ RUN      ] chain_xattr.skip_chain_cleanup_and_ensure_single_attr
test/objectstore/chain_xattr.cc:415: Failure

@tchaikov
Copy link
Contributor

@vumrao are you using selinux could you check the files in your testdir to see if it has some xattr already?

@vumrao
Copy link
Contributor Author

vumrao commented May 24, 2016

@tchaikov yes you are right Kefu, I have selinux enabled in my local box and also xattr file :

 $ getenforce 
Enforcing

 $ ll chain_xattr 
-rwx------. 1 vumrao vumrao 0 May 24 19:43 chain_xattr

$ getfattr -m . -d -e hex chain_xattr 
# file: chain_xattr
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a757365725f686f6d655f743a733000
user.foo=0xb8b8b8b8b8b8b8b8b8b8b8b8b 
[....]

@tchaikov
Copy link
Contributor

lgtm.

@vumrao i think we can either fix this or leave as it is atm. 5abc95d fixed chain_xattr.listxattr last time.

@liewegas liewegas removed the needs-qa label May 24, 2016
@liewegas
Copy link
Member

This isn't right. The mon commands must be idempotent. There is a debug option somewhere in fact that sends every command twice and asserts that if the first succeeds, teh second succeeds too.

@vumrao
Copy link
Contributor Author

vumrao commented May 25, 2016

Thanks Sage for your inputs. I had discussion with Kefu will modify the patch.

@vumrao
Copy link
Contributor Author

vumrao commented May 25, 2016

make[3]: *** [libec_jerasure.la] Error 1
../libtool: line 9822: .libs/libceph_example.lai: No space left on device
ln: failed to create symbolic link 'libceph_example.la': No space left on device
make[3]: *** [libceph_example.la] Error 1

@vumrao
Copy link
Contributor Author

vumrao commented May 25, 2016

test this please.

@tchaikov
Copy link
Contributor

@vumrao thanks filed http://tracker.ceph.com/issues/16014

@vumrao
Copy link
Contributor Author

vumrao commented May 27, 2016

test this please.

1 similar comment
@tchaikov
Copy link
Contributor

test this please.

ceph osd set noout >& $TMPFILE || return 1
check_response "set noout"
ceph osd set noout >& $TMPFILE || return 1
expect_failure $TMPDIR "noout flag is already set"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vumrao this should not fail. as per sage

The mon commands must be idempotent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kchai ahh sorry , right it should be check_response() instead of expect_failure().
Right, as sage pointed earlier:

The mon commands must be idempotent

I think it will apply to all the new tests for new messages ? will change it or do you think any other change is needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. just the new messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kefu, sure will make the changes and test it locally and then will send the updated commit.

@tchaikov
Copy link
Contributor

lgtm.

@vumrao
Copy link
Contributor Author

vumrao commented Jun 29, 2016

  • This branch has conflicts. Doing rebase.

@tchaikov
Copy link
Contributor

tchaikov commented Jul 4, 2016

missed it in last qa batch. will include it in next run.

@vumrao
Copy link
Contributor Author

vumrao commented Jul 4, 2016

@tchaikov thank you !

@liewegas
Copy link
Member

liewegas commented Jul 4, 2016

This PR is still wrong. At any time the MonClient may resend a command (e.g., due ot a transient network disconnect, or a mon quorum change). Making messages like "set foo" and "foo already set" will be incorrect in some cases. This is why we try to use messages like "flag foo is (or was already) set".

@tchaikov tchaikov removed the needs-qa label Jul 4, 2016
@vumrao
Copy link
Contributor Author

vumrao commented Jul 5, 2016

Thank you Sage for your inputs. I will change the messages and will modify the test cases.

Fixes: http://tracker.ceph.com/issues/15983

Signed-off-by: Vikhyat Umrao <vumrao@redhat.com>
@vumrao vumrao changed the title mon : add osdmap flags set and unset message when osdmap flags are already set and unset mon : change osdmap flags set and unset message Jul 7, 2016
@vumrao vumrao changed the title mon : change osdmap flags set and unset message mon : change osdmap flags set and unset messages Jul 7, 2016
@tchaikov
Copy link
Contributor

tchaikov commented Jul 7, 2016

lgtm. @liewegas what do you think?

@liewegas liewegas changed the title mon : change osdmap flags set and unset messages mon: change osdmap flags set and unset messages Jul 7, 2016
@liewegas
Copy link
Member

liewegas commented Jul 7, 2016

looks good, thanks!

@tchaikov
Copy link
Contributor

tested in http://pulpito.ceph.com/kchai-2016-07-10_07:48:29-rados-wip-kefu-testing---basic-mira/

the non-environmental failures are addressed by #10234 .

@tchaikov tchaikov merged commit 35ce55d into ceph:master Jul 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants