Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jewel: ceph_volume_client: fix recovery from partial auth update #11656

Merged
merged 6 commits into from Jan 25, 2017

Conversation

ajarr
Copy link
Contributor

@ajarr ajarr commented Oct 26, 2016

Fixes http://tracker.ceph.com/issues/17705

and adds tests in tasks/cephfs/test_volume_clients.py

@ajarr ajarr changed the title ceph_volume_client: fix recovery from partial auth update jewel: ceph_volume_client: fix recovery from partial auth update Oct 26, 2016
@ajarr
Copy link
Contributor Author

ajarr commented Oct 26, 2016

test the PR with ceph/ceph-qa-suite#1221

@jcsp jcsp added the cephfs Ceph File System label Oct 26, 2016
@ajarr ajarr added this to the jewel milestone Oct 26, 2016
@ghost
Copy link

ghost commented Oct 26, 2016

@ajarr could you link to the test results here please ?

@ghost ghost self-assigned this Oct 26, 2016
@ghost ghost added the bug-fix label Oct 26, 2016
@ghost ghost changed the base branch from jewel to jewel-next November 9, 2016 09:57
@ghost
Copy link

ghost commented Nov 9, 2016

jenkins test this please

@ghost ghost changed the title jewel: ceph_volume_client: fix recovery from partial auth update DNM: jewel: ceph_volume_client: fix recovery from partial auth update Nov 9, 2016
@ghost
Copy link

ghost commented Nov 9, 2016

pushed to gitbuilders as wip-17705-jewel so that it can be tested with the required ceph-qa-suite modifications https://github.com/ceph/ceph-qa-suite/pull/1221/files

@ghost
Copy link

ghost commented Nov 9, 2016

jenkins test this please (jenkins general failure)

@ghost
Copy link

ghost commented Nov 9, 2016

jenkins test this please (general jenkins failure)

@ghost
Copy link

ghost commented Nov 14, 2016

@ajarr you can now run a fs suite using wip-17705-jewel and ceph/ceph-qa-suite#1221 . Please let me know if you need help doing that.

@ajarr ajarr changed the title DNM: jewel: ceph_volume_client: fix recovery from partial auth update jewel: ceph_volume_client: fix recovery from partial auth update Nov 16, 2016
@jcsp
Copy link
Contributor

jcsp commented Nov 17, 2016

ghost pushed a commit that referenced this pull request Nov 23, 2016
…om partial auth update

Reviewed-by: Loic Dachary <ldachary@redhat.com>
@ghost ghost changed the title jewel: ceph_volume_client: fix recovery from partial auth update DNM: jewel: ceph_volume_client: fix recovery from partial auth update Nov 24, 2016
@ghost
Copy link

ghost commented Nov 24, 2016

@ajarr in the context of backports DNM means that it will not be merged in the integration branch. I set that for this pull request because it needs its own run of QA with the corresponding ceph-qa-suite branch.

@smithfarm
Copy link
Contributor

smithfarm commented Nov 24, 2016

Analysis of test failures from @jcsp 's run:

Test 557275 - assert(0) in common/lockdep.cc just like http://tracker.ceph.com/issues/17447

Test 557280:

2016-11-17T21:37:30.227 INFO:tasks.workunit.client.0.smithi087.stdout:test/libcephfs/flock.cc:466: Failure
2016-11-17T21:37:30.227 INFO:tasks.workunit.client.0.smithi087.stdout:Value of: sem_timedwait(&s.sem[1%2], abstime(ts, waitSlowMs))
2016-11-17T21:37:30.228 INFO:tasks.workunit.client.0.smithi087.stdout:  Actual: -1
2016-11-17T21:37:30.228 INFO:tasks.workunit.client.0.smithi087.stdout:Expected: 0
2016-11-17T21:37:30.228 INFO:tasks.workunit.client.0.smithi087.stdout:[  FAILED  ] LibCephFS.InterProcessLocking (5023 ms)

Test 557281: 29 failures in libcephfs-java/test.sh - Java makes my eyes glaze over, but maybe due to this:

2016-11-17T21:33:58.439 INFO:tasks.workunit.client.0.smithi002.stdout:.Loading libcephfs-jni from default path: /usr/lib/jni:/usr/lib64
2016-11-17T21:33:58.440 INFO:tasks.workunit.client.0.smithi002.stdout:Loading libcephfs-jni: /usr/lib64/libcephfs_jni.so
2016-11-17T21:33:58.440 INFO:tasks.workunit.client.0.smithi002.stdout:Loading libcephfs-jni: /usr/lib/jni/libcephfs_jni.so
2016-11-17T21:33:58.441 INFO:tasks.workunit.client.0.smithi002.stdout:Loading libcephfs-jni: Failure!
2016-11-17T21:33:58.527 INFO:tasks.workunit.client.0.smithi002.stdout:E.EEE.E.E.E.E.E.E.E.E.E.E.E.E.E.E.E.E.E.E.E.E.E.E.E.E.E

Test 557316 and test 557321 are similar to 557280

Test 557322 is similar to 557281

@smithfarm
Copy link
Contributor

smithfarm commented Nov 24, 2016

Re-running the 6 failed jobs:

./virtualenv/bin/teuthology-suite --priority 101 --machine-type smithi --email ncutler@suse.cz --ceph wip-17705-jewel --suite-branch wip-17705-jewel --suite fs --filter "$filter"

Full disclosure: the wip-17705-jewel branch is based on jewel, not jewel-next.

5 fail, 1 pass http://pulpito.ceph.com:80/smithfarm-2016-11-24_20:19:25-fs-wip-17705-jewel---basic-smithi/

I think the one that passed is the same test as 557281

@smithfarm
Copy link
Contributor

@jcsp Does my analysis of your run (see preceding two comments) help any?

@ajarr ajarr changed the title DNM: jewel: ceph_volume_client: fix recovery from partial auth update jewel: ceph_volume_client: fix recovery from partial auth update Dec 2, 2016
@ajarr ajarr changed the title jewel: ceph_volume_client: fix recovery from partial auth update DNM: jewel: ceph_volume_client: fix recovery from partial auth update Dec 2, 2016
@jcsp
Copy link
Contributor

jcsp commented Dec 6, 2016

@smithfarm yes, those failure are all ignoreable

@smithfarm smithfarm changed the title DNM: jewel: ceph_volume_client: fix recovery from partial auth update jewel: ceph_volume_client: fix recovery from partial auth update Dec 7, 2016
@ajarr
Copy link
Contributor Author

ajarr commented Dec 19, 2016

@smithfarm @dachary can this PR be merged?

@smithfarm
Copy link
Contributor

@ajarr The tests in ceph/ceph-qa-suite#1221 need to be moved into this PR.

@ghost ghost changed the base branch from jewel-next to jewel December 21, 2016 23:31
@smithfarm
Copy link
Contributor

@ajarr Ping

It needs to be an instance method.

Fixes: http://tracker.ceph.com/issues/17216
Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit 675cb91)
... when recovering from partial auth updates.

Auth update happens in the following order:
auth metadata update, volume metadata update, and then Ceph auth
update.

A partial auth update can happen such that auth metadata is updated,
but the volume metadata isn't updated and is empty, and the auth
update did not propogate to Ceph. When recovering from such a
scenario, check if volume metadata is empty and if so remove the
partial auth update info in auth metadata.

Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit a95de78)
... for volumes whose group_id is None.

Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit 0ab8bad)
... in ceph_volume_client.

Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit f0134a3)
Check that the total size shown by the df output of a mounted volume
is same as the volume size and the quota set on the volume.

Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit 91c74f4)
Signed-off-by: Ramana Raja <rraja@redhat.com>
(cherry picked from commit bb60e01)
@ajarr
Copy link
Contributor Author

ajarr commented Jan 16, 2017

@smithfarm done. missed your comment. sorry about the delay.

@jcsp
Copy link
Contributor

jcsp commented Jan 25, 2017

@jcsp jcsp merged commit dd703bc into ceph:jewel Jan 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-fix cephfs Ceph File System
Projects
None yet
3 participants