Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

librados: new style (sharded) object listing #6405

Merged
merged 12 commits into from Dec 17, 2015
Merged

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Oct 28, 2015

This is a rebase of what was wip-9963-sage

@jcsp jcsp mentioned this pull request Oct 28, 2015
@ghost ghost added the cephfs Ceph File System label Oct 28, 2015
const size_t n,
const size_t m,
rados_enumerate_cursor *split_start,
rados_enumerate_cursor *split_finish);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this get a comment documetning it? i would expect a single output argument (the cursor at the approx middle/split point)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a minute to remember so docstring definitely needed...

The dual output arguments get you a range from somewhere within the input start/finish, basically (chunk n of m). Maybe "slice" is a better name than "split" here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liewegas
Copy link
Member

for the C++ API, object_list_{begin,end} is awkward but consistent with the C API and with other list methods...

@jcsp jcsp changed the title Sharded PGLS (#9963) librados: new style (sharded) object listing (#9963) Nov 25, 2015
@jcsp
Copy link
Contributor Author

jcsp commented Nov 25, 2015

Updated with C++ bits.

@jcsp
Copy link
Contributor Author

jcsp commented Nov 25, 2015

The reason I didn't do an STL-style iterator thing is that it's a pretty weird API to hide network IO like that, and exposing the "give me the next N + return an error code" interface adds very little burden to the caller.

@liewegas
Copy link
Member

Agreed.. I found the STL-style iterator annoying anyway.

{
std::vector<ObjectItem> result;
int r = ioctx.object_list(c, end, 12, &result, &c);
ASSERT_GE(r, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could ASSERT_EQ(r, result.size()) here too

@jcsp jcsp force-pushed the wip-9963 branch 2 times, most recently from 4acc2e8 to f19390d Compare November 27, 2015 17:32
@jcsp
Copy link
Contributor Author

jcsp commented Nov 27, 2015

Updated

@jcsp jcsp self-assigned this Nov 27, 2015
@jcsp
Copy link
Contributor Author

jcsp commented Nov 30, 2015

loic-bot's failure is weird, because my test branch (including this) is building on the gitbuilders, and when I try an autotools build locally it works as well.

It's failing like this:

  CXXLD    unittest_erasure_code_plugin_shec
  CXXLD    unittest_erasure_code_example
  CXXLD    unittest_librados
  CXXLD    unittest_librados_config
  CXXLD    unittest_journal
  CXXLD    unittest_rbd_replay
  CXXLD    unittest_encoding
  CXXLD    unittest_base64
  CXXLD    unittest_run_cmd
  CXXLD    unittest_simple_spin
  CXXLD    unittest_libcephfs_config
  CXXLD    unittest_mon_moncap
  CXXLD    unittest_mon_pgmap
  CXXLD    unittest_ecbackend
  CXXLD    unittest_osdscrub
  CXXLD    unittest_pglog
  CXXLD    unittest_hitset
  CXXLD    unittest_osd_osdcap
  CXXLD    unittest_chain_xattr
  CXXLD    unittest_lfnindex
  CXXLD    unittest_mds_authcap
  CXXLD    unittest_addrs
  CXXLD    unittest_blkdev
  CXXLD    unittest_bloom_filter
  CXXLD    unittest_histogram
  CXXLD    unittest_prioritized_queue
  CXXLD    unittest_str_map
./.libs/libradostest.a(libradostest_la-test.o): In function `wait_for_healthy(void**)':
/home/jenkins/workspace/ceph/LABELS/ceph-centos-7-jenkins/src/test/librados/test.cc:67: undefined reference to `rados_buffer_free'
collect2: error: ld returned 1 exit status
  CXXLD    unittest_sharedptr_registry
make[4]: *** [unittest_journal] Error 1
make[4]: *** Waiting for unfinished jobs....
make[4]: Leaving directory `/home/jenkins/workspace/ceph/LABELS/ceph-centos-7-jenkins/src'
make[3]: *** [check-am] Error 2
make[3]: Leaving directory `/home/jenkins/workspace/ceph/LABELS/ceph-centos-7-jenkins/src'
make[2]: *** [check-recursive] Error 1
make[2]: Leaving directory `/home/jenkins/workspace/ceph/LABELS/ceph-centos-7-jenkins/src'
make[1]: *** [check] Error 2
make[1]: Leaving directory `/home/jenkins/workspace/ceph/LABELS/ceph-centos-7-jenkins/src'
make: *** [check-recursive] Error 1

It also makes no sense that rados_buffer_free would be missing from anything that was linking with libradostest, because that lib was already using rados_mon_command et al.

@jcsp
Copy link
Contributor Author

jcsp commented Nov 30, 2015

It's unittest_journal, which doesn't correctly link with librados

@jcsp
Copy link
Contributor Author

jcsp commented Dec 1, 2015

See a run with no failures attributable to this PR here: http://pulpito.ceph.com/jspray-2015-11-30_02:06:10-fs-wip-testing-jcsp---basic-multi/

This is good to go from my POV, I'll hook DataScan into it in a separate PR.

liewegas and others added 10 commits December 3, 2015 14:57
Convenience methods to easily get the bounds of the object range
that are managed by a given PG.  Note that this is a function of
the pool pg_num because otherwise the pg_t id only gives us the
start position.

Signed-off-by: Sage Weil <sage@redhat.com>
For the namespaced PGNLS op, set the handle to the start
of the next PG when we reach the end of the current PG.

We do this OSD side rather than client side so that
the client doesn't have to wait for the OSD's map
epoch in order to correctly calculate the next hash
in the case of PG splitting.

Only do this when the cluster is in bitwise mode to avoid
totally weird results when the client is in nibblewise
mode--the nibblewise sort order doesn't map contiguous
ranges of the hash to PGs.

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: John Spray <john.spray@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: John Spray <john.spray@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Because some unit tests benefit from having a
non-power-of-two pg_num.

Signed-off-by: John Spray <john.spray@redhat.com>
Signed-off-by: John Spray <john.spray@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
The existing request machinery already maps the hash to a pg; use
it.  Among other things, it means that a split will make us
recalculate and resend using existing code paths.

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: John Spray <john.spray@redhat.com>
Signed-off-by: John Spray <john.spray@redhat.com>
It is possible in pathological cases for the OSD to return 0
items with a non-MAX next value (indicating the objecter should
ask for more).  In particular, this is easy to reproduce with
max_entries = 1 as the OSD will list and ignore the pgmeta object,
return 0 items, and have next = the next object.

I check back as far as dumpling and the OSD is setting next=MAX
at the end of the PG (and result = 1, FWIW), so using next
here instead of inferring anything from the number of result
items is safe.

Signed-off-by: Sage Weil <sage@redhat.com>
These weren't explicitly linking librados,
so were prone to missing symbols on changes
in librados_test

Signed-off-by: John Spray <john.spray@redhat.com>
@jcsp
Copy link
Contributor Author

jcsp commented Dec 3, 2015

Oops, this lost the update to unit test linking when I was adding sage's commits. repushed.

@liewegas
Copy link
Member

liewegas commented Dec 4, 2015

2015-12-03T20:09:44.741 DEBUG:tasks.ceph.mon.d:waiting for process to exit
2015-12-03T20:09:49.011 INFO:tasks.workunit.client.0.plana17.stdout:test/librados/list.cc:731: Failure
2015-12-03T20:09:49.011 INFO:tasks.workunit.client.0.plana17.stdout:Value of: err_str.empty()
2015-12-03T20:09:49.012 INFO:tasks.workunit.client.0.plana17.stdout: Actual: false
2015-12-03T20:09:49.012 INFO:tasks.workunit.client.0.plana17.stdout:Expected: true
2015-12-03T20:09:49.018 INFO:tasks.workunit.client.0.plana17.stderr:Segmentation fault (core dumped)

from /a/sage-2015-12-03_13:46:35-rados-wip-sage-testing2---basic-multi/1167298

@jcsp
Copy link
Contributor Author

jcsp commented Dec 4, 2015

Huh -- I didn't realise we ran unit tests on a thrashing cluster. In this instance it took about 120 seconds for the test's PGs to get created, which exceeded the 30s timeout in wait_for_healthy. Guess I'll crank up that timeout.

I set this with unperturbed vstart clusters in mind.
The tests are sometimes run with thrashers enabled,
so things might take much longer to advance.

Signed-off-by: John Spray <john.spray@redhat.com>
@jcsp
Copy link
Contributor Author

jcsp commented Dec 4, 2015

Pushed update for timeouts.

@jcsp
Copy link
Contributor Author

jcsp commented Dec 15, 2015

ping @liewegas

@liewegas
Copy link
Member

@jcsp yeah, just waiting for the lab to come back alive so i can run it through again.

liewegas added a commit that referenced this pull request Dec 17, 2015
librados: new style (sharded) object listing

Reviewed-by: Sage Weil <sage@redhat.com>
@liewegas liewegas merged commit a32dd5e into ceph:master Dec 17, 2015
@ghost ghost changed the title librados: new style (sharded) object listing (#9963) librados: new style (sharded) object listing Feb 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants