New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: new style pgls #4919
RFC: new style pgls #4919
Conversation
@liewegas @athanatos see what you think of this so far? The tests don't actually pass; I'm finding that the filestore is not actually returning items in the expected order (i.e. sorted by bit-reversed hash) -- assuming that there's something in its implementation causing that (haven't looked). The math in ReplicatedPG::do_pg_op to calculate the hash offset of the start of the next PG is a bit hacky (takes account of non-power-of-two pg_num though), there is probably a simpler way of expressing that. |
On Wed, 10 Jun 2015, John Spray wrote:
oh.. yeah, there is. the DIR_$hexdigit dirs are sorted lexicographically |
wip-newstore-sort is the branch to rebase this onto |
4e60909
to
0a82b02
Compare
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
We'll need this later when we sort the subdirs bitwise instead of nibblewise. Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Note that this breaks compatibility for the moment. Signed-off-by: Sage Weil <sage@redhat.com>
We will sort [g]hobject_t's bitwise (instead of nibblewise) if all OSDs who have participated in peering support the feature. Note that this means a latecomer PG who does a notify may not have the feature while the acting set operates with the new sort order. We will need to be careful about last_backfill in this case.. that patch it coming. Signed-off-by: Sage Weil <sage@redhat.com>
Note that this doesn't build, but it will be easier to review this way. Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
We need to separate the bitwise and nibblewise implementations because they diverge too much. Start by converting the old code to a working nibblewise sort (compensating for the new ghobject_t sort and new args). Signed-off-by: Sage Weil <sage@redhat.com>
Handle listing by hash with bitwise sorting. This involves futzing with the hex digit sort order by reversing the digit nibbles and so on. Signed-off-by: Sage Weil <sage@redhat.com>
In particular, make SimpleListTest check both sorting methods and asssert that the result is sorted. Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
We need to know what sort order last_backfill is based on. Signed-off-by: Sage Weil <sage@redhat.com>
It's doesn't matter, since it's a deterministic permutation of hte value and we're checking for equality, but avoid the filestore-specific nibblewise key to avoid confusion. Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
…comparator Signed-off-by: Sage Weil <sage@redhat.com>
We initialize all BackfillIntervals when backfill starts. At that point we set the Comparator order via reset(). Signed-off-by: Sage Weil <sage@redhat.com>
Reset the map and set so that the comparator matches the sort order at the start of each backfill. Signed-off-by: Sage Weil <sage@redhat.com>
We need to take care not to clear() any of the maps or sets or else we'll lose the comparator. Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
…kfill Signed-off-by: Sage Weil <sage@redhat.com>
Some work we need to do on any new interval, *including* when the PG is first created and there is no "interval change". Factor out the (mostly new) bits of start_peering_interval() that also need to happen on the initial interval after the PG is first instantiated into on_new_interval(). Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
We need to force a backfill restart even if last_update is up to date. Signed-off-by: Sage Weil <sage@redhat.com>
This shouldn't normally happen, except that another byzantine path put us in backfill even though we had a log from 0'0. If we get back here (for any other reason), behave by scanning the local objects and backfilling them. Signed-off-by: Sage Weil <sage@redhat.com>
If the log is empty, last_backfill != max is meaningless, and will confuse peers later one. Signed-off-by: Sage Weil <sage@redhat.com>
…s complete If we have a full log (from 0'0) then we shouldn't backfill at all--don't force_restart_backfill in that case! Signed-off-by: Sage Weil <sage@redhat.com>
Allow the use of a comparator that has a value, and for that comparator to be adjusted at runtime for the purposes for get_next sort order. Since get_next uses the weak_refs map, that is the only map that needs to be rebuilt. We assume, for simplicity, that the comparator can be constructed with no arguments with some default... otherwise this is messier. Signed-off-by: Sage Weil <sage@redhat.com>
Scrub uses the object_contexts sort order to check whether it is safe to extend the scrub interval (verifying there aren't ops in flight). For this to work we need to use the correct sort order for the object_context map (as seen by SharedLRU<>::get_next()). Signed-off-by: Sage Weil <sage@redhat.com>
Calculate which PG comes next in the bit-reversed hash ordering, and set response.handle to the lowest hash in that PG's range. We do this OSD side rather than client side so that the client doesn't have to wait for the OSD's map epoch in order to correctly calculate the next hash in the case of PG splitting. Signed-off-by: John Spray <john.spray@redhat.com>
Signed-off-by: John Spray <john.spray@redhat.com>
Signed-off-by: John Spray <john.spray@redhat.com>
Because some unit tests benefit from having a non-power-of-two pg_num. Signed-off-by: John Spray <john.spray@redhat.com>
Signed-off-by: John Spray <john.spray@redhat.com>
0a82b02
to
5bb35f6
Compare
see wip-9663-sage |
I think we should do this after infernalis unless there is a real pressing need for it there |
@jcsp this needs to be rebased |
Closed in favour of #6405 |
No description provided.