Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore: use block_size for bitmap granularity #10999

Closed
wants to merge 11 commits into from

Conversation

liewegas
Copy link
Member

@liewegas liewegas commented Sep 6, 2016

No description provided.

Our transaction writes are labeled with a seq and uuid
to avoid replaying over garbage.

Two bugs, one real, one potential.

1) The second async compaction transactoin didn't have
its seq and uuid set, so replay always stopped.

2) We were writing two separate transactions, one with
all the new metadata, and the next one with a jump to
the new log offset.  If the first write completed but
it was torn and the second transaction didn't hit disk,
we might see an old transaction with seq == 2 and the
same uuid and replay that instead.

Fix both of these by making the async log txn one single
transaction that jumps directly to the new log offset.

Signed-off-by: Sage Weil <sage@redhat.com>
(they use + instead of ~)

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Rewrote much of the persistence of onode metadata.  The
highlights:

 - extents and blobs stored together (the blob with the
   first referencing extent).
 - extents sharded across multiple k/v keys
 - if a blob if referenced from multiple blobs, it's
   stored in the onode key (called a "spanning blob").
 - when we clone a blob we copy the metadata, but mark
   it shared and put (just) the ref_map on the underlying
   blocks in a shared_blob key.  at this point we also
   assign a globally unique id (sbid = shared blob id)
   so the key has a unique name.
 - we instantiate a SharedBlob in memory regardless of
   whether we need to load the ref_map (which is only
   needed for deallocations!).  the BufferSpace is
   attached to this SharedBlob so we get unified caching
   across clones.

Signed-off-by: Sage Weil <sage@redhat.com>
We could bump the _max value for a TransContext in it's
prepare state, have it wait for a long time on IO, and
let another txc allocate and commit something with
an id higher than the previous max.

Fix this first by pushing the max ids into the
TransContext where we can deal with them at commit time,
and then making _kv_sync_thread bump the committed
max in a safe way.

Note that this will need to change if/when we do
these commits in parallel.

Signed-off-by: Sage Weil <sage@redhat.com>
Only examine the range we just wrote to (and to the left
and right).

Signed-off-by: Sage Weil <sage@redhat.com>
This has to be block_size bits because min_alloc_size
can vary over mounts.

Signed-off-by: Sage Weil <sage@redhat.com>
We need to handle objects written during previous mounts
that may have had a smaller min_alloc_size.  Use
block_size, which is a safe lower bound.

Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas
Copy link
Member Author

liewegas commented Sep 6, 2016

@chhabaramesh

These were taking min_alloc_size, but this can change
across mounts; better to use the logical blob length
instead (that's what we want anyway!).

Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas
Copy link
Member Author

liewegas commented Sep 6, 2016

eh, rolled this into #10963 so i can run test. it's all passing there now.

@liewegas liewegas closed this Sep 6, 2016
@liewegas liewegas deleted the wip-bitmap-granularity branch September 6, 2016 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant