os/bluestore: use block_size for bitmap granularity #10999

liewegas · 2016-09-06T19:29:41Z

No description provided.

Our transaction writes are labeled with a seq and uuid to avoid replaying over garbage. Two bugs, one real, one potential. 1) The second async compaction transactoin didn't have its seq and uuid set, so replay always stopped. 2) We were writing two separate transactions, one with all the new metadata, and the next one with a jump to the new log offset. If the first write completed but it was torn and the second transaction didn't hit disk, we might see an old transaction with seq == 2 and the same uuid and replay that instead. Fix both of these by making the async log txn one single transaction that jumps directly to the new log offset. Signed-off-by: Sage Weil <sage@redhat.com>

(they use + instead of ~) Signed-off-by: Sage Weil <sage@redhat.com>

Signed-off-by: Sage Weil <sage@redhat.com>

Rewrote much of the persistence of onode metadata. The highlights: - extents and blobs stored together (the blob with the first referencing extent). - extents sharded across multiple k/v keys - if a blob if referenced from multiple blobs, it's stored in the onode key (called a "spanning blob"). - when we clone a blob we copy the metadata, but mark it shared and put (just) the ref_map on the underlying blocks in a shared_blob key. at this point we also assign a globally unique id (sbid = shared blob id) so the key has a unique name. - we instantiate a SharedBlob in memory regardless of whether we need to load the ref_map (which is only needed for deallocations!). the BufferSpace is attached to this SharedBlob so we get unified caching across clones. Signed-off-by: Sage Weil <sage@redhat.com>

We could bump the _max value for a TransContext in it's prepare state, have it wait for a long time on IO, and let another txc allocate and commit something with an id higher than the previous max. Fix this first by pushing the max ids into the TransContext where we can deal with them at commit time, and then making _kv_sync_thread bump the committed max in a safe way. Note that this will need to change if/when we do these commits in parallel. Signed-off-by: Sage Weil <sage@redhat.com>

Only examine the range we just wrote to (and to the left and right). Signed-off-by: Sage Weil <sage@redhat.com>

This has to be block_size bits because min_alloc_size can vary over mounts. Signed-off-by: Sage Weil <sage@redhat.com>

We need to handle objects written during previous mounts that may have had a smaller min_alloc_size. Use block_size, which is a safe lower bound. Signed-off-by: Sage Weil <sage@redhat.com>

liewegas · 2016-09-06T19:29:55Z

@chhabaramesh

These were taking min_alloc_size, but this can change across mounts; better to use the logical blob length instead (that's what we want anyway!). Signed-off-by: Sage Weil <sage@redhat.com>

liewegas · 2016-09-06T21:59:18Z

eh, rolled this into #10963 so i can run test. it's all passing there now.

liewegas added 10 commits September 6, 2016 10:41

scripts/bdev_grep: parse bluefs style extents too

3925506

(they use + instead of ~) Signed-off-by: Sage Weil <sage@redhat.com>

ceph_test_objectstore: add SyntheticMatrixSharding

f79901d

Signed-off-by: Sage Weil <sage@redhat.com>

ceph_test_objectstore: occasional umount/fsck/mount

635fcc2

Signed-off-by: Sage Weil <sage@redhat.com>

ceph_test_objectstore: test shards for longer

81a102c

Signed-off-by: Sage Weil <sage@redhat.com>

os/bluestore: optimize compress_extent_map

0160c6a

Only examine the range we just wrote to (and to the left and right). Signed-off-by: Sage Weil <sage@redhat.com>

os/bluestore: fix fsck used_block bitmap

33b8661

This has to be block_size bits because min_alloc_size can vary over mounts. Signed-off-by: Sage Weil <sage@redhat.com>

os/bluestore: use block_size for allocator unit

aa1710b

We need to handle objects written during previous mounts that may have had a smaller min_alloc_size. Use block_size, which is a safe lower bound. Signed-off-by: Sage Weil <sage@redhat.com>

liewegas added the bluestore label Sep 6, 2016

liewegas mentioned this pull request Sep 6, 2016

os/bluestore: shard extent map #10963

Merged

os/bluestore: make blob_t unused helpers use logical length

4e9fdef

These were taking min_alloc_size, but this can change across mounts; better to use the logical blob length instead (that's what we want anyway!). Signed-off-by: Sage Weil <sage@redhat.com>

liewegas closed this Sep 6, 2016

liewegas deleted the wip-bitmap-granularity branch September 6, 2016 21:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

os/bluestore: use block_size for bitmap granularity #10999

os/bluestore: use block_size for bitmap granularity #10999

liewegas commented Sep 6, 2016

liewegas commented Sep 6, 2016

liewegas commented Sep 6, 2016

os/bluestore: use block_size for bitmap granularity #10999

os/bluestore: use block_size for bitmap granularity #10999

Conversation

liewegas commented Sep 6, 2016

liewegas commented Sep 6, 2016

liewegas commented Sep 6, 2016