Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore: detect unnecessary zeroes in bufferlist when data is written #43337

Merged
merged 3 commits into from
Jan 3, 2022

Conversation

ljflores
Copy link
Contributor

@ljflores ljflores commented Sep 28, 2021

Bluestore's _do_write() method handles writing data from bufferlists. Currently, it writes data from bufferlists without checking it first for zeros. The lack zero detection may negatively impact performance.

The _do_write() method handles writing both small and large chunks of data with _do_write_small() and _do_write_large() respectively.

In do_write_small(), we check if a bufferlist is made up of zeros and avoid writing it. A new counter, zero_block_count_small, helps us keep track of how many zero blocks have been skipped in _do_write_small().

Similarly, in do_write_big(), we check if a bufferlist is made up of zeros and avoid writing it. A new counter, zero_block_count_big, helps us keep track of how many zero blocks have been skipped in _do_write_big().

Trello card: https://trello.com/c/qDn0jspb/754-bluestore-zero-block-detection
Signed-off-by: Laura Flores lflores@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@ljflores
Copy link
Contributor Author

@ljflores
Copy link
Contributor Author

jenkins test api

@ljflores
Copy link
Contributor Author

jenkins retest this please

@ljflores ljflores force-pushed the wip-bluestore-zero-detection branch 3 times, most recently from 0583152 to 5c87be9 Compare October 27, 2021 15:25
@ljflores
Copy link
Contributor Author

jenkins test make check

@ljflores ljflores marked this pull request as ready for review October 27, 2021 22:43
@ljflores ljflores changed the title [WIP] os/bluestore: detect unnecessary zeroes in bufferlist when data is written os/bluestore: detect unnecessary zeroes in bufferlist when data is written Oct 27, 2021
@ljflores
Copy link
Contributor Author

@aclamk @ifed01 @benhanokh I'd love if you could take a look at this PR when you have time. I've cleaned up the changes so they're divided between two distinct commits.

  • Commit 5c87be9 addresses the simpler implementation of "zero detection" in _do_write_small(). In do_write_small(), small chunks of data are written, so I simply check if a bufferlist is made up of zeroes and avoid writing it. No matter if the metadata is written or not, I logically extend the space with o->extent_map.punch_hole (I believe that is its intended functionality).
  • Commit b3e2c6a addresses the fine-grain implementation of "zero detection" in _do_write_big(). In do_write_big(), large chunks of data are written, so I go a step further and employ a fine-grain approach to the detection of zeroes. By scanning through the data in bufferlist and dividing it into chunks of length min_alloc_size, I check these smaller chunks for zeroes in an effort to catch even more unnecessary data.

To test the first commit, I launched a vstart cluster (RGW=2 ../src/vstart.sh --debug --new -x --localhost --bluestore) to check the cluster health, try out some commands, etc. The test cluster launched fine, the health was OK, and I was able to dump the metadata from each osd with ceph osd metadata {osd_id}. Upon pushing this commit, all checks including make check and api passed except for the Pull Request Needs Rebase? check.

To test the second commit, I also launched a vstart cluster with the same results as above (health was OK, I could dump the metadata, etc). In addition, I compared the contents of the bufferlist prior to scanning (called t in the source file) to the bufferlist after scanning (called non_zeroes in the source file), and the content appears to match except for blocks of zeroes that were removed from bufferlist t after scanning. (I have a log file of these results, but it's so large that I haven't figured out how to share/upload it yet). This commit, however, fails the api test due to problems with the metadata.
I see several lines from the mgr log in the Jenkins api test output similar to this:
2021-10-27T21:51:29.501+0000 7f416931e700 1 mgr finish mon failed to return metadata for osd.0: (2) No such file or directory
which indicates that there have been problems introduced after the fine-grain implementation.

One last check I did for the fine-grain implementation was to comment out line 14276 in the second commit (wctx->write(offset, b, l, b_off, non_zeroes, b_off, l, false, new_blob);) and launch a vstart cluster. The cluster was not able to lauch, which implies to me that the metadata is getting written correctly to some degree.

Let me know your thoughts.. I'd appreciate any feedback!

@aclamk aclamk self-requested a review October 28, 2021 14:41
@neha-ojha
Copy link
Member

jenkins test api

Copy link
Contributor

@ifed01 ifed01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly encourage to add a UT to store_test suite to make sure the new logic works as expected. Additionally it might be helpful to introduce a perf counter (which can be used in this new UT as well) to track the amount of blocks skipped due to zero detection)

src/os/bluestore/BlueStore.cc Outdated Show resolved Hide resolved
src/os/bluestore/BlueStore.cc Outdated Show resolved Hide resolved
src/os/bluestore/BlueStore.cc Show resolved Hide resolved
@ljflores
Copy link
Contributor Author

jenkins test api

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@ljflores
Copy link
Contributor Author

jenkins test make check

@ljflores
Copy link
Contributor Author

ljflores commented Dec 3, 2021

jenkins test make check arm64

Copy link
Contributor

@ifed01 ifed01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally LGTM, just some nits which are up to @ljflores to fix or not.

src/os/bluestore/BlueStore.cc Show resolved Hide resolved
src/os/bluestore/BlueStore.cc Outdated Show resolved Hide resolved
src/os/bluestore/BlueStore.cc Outdated Show resolved Hide resolved
src/test/objectstore/store_test.cc Outdated Show resolved Hide resolved
src/test/objectstore/store_test.cc Outdated Show resolved Hide resolved
@ljflores
Copy link
Contributor Author

generally LGTM, just some nits which are up to @ljflores to fix or not.

Those suggestions make sense-- I can fix them. Thanks for the careful reviews, @ifed01!

@ljflores ljflores force-pushed the wip-bluestore-zero-detection branch 2 times, most recently from c5ce66b to 1dc7301 Compare December 21, 2021 23:13
@ljflores
Copy link
Contributor Author

jenkins test make check arm64

@ljflores
Copy link
Contributor Author

@aclamk let me know if you have any final comments before I mark this as "needs-qa".

Copy link
Contributor

@aclamk aclamk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work. Minor details in douts and comments.

src/os/bluestore/BlueStore.cc Outdated Show resolved Hide resolved
src/os/bluestore/BlueStore.cc Outdated Show resolved Hide resolved
src/os/bluestore/BlueStore.cc Outdated Show resolved Hide resolved
src/os/bluestore/BlueStore.cc Outdated Show resolved Hide resolved
src/os/bluestore/BlueStore.cc Outdated Show resolved Hide resolved
@ljflores
Copy link
Contributor Author

Failures, unrelated:

6580187, 6580436 -- https://tracker.ceph.com/issues/52124
6580226, 6580440 -- https://tracker.ceph.com/issues/38455
6580242-- https://tracker.ceph.com/issues/53499
6580330 -- https://tracker.ceph.com/issues/53681
6580439 -- https://tracker.ceph.com/issues/53723
6580078, 6580296 -- https://tracker.ceph.com/issues/53424
6580192 -- https://tracker.ceph.com/issues/51076

Details:

Bug_#52124: Invalid read of size 8 in handle_recovery_delete() - Ceph - RADOS
Bug_#38455: Test failure: test_module_commands (tasks.mgr.test_module_selftest.TestModuleSelftest): RuntimeError: Synthetic exception in serve - Ceph - Mgr
Bug_#53499: test_dashboard_e2e.sh Failure: orchestrator/02-hosts-inventory.e2e failed. - Ceph - Mgr - Dashboard
Bug_#53681: Failed to extract uid/gid for path /var/lib/ceph - Ceph - Orchestrator
Bug_#53723: Cephadm agent fails to report and causes a health timeout - Ceph - Orchestrator
Bug_#53424: CEPHADM_DAEMON_PLACE_FAIL in orch:cephadm/mgr-nfs-upgrade/ - Ceph - Orchestrator
Bug_#51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC backend. - Ceph - RADOS

@yuriw
Copy link
Contributor

yuriw commented Dec 23, 2021

this was tested ref: https://trello.com/c/im0LpXz9

Bluestore's `_do_write()` method handles writing data from bufferlists. Currently, it writes data from bufferlists without checking for unnecessary zeros. The lack zero detection may negatively impact performance.

In _do_write_small(), we check if a bufferlist is made up of zeros and avoid writing it if so.
Two new counters, `l_bluestore_write_small_skipped` and l_bluestore_write_small_skipped_bytes`, have been introduced to help us count how many zero blocks and bytes from _do_write_small() have been skipped.

Signed-off-by: Laura Flores <lflores@redhat.com>
Bluestore's `_do_write()` method handles writing data from bufferlists. Currently, it writes data from bufferlists without checking for unnecessary zeros. The lack zero detection may negatively impact performance.

In _do_write_big, we also check if a bufferlist is made up of zeros and avoid writing it if so.
Two new counters, `l_bluestore_write_big_skipped_blobs` and `l_bluestore_write_big_skipped_bytes`, have been introduced to help us count how many zero blocks and bytes from _do_write_big() have been skipped.

Signed-off-by: Laura Flores <lflores@redhat.com>
Unit tests include ZeroBlockDetectionSmall{Append/Overwrite} and ZeroBlockDetectionBig{Append/Overwrite}.

Signed-off-by: Laura Flores <lflores@redhat.com>
@ljflores
Copy link
Contributor Author

ljflores commented Jan 3, 2022

Will merge this since it has been tested/approved.

@ljflores ljflores merged commit fa20cb8 into ceph:master Jan 3, 2022
@ljflores ljflores deleted the wip-bluestore-zero-detection branch January 3, 2022 15:16
sseshasa added a commit to sseshasa/ceph that referenced this pull request Feb 22, 2022
An optimization (see PR: ceph#43337) was made
in BlueStore to avoid writing bufferlists made up of zeros. The osd
benchmark used zero filled bufferlists and this resulted in bloated osd
benchmark results.

This issue is fixed by using bufferlists filled with non-zero values.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
sseshasa added a commit to sseshasa/ceph that referenced this pull request Feb 22, 2022
An optimization (see PR: ceph#43337) was made
in BlueStore to avoid writing bufferlists made up of zeros. The osd
benchmark used zero filled bufferlists and this resulted in inflated osd
benchmark results.

This issue is fixed by using bufferlists filled with non-zero values.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Fixes: https://tracker.ceph.com/issues/54364
sseshasa added a commit to sseshasa/ceph that referenced this pull request Feb 24, 2022
An optimization (see PR: ceph#43337) was made
in BlueStore to avoid writing bufferlists made up of zeros. The osd
benchmark used zero filled bufferlists and this resulted in inflated osd
benchmark results.

This issue is fixed by using bufferlists filled with non-zero values.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Fixes: https://tracker.ceph.com/issues/54364
(cherry picked from commit 09f94ac)
nSedrickm pushed a commit to nSedrickm/ceph that referenced this pull request Mar 21, 2022
An optimization (see PR: ceph#43337) was made
in BlueStore to avoid writing bufferlists made up of zeros. The osd
benchmark used zero filled bufferlists and this resulted in inflated osd
benchmark results.

This issue is fixed by using bufferlists filled with non-zero values.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Fixes: https://tracker.ceph.com/issues/54364
zalsader pushed a commit to zalsader/ceph that referenced this pull request Apr 11, 2022
An optimization (see PR: ceph#43337) was made
in BlueStore to avoid writing bufferlists made up of zeros. The osd
benchmark used zero filled bufferlists and this resulted in inflated osd
benchmark results.

This issue is fixed by using bufferlists filled with non-zero values.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Fixes: https://tracker.ceph.com/issues/54364
@idryomov
Copy link
Contributor

idryomov commented May 3, 2022

This broke rbd create --thick-provision and in general quite a big semantic change. Punching holes is definitely not what was proposed in https://bugzilla.redhat.com/show_bug.cgi?id=1924129 -- we envisioned a moral equivalent of fallocate with mode=FALLOC_FL_ZERO_RANGE. I filed https://tracker.ceph.com/issues/55521 to track this regression.

@ljflores
Copy link
Contributor Author

ljflores commented May 3, 2022

This broke rbd create --thick-provision and in general quite a big semantic change. Punching holes is definitely not what was proposed in https://bugzilla.redhat.com/show_bug.cgi?id=1924129 -- we envisioned a moral equivalent of fallocate with mode=FALLOC_FL_ZERO_RANGE. I filed https://tracker.ceph.com/issues/55521 to track this regression.

Thanks for bringing this up, @idryomov. I'll have a look.

dpaganel pushed a commit to dpaganel/ceph that referenced this pull request May 17, 2022
An optimization (see PR: ceph#43337) was made
in BlueStore to avoid writing bufferlists made up of zeros. The osd
benchmark used zero filled bufferlists and this resulted in inflated osd
benchmark results.

This issue is fixed by using bufferlists filled with non-zero values.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Fixes: https://tracker.ceph.com/issues/54364
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants