New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
librbd/cache/pwl/ssd: solve competition between read and retire #42580
librbd/cache/pwl/ssd: solve competition between read and retire #42580
Conversation
@MahatiC @idryomov @majianpeng please review it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we check the reader count before releasing data from RAM?[0]
[0]https://github.com/ceph/ceph/blob/master/src/librbd/cache/pwl/AbstractWriteLog.cc#L1293
src/librbd/cache/pwl/ssd/WriteLog.h
Outdated
@@ -136,6 +136,8 @@ class WriteLog : public AbstractWriteLog<ImageCtxT> { | |||
Context *ctx); | |||
void aio_read_data_blocks(std::vector<WriteLogCacheEntry*> &log_entries, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function still needed? I don't think so. The only other place it's used is[0][1]. In both places it WriteLogCacheEntry
can be changed.
[0]https://github.com/ceph/ceph/blob/master/src/librbd/cache/pwl/ssd/WriteLog.cc#L960
[1]https://github.com/ceph/ceph/blob/master/src/librbd/cache/pwl/ssd/WriteLog.cc#L513
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MahatiC yes, the function should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MahatiC this function shouldn't be removed, aio_read_data_block()
keeps raw pointer.
https://github.com/ceph/ceph/blob/master/src/librbd/cache/pwl/ssd/WriteLog.cc#L960 this line don't changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it have to be a raw pointer? Having two different rather long functions do exactly the same thing just on different types of pointers is really undesirable.
Is there a way to refactor this?
hi, @MahatiC About releasing data from RAM before check the reader count, if we don't release it, when will it be release next time? And I don't think this impact the correctness, it might impact the performance. It only release data in RAM, but we can read data from SSD. |
db3ae82
to
949ca0c
Compare
you're right. I don't think this needs to be changed. |
949ca0c
to
0a263fa
Compare
jenkins test make check |
0a263fa
to
b248917
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me, just a couple of nits.
@MahatiC Are you OK with this PR? |
3d0e54a
to
b5b592c
Compare
@hualongfeng I assume @idryomov pending above check, looks good to me. |
@MahatiC I don't understand what you mean? Do I need to write a new unit test to test the PR? Or I need to run the existed unit tests to test the PR? |
run the existing ones |
b5b592c
to
d17a643
Compare
@MahatiC run this? root@ceph-client1:/mnt/ceph/build# ./bin/unittest_librbd --gtest_filter="TestMockCacheSSDWriteLog.*"
seed 2238984
Note: Google Test filter = TestMockCacheSSDWriteLog.*
[==========] Running 16 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 16 tests from TestMockCacheSSDWriteLog
[ RUN ] TestMockCacheSSDWriteLog.init_state_write
Failed to load class: cas (lib/libcls_cas.so): lib/libcls_cas.so: undefined symbol: _Z26cls_get_manifest_ref_countPvNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Failed to load class: cmpomap (lib/libcls_cmpomap.so): lib/libcls_cmpomap.so: undefined symbol: _Z28cls_cxx_map_get_vals_by_keysPvRKSt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessIS6_ESaIS6_EEPSt3mapIS6_N4ceph6buffer7v15_2_04listES8_SaISt4pairIKS6_SH_EEE
Failed to load class: fifo (lib/libcls_fifo.so): lib/libcls_fifo.so: undefined symbol: _Z20cls_gen_random_bytesPci
Failed to load class: log (lib/libcls_log.so): lib/libcls_log.so: undefined symbol: _Z24cls_cxx_map_write_headerPvPN4ceph6buffer7v15_2_04listE
Failed to load class: lua (lib/libcls_lua.so): lib/libcls_lua.so: undefined symbol: _Z19cls_current_versionPv
Failed to load class: rgw (lib/libcls_rgw.so): lib/libcls_rgw.so: undefined symbol: _Z19cls_current_versionPv
Failed to load class: test_remote_reads (lib/libcls_test_remote_reads.so): lib/libcls_test_remote_reads.so: undefined symbol: _Z14cls_cxx_gatherPvRKSt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessIS6_ESaIS6_EERKS6_PKcSG_RN4ceph6buffer7v15_2_04listE
Failed to load class: user (lib/libcls_user.so): lib/libcls_user.so: undefined symbol: _Z24cls_cxx_map_write_headerPvPN4ceph6buffer7v15_2_04listE
[ OK ] TestMockCacheSSDWriteLog.init_state_write (34 ms)
[ RUN ] TestMockCacheSSDWriteLog.init_state_json_write
[ OK ] TestMockCacheSSDWriteLog.init_state_json_write (23 ms)
[ RUN ] TestMockCacheSSDWriteLog.init_shutdown
[ OK ] TestMockCacheSSDWriteLog.init_shutdown (303 ms)
[ RUN ] TestMockCacheSSDWriteLog.write
[ OK ] TestMockCacheSSDWriteLog.write (312 ms)
[ RUN ] TestMockCacheSSDWriteLog.read_hit_ssd_cache
[ OK ] TestMockCacheSSDWriteLog.read_hit_ssd_cache (317 ms)
[ RUN ] TestMockCacheSSDWriteLog.read_hit_part_ssd_cache
[ OK ] TestMockCacheSSDWriteLog.read_hit_part_ssd_cache (312 ms)
[ RUN ] TestMockCacheSSDWriteLog.read_miss_ssd_cache
[ OK ] TestMockCacheSSDWriteLog.read_miss_ssd_cache (312 ms)
[ RUN ] TestMockCacheSSDWriteLog.compare_and_write_compare_matched
[ OK ] TestMockCacheSSDWriteLog.compare_and_write_compare_matched (312 ms)
[ RUN ] TestMockCacheSSDWriteLog.compare_and_write_compare_failed
[ OK ] TestMockCacheSSDWriteLog.compare_and_write_compare_failed (311 ms)
[ RUN ] TestMockCacheSSDWriteLog.writesame
[ OK ] TestMockCacheSSDWriteLog.writesame (312 ms)
[ RUN ] TestMockCacheSSDWriteLog.discard
[ OK ] TestMockCacheSSDWriteLog.discard (317 ms)
[ RUN ] TestMockCacheSSDWriteLog.invalidate
[ OK ] TestMockCacheSSDWriteLog.invalidate (312 ms)
[ RUN ] TestMockCacheSSDWriteLog.flush
[ OK ] TestMockCacheSSDWriteLog.flush (308 ms)
[ RUN ] TestMockCacheSSDWriteLog.flush_source_shutdown
[ OK ] TestMockCacheSSDWriteLog.flush_source_shutdown (308 ms)
[ RUN ] TestMockCacheSSDWriteLog.flush_source_internal
[ OK ] TestMockCacheSSDWriteLog.flush_source_internal (316 ms)
[ RUN ] TestMockCacheSSDWriteLog.flush_source_user
[ OK ] TestMockCacheSSDWriteLog.flush_source_user (331 ms)
[----------] 16 tests from TestMockCacheSSDWriteLog (4440 ms total)
[----------] Global test environment tear-down
[==========] 16 tests from 1 test suite ran. (4492 ms total)
[ PASSED ] 16 tests. |
Dismissing pending aio_read_data_block() solution.
d17a643
to
364a32e
Compare
364a32e
to
d4baa53
Compare
d4baa53
to
5bb990a
Compare
jenkins test make check |
SSD read is not like rwl's. SSD need aio read. Therefore, we cannot guarantee that the data will not be retire during the period from sending the read request to the SSD and receiving the data to the memory, which may cause the corresponding data on the SSD to be overwritten. Fixes: https://tracker.ceph.com/issues/52236 Signed-off-by: Feng Hualong <hualong.feng@intel.com>
5bb990a
to
dfdb452
Compare
jenkins test make check arm64 |
SSD read is not like rwl's. SSD need aio read. Therefore,
we cannot guarantee that the data will not be retire
during the period from sending the read request to the SSD
and receiving the data to the memory, which may cause
the corresponding data on the SSD to be overwritten.
Signed-off-by: Feng Hualong hualong.feng@intel.com
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox