Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

librbd/cache/pwl/ssd/WriteLog: decrement m_bytes_allocated when retiring #41068

Merged
merged 6 commits into from May 3, 2021

Conversation

idryomov
Copy link
Contributor

Currently if ssd cache is filled to capacity, all future I/O hangs
indefinitely because even though the cache eventually becomes clean
and retires enough entries to get back under RETIRE_HIGH_WATER, this
isn't communicated to AbstractWriteLog::check_allocation().

Fixes: https://tracker.ceph.com/issues/50560
Signed-off-by: Ilya Dryomov idryomov@gmail.com

@idryomov
Copy link
Contributor Author

cc @MahatiC

@idryomov
Copy link
Contributor Author

cc @CongMinYin

@@ -708,6 +708,8 @@ bool WriteLog<I>::retire_entries(const unsigned long int frees_per_tx) {
m_first_valid_entry = first_valid_entry;
ceph_assert(m_first_valid_entry % MIN_WRITE_ALLOC_SSD_SIZE == 0);
this->m_free_log_entries += retiring_entries.size();
ceph_assert(this->m_bytes_allocated >= allocated_bytes);
this->m_bytes_allocated -= allocated_bytes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic does exist in RWL, but it is omitted here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the comment. Do you mean that it is omitted intentionally?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not omitted intentionally. This PR looks correct. I actually noticed this issue and fixed it locally along with a few more fixes and haven't upstreamed them yet. And I am currently running some tests as well. I will review this change once again by tomorrow. Thank you for this PR!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I have another PR for the ssd mode almost ready -- fixing power cycle issues. Should be able to submit by tomorrow. I'll CC you, thanks for taking a look!

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Currently if ssd cache is filled to capacity, all future I/O hangs
indefinitely because even though the cache eventually becomes clean
and retires enough entries to get back under RETIRE_HIGH_WATER, this
isn't communicated to AbstractWriteLog::check_allocation().

Fixes: https://tracker.ceph.com/issues/50560
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Follow rwl mode and use AbstractWriteLog::m_bytes_allocated_cap
instead of m_log_pool_ring_buffer_size specific to ssd.  This fixes
"bytes available" calculation in STATS output.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
…evel

Bump "Waiting for allocation" to 5.

"Retiring" is at 20 for rwl and 1 for ssd.  Bump the latter to 20 as
well.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
While at it, reduce the number of calls to operator<< and drop
the trailing comma.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
@idryomov
Copy link
Contributor Author

Added a fix for an existing free()/delete mismatch.

@MahatiC
Copy link
Contributor

MahatiC commented Apr 30, 2021

Thanks! Looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants