rbd: don't cap the buffer size used in GetImageNames#700
rbd: don't cap the buffer size used in GetImageNames#700mergify[bot] merged 2 commits intoceph:masterfrom
Conversation
|
Code seems reasonable enough, but I'm a bit concerned about the potential unbounded memory use. I understand that you didn't want to change the existing function... especially since you're only changing the legacy "nautilus" version. One thought I have is that you could move the limit value to a global variable, and then alter the limit that way. It's technically a "new api" but it's fully compatible with the existing code. But I wonder if @ansiwen would think it too hacky. :-) I will think about this some more too. |
|
It's not actually just a nautilus change - there aren't any build tags on I'm okay with going the global variable way too, although yeah it is a bit hacky. Let me know what you think after mulling for a few days. |
OK, thank you for pointing that out. I missed that.
Will do. |
|
Any new thoughts on this? |
|
I was waiting for @ansiwen to return from time off to discuss this. Thanks for being patient with us. |
|
Cool, no worries!
…On Thu, Jun 9, 2022 at 10:03 AM John Mulligan ***@***.***> wrote:
I was waiting for @ansiwen <https://github.com/ansiwen> to return from
time off to discuss this. Thanks for being patient with us.
—
Reply to this email directly, view it on GitHub
<#700 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEMGATZRJ4W5WDJTTZOUUETVOH2SDANCNFSM5XVU4DGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
ansiwen
left a comment
There was a problem hiding this comment.
In general I like the change, because I found the retry mechanism over-engineered for a buffer of a few bytes in size. So keeping it simple is good: try with 4096 bytes and then with whatever is required. (I wouldn't call that "unbound", because it's a specific request.) Speaking of simplicity: I'm not a fan of the recursion (in Go code), and would prefer to see a loop with a break. I will not block on that, though, but that would be clearly my preference.
rbd/rbd_nautilus.go
Outdated
| err := getErrorIfNegative(ret) | ||
| if err != nil { | ||
| if err == errRange { | ||
| return getImageNames(ioctx, size) |
There was a problem hiding this comment.
Although I'm heavily using recursion when writing ocaml code, I don't think it's idiomatic in Go code. Go doesn't guarantee tail-call optimisation (although it seems it does have it in some cases), so I think a loop or a goto is a better option for Go. It seems nit-picky, I know, because in most cases it will only get called twice, but it's more like a "Go style hygiene" for other coders that will work on the code later.
There was a problem hiding this comment.
Actually, the go compiler does not tail-call optimization, which means for example that the buffers would all stay allocated in parallel:
% cat tail-call-recursion.go
package main
func f() {
f()
}
func main() {
f()
}
% go run ./tail-call-recursion.go
runtime: goroutine stack exceeds 1000000000-byte limit
runtime: sp=0xc0200e0390 stack=[0xc0200e0000, 0xc0400e0000]
fatal error: stack overflow
runtime stack:
runtime.throw({0x10614e2, 0x10b98a0})
/usr/local/Cellar/go/1.17.5/libexec/src/runtime/panic.go:1198 +0x71
runtime.newstack()
/usr/local/Cellar/go/1.17.5/libexec/src/runtime/stack.go:1088 +0x5ac
runtime.morestack()
/usr/local/Cellar/go/1.17.5/libexec/src/runtime/asm_amd64.s:461 +0x8b
goroutine 1 [running]:
main.f()
/Users/svanders/tmp/go/tail-call-recursion.go:3 +0x26 fp=0xc0200e03a0 sp=0xc0200e0398 pc=0x1054cc6
main.f()
/Users/svanders/tmp/go/tail-call-recursion.go:4 +0x17 fp=0xc0200e03b0 sp=0xc0200e03a0 pc=0x1054cb7
main.f()
/Users/svanders/tmp/go/tail-call-recursion.go:4 +0x17 fp=0xc0200e03c0 sp=0xc0200e03b0 pc=0x1054cb7
main.f()
...
ansiwen
left a comment
There was a problem hiding this comment.
That looks better, thanks!
|
I think the pacific failure is a flakey test (the failure is in cephfs/admin), can you re-run it? |
|
@Mergifyio rebase |
This removes the limit on the max buffer size GetImageNames is willing to pass to rbd_list2, which is somewhat arbitrary and is too small for large clusters. GetImageNames will continue to start with a small buffer size and retry with a larger buffer if rbd_list2 returns ERANGE (just without a cap on the max buffer size it's willing to go to). Signed-off-by: Sanford Miller <smiller@digitalocean.com>
This is done because using a for loop is more idiomatic in Go code. Signed-off-by: Sanford Miller <smiller@digitalocean.com>
✅ Branch has been successfully rebased |
ec53a8b to
02ded3f
Compare
phlogistonjohn
left a comment
There was a problem hiding this comment.
While I do still think it would have been a lot less work to simply bump up the WithSizes maximum, this is fine with me.
|
I plan to have this land in the v0.16.0 release to be created today. This is about as last minute as we get. To use a sports metaphor you got this one in right before the buzzer. :-D |
This removes the limit on the max buffer size
GetImageNamesis willing to pass torbd_list2, which is somewhat arbitrary and is too small for large clusters.GetImageNameswill continue to start with a small buffer size and retry with a larger buffer ifrbd_list2returns ERANGE (just without a cap on the max buffer size it's willing to go to).I considered creating a new API method in which max buffer size is configurable, but ultimately it seemed much simpler/cleaner to me to just keep the current method and kill the cap entirely. My team has experimented with a no-cap version of
GetImageNames(since the capped one fails on our clusters which have many thousands of volumes), and it hasn't had any issues. However, if there is a subset of users for whom having the cap is important, happy to take another approach.Checklist