Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rowexec: release buckets from hash aggregator eagerly #47466

Merged
merged 2 commits into from Apr 15, 2020

Conversation

yuzefovich
Copy link
Member

rowexec: release buckets from hash aggregator eagerly

This commit makes hash aggregator release the memory under buckets
eagerly (once we're done with the bucket) so that it is returned to the
system. This can matter a lot when we have large number of buckets (on
the order of 100k). Previously, this would happen only on the flow
shutdown, once we're losing the references to hashAggregator
processor. But it was problematic - we "released" the associated memory
from the memory accounting, yet we were holding the references still.
With this commit we will reduce the memory footprint and we'll be a lot
closer to what our memory accounting thinks we're using.

Fixes: #47205.

Release note: None

roachtest: bump number of warehouses in alterpk-tpcc

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)


pkg/sql/rowexec/aggregator.go, line 665 at r1 (raw file):

	// This will allow us to return the memory to the system before the hash
	// aggregator is fully done (which matters when we have many buckets).
	state, row, meta := ag.getAggResults(ag.buckets[bucket])

This implies that we're only accounting for the size of the keys in ag.buckets. Should we also be accounting for the size of values? It's unclear to me whether you're worried about being so close to the edge in the previous 19.2 point release. Could a crash have occurred there with the previous struct definition?

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/rowexec/aggregator.go, line 665 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

This implies that we're only accounting for the size of the keys in ag.buckets. Should we also be accounting for the size of values? It's unclear to me whether you're worried about being so close to the edge in the previous 19.2 point release. Could a crash have occurred there with the previous struct definition?

Why do you say that it implies accounting only for keys? The keys here are strings, they are accounted for via stringarena, and the values are aggregateFuncs which are accounted for when they are created and when we add things to them.

Yes, I'm worried in general that we do not seem to respect --max-sql-memory (which is 3.7GiB in the tests I ran) very closely, but it just highlights the difficulty of memory accounting and that a simple bug like this one PR is fixing (releasing memory from account but not actually freeing it up) can have bad consequences, especially when disk spilling is missing.

With this PR, though, the memory usage is a lot better. In a run on 19.2 branch plus this PR two nodes never exceed 3.7GiB of Go Alloc and the logs from the third are:

I200414 15:42:58.122232 47 server/status/runtime.go:498  [n3] runtime stats: 4.2 GiB RSS, 251 goroutines, 1.9 GiB/117 MiB/2.1 GiB GO alloc/idle/total, 2.0 GiB/2.5 GiB CGO alloc/total, 856.0 CGO/sec, 101.7/3.6 %(u/s)time, 0.0 %gc (0x), 513 KiB/281 KiB (r/w)net
I200414 15:43:08.122052 47 server/status/runtime.go:498  [n3] runtime stats: 4.6 GiB RSS, 251 goroutines, 1.9 GiB/120 MiB/2.1 GiB GO alloc/idle/total, 2.2 GiB/2.6 GiB CGO alloc/total, 770.8 CGO/sec, 171.9/6.6 %(u/s)time, 0.0 %gc (1x), 801 KiB/302 KiB (r/w)net
I200414 15:43:18.122144 47 server/status/runtime.go:498  [n3] runtime stats: 5.6 GiB RSS, 251 goroutines, 2.7 GiB/55 MiB/2.9 GiB GO alloc/idle/total, 2.4 GiB/2.9 GiB CGO alloc/total, 20905.7 CGO/sec, 115.9/9.4 %(u/s)time, 0.0 %gc (0x), 3.2 MiB/528 KiB (r/w)net
I200414 15:43:28.122136 47 server/status/runtime.go:498  [n3] runtime stats: 6.3 GiB RSS, 252 goroutines, 2.4 GiB/762 MiB/3.5 GiB GO alloc/idle/total, 2.5 GiB/3.0 GiB CGO alloc/total, 712.9 CGO/sec, 245.3/15.0 %(u/s)time, 0.0 %gc (1x), 8.7 MiB/799 KiB (r/w)net
I200414 15:43:38.122474 47 server/status/runtime.go:498  [n3] runtime stats: 6.8 GiB RSS, 250 goroutines, 3.5 GiB/28 MiB/3.7 GiB GO alloc/idle/total, 2.7 GiB/3.3 GiB CGO alloc/total, 780.7 CGO/sec, 138.8/14.4 %(u/s)time, 0.0 %gc (0x), 8.4 MiB/1006 KiB (r/w)net
I200414 15:43:49.140758 47 server/status/runtime.go:498  [n3] runtime stats: 7.8 GiB RSS, 251 goroutines, 3.5 GiB/28 MiB/3.7 GiB GO alloc/idle/total(stale), 2.8 GiB/3.4 GiB CGO alloc/total, 1090.9 CGO/sec, 278.7/17.9 %(u/s)time, 0.0 %gc (0x), 12 MiB/1022 KiB (r/w)net
I200414 15:43:58.134800 47 server/status/runtime.go:498  [n3] runtime stats: 8.2 GiB RSS, 247 goroutines, 4.6 GiB/614 MiB/5.0 GiB GO alloc/idle/total, 3.0 GiB/3.6 GiB CGO alloc/total, 806.8 CGO/sec, 171.7/9.5 %(u/s)time, 0.0 %gc (1x), 5.6 MiB/728 KiB (r/w)net
I200414 15:44:08.134696 47 server/status/runtime.go:498  [n3] runtime stats: 9.3 GiB RSS, 247 goroutines, 5.6 GiB/135 MiB/5.9 GiB GO alloc/idle/total, 3.0 GiB/3.6 GiB CGO alloc/total, 96.3 CGO/sec, 170.7/38.7 %(u/s)time, 0.0 %gc (0x), 1.6 MiB/11 MiB (r/w)net
I200414 15:44:19.135360 47 server/status/runtime.go:498  [n3] runtime stats: 10 GiB RSS, 248 goroutines, 5.6 GiB/135 MiB/5.9 GiB GO alloc/idle/total(stale), 3.0 GiB/3.6 GiB CGO alloc/total, 990.2 CGO/sec, 298.4/31.7 %(u/s)time, 0.0 %gc (0x), 1.7 MiB/14 MiB (r/w)net
I200414 15:44:28.134794 47 server/status/runtime.go:498  [n3] runtime stats: 10 GiB RSS, 248 goroutines, 4.6 GiB/1.5 GiB/7.0 GiB GO alloc/idle/total, 3.1 GiB/3.7 GiB CGO alloc/total, 20999.8 CGO/sec, 283.8/33.1 %(u/s)time, 0.0 %gc (1x), 1.6 MiB/14 MiB (r/w)net
I200414 15:44:38.134963 47 server/status/runtime.go:498  [n3] runtime stats: 10 GiB RSS, 238 goroutines, 6.0 GiB/151 MiB/7.0 GiB GO alloc/idle/total, 3.2 GiB/3.8 GiB CGO alloc/total, 15133.9 CGO/sec, 245.4/28.4 %(u/s)time, 0.0 %gc (0x), 13 MiB/19 MiB (r/w)net
I200414 15:44:48.137490 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 238 goroutines, 7.5 GiB/39 MiB/8.3 GiB GO alloc/idle/total, 3.2 GiB/3.9 GiB CGO alloc/total, 1003.9 CGO/sec, 240.8/36.7 %(u/s)time, 0.0 %gc (0x), 17 MiB/17 MiB (r/w)net
I200414 15:44:58.137485 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 226 goroutines, 2.4 GiB/4.2 GiB/8.6 GiB GO alloc/idle/total, 3.2 GiB/3.9 GiB CGO alloc/total, 96631.7 CGO/sec, 236.4/22.6 %(u/s)time, 0.0 %gc (1x), 8.9 MiB/7.0 MiB (r/w)net
I200414 15:45:08.137710 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 223 goroutines, 3.4 GiB/3.2 GiB/8.6 GiB GO alloc/idle/total, 3.1 GiB/3.9 GiB CGO alloc/total, 242809.7 CGO/sec, 136.6/12.2 %(u/s)time, 0.0 %gc (0x), 2.9 MiB/2.7 MiB (r/w)net
I200414 15:45:18.137807 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 223 goroutines, 2.3 GiB/5.9 GiB/8.6 GiB GO alloc/idle/total, 3.2 GiB/3.9 GiB CGO alloc/total, 223054.6 CGO/sec, 170.5/28.3 %(u/s)time, 0.0 %gc (1x), 4.7 MiB/27 MiB (r/w)net
I200414 15:45:28.137976 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 224 goroutines, 1.9 GiB/6.2 GiB/8.6 GiB GO alloc/idle/total, 3.2 GiB/3.9 GiB CGO alloc/total, 239906.7 CGO/sec, 163.6/27.7 %(u/s)time, 0.0 %gc (1x), 5.1 MiB/32 MiB (r/w)net
I200414 15:45:38.138027 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 224 goroutines, 2.8 GiB/5.4 GiB/8.6 GiB GO alloc/idle/total, 3.2 GiB/3.9 GiB CGO alloc/total, 169585.8 CGO/sec, 146.0/29.4 %(u/s)time, 0.0 %gc (0x), 4.3 MiB/34 MiB (r/w)net
I200414 15:45:48.138093 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 224 goroutines, 1.9 GiB/6.3 GiB/8.6 GiB GO alloc/idle/total, 3.3 GiB/4.0 GiB CGO alloc/total, 204666.9 CGO/sec, 162.9/24.3 %(u/s)time, 0.0 %gc (1x), 2.9 MiB/20 MiB (r/w)net
I200414 15:45:58.138029 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 223 goroutines, 2.5 GiB/5.6 GiB/8.6 GiB GO alloc/idle/total, 3.4 GiB/4.0 GiB CGO alloc/total, 219946.8 CGO/sec, 141.7/27.1 %(u/s)time, 0.0 %gc (0x), 2.3 MiB/28 MiB (r/w)net
I200414 15:46:08.138453 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 223 goroutines, 1.6 GiB/6.6 GiB/8.6 GiB GO alloc/idle/total, 3.4 GiB/4.1 GiB CGO alloc/total, 81249.6 CGO/sec, 149.7/20.9 %(u/s)time, 0.0 %gc (1x), 2.6 MiB/30 MiB (r/w)net
I200414 15:46:18.138610 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 223 goroutines, 2.3 GiB/5.8 GiB/8.6 GiB GO alloc/idle/total, 3.5 GiB/4.1 GiB CGO alloc/total, 158729.9 CGO/sec, 133.6/31.5 %(u/s)time, 0.0 %gc (0x), 3.5 MiB/27 MiB (r/w)net
I200414 15:46:28.825796 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 224 goroutines, 2.9 GiB/5.3 GiB/8.6 GiB GO alloc/idle/total, 3.5 GiB/4.1 GiB CGO alloc/total, 99344.2 CGO/sec, 148.1/23.7 %(u/s)time, 0.0 %gc (1x), 2.9 MiB/38 MiB (r/w)net
I200414 15:46:38.141772 47 server/status/runtime.go:498  [n3] runtime stats: 13 GiB RSS, 224 goroutines, 2.1 GiB/6.1 GiB/8.6 GiB GO alloc/idle/total, 3.5 GiB/4.2 GiB CGO alloc/total, 136691.1 CGO/sec, 177.0/28.2 %(u/s)time, 0.0 %gc (0x), 3.5 MiB/29 MiB (r/w)net
I200414 15:46:48.141377 47 server/status/runtime.go:498  [n3] runtime stats: 13 GiB RSS, 224 goroutines, 1.8 GiB/6.3 GiB/8.6 GiB GO alloc/idle/total, 3.5 GiB/4.2 GiB CGO alloc/total, 243149.8 CGO/sec, 162.0/29.3 %(u/s)time, 0.0 %gc (1x), 5.3 MiB/35 MiB (r/w)net
I200414 15:46:58.740523 47 server/status/runtime.go:498  [n3] runtime stats: 13 GiB RSS, 223 goroutines, 2.9 GiB/5.3 GiB/8.6 GiB GO alloc/idle/total, 3.6 GiB/4.2 GiB CGO alloc/total, 240568.2 CGO/sec, 159.8/26.8 %(u/s)time, 0.0 %gc (1x), 5.7 MiB/574 KiB (r/w)net
I200414 15:47:08.155447 47 server/status/runtime.go:498  [n3] runtime stats: 13 GiB RSS, 223 goroutines, 2.4 GiB/5.7 GiB/8.6 GiB GO alloc/idle/total, 3.6 GiB/4.3 GiB CGO alloc/total, 370347.3 CGO/sec, 157.0/31.7 %(u/s)time, 0.0 %gc (0x), 3.3 MiB/465 KiB (r/w)net
I200414 15:47:18.155478 47 server/status/runtime.go:498  [n3] runtime stats: 13 GiB RSS, 221 goroutines, 1.9 GiB/6.3 GiB/8.6 GiB GO alloc/idle/total, 3.6 GiB/4.3 GiB CGO alloc/total, 197441.0 CGO/sec, 162.6/30.7 %(u/s)time, 0.0 %gc (1x), 2.7 MiB/428 KiB (r/w)net
I200414 15:47:28.155154 47 server/status/runtime.go:498  [n3] runtime stats: 13 GiB RSS, 214 goroutines, 2.0 GiB/6.1 GiB/8.6 GiB GO alloc/idle/total, 3.6 GiB/4.3 GiB CGO alloc/total, 22573.7 CGO/sec, 32.4/4.7 %(u/s)time, 0.0 %gc (0x), 568 KiB/338 KiB (r/w)net

True, there is a spike up to 7.5 GiB which is 2x of our --max-sql-memory, but at that point we're doing a hash join with one input being the hash aggregator (so it cannot release it's memory yet) and another being the table reader (so the allocations are below SQL layer). Nonetheless, it is a dramatic improvement to what we had before.

I believe it was quite a lucky coincidence that we were not hitting the crash on 19.2.5 and are hitting it with the addition of those structs in aggregate builtins.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/rowexec/aggregator.go, line 665 at r1 (raw file):

Previously, yuzefovich wrote…

Why do you say that it implies accounting only for keys? The keys here are strings, they are accounted for via stringarena, and the values are aggregateFuncs which are accounted for when they are created and when we add things to them.

Yes, I'm worried in general that we do not seem to respect --max-sql-memory (which is 3.7GiB in the tests I ran) very closely, but it just highlights the difficulty of memory accounting and that a simple bug like this one PR is fixing (releasing memory from account but not actually freeing it up) can have bad consequences, especially when disk spilling is missing.

With this PR, though, the memory usage is a lot better. In a run on 19.2 branch plus this PR two nodes never exceed 3.7GiB of Go Alloc and the logs from the third are:

I200414 15:42:58.122232 47 server/status/runtime.go:498  [n3] runtime stats: 4.2 GiB RSS, 251 goroutines, 1.9 GiB/117 MiB/2.1 GiB GO alloc/idle/total, 2.0 GiB/2.5 GiB CGO alloc/total, 856.0 CGO/sec, 101.7/3.6 %(u/s)time, 0.0 %gc (0x), 513 KiB/281 KiB (r/w)net
I200414 15:43:08.122052 47 server/status/runtime.go:498  [n3] runtime stats: 4.6 GiB RSS, 251 goroutines, 1.9 GiB/120 MiB/2.1 GiB GO alloc/idle/total, 2.2 GiB/2.6 GiB CGO alloc/total, 770.8 CGO/sec, 171.9/6.6 %(u/s)time, 0.0 %gc (1x), 801 KiB/302 KiB (r/w)net
I200414 15:43:18.122144 47 server/status/runtime.go:498  [n3] runtime stats: 5.6 GiB RSS, 251 goroutines, 2.7 GiB/55 MiB/2.9 GiB GO alloc/idle/total, 2.4 GiB/2.9 GiB CGO alloc/total, 20905.7 CGO/sec, 115.9/9.4 %(u/s)time, 0.0 %gc (0x), 3.2 MiB/528 KiB (r/w)net
I200414 15:43:28.122136 47 server/status/runtime.go:498  [n3] runtime stats: 6.3 GiB RSS, 252 goroutines, 2.4 GiB/762 MiB/3.5 GiB GO alloc/idle/total, 2.5 GiB/3.0 GiB CGO alloc/total, 712.9 CGO/sec, 245.3/15.0 %(u/s)time, 0.0 %gc (1x), 8.7 MiB/799 KiB (r/w)net
I200414 15:43:38.122474 47 server/status/runtime.go:498  [n3] runtime stats: 6.8 GiB RSS, 250 goroutines, 3.5 GiB/28 MiB/3.7 GiB GO alloc/idle/total, 2.7 GiB/3.3 GiB CGO alloc/total, 780.7 CGO/sec, 138.8/14.4 %(u/s)time, 0.0 %gc (0x), 8.4 MiB/1006 KiB (r/w)net
I200414 15:43:49.140758 47 server/status/runtime.go:498  [n3] runtime stats: 7.8 GiB RSS, 251 goroutines, 3.5 GiB/28 MiB/3.7 GiB GO alloc/idle/total(stale), 2.8 GiB/3.4 GiB CGO alloc/total, 1090.9 CGO/sec, 278.7/17.9 %(u/s)time, 0.0 %gc (0x), 12 MiB/1022 KiB (r/w)net
I200414 15:43:58.134800 47 server/status/runtime.go:498  [n3] runtime stats: 8.2 GiB RSS, 247 goroutines, 4.6 GiB/614 MiB/5.0 GiB GO alloc/idle/total, 3.0 GiB/3.6 GiB CGO alloc/total, 806.8 CGO/sec, 171.7/9.5 %(u/s)time, 0.0 %gc (1x), 5.6 MiB/728 KiB (r/w)net
I200414 15:44:08.134696 47 server/status/runtime.go:498  [n3] runtime stats: 9.3 GiB RSS, 247 goroutines, 5.6 GiB/135 MiB/5.9 GiB GO alloc/idle/total, 3.0 GiB/3.6 GiB CGO alloc/total, 96.3 CGO/sec, 170.7/38.7 %(u/s)time, 0.0 %gc (0x), 1.6 MiB/11 MiB (r/w)net
I200414 15:44:19.135360 47 server/status/runtime.go:498  [n3] runtime stats: 10 GiB RSS, 248 goroutines, 5.6 GiB/135 MiB/5.9 GiB GO alloc/idle/total(stale), 3.0 GiB/3.6 GiB CGO alloc/total, 990.2 CGO/sec, 298.4/31.7 %(u/s)time, 0.0 %gc (0x), 1.7 MiB/14 MiB (r/w)net
I200414 15:44:28.134794 47 server/status/runtime.go:498  [n3] runtime stats: 10 GiB RSS, 248 goroutines, 4.6 GiB/1.5 GiB/7.0 GiB GO alloc/idle/total, 3.1 GiB/3.7 GiB CGO alloc/total, 20999.8 CGO/sec, 283.8/33.1 %(u/s)time, 0.0 %gc (1x), 1.6 MiB/14 MiB (r/w)net
I200414 15:44:38.134963 47 server/status/runtime.go:498  [n3] runtime stats: 10 GiB RSS, 238 goroutines, 6.0 GiB/151 MiB/7.0 GiB GO alloc/idle/total, 3.2 GiB/3.8 GiB CGO alloc/total, 15133.9 CGO/sec, 245.4/28.4 %(u/s)time, 0.0 %gc (0x), 13 MiB/19 MiB (r/w)net
I200414 15:44:48.137490 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 238 goroutines, 7.5 GiB/39 MiB/8.3 GiB GO alloc/idle/total, 3.2 GiB/3.9 GiB CGO alloc/total, 1003.9 CGO/sec, 240.8/36.7 %(u/s)time, 0.0 %gc (0x), 17 MiB/17 MiB (r/w)net
I200414 15:44:58.137485 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 226 goroutines, 2.4 GiB/4.2 GiB/8.6 GiB GO alloc/idle/total, 3.2 GiB/3.9 GiB CGO alloc/total, 96631.7 CGO/sec, 236.4/22.6 %(u/s)time, 0.0 %gc (1x), 8.9 MiB/7.0 MiB (r/w)net
I200414 15:45:08.137710 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 223 goroutines, 3.4 GiB/3.2 GiB/8.6 GiB GO alloc/idle/total, 3.1 GiB/3.9 GiB CGO alloc/total, 242809.7 CGO/sec, 136.6/12.2 %(u/s)time, 0.0 %gc (0x), 2.9 MiB/2.7 MiB (r/w)net
I200414 15:45:18.137807 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 223 goroutines, 2.3 GiB/5.9 GiB/8.6 GiB GO alloc/idle/total, 3.2 GiB/3.9 GiB CGO alloc/total, 223054.6 CGO/sec, 170.5/28.3 %(u/s)time, 0.0 %gc (1x), 4.7 MiB/27 MiB (r/w)net
I200414 15:45:28.137976 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 224 goroutines, 1.9 GiB/6.2 GiB/8.6 GiB GO alloc/idle/total, 3.2 GiB/3.9 GiB CGO alloc/total, 239906.7 CGO/sec, 163.6/27.7 %(u/s)time, 0.0 %gc (1x), 5.1 MiB/32 MiB (r/w)net
I200414 15:45:38.138027 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 224 goroutines, 2.8 GiB/5.4 GiB/8.6 GiB GO alloc/idle/total, 3.2 GiB/3.9 GiB CGO alloc/total, 169585.8 CGO/sec, 146.0/29.4 %(u/s)time, 0.0 %gc (0x), 4.3 MiB/34 MiB (r/w)net
I200414 15:45:48.138093 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 224 goroutines, 1.9 GiB/6.3 GiB/8.6 GiB GO alloc/idle/total, 3.3 GiB/4.0 GiB CGO alloc/total, 204666.9 CGO/sec, 162.9/24.3 %(u/s)time, 0.0 %gc (1x), 2.9 MiB/20 MiB (r/w)net
I200414 15:45:58.138029 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 223 goroutines, 2.5 GiB/5.6 GiB/8.6 GiB GO alloc/idle/total, 3.4 GiB/4.0 GiB CGO alloc/total, 219946.8 CGO/sec, 141.7/27.1 %(u/s)time, 0.0 %gc (0x), 2.3 MiB/28 MiB (r/w)net
I200414 15:46:08.138453 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 223 goroutines, 1.6 GiB/6.6 GiB/8.6 GiB GO alloc/idle/total, 3.4 GiB/4.1 GiB CGO alloc/total, 81249.6 CGO/sec, 149.7/20.9 %(u/s)time, 0.0 %gc (1x), 2.6 MiB/30 MiB (r/w)net
I200414 15:46:18.138610 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 223 goroutines, 2.3 GiB/5.8 GiB/8.6 GiB GO alloc/idle/total, 3.5 GiB/4.1 GiB CGO alloc/total, 158729.9 CGO/sec, 133.6/31.5 %(u/s)time, 0.0 %gc (0x), 3.5 MiB/27 MiB (r/w)net
I200414 15:46:28.825796 47 server/status/runtime.go:498  [n3] runtime stats: 12 GiB RSS, 224 goroutines, 2.9 GiB/5.3 GiB/8.6 GiB GO alloc/idle/total, 3.5 GiB/4.1 GiB CGO alloc/total, 99344.2 CGO/sec, 148.1/23.7 %(u/s)time, 0.0 %gc (1x), 2.9 MiB/38 MiB (r/w)net
I200414 15:46:38.141772 47 server/status/runtime.go:498  [n3] runtime stats: 13 GiB RSS, 224 goroutines, 2.1 GiB/6.1 GiB/8.6 GiB GO alloc/idle/total, 3.5 GiB/4.2 GiB CGO alloc/total, 136691.1 CGO/sec, 177.0/28.2 %(u/s)time, 0.0 %gc (0x), 3.5 MiB/29 MiB (r/w)net
I200414 15:46:48.141377 47 server/status/runtime.go:498  [n3] runtime stats: 13 GiB RSS, 224 goroutines, 1.8 GiB/6.3 GiB/8.6 GiB GO alloc/idle/total, 3.5 GiB/4.2 GiB CGO alloc/total, 243149.8 CGO/sec, 162.0/29.3 %(u/s)time, 0.0 %gc (1x), 5.3 MiB/35 MiB (r/w)net
I200414 15:46:58.740523 47 server/status/runtime.go:498  [n3] runtime stats: 13 GiB RSS, 223 goroutines, 2.9 GiB/5.3 GiB/8.6 GiB GO alloc/idle/total, 3.6 GiB/4.2 GiB CGO alloc/total, 240568.2 CGO/sec, 159.8/26.8 %(u/s)time, 0.0 %gc (1x), 5.7 MiB/574 KiB (r/w)net
I200414 15:47:08.155447 47 server/status/runtime.go:498  [n3] runtime stats: 13 GiB RSS, 223 goroutines, 2.4 GiB/5.7 GiB/8.6 GiB GO alloc/idle/total, 3.6 GiB/4.3 GiB CGO alloc/total, 370347.3 CGO/sec, 157.0/31.7 %(u/s)time, 0.0 %gc (0x), 3.3 MiB/465 KiB (r/w)net
I200414 15:47:18.155478 47 server/status/runtime.go:498  [n3] runtime stats: 13 GiB RSS, 221 goroutines, 1.9 GiB/6.3 GiB/8.6 GiB GO alloc/idle/total, 3.6 GiB/4.3 GiB CGO alloc/total, 197441.0 CGO/sec, 162.6/30.7 %(u/s)time, 0.0 %gc (1x), 2.7 MiB/428 KiB (r/w)net
I200414 15:47:28.155154 47 server/status/runtime.go:498  [n3] runtime stats: 13 GiB RSS, 214 goroutines, 2.0 GiB/6.1 GiB/8.6 GiB GO alloc/idle/total, 3.6 GiB/4.3 GiB CGO alloc/total, 22573.7 CGO/sec, 32.4/4.7 %(u/s)time, 0.0 %gc (0x), 568 KiB/338 KiB (r/w)net

True, there is a spike up to 7.5 GiB which is 2x of our --max-sql-memory, but at that point we're doing a hash join with one input being the hash aggregator (so it cannot release it's memory yet) and another being the table reader (so the allocations are below SQL layer). Nonetheless, it is a dramatic improvement to what we had before.

I believe it was quite a lucky coincidence that we were not hitting the crash on 19.2.5 and are hitting it with the addition of those structs in aggregate builtins.

I mean that if we were properly accounting for the struct size of the aggregateFuncs, shouldn't we have returned a memory error before crashing? We're also keeping all that accounted memory in the account until we close, right? So it seems that what we're doing here is relieving actual memory pressure but the account stays the same, which I think might be problematic because we might be hiding misaccounting somewhere else.

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/rowexec/aggregator.go, line 665 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I mean that if we were properly accounting for the struct size of the aggregateFuncs, shouldn't we have returned a memory error before crashing? We're also keeping all that accounted memory in the account until we close, right? So it seems that what we're doing here is relieving actual memory pressure but the account stays the same, which I think might be problematic because we might be hiding misaccounting somewhere else.

Here is my understanding of the current master:

  • we create a bunch of aggregateFuncs and store the references to them in ag.buckets
  • we account for these allocations accordingly with bucketsAcc
  • when the hash aggregator is done, we close bucketsAcc which releases the budget to SQL root memory
  • however, ag.buckets is still pointing to a map, and that map still points to a bunch of aggregateFuncs. The memory will be GC'ed only when there are no more references to it which I think happens only when the flow that contains hash aggregator is shut down (because we have the following chain hashJoiner -> hashAggregator -> ag.buckets -> bunch of aggregateFuncs).

So IIUC the problem is that we're telling memory monitoring system that we have released the memory when, in fact, we didn't, not when closing the hash aggregator.

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/rowexec/aggregator.go, line 665 at r1 (raw file):

Previously, yuzefovich wrote…

Here is my understanding of the current master:

  • we create a bunch of aggregateFuncs and store the references to them in ag.buckets
  • we account for these allocations accordingly with bucketsAcc
  • when the hash aggregator is done, we close bucketsAcc which releases the budget to SQL root memory
  • however, ag.buckets is still pointing to a map, and that map still points to a bunch of aggregateFuncs. The memory will be GC'ed only when there are no more references to it which I think happens only when the flow that contains hash aggregator is shut down (because we have the following chain hashJoiner -> hashAggregator -> ag.buckets -> bunch of aggregateFuncs).

So IIUC the problem is that we're telling memory monitoring system that we have released the memory when, in fact, we didn't, not when closing the hash aggregator.

I guess the comment I put in the code might be confusing - the "philosophically correct" fix is setting ag.buckets to nil in close because that's when we're closing the memory accounts.

However, I think it is very beneficial to delete the entries from the map as early as possible to reduce the memory usage. We're still be accounting for all buckets until close is called, but most memory will actually have been GC'ed by that point which should reduce the probability of a crash due to misaccounting in other places.

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/rowexec/aggregator.go, line 665 at r1 (raw file):

Previously, yuzefovich wrote…

I guess the comment I put in the code might be confusing - the "philosophically correct" fix is setting ag.buckets to nil in close because that's when we're closing the memory accounts.

However, I think it is very beneficial to delete the entries from the map as early as possible to reduce the memory usage. We're still be accounting for all buckets until close is called, but most memory will actually have been GC'ed by that point which should reduce the probability of a crash due to misaccounting in other places.

Updated the comment to provide more context per our offline discussion.

This commit makes hash aggregator release the memory under buckets
eagerly (once we're done with the bucket) so that it is returned to the
system. This can matter a lot when we have large number of buckets (on
the order of 100k). Previously, this would happen only on the flow
shutdown, once we're losing the references to `hashAggregator`
processor. But it was problematic - we "released" the associated memory
from the memory accounting, yet we were holding the references still.
With this commit we will reduce the memory footprint and we'll be a lot
closer to what our memory accounting thinks we're using.

Release note: None
Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)

@yuzefovich
Copy link
Member Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Apr 15, 2020

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: investigate out of memory issues on TPCC check 3.3.2.6
3 participants