Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eth2util/keystore: lower amount of loadStoreWorkers #2584

Merged
merged 1 commit into from
Oct 16, 2023

Conversation

eth2353
Copy link
Contributor

@eth2353 eth2353 commented Sep 13, 2023

When running create cluster with the --split-existing-keys flag, I ran into the charon service getting OOM killed. A bit of trial and error helped me figure this was related to the amount of keys to be split. A low amount (<30) would work fine.

I found out splitting 25 existing keys resulted in a memory consumption of over 6GB, 10 keys used about 2.5GB.

Curious to see what would lead to such high memory consumption, I built the binary locally and extracted a heap dump. The heap dump showed that pretty much all the memory was consumed by golang.org/x/crypto/scrypt.Key objects , 256MB per key (temporarily even 512MB is allocated if I'm looking at the data correctly).

64 workers (current value) * 256MB per key results in an upper limit of more than 16GB of memory consumed, with an even higher temporary spike.

This PR lowers the amount of workers to 10, resulting in an upper limit of 10 * 512MB = 5GB of memory consumption.

I didn't investigate why the scrypt.Key object itself needs as much memory as it seems to be using, I'm not that well versed at Go. But it does seem a bit excessive. Fixing that would likely be the more proper fix.

Let me know if this deserves an issue, but it seemed trivial enough to me.

category: bug
ticket: none

@gsora
Copy link
Collaborator

gsora commented Sep 20, 2023

Hey! Sorry for having you wait this much but it has been a hectic week.

Scrypt is a very memory-intensive hashing function, so with a secure set of parameters it's expected to use a great deal of memory: we sadly can't do much about it without compromising the generated keys.

What are the specs of the machine you're running?

@codecov
Copy link

codecov bot commented Sep 20, 2023

Codecov Report

Patch coverage has no change and project coverage change: +0.12% 🎉

Comparison is base (d201cd0) 53.68% compared to head (8c0f0f4) 53.81%.
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2584      +/-   ##
==========================================
+ Coverage   53.68%   53.81%   +0.12%     
==========================================
  Files         200      200              
  Lines       27132    27132              
==========================================
+ Hits        14567    14602      +35     
+ Misses      10742    10707      -35     
  Partials     1823     1823              
Files Changed Coverage Δ
eth2util/keystore/keystore.go 66.94% <ø> (ø)

... and 9 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@eth2353
Copy link
Contributor Author

eth2353 commented Sep 21, 2023

Hey, no worries.

I first ran the split command (attempting to split 500 keys) on my a M1 Max Macbook with 64GB of RAM via Docker, which has 8GB of RAM allocated to it. That got killed. Then I tried the same command on a Linux machine with 16GB of RAM, thinking it might be architecture-related, but the process also got killed.

Then I compiled the binary on the Macbook, ran it again giving it access to the full 64GB of RAM, still splitting 500 keys. I saw the memory spike up to about 30GB of RAM if I remember correctly. Not sure if that went fine but I believe it ended up being killed too.

The change in this PR allowed me to finally split all 500 keys with a lower amount of memory used - as mentioned in the description, it should now use up to 5GB of memory with 10 loadStoreWorkers.

@@ -32,7 +32,7 @@ const (
insecureCost = 4

// loadStoreWorkers is the amount of workers to use when loading/storing keys concurrently.
loadStoreWorkers = 64
loadStoreWorkers = 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest a power of 2, so 8 or 16.

@xenowits xenowits changed the title eth2util/keystore: lower amount of loadStoreWorkers to reduce memory consumption eth2util/keystore: lower amount of loadStoreWorkers Oct 16, 2023
@xenowits xenowits added the merge when ready Indicates bulldozer bot may merge when all checks pass label Oct 16, 2023
@obol-bulldozer obol-bulldozer bot merged commit 41dd725 into ObolNetwork:main Oct 16, 2023
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge when ready Indicates bulldozer bot may merge when all checks pass
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants