Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Too many open files" error when performing a sync using the docker image #3221

Closed
habdelra opened this issue Jul 12, 2021 · 6 comments
Closed
Assignees
Labels

Comments

@habdelra
Copy link

habdelra commented Jul 12, 2021

Describe the bug
I am performing a full archive sync of the POA Sokol network from the docker image located at https://hub.docker.com/r/nethermind/nethermind. After about 17.3 million blocks I encounter this error:

2021-07-12 12:55:39.0614|BlockchainProcessor encountered an exception. RocksDbSharp.RocksDbException: IO error: While open a file for random read: /nethermind/data/nethermind_db/sokol_archive/state/085434.sst: Too many open files
   at RocksDbSharp.Native.rocksdb_write(IntPtr db, IntPtr options, IntPtr batch)
   at Nethermind.Db.Rocks.DbOnTheRocks.RocksDbBatch.Dispose()
   at Nethermind.Trie.Pruning.TrieStore.FinishBlockCommit(TrieType trieType, Int64 blockNumber, TrieNode root)
   at Nethermind.Trie.PatriciaTree.Commit(Int64 blockNumber)
   at Nethermind.State.StorageProvider.CommitTrees(Int64 blockNumber)
   at Nethermind.Blockchain.Processing.BlockProcessor.PreCommitBlock(Keccak newBranchStateRoot, Int64 blockNumber)
   at Nethermind.Blockchain.Processing.BlockProcessor.Process(Keccak newBranchStateRoot, List`1 suggestedBlocks, ProcessingOptions options, IBlockTracer blockTracer)
   at Nethermind.Blockchain.Processing.BlockchainProcessor.ProcessBranch(ProcessingBranch processingBranch, ProcessingOptions options, IBlockTracer tracer)
   at Nethermind.Blockchain.Processing.BlockchainProcessor.Process(Block suggestedBlock, ProcessingOptions options, IBlockTracer tracer)
   at Nethermind.Blockchain.Processing.BlockchainProcessor.RunProcessingLoop()
   at System.Threading.Tasks.Task.InnerInvoke()
   at System.Threading.Tasks.Task.<>c.<.cctor>b__277_0(Object obj)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
2021-07-12 12:55:39.1816|Failed to store data in /nethermind/data/nethermind_db/sokol_archive/peers/SimpleFileDb.db System.IO.IOException: Too many open files
   at Interop.ThrowExceptionForIoErrno(ErrorInfo errorInfo, String path, Boolean isDirectory, Func`2 errorRewriter)
   at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String path, OpenFlags flags, Int32 mode)
   at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
   at System.IO.StreamWriter.ValidateArgsAndOpenPath(String path, Boolean append, Encoding encoding, Int32 bufferSize)
   at Nethermind.Network.SimpleFilePublicKeyDb.CommitBatch() in /src/Nethermind/Nethermind.Network/SimpleFilePublicKeyDb.cs:line 118

After that error is encountered the no more blocks are processed and the syncing process is essentially frozen.

To Reproduce
Execute the following:

docker pull nethermind/nethermind:latest
docker run -d \
  -v /data/ethereum/:/nethermind/data \
  -p 8545:8545 -p 30300-30400:30300-30400 -p 30300-30400:30300-30400/udp \
  nethermind/nethermind \
    --datadir data \
    --config sokol_archive \
    --HealthChecks.Enabled true \
    --HealthChecks.UIEnabled true \
    --JsonRpc.Enabled true \
    --JsonRpc.Host 0.0.0.0

wait very many blocks (in my case about 17.3 million), and the stack trace from above appears

Expected behavior
Ideally, we should never get this error. I wonder. in the documentation here: https://docs.nethermind.io/nethermind/first-steps-with-nethermind/manage-nethermind-with-systemd, the max number of open files is being explicitly set. should something like this be specified in the Dockerfile for nethermind as well?

Platform:

  • OS: AWS Amazon Linux (CentOS) EC2 instance w/ EBS volume mounted
@habdelra
Copy link
Author

In my situation I've mounting my host disk as a volume in the docker container. Maybe this is a simple as just updating the documentation for running in docker to ensure that the maximum number of files that can be open should be raised to the hard limit for the OS when you are mounting a host volume in the docker container running nethermind.

@LukaszRozmej
Copy link
Member

@matilote can we raise the default limit in our docker file?

@matilote
Copy link
Member

Cannot do that explicitly in the Dockerfile. I will update the documentation to include --ulimit nofile=1000000:1000000 in docker and docker-compose instruction.

@matilote
Copy link
Member

Added instructions on our docs.

@attila-lendvai
Copy link

it would be nice to know a more accurate estimation than 1000000.

my node synced up and was running fine for several days with the default 1024.

@attila-lendvai
Copy link

my node synced up and was running fine for several days with the default 1024.

up until now. it started to fail in strange ways (#6195).

65536 is the new magic number that i'm running with, but i can't say anything about stability yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants