New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COMET - accumulomaster out of memory issue #14
Comments
Master was again killed with OutofMemory error within a day. Additional observation that on each occurence following trace is observed in accumulo logs.
|
As per Accumulo documentation memory calculated for running master is 5GB. Re-spawned EC2 instance for accummulo master as EC2 large instance. Script stats.sh used to collect measurements is available at /root on 18.191.186.255 (master). Measurements collected are being redirected to meas.txt |
From meas3.txt collected by stats.sh, it shows when Master just started, it consumed 223m memory already; it was down again on 3/11 and showing it consumed 289m memory 1hr before crash. The memory consumption is quite stable for a week (277m~292m in past 6 days). According to this, I increased heap size limit to 512m so we have better margin and will check how it goes. |
Master is crashed again twice this week, with the recent crashes, in log we always see something like |
From tcpdump, there is indeed suspicious TCP SYN msg from
|
No crash observed for a week after setting up the security group policy to accept only same group inbound traffic to ports of accumulomaster and gc. |
The issue is not a bug. |
In COMET cluster running in AWS, node running accumulomaster also hosts comet head node.
In current deployment, EC2 instance is of type small which has 2GB ram.
Issue:
Accumulomaster process is killed due to OutOfMemoryError. This happens almost after 4-5 days.
Accumulo process is killed by watchdog as can be observed in logs: /opt/accumulo-1.9.1/logs/gc_accumulomaster.out
Investigation so far:
Current deployment configures Accumulo to use 1GB memory distributed across various processes.
Also, no memory parameters are passed for comet when launching it. By default Spring Boot app will use JVM default memory settings. This results in comet process taking upto 1GB memory.
With the above two configurations we are pretty much exhausting EC2 instance out.
To verify this theory, i have launched comet only on comet-head node VM. Accumulomaster is running standalone with memory options -Xmx256m -Xms256m.
I also have a script running on accumulomaster node collecting memory usage. Would monitor the system to check if memory issue is still observed. Based on observations above, will change depoloyment and configuration scripts.
The text was updated successfully, but these errors were encountered: