Memory requirements and Server Out of memory exception - help #903
Comments
Only opened this issue since other one is closed - #826 |
Hello @pklanka . This will largely depend on your environment. Depending on the size of the environment and the technologies you are watching, it is recommended that you spin off separate Security Monkey worker instances that are dedicated to select accounts and technologies. Do you have an idea on which technologies being scanned are killing SM? Knowing the specific technology in question will also help us to better address the issue. |
I increased the VM size to C2xlarge. We have 20 accounts. Difficult to say which is killing SM - none of them return an error (except few access denied - which is expected) when I follow previous mail thread. Any other way I can debug? |
This one is consuming all the memory / cpu - nginx 9340 10.3 52.7 12015944 8355808 ? Sl Jan09 113:46 /usr/local/src/security_monkey/venv/bin/python /usr/local/src/security_monkey/venv/bin/monkey start_scheduler |
This may be what @zpritcha was discussing about yesterday on Gitter. We'll need to investigate if the scheduler has memory leak issues. |
This is the first time I've seen such a recommendation. Can you provide any more specifics? Is there a threshold of number of accounts that you have in mind? |
@markofu It's a loose recommendation that I provide when users are experiencing massive scalability issues. Generally if the watchers are taking a very long time to describe all the resources, then it makes sense to break it up. This will be resolved in the future when we work on more event-driven watchers. |
🤞 Fixed in #904 ?? |
@pklanka Please fetch the latest develop branch down, re-install dependencies, and test. The latest version should address the issues with a newer scheduler library. |
Absolutely. Will give it a spin and test it over weekend. Many thanks for a quick fix. |
I installed the latest and it ran the hourly scheduler OK 3 times, but then it just failed with a Throttling error.
|
@rayj-pgi That appears to be more of a AWS rate-limiting error. We'll need to see why cloudaux isn't handling that properly. EDIT: I see what's happening. Looks like the managed policy watcher isn't using CloudAux. CloudAux would properly retry with backoff. |
Thanks @mikegrima! That error seems to have only occurred once this weekend so probably not a big deal. Otherwise with this release I'm seeing the scheduler container plateau at about 3G of memory compared to previously I was seeing about 3.6G. I have 3 medium sized AWS accounts in my SM at the moment. |
Hi All - running into this issue still even after redeploying with the code changes. PID VDATA VSTACK VSIZE RSIZE PSIZE VGROW RGROW SWAPSZ MEM CMD 1/19 I'm not a Python expert but willing to do some troubleshooting if you can provide any guidance. Thanks, |
Hi @mikegrima - Thanks for the update - I'll keep an eye on #909. |
Please see #910 |
This (#911) looks great Mike. I'm seeing that the scheduler is able to utilize multiple cores now and my memory usage, which was normally 3G+, is going up to 1.5G while running and back down to 500M when idle. 🥦 💯 |
Fixed in #911 |
Please make sure that you have checked the boxes:
AWS, GCP, OpenStack, GitHub
Description of issue:
There is a memory leak on monkey process that is leaking until our security monkey server does not respond. Here is what atop looks like -
Server has 2vcpu and 8 GB of RAM. Is that sufficient for security monkey operations? Could someone please guide / help.
The text was updated successfully, but these errors were encountered: