11 Accounts Show no data. #916
Comments
I've seen this happen, but I found it hard to reproduce and don't know why yet. |
I think the last exception is different from the first two. The first two I've seen a few times, but it seems to be transient. The last one is interesting (I haven't seen that one yet). |
Running locally (with security groups), I'm seeing exceptions with this method:
In |
So this is interesting... Looking in my testing db, I am seeing what I guess is a half-stored value. One where the the So need to:
|
@fstuck37 Can you attempt to run the following SQL query:
^^ Does this return the 11 accounts you are seeing issues with? |
Looking at our databases, it appears that these were all deleted items. But for some reason, the deletion doesn't have a deletion revision ID attached... So, I think the solution is to, on startup of the given (celery) task, to look for these orphaned items, and create a "deletion" change record for them. That should make things in the DB copasetic and hopefully resolve the problem. |
#922 is a possible fix. |
@fstuck37 Can you do me another favor? Can you run the following query:
I see that |
@mikegrima First off, thank you for the quick response to this request. I ran the first query and it only returned one account that was not part of the 11. I did notice that the 11 that aren't populating are the last 11 accounts that I added. The IDs of the first 25 accounts that are working all have IDs between 4 and 72. going to try and delete the account that has ID 76 and see what happens. Thanks again for all the help. |
OK no luck with deleting the first account nor all 11 and then adding one back. Hope this helps. Thanks again, |
Feel free to reach out to me on gitter, and we'll try to debug. I would like to make sure these issues are completely resolved before I make the next major release. |
@mikegrima If you have anything specific you'd like me to run I'll find some time to get it to you. Really appreciate the help and apologize for not being able to work on this sooner. |
@mikegrima Thanks, |
Can you do me a favor and verify that those accounts are |
@mikegrima |
Is it possible that the Terraform template didn't place the proper trust relationships? Can you verify that the permissions for those 11 new accounts are correct? |
@mikegrima Will have to look to see if there is anything missing in my build script with the latest changes. Thanks for the help, |
@mikegrima I also tried disabling the account in the error message and running find_changes again and the same error comes up with a different account. Wondering if I need to redeploy the DB?
|
Can you update to the latest version in the I didn't make any DB schema changes, but can't hurt to run a |
@mikegrima Does this get the branch you want me to use? Also, I run the monkey db upgrade whenever I redeploy the instance. Thanks, |
The latest version is supposed to fix the orphaned items (the first 2 exceptions). Not sure why you are still getting them. Do you only see this for IAM roles? |
@mikegrima
|
@mikegrima
Thanks, |
@mikegrima File "/usr/local/src/security_monkey/security_monkey/datastore.py", line 317
|
Hello @fstuck37 - We need to understand why it cannot find the Can you please do me a favor and run the following SQL query:
The problem is that you still have some orphaned items in your DB. The new watcher code is supposed to find the orphaned items, and place a deletion record there. So, we need to get to the bottom of why those items are not being corrected. |
@mikegrima
I've started a manual monkey find_changes and so far did not receive the same errors but its still running. Here are the errors I've observed thus far:
I think it may be working now but the 1 account still doesn't have any objects in it so will need to see if it just needs time to get through the other accounts first. Hope this helps you find the issue. Thanks, |
I'm very curious why the latest isn't automatically correcting that orphaned item, since it's the first thing it does when If you enable debug logging, do you see a log entry about orphaned items not being found? |
@mikegrima Also the system still doesn't pull new data even though we fixed the issue with the null record. Thanks, |
To turn on debug logging (I really need to add a section to the quickstart for this), you will need to modify your |
@mikegrima I checked the DB and found that all of them are missing.
|
@mikegrima |
I think your DB is messed up. I hate to say this, but it might be worth re-creating. |
@mikegrima Do you think I'm missing something in the setup or is there an issue with the code? Thanks, |
I have not been able to replicate your issue and few other users so far have reported this. |
@fstuck37 Did you try a manual |
@mstair I basically copy the original files to /etc/supervisord.d/ cp /usr/local/src/security_monkey/supervisor/security_monkey_ui.conf /etc/supervisord.d/security_monkey_ui.ini In both security_monkey_scheduler.ini & security_monkey_workers.ini I changed numprocs=1 to numprocs=10. Not sure if this is appropriate but figured it was a good starting point, Thanks again for all the help, |
Could that be the issue? Worker concurrency should only be set in the Also, there should only be exactly one scheduler running at any time. |
@mikegrima The find_changes did fail with a error: Going to see if the changes to the processes fixes both. I waited a little while to see if the accounts populate automatically but didn't see anything so I started the find changes again. Thanks, |
Whoa, guess with that many accounts you may hit default file descriptor limits with the single manual find_changes. Accounts and associated roles/policies must be good, at least for some, if you are seeing some results from find_changes. Check /var/log/supervisor/* logs for permission errors (has gotten me more than once). Sorry if I missed above, is this all on a single instance? |
@mstair
Below is what I found in /var/log/supervisor/* logs:
...Restarted Workers here...
has stayed up for > than 60 seconds (startsecs) Thanks, |
Your I think it would be a good feature for CloudAux to have that set as an environment variable. How many other things do you have calling that API? If you have other apps deployed calling that API in the given account, then you will likely hit those rate limits faster. You could also try submitting a ticket to AWS to see if the limits can be lifted. |
Hi @mikegrima @mstair Could I be missing something in the setup since the celery changes? Thanks, |
What does a |
@mstair @mikegrima sudo supervisorctl status Thanks, |
My celery tasking is being logged to /var/log/supervisor/securitymonkeyscheduler-stderr---supervisor-xxxxx.log. Looks like this: |
@mstair Where is the config for the log file you've specified? Thanks, |
Can you jump on gitter? Might be able to assist better real-time |
Just to document what fixed this for me. So the end of this looks like we had a null latest_revision_id in the item table. Thanks to both of you for all the help, |
Hi All,
Hoping someone could help me identify an issue with getting data from 11 accounts.
Here's where I'm at:
securitymonkeyscheduler RUNNING pid 28938, uptime 1:46:14
securitymonkeyui RUNNING pid 10923, uptime 19:11:23
securitymonkeyworkers RUNNING pid 29280, uptime 0:07:18
Any ideas would be very much appreciated.
Thanks,
Fred
The text was updated successfully, but these errors were encountered: