Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to track query progress? #11

Closed
iainelder opened this issue Sep 7, 2021 · 8 comments
Closed

How to track query progress? #11

iainelder opened this issue Sep 7, 2021 · 8 comments

Comments

@iainelder
Copy link
Collaborator

iainelder commented Sep 7, 2021

First let me thank you for this tool. It's a game changer! botocove is the best tool I know for ad-hoc analysis across an organization.

Currently I'm working with two organizations that each have in the order of 500 to 1000 accounts.

Across such large organizations, botocove takes hundreds of seconds to return a result. Anecdotally, depending on network conditions, I can wait between 120 and 300 seconds to get a result.

That's still good enough for interactive use, but it would be helpful to get some kind of "loading bar"-style feedback to know how long I should expect to wait.

I've considered adding a counter to the function wrapped by botocove. I've not tried it yet, but I guess it would work. I would need to run botocove in a second thread to be able to check the counter value.

Another solution could be to make botocove return immediately and run in the background. It would return an object with a blocking call to get the result and other calls to get the number of queries in progress, the number completed, the number remaining, and so on.

Is that something you have already considered?

@connelldave
Copy link
Owner

You're welcome! Thanks for the feedback - glad it's helpful. If you're interested in contributing or sharing ideas on implementation I'm open to it. I do feel like the internals could do with a refactor looking back over the codebase today and have started hacking out an idea on that out on a branch, but I don't think there's any need to make any user-facing changes.

I hadn't thought much about interactive UX in all honesty: it definitely seems like a useful feature that'd be easier implemented inside the library than bolting it on afterwards. I'm happy to have a go at implementing when I can find some spare time, although I don't have access to a large organization to test after changing employer, and the highest number of accounts I was dealing with was only around 250~. The main bottleneck if I remember correctly is the org master account rate limiting sts:assumerole at an AWS api level, but after that I'd expect the constraint being how many thread workers a machine is happy with.

I wonder if moving to lazy loading and yielding where possible would add a bit more performance - I intentionally set the "assume roles" and "do work" into two disparate steps on the first pass.

@iainelder
Copy link
Collaborator Author

If you're interested in contributing or sharing ideas on implementation I'm open to it.

I'd be happy to help out. I know enough Python to be dangerous and I've published a few experimental boto clients to make it easier to query across pages and across regions.

although I don't have access to a large organization to test after changing employer, and the highest number of accounts I was dealing with was only around 250~.

I'm happy to help you test it in my personal organization to get the right ergonomics and programmability.

It should be possible to create a test AWS organization with many accounts. The initial quota for members is 4, but there's no published hard limit. CloudFormation stack sets could be used to populate them with arbitrary resources for the sake of having something to query.

The biggest problem would be closing the member accounts afterwards. There's no API for that!

The main bottleneck if I remember correctly is the org master account rate limiting sts:assumerole at an AWS api level

I've been thrown throttling errors from the organizations DescribeAccounts API. I don't have a reliable way to reproduce it. It usually works again on the second attempt.

I wonder if moving to lazy loading and yielding where possible would add a bit more performance - I intentionally set the "assume roles" and "do work" into two disparate steps on the first pass.

I had a suspicion it worked like that! As soon as a credentials for an assumed role are obtained then it would be possible to start querying the given acccount. I'm not sure how it would work in practice. I have never programmed with asyncio. But I have had success with multiprocessing.Pool for similar tasks.

@iainelder
Copy link
Collaborator Author

Copied from #14 (comment):

I refactored the codebase without disrupting the user facing API on this branch. It drops some of the cruft like async.io loops and async funcs that don't really add much, and should hopefully allow progress bars. Feel free to check out https://github.com/connelldave/botocove/tree/progress_bar if you'd like to test. It'll be easy to port contributions to that change too if needs be.

Glad to hear it! I'll have a look after creating a PR for #14.

@connelldave
Copy link
Owner

image

Some progress after I got round to bothering to setting up a small test org - would be interested to know how it handles your scale!

@iainelder
Copy link
Collaborator Author

Looks pretty! I'll give you some feedback on my next use of the tool :-)

@iainelder
Copy link
Collaborator Author

I love this! It's just what I need to get visual feedback of the progress.

I've tried it in a smaller org of about 50 accounts.

When this gets published, I hope I'll have the chance to use it on something bigger.

I'd also like to use this in a tool I've built that uses botocove to collect inventory.

https://github.com/iainelder/aws-org-inventory/

The API calls to list resources can take a long time to run in accounts with a lot of resources.

@connelldave
Copy link
Owner

Released in 1.4.0 - #19

@iainelder
Copy link
Collaborator Author

Thanks! I'll try it out when I get a moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants