How to track query progress? #11

iainelder · 2021-09-07T14:03:44Z

First let me thank you for this tool. It's a game changer! botocove is the best tool I know for ad-hoc analysis across an organization.

Currently I'm working with two organizations that each have in the order of 500 to 1000 accounts.

Across such large organizations, botocove takes hundreds of seconds to return a result. Anecdotally, depending on network conditions, I can wait between 120 and 300 seconds to get a result.

That's still good enough for interactive use, but it would be helpful to get some kind of "loading bar"-style feedback to know how long I should expect to wait.

I've considered adding a counter to the function wrapped by botocove. I've not tried it yet, but I guess it would work. I would need to run botocove in a second thread to be able to check the counter value.

Another solution could be to make botocove return immediately and run in the background. It would return an object with a blocking call to get the result and other calls to get the number of queries in progress, the number completed, the number remaining, and so on.

Is that something you have already considered?

connelldave · 2021-09-09T20:21:57Z

You're welcome! Thanks for the feedback - glad it's helpful. If you're interested in contributing or sharing ideas on implementation I'm open to it. I do feel like the internals could do with a refactor looking back over the codebase today and have started hacking out an idea on that out on a branch, but I don't think there's any need to make any user-facing changes.

I hadn't thought much about interactive UX in all honesty: it definitely seems like a useful feature that'd be easier implemented inside the library than bolting it on afterwards. I'm happy to have a go at implementing when I can find some spare time, although I don't have access to a large organization to test after changing employer, and the highest number of accounts I was dealing with was only around 250~. The main bottleneck if I remember correctly is the org master account rate limiting sts:assumerole at an AWS api level, but after that I'd expect the constraint being how many thread workers a machine is happy with.

I wonder if moving to lazy loading and yielding where possible would add a bit more performance - I intentionally set the "assume roles" and "do work" into two disparate steps on the first pass.

iainelder · 2021-09-10T19:14:00Z

If you're interested in contributing or sharing ideas on implementation I'm open to it.

I'd be happy to help out. I know enough Python to be dangerous and I've published a few experimental boto clients to make it easier to query across pages and across regions.

although I don't have access to a large organization to test after changing employer, and the highest number of accounts I was dealing with was only around 250~.

I'm happy to help you test it in my personal organization to get the right ergonomics and programmability.

It should be possible to create a test AWS organization with many accounts. The initial quota for members is 4, but there's no published hard limit. CloudFormation stack sets could be used to populate them with arbitrary resources for the sake of having something to query.

The biggest problem would be closing the member accounts afterwards. There's no API for that!

The main bottleneck if I remember correctly is the org master account rate limiting sts:assumerole at an AWS api level

I've been thrown throttling errors from the organizations DescribeAccounts API. I don't have a reliable way to reproduce it. It usually works again on the second attempt.

I wonder if moving to lazy loading and yielding where possible would add a bit more performance - I intentionally set the "assume roles" and "do work" into two disparate steps on the first pass.

I had a suspicion it worked like that! As soon as a credentials for an assumed role are obtained then it would be possible to start querying the given acccount. I'm not sure how it would work in practice. I have never programmed with asyncio. But I have had success with multiprocessing.Pool for similar tasks.

iainelder · 2021-10-05T08:40:51Z

Copied from #14 (comment):

I refactored the codebase without disrupting the user facing API on this branch. It drops some of the cruft like async.io loops and async funcs that don't really add much, and should hopefully allow progress bars. Feel free to check out https://github.com/connelldave/botocove/tree/progress_bar if you'd like to test. It'll be easy to port contributions to that change too if needs be.

Glad to hear it! I'll have a look after creating a PR for #14.

connelldave · 2021-10-21T14:40:59Z

Some progress after I got round to bothering to setting up a small test org - would be interested to know how it handles your scale!

iainelder · 2021-10-26T10:48:17Z

Looks pretty! I'll give you some feedback on my next use of the tool :-)

iainelder · 2021-10-27T00:02:54Z

I love this! It's just what I need to get visual feedback of the progress.

I've tried it in a smaller org of about 50 accounts.

When this gets published, I hope I'll have the chance to use it on something bigger.

I'd also like to use this in a tool I've built that uses botocove to collect inventory.

https://github.com/iainelder/aws-org-inventory/

The API calls to list resources can take a long time to run in accounts with a lot of resources.

connelldave · 2021-11-15T23:25:46Z

Released in 1.4.0 - #19

iainelder · 2021-11-16T11:26:07Z

Thanks! I'll try it out when I get a moment.

iainelder mentioned this issue Oct 5, 2021

Add ability to set session policies for assumed roles #14

Closed

connelldave closed this as completed Nov 15, 2021

iainelder mentioned this issue Dec 19, 2021

Use generators to stream output #23

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to track query progress? #11

How to track query progress? #11

iainelder commented Sep 7, 2021 •

edited

Loading

connelldave commented Sep 9, 2021

iainelder commented Sep 10, 2021

iainelder commented Oct 5, 2021

connelldave commented Oct 21, 2021

iainelder commented Oct 26, 2021

iainelder commented Oct 27, 2021

connelldave commented Nov 15, 2021

iainelder commented Nov 16, 2021

How to track query progress? #11

How to track query progress? #11

Comments

iainelder commented Sep 7, 2021 • edited Loading

connelldave commented Sep 9, 2021

iainelder commented Sep 10, 2021

iainelder commented Oct 5, 2021

connelldave commented Oct 21, 2021

iainelder commented Oct 26, 2021

iainelder commented Oct 27, 2021

connelldave commented Nov 15, 2021

iainelder commented Nov 16, 2021

iainelder commented Sep 7, 2021 •

edited

Loading