Make Agent Run for all ASG in each iteration, packaging discussions #15

roeyazroel · 2016-09-21T09:04:50Z

Looks like in each iteration of the Lambda function, it runs only for one ASG, make the replacement process very long for multiple ASG environment.

roeyazroel · 2016-09-21T09:25:43Z

I've update the Event Trigger in CW to run every 1m. If my ASG are very dynamic (lots of scaling actions), any reason why shouldn't I keep it in 1m?

Thanks

cristim · 2016-09-22T11:50:16Z

It should run in parallel for all ASGs in all regions(anything else is a bug), but it only takes one action per ASG, which is intentional in order to reduce the chance of it replacing too many machines at once which may bring down your environment if your application isn't really ready for too many replacements.

Regarding the increased frequency I don't really mind, feel free if it works for you, although at the moment I'd likely need to pay more pennies for that extra bandwidth due to the self-update checks which happen on each run. I still have enough AWS credits for the next few months but once I start paying with my own money I may do something about the currently wasteful self-update mechanism.

I haven't really tested it in such a high frequency mode, so please report any issues you may find.

roeyazroel · 2016-09-22T11:55:34Z

I've configured it for 1m to "onboard" faster, and then change it back to 5m. It worked perfectly, we had heavy load during night and Spots instances where all over the place working like magic. 👍

In the long term, how do you plan do implement it without depending on your account?
What exactly run on your credits?

Thanks,
Roey

cristim · 2016-09-22T13:22:18Z

Thanks, it feels great to see such feedback :-)

The plan is to include more in the Lambda zip file if it gets expensive. At the moment it only includes the Python Lambda shim, while the static binary written in golang is downloaded on demand if the latest version is not available in lambda function's /tmp, and some content is downloaded from the Github CDN on every run, which may be costly for them as well and I may not be able to do it for long term. The benefit is it allows users to get updates automatically without performing any CloudFormation stack changes.

The check for determining the latest version and the download on of the binary both require some bandwidth and requests, which show up on my CloudFront bill, the runtime data likely shows up on Github's bill, but as long as the traffic is small they may not care about it. With the 5 min frequency it costs me about 20c/user/month, and since so far I only have a handful of active users and enough AWS credits, that's not a problem, but if this really takes off I may need to do something about it and Github may block it as well.

An easy option would be to include everything I have at build time and stop downloading anything at runtime, but that will break auto-updates which I'm not sure I'd like to do.

A nicer solution may be creating an S3 bucket in user's account, where the binary and some runtime dependencies are updated once a day. If the user doesn't want auto-updates, it could control them by blocking write access to that S3 bucket using a IAM policy change.

roeyazroel · 2016-09-22T13:27:42Z

I think that I'll prefer to be out of date, and be responsible to update the stack to the latest update rather than paying the on demand prices. Maybe consider to give this decision to the users.

Anyway, I've some ideas. Hopefully I'll be press the fork button in couple of days/weeks. 👍

Thanks a million!
Roey

cristim · 2016-09-22T14:01:09Z

I agree, as an Ops guy I totally agree that auto-updates are scary and when unsure they should be left to the user, but as a developer back when I was rolling out fixes pretty much on a hourly basis, it was really convenient to just commit and have it auto-update everywhere out of the box without announcing the users to update their stacks.

But considering that the target audience of this tool is mostly Ops'y people, I think dropping auto-updates would actually be desirable once the tool is more or less stable, which as per your feedback, I think may have already happened.

I'll try to come up with a change that bundles everything and removes auto-updates, stay tuned. This entire discussion should actually be part of #6

The goal is to be able to run without any runtime dependencies - everything is now self-contained, included in the Lambda zip file - no more runtime downloads of data from external infrastructure - more reliable, everything is available out of the box - stable software, no longer auto-updates behind the user's back - remove dependency on UPX, since the size no longer matters so much - saving some execution time and a few pennies spent on bandwidth - lots of Lambda wrapper cleanups, the wrapper code is now 3x smaller - some build system cleanups and improvements - introduced a command-line argument to avoid hard-coding the instances.json file name Downside: the users need to update the CloudFormation stack in order to update the software to the latest version

cristim · 2016-09-24T15:46:35Z

I just tested and the replacement process is correctly handling multiple AutoScaling groups in the same region concurrently, although only one action is taken per function run per group - according to the expected behavior - so this bug is essentially invalid.

I recently implemented the self-contained packaging method I mentioned before, removing the self-update mechanism entirely, so from now on users would need to update on their own. I will document the update procedure in the Readme file.

This allows users to set any event frequency without much additional costs: only more Lambda execution time and bandwidth consumed while performing API calls, which is negligible or well within the free tier, and it is no longer going to generate any costs on my infrastructure and will no longer misuse the Github CDN.

In 29fcd3c929605 I parameterized the event frequency to let people set any value they like, but I would keep the default of 5 minutes, which I still think it's a good out-of-the-box compromise between speed and safety.

cristim · 2016-09-26T09:46:03Z

I thought that maybe I should reopen this until I get confirmation from the user that everything is fine :)

@roeyazroel please give this a try when you have some time, and let me know if you run into any issues.

I hope the documentation for software updates is clear enough.

roeyazroel · 2016-09-26T09:55:26Z

I'm not sure why, but i think when i ran the upgrade to the CF stack, the lambda CW trigger was added again (the same one). Can you check it? Not sure if it was because my CW trigger was in disable during the update as we had some AWS issues.

roeyazroel · 2016-09-26T09:56:37Z

Another thing - think about using the Github Tags, as the SHA isn't very "friendly" and not easy to "see" that you aren't up to date.

cristim · 2016-09-26T10:03:00Z

Thanks for testing it.

In theory CloudFormation shouldn't duplicate such resources, unless you created the more frequent event resource outside the CloudFormation stack, but it should take over the edited resource and change it if you just edited it in place. Can you confirm that wasn't the case? I can have a look into the lambda trigger creation tonight, unless you figure it out first.

Interesting idea to use tags, I'm thinking maybe to use the Travis build number as tag name, what do you think?

roeyazroel · 2016-09-26T10:05:10Z

Should be good enough, tags are more friendly for "human eyes" and for versioning.

Not sure what want wrong after the update, but now it's running at all. still trying to understand why.

roeyazroel · 2016-09-26T10:19:18Z

False alarm, working fine.

But it's look like that update of the stack does create another trigger in the lambda function.

cristim · 2016-09-26T12:53:22Z

Do I understand correctly that every update creates yet another event generator that runs the lambda function? This looks to me like a bug in CloudFormation.

cristim · 2016-09-26T20:55:47Z

I tested the CloudFormation stack updates and the resource is correctly updated according to the given stack parameters, updating the event generator to the defined frequency. In my case I didn't notice any additional CloudWatch event generators being created.

I also implemented support for using Travis CI build numbers for software updates, and I documented this in the readme.

@roeyazroel Please have a look and let me know if this is good enough.

cristim · 2016-09-27T22:36:49Z

I now understood what you meant by duplicating the event generator after a stack update, I think I just experienced the same issue. The resource doesn't get duplicated, but the Lambda function gets subscribed to the same event generator twice.

It looks much like a CloudFormation issue to me. If I see it again, I'll report it as an AWS support case.

cristim · 2016-09-27T22:47:43Z

I deleted one of them and the function wouldn't be executed anymore. It recovered after disabling and re-enabling the remaining event in the Lambda function's list of triggers. So I assume having both of them is harmless(although not so nice to see in the console), since only one of them would actually launch the function. But this is still a bug that should be fixed.

cristim · 2016-11-03T18:28:54Z

I can't see this happening anymore, likely AWS fixed it, so I would like to finally close this issue.

@roeyazroel please reopen if you are still having issues.

cristim mentioned this issue Sep 23, 2016

Self contained execution, without any dependencies #17

Merged

cristim closed this as completed Sep 24, 2016

cristim reopened this Sep 26, 2016

cristim changed the title ~~Make Agent Run for all ASG in each iteration~~ Make Agent Run for all ASG in each iteration, packaging discussions Sep 26, 2016

cristim closed this as completed Nov 3, 2016

cristim self-assigned this Nov 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Agent Run for all ASG in each iteration, packaging discussions #15

Make Agent Run for all ASG in each iteration, packaging discussions #15

roeyazroel commented Sep 21, 2016

roeyazroel commented Sep 21, 2016

cristim commented Sep 22, 2016

roeyazroel commented Sep 22, 2016

cristim commented Sep 22, 2016 •

edited

Loading

roeyazroel commented Sep 22, 2016

cristim commented Sep 22, 2016 •

edited

Loading

cristim commented Sep 24, 2016 •

edited

Loading

cristim commented Sep 26, 2016

roeyazroel commented Sep 26, 2016

roeyazroel commented Sep 26, 2016

cristim commented Sep 26, 2016 •

edited

Loading

roeyazroel commented Sep 26, 2016

roeyazroel commented Sep 26, 2016

cristim commented Sep 26, 2016

cristim commented Sep 26, 2016

cristim commented Sep 27, 2016 •

edited

Loading

cristim commented Sep 27, 2016

cristim commented Nov 3, 2016

Make Agent Run for all ASG in each iteration, packaging discussions #15

Make Agent Run for all ASG in each iteration, packaging discussions #15

Comments

roeyazroel commented Sep 21, 2016

roeyazroel commented Sep 21, 2016

cristim commented Sep 22, 2016

roeyazroel commented Sep 22, 2016

cristim commented Sep 22, 2016 • edited Loading

roeyazroel commented Sep 22, 2016

cristim commented Sep 22, 2016 • edited Loading

cristim commented Sep 24, 2016 • edited Loading

cristim commented Sep 26, 2016

roeyazroel commented Sep 26, 2016

roeyazroel commented Sep 26, 2016

cristim commented Sep 26, 2016 • edited Loading

roeyazroel commented Sep 26, 2016

roeyazroel commented Sep 26, 2016

cristim commented Sep 26, 2016

cristim commented Sep 26, 2016

cristim commented Sep 27, 2016 • edited Loading

cristim commented Sep 27, 2016

cristim commented Nov 3, 2016

cristim commented Sep 22, 2016 •

edited

Loading

cristim commented Sep 22, 2016 •

edited

Loading

cristim commented Sep 24, 2016 •

edited

Loading

cristim commented Sep 26, 2016 •

edited

Loading

cristim commented Sep 27, 2016 •

edited

Loading