Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Agent Run for all ASG in each iteration, packaging discussions #15

Closed
roeyazroel opened this issue Sep 21, 2016 · 18 comments
Closed
Assignees

Comments

@roeyazroel
Copy link
Contributor

Looks like in each iteration of the Lambda function, it runs only for one ASG, make the replacement process very long for multiple ASG environment.

@roeyazroel
Copy link
Contributor Author

I've update the Event Trigger in CW to run every 1m. If my ASG are very dynamic (lots of scaling actions), any reason why shouldn't I keep it in 1m?

Thanks

@cristim
Copy link
Member

cristim commented Sep 22, 2016

It should run in parallel for all ASGs in all regions(anything else is a bug), but it only takes one action per ASG, which is intentional in order to reduce the chance of it replacing too many machines at once which may bring down your environment if your application isn't really ready for too many replacements.

Regarding the increased frequency I don't really mind, feel free if it works for you, although at the moment I'd likely need to pay more pennies for that extra bandwidth due to the self-update checks which happen on each run. I still have enough AWS credits for the next few months but once I start paying with my own money I may do something about the currently wasteful self-update mechanism.

I haven't really tested it in such a high frequency mode, so please report any issues you may find.

@roeyazroel
Copy link
Contributor Author

I've configured it for 1m to "onboard" faster, and then change it back to 5m. It worked perfectly, we had heavy load during night and Spots instances where all over the place working like magic. 👍

In the long term, how do you plan do implement it without depending on your account?
What exactly run on your credits?

Thanks,
Roey

@cristim
Copy link
Member

cristim commented Sep 22, 2016

Thanks, it feels great to see such feedback :-)

The plan is to include more in the Lambda zip file if it gets expensive. At the moment it only includes the Python Lambda shim, while the static binary written in golang is downloaded on demand if the latest version is not available in lambda function's /tmp, and some content is downloaded from the Github CDN on every run, which may be costly for them as well and I may not be able to do it for long term. The benefit is it allows users to get updates automatically without performing any CloudFormation stack changes.

The check for determining the latest version and the download on of the binary both require some bandwidth and requests, which show up on my CloudFront bill, the runtime data likely shows up on Github's bill, but as long as the traffic is small they may not care about it. With the 5 min frequency it costs me about 20c/user/month, and since so far I only have a handful of active users and enough AWS credits, that's not a problem, but if this really takes off I may need to do something about it and Github may block it as well.

An easy option would be to include everything I have at build time and stop downloading anything at runtime, but that will break auto-updates which I'm not sure I'd like to do.

A nicer solution may be creating an S3 bucket in user's account, where the binary and some runtime dependencies are updated once a day. If the user doesn't want auto-updates, it could control them by blocking write access to that S3 bucket using a IAM policy change.

@roeyazroel
Copy link
Contributor Author

I think that I'll prefer to be out of date, and be responsible to update the stack to the latest update rather than paying the on demand prices. Maybe consider to give this decision to the users.

Anyway, I've some ideas. Hopefully I'll be press the fork button in couple of days/weeks. 👍

Thanks a million!
Roey

@cristim
Copy link
Member

cristim commented Sep 22, 2016

I agree, as an Ops guy I totally agree that auto-updates are scary and when unsure they should be left to the user, but as a developer back when I was rolling out fixes pretty much on a hourly basis, it was really convenient to just commit and have it auto-update everywhere out of the box without announcing the users to update their stacks.

But considering that the target audience of this tool is mostly Ops'y people, I think dropping auto-updates would actually be desirable once the tool is more or less stable, which as per your feedback, I think may have already happened.

I'll try to come up with a change that bundles everything and removes auto-updates, stay tuned. This entire discussion should actually be part of #6

cristim added a commit that referenced this issue Sep 23, 2016
The goal is to be able to run without any runtime dependencies
- everything is now self-contained, included in the Lambda zip file
- no more runtime downloads of data from external infrastructure
- more reliable, everything is available out of the box
- stable software, no longer auto-updates behind the user's back
- remove dependency on UPX, since the size no longer matters so much
- saving some execution time and a few pennies spent on bandwidth
- lots of Lambda wrapper cleanups, the wrapper code is now 3x smaller
- some build system cleanups and improvements
- introduced a command-line argument to avoid hard-coding the
  instances.json file name

Downside: the users need to update the CloudFormation stack in order to
update the software to the latest version
@cristim
Copy link
Member

cristim commented Sep 24, 2016

I just tested and the replacement process is correctly handling multiple AutoScaling groups in the same region concurrently, although only one action is taken per function run per group - according to the expected behavior - so this bug is essentially invalid.

I recently implemented the self-contained packaging method I mentioned before, removing the self-update mechanism entirely, so from now on users would need to update on their own. I will document the update procedure in the Readme file.

This allows users to set any event frequency without much additional costs: only more Lambda execution time and bandwidth consumed while performing API calls, which is negligible or well within the free tier, and it is no longer going to generate any costs on my infrastructure and will no longer misuse the Github CDN.

In 29fcd3c929605 I parameterized the event frequency to let people set any value they like, but I would keep the default of 5 minutes, which I still think it's a good out-of-the-box compromise between speed and safety.

@cristim cristim closed this as completed Sep 24, 2016
@cristim
Copy link
Member

cristim commented Sep 26, 2016

I thought that maybe I should reopen this until I get confirmation from the user that everything is fine :)

@roeyazroel please give this a try when you have some time, and let me know if you run into any issues.

I hope the documentation for software updates is clear enough.

@cristim cristim reopened this Sep 26, 2016
@roeyazroel
Copy link
Contributor Author

I'm not sure why, but i think when i ran the upgrade to the CF stack, the lambda CW trigger was added again (the same one). Can you check it? Not sure if it was because my CW trigger was in disable during the update as we had some AWS issues.

@roeyazroel
Copy link
Contributor Author

Another thing - think about using the Github Tags, as the SHA isn't very "friendly" and not easy to "see" that you aren't up to date.

@cristim
Copy link
Member

cristim commented Sep 26, 2016

Thanks for testing it.

In theory CloudFormation shouldn't duplicate such resources, unless you created the more frequent event resource outside the CloudFormation stack, but it should take over the edited resource and change it if you just edited it in place. Can you confirm that wasn't the case? I can have a look into the lambda trigger creation tonight, unless you figure it out first.

Interesting idea to use tags, I'm thinking maybe to use the Travis build number as tag name, what do you think?

@roeyazroel
Copy link
Contributor Author

Should be good enough, tags are more friendly for "human eyes" and for versioning.

Not sure what want wrong after the update, but now it's running at all. still trying to understand why.

@cristim cristim changed the title Make Agent Run for all ASG in each iteration Make Agent Run for all ASG in each iteration, packaging discussions Sep 26, 2016
@roeyazroel
Copy link
Contributor Author

False alarm, working fine.

But it's look like that update of the stack does create another trigger in the lambda function.

@cristim
Copy link
Member

cristim commented Sep 26, 2016

Do I understand correctly that every update creates yet another event generator that runs the lambda function? This looks to me like a bug in CloudFormation.

@cristim
Copy link
Member

cristim commented Sep 26, 2016

I tested the CloudFormation stack updates and the resource is correctly updated according to the given stack parameters, updating the event generator to the defined frequency. In my case I didn't notice any additional CloudWatch event generators being created.

I also implemented support for using Travis CI build numbers for software updates, and I documented this in the readme.

@roeyazroel Please have a look and let me know if this is good enough.

@cristim
Copy link
Member

cristim commented Sep 27, 2016

I now understood what you meant by duplicating the event generator after a stack update, I think I just experienced the same issue. The resource doesn't get duplicated, but the Lambda function gets subscribed to the same event generator twice.

It looks much like a CloudFormation issue to me. If I see it again, I'll report it as an AWS support case.

@cristim
Copy link
Member

cristim commented Sep 27, 2016

I deleted one of them and the function wouldn't be executed anymore. It recovered after disabling and re-enabling the remaining event in the Lambda function's list of triggers. So I assume having both of them is harmless(although not so nice to see in the console), since only one of them would actually launch the function. But this is still a bug that should be fixed.

@cristim
Copy link
Member

cristim commented Nov 3, 2016

I can't see this happening anymore, likely AWS fixed it, so I would like to finally close this issue.

@roeyazroel please reopen if you are still having issues.

@cristim cristim closed this as completed Nov 3, 2016
@cristim cristim self-assigned this Nov 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants