Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delayed notifications running after failed chef run can cause broken configuration #2084

Closed
micmac opened this issue Sep 18, 2014 · 6 comments

Comments

@micmac
Copy link

micmac commented Sep 18, 2014

If I generate a config file calling the template's :create action delayed after collecting all the needed parts during the chef run, calling the template after a failed chef run can cause some config parts missing, turning a mostly harmless chef breakage into a potential service outage.

I've created a proof-of-concept fix, creating an additional option :only_on_success for notifications that causes them to not run if there was an exception earlier: prezi#1

@lamont-granquist
Copy link
Contributor

Gathering information throughout recipes to feed into a single template resource is generally better done via definitions:

https://docs.getchef.com/essentials_cookbook_definitions.html

See the "Many Recipes, One Definition" section. Since you're not throwing a delayed notification there it will not run if the chef-client run fails before it gets to it.

@micmac
Copy link
Author

micmac commented Sep 19, 2014

Unfortunately - as I understand based on behaviour - the solution with definitions only works because definitions are processed during recipe load/compile time and compile-time exceptions seem to be fatal, causing delayed notifications to be skipped. I feel that if this behavior changes or someone decides to write a resource that tries to call these definitions based on a lazy attribute, things will break again.

What I'm trying to do: we collect all the monitoring items to be registered for that machine and send them to the monitoring service, then at the end, fetch the list of registered services and delete the ones no longer defined in Chef. This means I have to call the cleanup once, at the end of the chef run, if the chef run was successful (otherwise I may end up deleting a lot of important monitoring items).

Did I misunderstand something?

@lamont-granquist
Copy link
Contributor

Well the point is that you avoid calling a delayed notification. It moves the collection of monitoring things to compile stage, and once you're done with that you are done collecting things. This uses compile mode more correctly as a "prep" stage. Then you have resources that run normally in converge mode which take the collected stuff and use them to do the thing. If you use any notifications for idempotency then you would use :immediate ones instead of :delayed. Either way you've fully collected all your information by the time that compile mode is done so you'll never push partial data. Also this way you're closer to atomic -- all of your recipes run their compile time code and assemble the information, any failures happen in compile mode and there's no possibility that you've even scheduled a delayed notification yet since no resource have run. Then you assemble your resource collection and the resources associated with using and pushing that data should run 'together' as much as possible and either failures beforehand will cause the whole thing to abort, or failures afterwards don't cause any corruption.

If you are assembling things in converge mode with ruby_blocks and LWRPs modifying the data which later needs to be presented, then that's an indication that code needs to get refactored.

I'm not sure what you mean by someone tries to call these definitions based on a lazy attribute since definitions are compile-time and that means that lazy doesn't make any sense and isn't supported. You can compile your data (probably in the run_state) and then inject that prepared data into a resource and use lazy on the resource so that the data is fully compiled before its handed off to the resource, in fact you should do that if you expect later recipes to be calling your definition, but that won't affect any of this analysis -- since you use lazy to delay consuming the data you are assured that all your compile-time code in the definitions are evaluated before the lazy evaluation, so you will never have partial data.

@micmac
Copy link
Author

micmac commented Sep 22, 2014

When using notifications I can only use an immediate one if I'm reloading some service. I cannot just rewrite a configfile immediately when the collected content changes.

There are cases when I cannot collect all the information in compile time - think for example of creating a user with a resource then using the user information later.

What you're suggesting boils down to rewriting most of my LWRPs as resources. Before I start doing that, do you know about any guidelines about how these use cases should be implemented in a "proper" way? Like, when should one use an LWRP and when a definition, or what should happen during "prep" stage?

Also, with a definition, I have to explicitly implement only_if/not_if logic.

I could not find a use case for lazy, however I have plenty of resources creating other resources creating even further resources that in turn need some data collected for monitoring. Maybe it's not how I should use chef but again, all these need is a method of doing things at the end of a successful converge.

In the end I still feel that avoiding a delayed notification just because I cannot call it in a failsafe way is a workaround and having the option to run something at the end of a successful chef run is good.

@smurawski
Copy link
Contributor

@micmac You are welcome to submit a PR addressing this behavior, but this feels more like a workflow solution than a change in code. This is not something we intend to pick up as a change. Thanks!

@lamont-granquist
Copy link
Contributor

So when you're creating a user with a resource, you should really be asserting the state that you desire, and you can therefore assert the state in the other resources. Creating a user, then reading the pid (using a converge-time ruby block) then using a lazy attribute on a template is code smell. In that case you should just select a pid at compile_time somewhere and assert it in both resources. Unless you've got a more concrete example.

Also since you're using a definition to create a template resource you can hang a not_if/only_if off of the template resource that you are constructing. In chef 12 if your definition returns the resource then it will correctly get returned in the dsl and you can use the absence of an argument to the definition to create a no-op that will return your resource so you can easily chain notifications/guards/etc off of the resource.

And again, I think the direction you're going is fighting the tool. If you've got resources creating other resources that is going to end badly. You want to be using definitions to iteratively create resources.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants