-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audit mode proposal #69
Conversation
I'm a little confused as to how this is different from ChefSpec? |
@sethvargo I'm not super familiar with ChefSpec, but from the README I can say that audit mode is an additional phase to Chef Client and can run with or without converging the node. It looks like ChefSpec is intended for local testing using Chef Solo, whereas audit mode is intended to be used in testing/preprod and prod. |
The closest project to this out there is minitest-chef-handler -- it runs, "in production" on the node actually being converged, it isn't development testing like chefspec or test-kitchen. Used either to have a set of audit cookbooks to inspect a system (similar to why-run but able to report things like "port 80 is/is-not open" or "host www.google.com correctly resolves" in policy language). |
# Audit Mode for Chef Client | ||
|
||
Add an auditing feature to core Chef which will allow users to develop a test | ||
drivin infrastructure, as well as evaluate and monitor its state, without the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
driven
I'm a little ways through this, but I wanted to clarify something up front: this RFC is not for the same auditors that we have talked about before, correct? It seems like we're free to use the words |
### Chef client runner | ||
An audit phase will be performed after client converges. By default, audit mode | ||
will be enabled. Only audits included in recipes in the expanded run list will | ||
be evaluated on the node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As soon as we complete our include_recipe
card I think we should update this to explain how we're going to deal with controls
blocks that are included multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we will need to call this out explicitly. We will include a recipe only once with the existing logic in include_recipe
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Other than the minor tweaks this looks sweet to me @mcquin. 👍 |
Our goal has been to simplify the directory structure of cookbooks. Writing | ||
audits in recipes means users don't need to add an additional directory to their | ||
cookbooks, and more tightly couples testing and development which helps users | ||
develop good test driven infrastructure practices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with this as a pragmatic way to get things done in the short term, but I think our long-term goal should be to allow tests/controls to be written in other files, as soon as the server APIs support it. I mean, if writing tests inline was always better, why don't we do that in Chef itself?
I feel like I'm missing some context on this whole idea. Why do we want to put test code in recipes? At a minimum this seems like it should be a thing that isn't in recipe code, but I'm guessing there has been some internal discussion of this feature previously. |
@coderanger my interpretation: right now the "segments" stuff in the Server APIs kinda screws us from including files in different directories, and we want to ship this without having fixing that first. |
Given the impact adding stuff to the recipe DSL will have on compat (read: we'll likely never be able to change this) I don't think I like that as enough reason to not do the work to create a new segment. |
My opinion is that we should support both, as both styles are common enough in other software development contexts. |
In the abstract I'm not against having test-ish code in recipes, but I think we would need to figure out a better way to adapt RSpec to work more like the existing resource format. Having two super-different DSLs in one file seems like a fast way to confuse people, especially if they don't already know RSpec (most ops folks). |
Can you provide an example? I'm having a hard time finding another paradigm where the tests are inside the code instead of a separate file... |
http://doc.rust-lang.org/guide.html#writing-tests is one, I've also seen it in ruby where you wrap the test suite in something like |
It's also common with eunit http://www.erlang.org/doc/apps/eunit/chapter.html |
I don't think you are correct that Rust does this, from my reading of the linked doc it seems to put them under |
I agree that it could be useful for "learning", but after hearing more, my vote is 👎 |
As an example, I would want a minimum to be enclosing the test code in a resource to itself so it is clearer when it will execute and that the DSL is only available inside that block. I would also want to see that implemented outside of core before we talk about adding it since that part wouldn't need any core mods that I can think of. Adding the audit events system seems like the most interesting part of this since it will be useful in a broad range of final forms and must be in core to make any sense. |
@mcquin @tyler-ball can you provide the recipe example as well? Can you elaborate on why modeling audits on top of a testing framework? How this will interact with lwrp's or other dynamically defined resources , in why_run mode, with nesting etc. Currently we have several other situation where a multi-phase chef run might be useful, and if we at all take this route, it will better have the multi phase/stage support to core chef before implementing features using it, and then provide a middle ware & DI mechanism on top of it, and then building audit or other functionality, but also allow multiple implementations to pick and choose from. |
I think adding more phases to chef runs is an interesting idea, but blocking this on it is scope creep IMO. Also, the controls/tests aren't a resource, so making them look like one isn't going to make anything less confusing. We already have compile/converge, where resources represent actions that get added to a queue; this is the same concept, but the queue is different. |
I think there are a few things going on in parallel in our discussion:
|
@tyler-ball if a service fails to start, your service resource should error and fail the converge. |
I won't be able to write an ohai plugin that can check if the database is connectable since ohai runs before converge and since we don't have a built-in way of distributing ohai plugin (we really should but that's a different topic). I agree with @tyler-ball that even if the template validation works, it doesn't mean that my apache / nginx is running because maybe it doesn't have access to log location, or the service template is not correct, etc... I agree with you @coderanger. One shouldn't need to write a rule for all Chef resources they add. But all of our resources doesn't cover all the things one might want to check / set in an idempotent way. |
@coderanger this is more useful to assert things like "nothing is listening on port 111". And that doesn't map well to a service resource because it may be named "portmap" or "rpcbind" based on which distro I'm using and when they decided to rename that service, and it assumes I get the service provider correct and there's other things that can go wrong with that (it could be running, but the init script could be removed which would defeat any service resource :stop action). As a SOX/PCI-DSS auditor or sufferer what I want to be able to say is "nothing is listening on port 111" and "nothing is listening on port 23". Serverspec does that well. The Chef DSL does not do that well, and neither does minitest-chef-handler (because that rides on the back of Chef's resources and doesn't have its own matchers like serverspec does). |
in chef, the cross platform code to shut down port 111: portmap_service =
case node['platform_family']
when "debian"
case node['platform']
when "ubuntu"
node['platform_version'].to_f >= 13.10 ? "rpcbind" : "portmap"
when "debian", "raspbian"
node['platform_version'].to_f >= 7.0 ? "rpcbind" : "portmap"
else
"rpcbind"
end
when "rhel"
node['platform_version'].to_f >= 6.0 ? "rpcbind" : "portmap"
else
"rpcbind"
end
service portmap_service do
action :stop
end in serverspec the cross-platform code to assert that the previous code did the right thing: describe port(111) do
expect(it).to_not be_listening
end |
I understand the point, and agree it is a useful thing to have. Trying to make sure this is done in a way that allows enough flexibility in development to find the best syntax and whatnot since RSpec is a complicated beast. Serverspec is definitely a compelling argument to stick to that DSL, but I would much rather see the DSL done in such a way that you can depend on |
end | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chef's whole premise is that Chef will converge on a system state; these tests look like precisely the kind of thing Chef is supposed to save users from writing. If the root user wasn't created, or MySQL didn't start, why didn't the converge fail?
Are there examples that would better illustrate audit mode's purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lamont has a good comment here that I will incorporate into the next revision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@randomcamel Think of it as end-to-end testing. Chef installed a package, wrote an auth config, and started the service. The test is "can log in and run a query". This catches stuff like a new version of the software changing things or whatnot in an unexpected fashion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@randomcamel The other use case is specifically when chef ISN'T configured yet. I agree with you - once you have a mysql::install
recipe converging successfully you should no longer need an audit that checks package("mysql")
.
But the code snippet above is useful for an organization trying to convert to Chef for the first time - they can ensure that all the systems are in a known state before adding the mysql recipe. What if one of the nodes (this is a bad example) has oracle installed instead of mysql? Audits would highlight that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tyler-ball I don't think I would call that out as the right use case, that is far far better served by plain serverspec.
A few things that this RFC makes me think:
@coderanger and others, what do you think about adding a marker like that to the RFC metadata, and making it clear a feature is experimental (its syntax subject to change, and maybe even deep implementation)? |
@adamhjk I think that is a very solid indication something shouldn't be an RFC yet. Go implement something externally from Chef, learn the lessons of doing it wrong a few times, and then write an RFC for a design you think is workable. We don't need an RFC to agree something is a good idea or worth of experimentation, we need them to have a way to discuss detailed plans. |
I'm not sure I agree. One of the things I would like to see us do is move faster in the core. I'll noodle more. |
I guess I agree for cases where not being in core would substantially complicate things, like I've been saying about adding an audit events framework as detailed here. I also think it is telling that that piece seems much less controversial :) Bundling a DSL extension as a cookbook is, comparatively speaking, very easy to do and would detract very little IMO. In the more general case, I worry that moving fast in core can easily put a lot of momentum behind design choices that are being made "because it seemed like a good idea". For example, rspec vs. minitest in this case. Adding something experimental to core could work, with sufficient documentation and output to indicate the immaturity, but then the experimental feature is tied to Chef's release cycle. Imagine if something like chef-metal had to wait for Chef releases to get feedback on new design changes. |
@coderanger yeah, I also think we need to to release every piece of software we have at least daily, and ship from those to the "stable" release channel. |
@adamhjk I think people would be much more willing to run an experimental plugin for Chef than to deploy a big ball labeled "unstable" or "daily". The former has much more scope to what is unstable and what is not. Chef-metal is a great example of this at work. |
(though I'm not against daily releases so people can try out bug fixes sooner) |
IMO the fact that something is implemented inside the core or as a plugin doesn't matter too much as long as it is shipped daily. I think the choice should be on a case by case basis. For example Chef Metal is easy to implement outside Chef because it doesn't change any of the core logic but Policyfile related features in Chef Client is tough to implement outside because it changes the very core of it. As long as we keep the daily releases healthy and have a way to turn on - off the feature completely I don't see a reason why we should take the extra complexity to implement it outside core. |
@coderanger I'm not sure that consumers would be more likely to run experimental features as a plugin vs a feature-built omnibus install or gem install. We have a system for installing knife plugins, but not core chef feature plugins. Lets say I am an infrastructure maintainer and I have a way to install Chef. If they release a new plugin, I need to maintain one workflow for installing Chef and another workflow for installing/uninstalling/updating plugins in the chef install. If they just release an omnibus named I've personally found plugin maintenance to be a lot of work (curse you, OSGI). And because Chef typically runs on a scheduled interval it is easy to just re-install the entire package without disrupting normal converge workflow. I would love for someone to show me how to make plugin management easy :) Ironically, Chef is the best tool I have found for doing this. I think the separation of these experimental features seems more a developer concern. Personally, Git branches work for me. |
I also agree - daily releases FTW |
@tyler-ball In this context, plugins would just be cookbooks that register extensions of some kind. In this case it would just be adding new DSL methods. No process is needed per se, just the existing cookbook dependencies system and the tooling around that. |
I think we can boil some of this down to two remaining concerns:
More things outstanding? @coderanger, @jonlives? |
Exactly, @sersut. We need to move to making it safe to innovate in core, and let people opt-in or out to those experiments. |
I think that just about covers it. I'm cool with core moving fast and breaking things as long as non-core can come along for the ride in some way. If the internal APIs are based on some kind of registration system (a la dialects, which could provide this kind of plugability) then we can have code in core that happens to register itself as an audit provider but the same code can be called just as well from a gem or cookbook. The existing dialects code also has a concept of "quality" levels so you can also flat out override built-in providers if you don't like them or want to disable them. As long as those capabilities exist, I'm happy :) |
Based on my experiences working on similar extension points, and a desire to eventually break Chef (the git repo) into smaller components, I think a totally sane way of doing this is to develop it in core, with an unstable marker of some sort (e.g., policyfile mode currently emits a warning saying the feature is experimental), and then spin some parts of it out to a gem that chef would depend on. Developing it in core lets you adjust the interfaces between classes and such without having to modify two projects until you reach a place where the interfaces are sane and stable, which is a good indication that it's probably usable by other people/projects. |
If this can be implemented independently from Chef internals, I think an external gem makes more sense. Bearing in mind that there are independent projects that do something similar, it may be possible, but cannot tell. Either way, if chef-client needs to expose a new API (or experimental code) for this to work properly, I totally agree with @danielsdeleo. |
Here is the plan of record I'd like to propose to move this RC forward:
We can chat more about this tomorrow in our IRC meeting and get to a consensus. Any thoughts? /cc: @adamhjk |
Per the rfc meeting, we are going to merge this RFC, and put its state to "accepted" but not "final". We need to add a section letting potential readers know it is in flux, but it is our intent to accept this feature. Paging @jonlives! 👍 |
No description provided.