Audit mode proposal #69

mcquin · 2014-11-13T21:23:02Z

No description provided.

sethvargo · 2014-11-13T22:48:42Z

I'm a little confused as to how this is different from ChefSpec?

mcquin · 2014-11-13T23:43:17Z

@sethvargo I'm not super familiar with ChefSpec, but from the README I can say that audit mode is an additional phase to Chef Client and can run with or without converging the node. It looks like ChefSpec is intended for local testing using Chef Solo, whereas audit mode is intended to be used in testing/preprod and prod.

lamont-granquist · 2014-11-13T23:51:19Z

The closest project to this out there is minitest-chef-handler -- it runs, "in production" on the node actually being converged, it isn't development testing like chefspec or test-kitchen. Used either to have a set of audit cookbooks to inspect a system (similar to why-run but able to report things like "port 80 is/is-not open" or "host www.google.com correctly resolves" in policy language).

tyler-ball · 2014-11-14T00:10:50Z

new/audit-mode.md

+# Audit Mode for Chef Client
+
+Add an auditing feature to core Chef which will allow users to develop a test
+drivin infrastructure, as well as evaluate and monitor its state, without the


tyler-ball · 2014-11-14T01:19:25Z

I'm a little ways through this, but I wanted to clarify something up front: this RFC is not for the same auditors that we have talked about before, correct? It seems like we're free to use the words spec, test and make comparisons to regular RSpec/Serverspec code, right?

tyler-ball · 2014-11-14T01:21:52Z

new/audit-mode.md

+### Chef client runner
+An audit phase will be performed after client converges. By default, audit mode
+will be enabled. Only audits included in recipes in the expanded run list will
+be evaluated on the node.


As soon as we complete our include_recipe card I think we should update this to explain how we're going to deal with controls blocks that are included multiple times.

I don't think that we will need to call this out explicitly. We will include a recipe only once with the existing logic in include_recipe.

sersut · 2014-11-14T17:04:25Z

Other than the minor tweaks this looks sweet to me @mcquin. 👍

danielsdeleo · 2014-11-14T19:49:52Z

new/audit-mode.md

+Our goal has been to simplify the directory structure of cookbooks. Writing
+audits in recipes means users don't need to add an additional directory to their
+cookbooks, and more tightly couples testing and development which helps users
+develop good test driven infrastructure practices.


I'm okay with this as a pragmatic way to get things done in the short term, but I think our long-term goal should be to allow tests/controls to be written in other files, as soon as the server APIs support it. I mean, if writing tests inline was always better, why don't we do that in Chef itself?

coderanger · 2014-11-14T19:53:01Z

I feel like I'm missing some context on this whole idea. Why do we want to put test code in recipes? At a minimum this seems like it should be a thing that isn't in recipe code, but I'm guessing there has been some internal discussion of this feature previously.

danielsdeleo · 2014-11-14T19:55:18Z

@coderanger my interpretation: right now the "segments" stuff in the Server APIs kinda screws us from including files in different directories, and we want to ship this without having fixing that first.

coderanger · 2014-11-14T20:55:01Z

Given the impact adding stuff to the recipe DSL will have on compat (read: we'll likely never be able to change this) I don't think I like that as enough reason to not do the work to create a new segment.

danielsdeleo · 2014-11-14T21:05:51Z

My opinion is that we should support both, as both styles are common enough in other software development contexts.

coderanger · 2014-11-14T21:12:59Z

In the abstract I'm not against having test-ish code in recipes, but I think we would need to figure out a better way to adapt RSpec to work more like the existing resource format. Having two super-different DSLs in one file seems like a fast way to confuse people, especially if they don't already know RSpec (most ops folks).

sethvargo · 2014-11-14T21:13:17Z

both styles are common enough in other software development contexts.

Can you provide an example? I'm having a hard time finding another paradigm where the tests are inside the code instead of a separate file...

danielsdeleo · 2014-11-14T21:21:41Z

http://doc.rust-lang.org/guide.html#writing-tests is one, I've also seen it in ruby where you wrap the test suite in something like if $0 == __FILE__ as an introduction to unit testing (so that the directory structure of the test framework can be taught later).

danielsdeleo · 2014-11-14T21:23:32Z

It's also common with eunit http://www.erlang.org/doc/apps/eunit/chapter.html

coderanger · 2014-11-14T21:44:33Z

I don't think you are correct that Rust does this, from my reading of the linked doc it seems to put them under tests/. The inline format does happen in Python, but it is generally discouraged and used only for "toy"-ish programs. Similarly I've never seen a gem that used that trick for its real tests.

sethvargo · 2014-11-14T21:50:28Z

I agree that it could be useful for "learning", but after hearing more, my vote is 👎

coderanger · 2014-11-14T21:59:31Z

As an example, I would want a minimum to be enclosing the test code in a resource to itself so it is clearer when it will execute and that the DSL is only available inside that block. I would also want to see that implemented outside of core before we talk about adding it since that part wouldn't need any core mods that I can think of. Adding the audit events system seems like the most interesting part of this since it will be useful in a broad range of final forms and must be in core to make any sense.

ranjib · 2014-11-14T22:17:30Z

@mcquin @tyler-ball can you provide the recipe example as well?

Can you elaborate on why modeling audits on top of a testing framework? How this will interact with lwrp's or other dynamically defined resources , in why_run mode, with nesting etc.
It will be nice to test this as a module or add on gem(similar to metal/zero) first, as noah mentioned,

Currently we have several other situation where a multi-phase chef run might be useful, and if we at all take this route, it will better have the multi phase/stage support to core chef before implementing features using it, and then provide a middle ware & DI mechanism on top of it, and then building audit or other functionality, but also allow multiple implementations to pick and choose from.

danielsdeleo · 2014-11-14T22:33:42Z

I think adding more phases to chef runs is an interesting idea, but blocking this on it is scope creep IMO.

Also, the controls/tests aren't a resource, so making them look like one isn't going to make anything less confusing. We already have compile/converge, where resources represent actions that get added to a queue; this is the same concept, but the queue is different.

sersut · 2014-11-14T22:48:03Z

I think there are a few things going on in parallel in our discussion:

I think a built-in way of running custom written rules is useful for smoke testing your deployments and opens the door for some auditing scenarios. Having arbitrary stages is definitely a cool idea but has more ambitious goals than the proposal here and this proposal doesn't in any way block to have more stages in the future.
Distributing these custom written rules via recipes enables us to leverage many things like having access to run information, versioning, being run on demand via run_list etc. Having a separate segment for them is definitely an option but I think having segments impose unnecessary complexity for the cookbook structure and we should have less not more.
why-run mode is a great point @ranjib. I think we shouldn't run these rules in why-run mode. I don't see how dynamically defined resources or LWRPs are relevant. controls is a keyword for recipe DSL. The idea is to collect all the controls during compile phase and run them after the converge phase. We don't need the ability to write rules from within resources 😄.
We should note that the tests mentioned here are not tests for your resources or cookbooks. These tests should exist to test the infrastructure. IMO if I write a package "mysql" resource I wouldn't need to write an audit rule to check if mysql is installed. But I should definitely write an audit rule that ensures I can connect to your DB.

coderanger · 2014-11-19T22:45:03Z

@tyler-ball if a service fails to start, your service resource should error and fail the converge.

sersut · 2014-11-20T01:48:15Z

I won't be able to write an ohai plugin that can check if the database is connectable since ohai runs before converge and since we don't have a built-in way of distributing ohai plugin (we really should but that's a different topic).

I agree with @tyler-ball that even if the template validation works, it doesn't mean that my apache / nginx is running because maybe it doesn't have access to log location, or the service template is not correct, etc...

I agree with you @coderanger. One shouldn't need to write a rule for all Chef resources they add. But all of our resources doesn't cover all the things one might want to check / set in an idempotent way.

lamont-granquist · 2014-11-20T02:21:08Z

@coderanger this is more useful to assert things like "nothing is listening on port 111". And that doesn't map well to a service resource because it may be named "portmap" or "rpcbind" based on which distro I'm using and when they decided to rename that service, and it assumes I get the service provider correct and there's other things that can go wrong with that (it could be running, but the init script could be removed which would defeat any service resource :stop action). As a SOX/PCI-DSS auditor or sufferer what I want to be able to say is "nothing is listening on port 111" and "nothing is listening on port 23". Serverspec does that well. The Chef DSL does not do that well, and neither does minitest-chef-handler (because that rides on the back of Chef's resources and doesn't have its own matchers like serverspec does).

lamont-granquist · 2014-11-20T02:24:50Z

in chef, the cross platform code to shut down port 111:

portmap_service =
  case node['platform_family']
  when "debian"
    case node['platform']
    when "ubuntu"
      node['platform_version'].to_f >= 13.10 ? "rpcbind" : "portmap"
    when "debian", "raspbian"
      node['platform_version'].to_f >= 7.0 ? "rpcbind" : "portmap"
    else
      "rpcbind"
    end
  when "rhel"
    node['platform_version'].to_f >= 6.0 ? "rpcbind" : "portmap"
  else
    "rpcbind"
  end

service portmap_service do
  action :stop
end

in serverspec the cross-platform code to assert that the previous code did the right thing:

describe port(111) do
  expect(it).to_not be_listening
end

coderanger · 2014-11-20T05:04:27Z

I understand the point, and agree it is a useful thing to have. Trying to make sure this is done in a way that allows enough flexibility in development to find the best syntax and whatnot since RSpec is a complicated beast. Serverspec is definitely a compelling argument to stick to that DSL, but I would much rather see the DSL done in such a way that you can depend on audit-rspec in your metadata and then it JFWs (famous last words).

randomcamel · 2014-11-20T19:55:03Z

new/audit-mode.md

+      end
+    end
+  end
+end


Chef's whole premise is that Chef will converge on a system state; these tests look like precisely the kind of thing Chef is supposed to save users from writing. If the root user wasn't created, or MySQL didn't start, why didn't the converge fail?

Are there examples that would better illustrate audit mode's purpose?

Lamont has a good comment here that I will incorporate into the next revision.

@randomcamel Think of it as end-to-end testing. Chef installed a package, wrote an auth config, and started the service. The test is "can log in and run a query". This catches stuff like a new version of the software changing things or whatnot in an unexpected fashion.

@randomcamel The other use case is specifically when chef ISN'T configured yet. I agree with you - once you have a mysql::install recipe converging successfully you should no longer need an audit that checks package("mysql").

But the code snippet above is useful for an organization trying to convert to Chef for the first time - they can ensure that all the systems are in a known state before adding the mysql recipe. What if one of the nodes (this is a bad example) has oracle installed instead of mysql? Audits would highlight that.

@tyler-ball I don't think I would call that out as the right use case, that is far far better served by plain serverspec.

adamhjk · 2014-11-25T04:14:36Z

A few things that this RFC makes me think:

We need a way to add segments elegantly, because I (like @sethvargo and others) really don't like the inline style.
We need a way to mark an RFC, and its implementation, as stable/unstable/experimental. For example, I think we should move ahead with this RFC, but I think we need to mark it as experimental - because we need more experience to really understand what a "final" style really is.

@coderanger and others, what do you think about adding a marker like that to the RFC metadata, and making it clear a feature is experimental (its syntax subject to change, and maybe even deep implementation)?

coderanger · 2014-11-25T04:24:42Z

@adamhjk I think that is a very solid indication something shouldn't be an RFC yet. Go implement something externally from Chef, learn the lessons of doing it wrong a few times, and then write an RFC for a design you think is workable. We don't need an RFC to agree something is a good idea or worth of experimentation, we need them to have a way to discuss detailed plans.

adamhjk · 2014-11-25T04:45:05Z

I'm not sure I agree. One of the things I would like to see us do is move faster in the core. I'll noodle more.

coderanger · 2014-11-25T04:55:43Z

I guess I agree for cases where not being in core would substantially complicate things, like I've been saying about adding an audit events framework as detailed here. I also think it is telling that that piece seems much less controversial :) Bundling a DSL extension as a cookbook is, comparatively speaking, very easy to do and would detract very little IMO.

In the more general case, I worry that moving fast in core can easily put a lot of momentum behind design choices that are being made "because it seemed like a good idea". For example, rspec vs. minitest in this case. Adding something experimental to core could work, with sufficient documentation and output to indicate the immaturity, but then the experimental feature is tied to Chef's release cycle. Imagine if something like chef-metal had to wait for Chef releases to get feedback on new design changes.

adamhjk · 2014-11-25T05:24:09Z

@coderanger yeah, I also think we need to to release every piece of software we have at least daily, and ship from those to the "stable" release channel.

coderanger · 2014-11-25T05:40:01Z

@adamhjk I think people would be much more willing to run an experimental plugin for Chef than to deploy a big ball labeled "unstable" or "daily". The former has much more scope to what is unstable and what is not. Chef-metal is a great example of this at work.

coderanger · 2014-11-25T05:40:34Z

(though I'm not against daily releases so people can try out bug fixes sooner)

sersut · 2014-11-25T06:06:13Z

IMO the fact that something is implemented inside the core or as a plugin doesn't matter too much as long as it is shipped daily. I think the choice should be on a case by case basis. For example Chef Metal is easy to implement outside Chef because it doesn't change any of the core logic but Policyfile related features in Chef Client is tough to implement outside because it changes the very core of it.

As long as we keep the daily releases healthy and have a way to turn on - off the feature completely I don't see a reason why we should take the extra complexity to implement it outside core.

tyler-ball · 2014-11-25T16:35:35Z

@coderanger I'm not sure that consumers would be more likely to run experimental features as a plugin vs a feature-built omnibus install or gem install. We have a system for installing knife plugins, but not core chef feature plugins.

Lets say I am an infrastructure maintainer and I have a way to install Chef. If they release a new plugin, I need to maintain one workflow for installing Chef and another workflow for installing/uninstalling/updating plugins in the chef install.

If they just release an omnibus named chef-12-experimental-audit then I can use my normal deploy process to deploy that omnibus rather than my regular chef-12 deploy.

I've personally found plugin maintenance to be a lot of work (curse you, OSGI). And because Chef typically runs on a scheduled interval it is easy to just re-install the entire package without disrupting normal converge workflow. I would love for someone to show me how to make plugin management easy :) Ironically, Chef is the best tool I have found for doing this.

I think the separation of these experimental features seems more a developer concern. Personally, Git branches work for me.

tyler-ball · 2014-11-25T16:37:44Z

I also agree - daily releases FTW

coderanger · 2014-11-25T16:51:04Z

@tyler-ball In this context, plugins would just be cookbooks that register extensions of some kind. In this case it would just be adding new DSL methods. No process is needed per se, just the existing cookbook dependencies system and the tooling around that.

adamhjk · 2014-11-25T17:57:17Z

I think we can boil some of this down to two remaining concerns:

We want to ensure that, to some extent, the specifics of the DSL and location of controls are pluggable. We all recognize this means there will be changes to chef core. Please make sure that either the first implementaiton, or shortly thereafter, lets us engage in plugabillity for syntax.
We need to be specific about marking the feature as actively unstable within the current release cycle. People who use it should know that it's under active development, at the very least when they hit the documentation. I want us to be able to break semver for this feature.

More things outstanding? @coderanger, @jonlives?

adamhjk · 2014-11-26T02:51:53Z

Exactly, @sersut. We need to move to making it safe to innovate in core, and let people opt-in or out to those experiments.

coderanger · 2014-11-26T06:18:30Z

I think that just about covers it. I'm cool with core moving fast and breaking things as long as non-core can come along for the ride in some way. If the internal APIs are based on some kind of registration system (a la dialects, which could provide this kind of plugability) then we can have code in core that happens to register itself as an audit provider but the same code can be called just as well from a gem or cookbook. The existing dialects code also has a concept of "quality" levels so you can also flat out override built-in providers if you don't like them or want to disable them. As long as those capabilities exist, I'm happy :)

danielsdeleo · 2014-11-26T18:18:26Z

Based on my experiences working on similar extension points, and a desire to eventually break Chef (the git repo) into smaller components, I think a totally sane way of doing this is to develop it in core, with an unstable marker of some sort (e.g., policyfile mode currently emits a warning saying the feature is experimental), and then spin some parts of it out to a gem that chef would depend on. Developing it in core lets you adjust the interfaces between classes and such without having to modify two projects until you reach a place where the interfaces are sane and stable, which is a good indication that it's probably usable by other people/projects.

zuazo · 2014-11-26T20:03:50Z

If this can be implemented independently from Chef internals, I think an external gem makes more sense. Bearing in mind that there are independent projects that do something similar, it may be possible, but cannot tell. Either way, if chef-client needs to expose a new API (or experimental code) for this to work properly, I totally agree with @danielsdeleo.

sersut · 2014-12-10T16:30:37Z

Here is the plan of record I'd like to propose to move this RC forward:

Merge audit-mode branch into master by disabling audit mode by default and adding a warn message saying that "This feature is under experimental and under heavy development. Expect APIs for it to change and use it at your own risk."
Once we have an idea of how the feature looks, update the RFC with some requirements to choose a different audit dsl / language.
Merge the RFC.
Implement the pluggability in master.

We can chat more about this tomorrow in our IRC meeting and get to a consensus. Any thoughts?

/cc: @adamhjk

adamhjk · 2014-12-11T17:13:19Z

Per the rfc meeting, we are going to merge this RFC, and put its state to "accepted" but not "final". We need to add a section letting potential readers know it is in flux, but it is our intent to accept this feature. Paging @jonlives! 👍

Audit mode proposal

Audit mode RFC, first draft.

3b8883d

tyler-ball reviewed Nov 14, 2014
View reviewed changes

danielsdeleo reviewed Nov 14, 2014
View reviewed changes

randomcamel reviewed Nov 20, 2014
View reviewed changes

Update for clarity, based on feedback.

cb1125b

Add copyright

1ef2a91

adamleff added a commit that referenced this pull request Dec 11, 2014

Merge pull request #69 from opscode/mcquin/audit-mode

bfec256

Audit mode proposal

adamleff merged commit bfec256 into master Dec 11, 2014

adamleff deleted the mcquin/audit-mode branch December 11, 2014 18:13

tyler-ball mentioned this pull request Dec 18, 2014

Audit mode chef/chef#2674

Merged

Audit mode proposal #69

Audit mode proposal #69

Conversation

mcquin commented Nov 13, 2014

sethvargo commented Nov 13, 2014

mcquin commented Nov 13, 2014

lamont-granquist commented Nov 13, 2014

Choose a reason for hiding this comment

tyler-ball commented Nov 14, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sersut commented Nov 14, 2014

Choose a reason for hiding this comment

coderanger commented Nov 14, 2014

danielsdeleo commented Nov 14, 2014

coderanger commented Nov 14, 2014

danielsdeleo commented Nov 14, 2014

coderanger commented Nov 14, 2014

sethvargo commented Nov 14, 2014

danielsdeleo commented Nov 14, 2014

danielsdeleo commented Nov 14, 2014

coderanger commented Nov 14, 2014

sethvargo commented Nov 14, 2014

coderanger commented Nov 14, 2014

ranjib commented Nov 14, 2014

danielsdeleo commented Nov 14, 2014

sersut commented Nov 14, 2014

coderanger commented Nov 19, 2014

sersut commented Nov 20, 2014

lamont-granquist commented Nov 20, 2014

lamont-granquist commented Nov 20, 2014

coderanger commented Nov 20, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamhjk commented Nov 25, 2014

coderanger commented Nov 25, 2014

adamhjk commented Nov 25, 2014

coderanger commented Nov 25, 2014

adamhjk commented Nov 25, 2014

coderanger commented Nov 25, 2014

coderanger commented Nov 25, 2014

sersut commented Nov 25, 2014

tyler-ball commented Nov 25, 2014

tyler-ball commented Nov 25, 2014

coderanger commented Nov 25, 2014

adamhjk commented Nov 25, 2014

adamhjk commented Nov 26, 2014

coderanger commented Nov 26, 2014

danielsdeleo commented Nov 26, 2014

zuazo commented Nov 26, 2014

sersut commented Dec 10, 2014

adamhjk commented Dec 11, 2014