[Discussion/Refactoring request] Split benchmark content from "infrastructure"/tooling code #5125

jalliot · 2020-01-22T14:15:21Z

Hello everyone!
I don't know if this is the best place to start such a discussion but I guess this is the best way to gather the most attention.

My colleagues and I have started using the ComplianceAsCode content and tooling for our company internal needs and we are very interested to reuse it for all our security guides (mostly SCAP-based), whatever the platform (e.g. Windows, macOS, Cisco, etc.). We tested the CIS workbench already and exchanged with other companies like Siemens and their scapolite but your tool chain is at the moment the best available and/or more mature and/or more flexible.
We are not afraid of modifying your source code to adapt to our specific requirements but we still prefer to stick as much as possible to your master to ease the upgrade path between versions.

Without even speaking about some sort of support for non-Linux based content (for which we perfectly understand this is not your priority), one of the main issues we are facing is the strong intricacy between the contents (rules, remediations, templates...) and the code required to build the datastreams and guides (make files, python...).
In our environment and because of the completely independent packages for various platforms and different teams working on each one, we have decided not to put everything in a single git repo but instead to split by "technology" and/or implementing team + extract everything which is not content-related (all scripts) into a different repository. Compilation tools + content are for now "merged" by our build pipeline but we haven't spent a lot of time on it and it is just a big hack with many unwanted artefacts...
And since your release changelogs usually mostly refer to changes in content and not the tool chain, it is difficult to identify breaking changes.

After peeking at the build scripts, it didn't seem to be like an impossible task/workload to do this split and make the scripts as much content/product-agnostic as possible. Here is a non-exhaustive list of items I managed to identify (I probably forgot some while writing) that would be prerequisites or recommendations for this split:

Any opinion/feedback would be really appreciated (and I don't expect you to put a priority on such a topic obviously).

The text was updated successfully, but these errors were encountered:

jan-cerny · 2020-01-23T08:34:02Z

Hi @jalliot,

Thanks for starting this discussion. I like your proposal very much. Build system improvements are one of our long-term efforts, because improving the build system makes the content creation faster and easier. We are happy to accept any pull requests that improve it.

I agree with you that the content and build system are tied together too much and we should separate them a little. But, I think that creating 2 separate git repositories isn't the right thing to do at this moment. The Linux distributions would have to change their packaging, testing and automation. Also, it could be a little more difficult for newcomers and the documentation would need to be reformulated as well.

I think that at this moment, we should work in the existing repository to make it easier to add a new product, increase flexibility and re-usability of the build tools and reduce the dependency between tools and content. Also I like the idea about supporting other languages.

I like almost all the proposals in your list.

Only regarding the "bash_remediation_functions", I would prefer to not improve it, but remove it and replace it by using Jinja and/or templates. The reason for this that "bash_remediation_functions" mechanism depends on XSLTs and multiple ugly hacks in the Python scripts, with a lot of technical debt. For example, to use a bash function in the bash remediation you need to source a non-existent file /usr/share/scap-security-guide/remediation_functions which causes that the build tools will insert the function there.

yuumasato · 2020-01-23T13:10:33Z

Hello @jalliot, thank you for starting the discussion.

The idea of separating the build system from the content is definitely interesting, but I'm not sure if it should go as far as using different repos. And the use case for the split is as you described, to be able to use different set of rules and groups and easily add new products.

And since your release changelogs usually mostly refer to changes in content and not the tool chain, it is difficult to identify breaking changes.

If there is interest, changelogs for the tool chain can also be generated and published.

All your suggestions seem good.
I put below possible caveats for a few of the items.

* Remove all hardcoded mentions of individual products in the build chain (`CMakeLists.txt`, `build_product`, etc.) or at least make `build_product` capable of compiling an unknown product by simply providing the `product.yml` location.

I believe the mentions of specific products is to build the content in a sligthly diffrent way, like in rhel7, for example, to add a second benchmark.
So I'm afraid this may require more flexibility than CMake allows (or that I'm currently aware of). Like some kind of "hook" where the product could customize the way content is built.

I agree with Jan Černy on the bash_remediation_functions.

* [ ]  Extract the hardcoded groups/rules ordering/priority logic from `build_yaml.py` (2 quick ideas for doing that: add some optional priority key in group and rule YAMLs, or use group/rule ID alphabetical order since the ID is a technical detail and we could easily add some priority prefix such as `01_` to each priority directory)

The ID of the rule is important when tailoring a profile. So, while the directory of the rule can have the prefix for priority, in my opinion this shouldn't go into the built content.

* [ ]  Review the `_get_implied_properties` mechanism?

Should it be made explicit in product.yml?

matejak · 2020-02-11T12:16:24Z

Hello @jalliot , great to see how deep you went when looking into the build system. As others already pointed out, the build code has issues that need to be resolved before even considering decoupling. As those issues are numerous, decoupling is totally out of sight.
However, bringing the code to a state when decoupling is possible is definitely part of the project's vision. Therefore, we encourage you to start discussions on our public mailing list, and rest assured that we welcome and look forward to review pull requests bringing the code towards that goal.

marcusburghardt added the Infrastructure Our content build system label Sep 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion/Refactoring request] Split benchmark content from "infrastructure"/tooling code #5125

[Discussion/Refactoring request] Split benchmark content from "infrastructure"/tooling code #5125

jalliot commented Jan 22, 2020

jan-cerny commented Jan 23, 2020

yuumasato commented Jan 23, 2020

matejak commented Feb 11, 2020

[Discussion/Refactoring request] Split benchmark content from "infrastructure"/tooling code #5125

[Discussion/Refactoring request] Split benchmark content from "infrastructure"/tooling code #5125

Comments

jalliot commented Jan 22, 2020

jan-cerny commented Jan 23, 2020

yuumasato commented Jan 23, 2020

matejak commented Feb 11, 2020