Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion/Refactoring request] Split benchmark content from "infrastructure"/tooling code #5125

Open
11 tasks
jalliot opened this issue Jan 22, 2020 · 3 comments
Labels
Infrastructure Our content build system

Comments

@jalliot
Copy link

jalliot commented Jan 22, 2020

Hello everyone!
I don't know if this is the best place to start such a discussion but I guess this is the best way to gather the most attention.

My colleagues and I have started using the ComplianceAsCode content and tooling for our company internal needs and we are very interested to reuse it for all our security guides (mostly SCAP-based), whatever the platform (e.g. Windows, macOS, Cisco, etc.). We tested the CIS workbench already and exchanged with other companies like Siemens and their scapolite but your tool chain is at the moment the best available and/or more mature and/or more flexible.
We are not afraid of modifying your source code to adapt to our specific requirements but we still prefer to stick as much as possible to your master to ease the upgrade path between versions.

Without even speaking about some sort of support for non-Linux based content (for which we perfectly understand this is not your priority), one of the main issues we are facing is the strong intricacy between the contents (rules, remediations, templates...) and the code required to build the datastreams and guides (make files, python...).
In our environment and because of the completely independent packages for various platforms and different teams working on each one, we have decided not to put everything in a single git repo but instead to split by "technology" and/or implementing team + extract everything which is not content-related (all scripts) into a different repository. Compilation tools + content are for now "merged" by our build pipeline but we haven't spent a lot of time on it and it is just a big hack with many unwanted artefacts...
And since your release changelogs usually mostly refer to changes in content and not the tool chain, it is difficult to identify breaking changes.

After peeking at the build scripts, it didn't seem to be like an impossible task/workload to do this split and make the scripts as much content/product-agnostic as possible. Here is a non-exhaustive list of items I managed to identify (I probably forgot some while writing) that would be prerequisites or recommendations for this split:

  • (recommended but we could live without) a stable "data model/structure" (YAMLs, directory structure, etc.) for which the compilation tool chain ensures backward compatibility. That would allow to upgrade the tooling without having to update the entire content codebase. Of course it does not prevent to implement new features in the contents (e.g. the templating system some time ago) that would require some minimum version of the toolchain
  • Remove all hardcoded mentions of individual products in the build chain (CMakeLists.txt, build_product, etc.) or at least make build_product capable of compiling an unknown product by simply providing the product.yml location.
  • Move products' references from constants.py (e.g. CPE mapping, etc.) into the product.yml or a product specific script that could live in the product's directory
  • Make the shared directory configurable/optional (similar to the additional_content_directories key in product.yml). For some products, it may be useless to share anything and then all shared sub-folders/files could be moved into the product's own directory.
  • Make the bash_remediation_functions system more generic to allow the same type of processing for other kinds of remediation languages (e.g. PowerShell for Windows). Note that I am not asking you to specifically add support for other languages, just to refactor the current bash specificities.
  • Have some global Jinja macros kept in the "toolchain repo" while allowing each product to define their own set of custom macros in their own repo
  • Extract the hardcoded groups/rules ordering/priority logic from build_yaml.py (2 quick ideas for doing that: add some optional priority key in group and rule YAMLs, or use group/rule ID alphabetical order since the ID is a technical detail and we could easily add some priority prefix such as 01_ to each priority directory)
  • I don't remember where exactly but I saw in one python file an hardcoded reference to linux_os (vs. applications). This should be made generic as well.
  • Make all the STIG overlay content/references and other similar profile-specific references optional (compile if present, ignore otherwise).
  • Review the _get_implied_properties mechanism?
  • Make the templating system slightly more flexible/reusable: make it language-agnostic and maybe find a way to remove the needed callback function in templates.py (or allow custom templates per product at least)

Any opinion/feedback would be really appreciated (and I don't expect you to put a priority on such a topic obviously).

@jan-cerny
Copy link
Collaborator

Hi @jalliot,

Thanks for starting this discussion. I like your proposal very much. Build system improvements are one of our long-term efforts, because improving the build system makes the content creation faster and easier. We are happy to accept any pull requests that improve it.

I agree with you that the content and build system are tied together too much and we should separate them a little. But, I think that creating 2 separate git repositories isn't the right thing to do at this moment. The Linux distributions would have to change their packaging, testing and automation. Also, it could be a little more difficult for newcomers and the documentation would need to be reformulated as well.

I think that at this moment, we should work in the existing repository to make it easier to add a new product, increase flexibility and re-usability of the build tools and reduce the dependency between tools and content. Also I like the idea about supporting other languages.

I like almost all the proposals in your list.

Only regarding the "bash_remediation_functions", I would prefer to not improve it, but remove it and replace it by using Jinja and/or templates. The reason for this that "bash_remediation_functions" mechanism depends on XSLTs and multiple ugly hacks in the Python scripts, with a lot of technical debt. For example, to use a bash function in the bash remediation you need to source a non-existent file /usr/share/scap-security-guide/remediation_functions which causes that the build tools will insert the function there.

@yuumasato
Copy link
Member

Hello @jalliot, thank you for starting the discussion.

The idea of separating the build system from the content is definitely interesting, but I'm not sure if it should go as far as using different repos. And the use case for the split is as you described, to be able to use different set of rules and groups and easily add new products.

And since your release changelogs usually mostly refer to changes in content and not the tool chain, it is difficult to identify breaking changes.

If there is interest, changelogs for the tool chain can also be generated and published.

All your suggestions seem good.
I put below possible caveats for a few of the items.

* Remove all hardcoded mentions of individual products in the build chain (`CMakeLists.txt`, `build_product`, etc.) or at least make `build_product` capable of compiling an unknown product by simply providing the `product.yml` location.

I believe the mentions of specific products is to build the content in a sligthly diffrent way, like in rhel7, for example, to add a second benchmark.
So I'm afraid this may require more flexibility than CMake allows (or that I'm currently aware of). Like some kind of "hook" where the product could customize the way content is built.

I agree with Jan Černy on the bash_remediation_functions.

* [ ]  Extract the hardcoded groups/rules ordering/priority logic from `build_yaml.py` (2 quick ideas for doing that: add some optional priority key in group and rule YAMLs, or use group/rule ID alphabetical order since the ID is a technical detail and we could easily add some priority prefix such as `01_` to each priority directory)

The ID of the rule is important when tailoring a profile. So, while the directory of the rule can have the prefix for priority, in my opinion this shouldn't go into the built content.

* [ ]  Review the `_get_implied_properties` mechanism?

Should it be made explicit in product.yml?

@matejak
Copy link
Member

matejak commented Feb 11, 2020

Hello @jalliot , great to see how deep you went when looking into the build system. As others already pointed out, the build code has issues that need to be resolved before even considering decoupling. As those issues are numerous, decoupling is totally out of sight.
However, bringing the code to a state when decoupling is possible is definitely part of the project's vision. Therefore, we encourage you to start discussions on our public mailing list, and rest assured that we welcome and look forward to review pull requests bringing the code towards that goal.

@marcusburghardt marcusburghardt added the Infrastructure Our content build system label Sep 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Infrastructure Our content build system
Projects
None yet
Development

No branches or pull requests

5 participants