RFC Dependencies #888

Closed
arlimus opened this Issue Aug 8, 2016 · 18 comments

Projects

None yet

5 participants

@arlimus
Contributor
arlimus commented Aug 8, 2016 edited

Follow up #798 around dependency resolution.

Design

(1) dependencies are written in inspec.yml:

depends:
  - name: hello
  [ ... ]

(2) dependencies are based on semantic versioning: http://semver.org/ . Dependencies do not support alpha/pre-release information (9, 10) (see comment)
(3) dependency version constraints are specified with:

- version: [op] [version] [...]

with op being <, >, <=, >=, =, != or ~>. Multiple constraints may be specified. Whitespaces are optional. Whitespaces in version numbers are not supported.

(4) each dependency has one source, with the default pointing to supermarket. sources can be specified via:

- path: ../relative_path/to/profile
- path: /absolute/path
- supermarket: owner/name
- compliance: owner/name
- github: owner/name
- url: http://sth...

(5) upon resolution, a lockfile is created in inspec.lock. If this lockfile exists, dependencies are not resolved but taken from this file intead.

(6) dependencies are vendored to a local cache in ${inspec-home}/cache. Users may specify a custom vendor location.

(7) dependencies provide their library functions the profile (without additional specification). For example: Resources that are defined in libraries are available to the profile that requires it.

(8) dependencies may provide controls via include_controls (all) or require_controls (selective). If none of these are used, dependencies will not execute or report on any controls.

(9) dependencies are scoped with their name. scoping applies to all aspects, including resources, controls, and attributes. resources are added to the global space if there is no conflict.

... and discuss 😁

scoping

  • All profiles have a simple name. Names are short, to the point (and should not contain spaces). Example: ssh, cis-centos6-lvl1

  • When a profile is pulled in via dependencies, its name may be overwritten. This allows the inclusion of 2 profiles with the same name. Example: my ssh becomes my-ssh while upstream ssh becomes upstream-ssh. It's done via the name field:

    depends:
    - name: my-ssh
      url: go.to/my/ssh
    
  • All profile resources, controls, attributes, and other future artifacts are scoped under this name. Controls and attributes must always follow the name convention, resources may be added to a shared global scope but may overwrite existing resource with this name.

sources

The following types are supported:

path

Profile which is located in a folder on disk. This should only be used for development and debugging.

Does not support version constraints. The folder must exist. If it doesn't, throw an error.

depends:
- name: my-profile
  path: /absolute/path
- name: another
  path: ../relative/path

url

Fixed HTTP/HTTPS-based URL which contains a profile. To retrieve the profile use a HTTP GET operation. The profile is provided in either zip, tar, or tar.gz format. If the download fails or doesn't provide the expected format, throw an error.

depends:
- name: my-profile
  url: https://my.domain/path/to/profile.tgz

supermarket, git, and github

These sources are translated into a URL upon resolution. All support version indexing. For versions to be indexed, they must be provided via semantic versioning as git tags.

Git is the basic mechanism, which supports an optional branch, tag, commit, or version specification. Version specs are resolved via tags matching semantic versioning patterns. If a version constraint cannot be resolved, an error is thrown.

depends:
- name: git-profile
  git: http://url/to/repo
  [branch:  desired_branch]
  [tag:     desired_version]
  [commit:  pinned_commit]
  [version: semver_via_tags]

Github and supermarket build on this source and support all git options:

depends:
- name: gh-profile
  github: username/project
- name: super-profile
  supermarket: username/profile

MVP

  • Specify a dependency in inspec.yml
  • Dependency is a path-based and does not require version resolution
  • No need for vendoring yet
  • Ability to require controls from said profile via include_controls and require_controls
  • Create a Lockfile

Slices

MVP

  • MVP path #891
  • Transitive dependencies (A pulls B pulls C) #915
  • Create and load a Lockfile for dependencies after resolution #950
  • Introduce scoping to the ProfileContext which has a view of all of its dependencies #958
    • create an up-front profile context for every profile that is pulled in (e.g. as dependencies)
    • attach profile contexts to the dependency tree
    • runner loads content in the context of profile inside the dep tree
  • Vendor URL dependencies so that they don't conflict on disk and load them (ignore conflicts)
  • Vendor Github and Supermarket dependencies #959
  • Design UX for scoping of attributes and resources #1057
  • All resources are scoped #1058
@arlimus arlimus modified the milestone: 1.0.0, 0.30.0 Aug 8, 2016
@alexpop
Contributor
alexpop commented Aug 8, 2016

🍒 | 🍎 | 🍌

🍉 | 🍒 | 🍓

🍏 | 🍉 | 🍒

@arlimus arlimus added a commit that referenced this issue Aug 8, 2016
@arlimus arlimus introduce dependency resolution
This commit is the foundation of the dependency resolution as described in #888 .

It currently only works with local dependencies, as seen in the example inheritance profile.

Tests and full resolution are coming next on the path to an MVP implementation.
4e9c394
@arlimus arlimus added a commit that referenced this issue Aug 9, 2016
@arlimus arlimus introduce dependency resolution
This commit is the foundation of the dependency resolution as described in #888 .

It currently only works with local dependencies, as seen in the example inheritance profile.

Tests and full resolution are coming next on the path to an MVP implementation.
9b660c1
@vinyar
Member
vinyar commented Aug 9, 2016 edited

As you guys are working on the feature, I would like to know how the default supermarket URL will be resolved? I've seen bugs, where default URL is hardcoded to public supermarket (supermarket.chef.io), which blows up inside firewalled environment where customer has their own supermarket running.

Please consider how/where this URL will be fetched from, and where/how it can be overwritten to point to internal source by default.

@vinyar
Member
vinyar commented Aug 9, 2016 edited

Is there a page with words that describes (9) ?

@chris-rock
Contributor

In addition to @vinyar the same challenge will happen with our compliance:// prefix. The challenge is therefore the same for:

- supermarket: owner/name
- compliance: owner/name

One solution could be to use the compliance and supermarket url from inspec compliance login and a new inspec supermarket login.

@arlimus arlimus added a commit that referenced this issue Aug 9, 2016
@arlimus arlimus introduce dependency resolution
This commit is the foundation of the dependency resolution as described in #888 .

It currently only works with local dependencies, as seen in the example inheritance profile.

Tests and full resolution are coming next on the path to an MVP implementation.
4cefc85
@arlimus arlimus added a commit that referenced this issue Aug 9, 2016
@arlimus arlimus introduce dependency resolution
This commit is the foundation of the dependency resolution as described in #888 .

It currently only works with local dependencies, as seen in the example inheritance profile.

Tests and full resolution are coming next on the path to an MVP implementation.
68f52e4
@arlimus arlimus modified the milestone: 1.0.0, 0.30.0 Aug 9, 2016
@arlimus
Contributor
arlimus commented Aug 9, 2016 edited

@vinyar @chris-rock fun story, this might even happen to github prefix ;)

To address it, we have 2 options afaics

  1. Let the user inspec [sth] login [url] to the service. This is great, if you have a login. But what if you don't? How about an internal supermarket that doesn't require a login?
  2. Specify the provider's url via the target: supermarket://my.server/owner/profile. This also has a number of problems, eg there is no way to specify http vs https
  3. Specify the provider's url via an additional parameter: inspec exec profile --supermarket-url https://my.server. This may get quite wordy.
  4. This ^^ could also be split in 2 stages: inspec [supermarket|compliance|github] login/url https://my.server and then inspec exec profile (i.e. not just login but also just specify via url). The login information is either kept in a local(ish) file or in env (recommended imho)
@arlimus
Contributor
arlimus commented Aug 9, 2016

@vinyar added a lot of text around scoping

@alexpop
Contributor
alexpop commented Aug 10, 2016

Proposing a git source as well or instead of github

@chris-rock
Contributor

@alexpop great idea to support git directly

@alexpop
Contributor
alexpop commented Aug 10, 2016 edited

(A) Is the url source going to point to an archive?


(B) Is ${inspec-home} the installation path for inspec? If so, what happens when inspec is upgraded?


(C) I like the idea of defining a source per dependency. But this creates the need to be able to override these locations for environments that don't have direct access to them or have a policy to review and store all dependencies before using. Let's take this for example, where gitlab is a local git repository:

hippa-profile (gitlab)
│
├─────> oracle-db (gitlab) ─────────> oracle (github)
│                                     └─────────────────────> inspec-sugar (bitbucket)
│          
└─────> docker-engine (private gitlab) ─────> cis-docker-benchmark (public supermarket)

(C1) Passing overriding options to inspec can get wordy very fast and lacks granularity.

(C2) Not wordy for inspec and very granular is to use a file, similar to Berksfile/Policyfile where you can define a default source and individual source for each dependency.

(C3) Like (C2) but without another file, but instead using inspec.yml to override the source of dependencies


(D) Should probably design the sources to receive parameters. So, instead of:

depends:
  - name: profile
    github: owner/name

have something like this:

depends:
  - name: profile1
    source:
      github: docker/security
      branch: stable
      rel: 'inspec-profiles/profile1'
  - name: profile2
    source:
      s3: profiles-bucket/inspec/profile2.tgz
      aws_access_key: x
      aws_secret_acces_key: y

same way berkshelf is doing it.

@alexpop
Contributor
alexpop commented Aug 10, 2016

Dom (8) isn't include_controls (all) and require_controls(selective)?
Link this as example for the two: https://github.com/chef/inspec/blob/master/docs/profiles.rst#profile-inheritance

@vinyar
Member
vinyar commented Aug 10, 2016

@arlimus there is also another problem outside of the realm of compliance, is that presently supermarket does not support publishing profiles via command line or even curl.
(chef/supermarket#1166)

@stevendanna
Member

Re (2) dependencies are based on semantic versioning: http://semver.org/. Do we intend to support alpha/pre-release identifiers or build identifiers:

My vote would be that we do not support these parts of semver.

@chris-rock chris-rock added a commit that referenced this issue Aug 10, 2016
@arlimus @chris-rock arlimus + chris-rock introduce dependency resolution
This commit is the foundation of the dependency resolution as described in #888 .

It currently only works with local dependencies, as seen in the example inheritance profile.

Tests and full resolution are coming next on the path to an MVP implementation.
7e56966
@arlimus
Contributor
arlimus commented Aug 11, 2016 edited

@alexpop thank you, fixed 👍 :)
somehow github edit reverted my last set of edits to the first post, will have to do it again...

@arlimus
Contributor
arlimus commented Aug 11, 2016

@stevendanna We don't have a use-case for these yet, so happy to keep the feature-set small. 👍

@stevendanna
Member

I'm having trouble understanding how some of this dependency RFC will work. It mixes some ideas from dependency systems that do minimal version management and others that do full dependency resolution, usually using a central index of the universe of available packages to speed up the process.

Here is a more basic question that I think we should spell out more explicitly:

Consider: A depends on B and C with no version constraints. B and C depends on D with no version constrains but from different sources. (You can create other examples with completing versions). It seems in this case we have a few options:

  • Raise an error and instruct the user they need to add D to their top level inspec.yml and specify a source that will take precedence.
  • Develop features inside of inspec that would allow both version of D to be used simultaneously, with B and C each having access to the version of D that they required. Perhaps, building the "source" into the internal identifier that one requires.
  • Not consider "source" information from our transitive dependencies and assume transitive deps will be available in some default source.

Or am I missing an option? Or more generally: For a given dependency "X" are we going to try to find a single version of X to try to load into the app or will we allow for multiple versions.

@stevendanna
Member

I'd also like to see us spec out some of the UX of how you will interact with this feature. It seems to me the core operation of a tool like this are:

  • Resolve & Fetch all dependencies
  • Update all dependencies
  • Update a single dependency
    – Show the "activated" list of dependencies
  • Show the "activated" list of dependencies in tree form to better understand your dependencies
@stevendanna
Member

@arlimus After discussing this with @chris-rock a bit, I think an important point to discuss and make a bit more explicit in this RFC is how dependency will be scoped and the impact that has on the features we need in the dependency resolution.

From our discussion, I believe one of the goals of this work is to allow two different versions of the same dependencies to be loaded at the same time. For example, you might have a dependency tree that looks like:

A
|
+-->B-->D@1.0.0
|
+-->C-->D@2.0.0

This proposes that we change inspec such that code inside B can reference code inside D and be sure it is getting the 1.0.0 version of the code, while code inside C can reference code inside D and be sure it is getting the 2.0.0 version of the code. Is this a correct characterization of part of the feature being proposed?

If so, it leads to a follow-on thought:

Is there utility in including the version operators at all. In the dependency management systems I've used (I'm by no means an expert) one of the major reason to include the version operators is because the runtime can only load a single version of a given dependency. Packages use version constraints to ensure that the single version that is loaded is within the range of versions it supports. The package manager solves the constraint problem after getting version constraint information from every source to ensure it has the full universe of dependencies.

However, in the world describe above, it seems that the version constraints would only be used to limit which version is fetched in the absence of a lockfile. However, most of the proposed sources don't have a standard API for exposing information about the available versions.

@arlimus
Contributor
arlimus commented Aug 11, 2016 edited

Great discussion with @stevendanna and @chris-rock :

Which sources do we have that provide version information in an index?

  • url, path, compliance: no index information
  • github, supermarket: index information without dependency/version info
  • for all indexes, we find the optimal item that matches the user's specification

Ideas:

  • We still build the full dependency tree for the final pinned state; We fetch the version that matches the specification for the sources that support it.

I'll update the spec:

  • When a Profile X executes, it will have access to its dependencies D' and either dependencies D". Resources have flat exposure, e.g. if D" creates d_conf, X will have access to it via calling d_conf
  • When a Profile X executes, its dependent attributes are accessible via the mapped names of dependencies D' and D", e.g. for profile postgres you get access to its attribute user via postgres/user
  • If a Profile X pulls any dependencies with the same space but different versions D1 and D2, the profile X gets access to the latest version of D in terms of library contents and attributes.
  • Users may specify attributes, libraries, and controls within X of with two identically spaces child profiles D1 and D2 by giving a full access path X/Y/D1/control...
@arlimus arlimus added RFC and removed feature request labels Aug 22, 2016
@chris-rock chris-rock modified the milestone: 1.0.0, 1.1.0 Sep 21, 2016
@chris-rock chris-rock closed this Sep 29, 2016
@chris-rock chris-rock modified the milestone: 1.0.0, 1.1.0 Sep 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment