Packages can require patches on dependencies #5476

tgamblin · 2017-09-26T09:36:01Z

Addresses an issue that came up in #4907.

A package can now depend on a special patched version of its dependencies.

depends_on() can optionally take a patch directive or a list of them:

     depends_on(<spec>,
                patches=patch(..., when=<cond>),    # condition on dependency
                when=<cond>)                        # condition on this package
     # or
     depends_on(<spec>,
                patches=[patch(..., when=<cond>),   # condition on dependency
                         patch(..., when=<cond>)],  # condition on dependency
                when=<cond>)                        # condition on this package

The Spec YAML (and therefore the hash) now includes the sha256 of
the patch in the Spec YAML, which changes its hash.
- The special patched version will be built separately from a "vanilla"
  version of the same package.
- This allows packages to maintain patches on their dependencies
  without affecting either the dependency package or its dependents.
  This could previously be accomplished with special variants, but
  having to add variants means the hash of the dependency changes
  frequently when it really doesn't need to. This commit allows the
  hash to change just for dependencies that need patches.
- Patching dependencies shouldn't be the common case, but some packages
  (qmcpack, hpctoolkit, openspeedshop) do this kind of thing and it
  makes the code structure mirror maintenance responsibilities.
This commit means that adding or changing a patch on a
package will change its hash. This is probably what should happen,
but we haven't done it so far. (@adamjstewart mentioned some ANL
folks wanting this)
- Only applies to patch() directives; package.py files (and their
  patch() functions) are not hashed, but we'd like to do that in the
  future.
To implement these extensions to Spack's DSL, some fanciness was added to
directives.DirectiveMetaMixin. In particular:
- directives can be nested (called as parameters to other directives) and their
  results won't be enqueued for lazy execution like a normal directive.
- same for directives called within directives
Previously, the patch() directive only took an md5 parameter. Now
it takes a sha256 parameter. We restrict the hash used because we want
to be consistent about which hash is used in the Spec.
- A side effect of hashing patches is that compressed patches fetched
  from URLs now need two checksums: one for the downloaded archive and
  one for the content of the patch itself. Patches fetched uncompressed
  only need a checksum for the patch. Rationale:
- we include the content of the patch in the spec hash, as that is
  the checksum we can do consistently for patches included in Spack's
  source and patches fetched remotely, both compressed and
  uncompressed.
- we still need the patch of the downloaded archive, because we want
  to verify the download before handing it off to tar, unzip, or
  another decompressor. Not doing so is a security risk and leaves
  users exposed to any arbitrary code execution vulnerabilities in
  compression tools.
This also refactors the way dependency metadata is stored on packages.
- Previously, dependencies and dependency_types were stored as separate
  dicts on Package. This means a package can only depend on another in one
  specific way, which is usually but not always true.
- Prior code unioned dependency types statically across dependencies on
  the same package.
- New code stores dependency relationships as their own object (a Dependency),
  with a spec constraint and a set of dependency types and patches per relationship.
  - Dependency types are now more precise
  - There is now room to add more information to dependency relationships (like patches)
- New Dependency class lives in spack.dependency, along with deptype
  definitions that used to live in spack.spec.
- Updated nwchem and nauty to use sha256 for patches (@junghans)
- Lines with long checksums are now ignored by spack flake8
Documentation

@naromero77: this is nearly done, and should let you continue to maintain your own patches on dependencies for qmcpack.

@jgalarowicz: you may be interested in this for openspeedshop.
@mwkrentel: you may be interested in this for hpctoolkit, too, as I know you maintain a lot of patches on your dependencies. This would allow you to maintain them with the hpctoolkit package wtihout disrupting other packages.

alalazo

Among the points above I was able to give a close look at:

the refactoring of dependencies metadata
the modifications to the meta-classes and directives

I think 1. is great, and it will permit to be quite flexible in the future if we need a dependee to modify other aspects of its dependencies.

For 2. the only concern I have is if we really need that additional complexity. The meta-class served the purpose of delaying the directive machinery until when the package to which we want to attach attributes actually existed. The support for nested directives takes a great deal to deactivate that same machinery when a nested directive is encountered.

But since the Dependency objects to which we want to attach attributes are already constructed when we need to evaluate patch (see comment below), is it worth generalizing the mechanism in this way or we can give up using patch lowercase to permit a more direct logic in the implementation of depends_on? Am I missing some important details here?

alalazo · 2017-09-27T07:30:32Z

lib/spack/spack/dependency.py

+            builtin function ``all`` or the string 'all', which result in
+            a tuple of all dependency types known to Spack.
+    """
+    if deptype in (None, 'all', all):


Why do we check for None here? From what I can see the only point where:

s = canonical_deptype(None)

is used is within unit tests. Is this a leftover from some refactoring?

None is commonly the default value for deptypes= kwargs. But you're right, I suppose we should find usages and replace it with 'all'. It'll be clearer that way.

fixed in next push

alalazo · 2017-09-27T07:50:57Z

lib/spack/spack/dependency.py

+
+    elif isinstance(deptype, string_types):
+        if deptype not in all_deptypes:
+            raise ValueError('Invalid dependency type: %s' % deptype)


OT (not strictly part of the code review): I also tend to use built-in exceptions in code, rather than declaring specific ones. I do that mainly based on the assumption that Spack is basically an application. Anyhow I have seen in a couple of places lately (like #5462) that Spack is also being used "as a library" to write scripts.

Do you think it makes sense to start adopting a pattern like:

class SpackValueError(ValueError, SpackError): pass

so that people are allowed to decide whether to manage a generic ValueError or an error coming from Spack? Just asking to raise the question and know your opinion on this.

I think when a ValueError occurs, it's generally because there is an error in the program, not something that would need to be handled programmatically or that you'd need an exception handler for. The only place I've seen ValueError even caught is if someone's trying to construct an int from a string or something similar, and generally in that case you juts make a really narrow try/catch statement.

For things that the caller would actually want to handle (like network errors, or issues with packages) I think wrapping in a SpackError makes a lot more sense.

alalazo · 2017-09-27T07:58:17Z

lib/spack/spack/dependency.py

+        return (deptype,)
+
+    elif isinstance(deptype, (tuple, list)):
+        invalid = next((d for d in deptype if d not in all_deptypes), None)


Minor nitpicking: if there's more than one invalid argument in the list or tuple, the error message will just show the first one, instead of pointing them out all at once.

alalazo · 2017-09-27T08:00:03Z

lib/spack/spack/dependency.py

+
+    A dependency is a requirement for a configuration of another package
+    that satisfies a particular spec.  The dependency can have *types*,
+    which determine *how* that packge configuration is required,


typo: packge

alalazo · 2017-09-27T08:05:18Z

lib/spack/spack/dependency.py

+        Args:
+            pkg (type): Package that has this dependency
+            spec (Spec): Spec indicating dependency requirements
+            patches (list): list of patches to apply to this dependency


__init__ signature and docs seem to disagree, as there's no patches argument to __init__.

This still appears to be out of sync

alalazo · 2017-09-27T09:24:28Z

lib/spack/spack/directives.py

+
+                # wrapped function returns same result as original so
+                # that we can nest directives
+                return result


I see the nastiness here:

nested directives are stripped from the argument list

then the current directive is evaluated with the stripped list, like it was before

the executor is returned so that inner directives like patches=[patches(...)] can be collected by the outer directive

alalazo · 2017-09-27T10:27:03Z

lib/spack/spack/directives.py

+               else p for p in patches]
+    assert all(callable(p) for p in patches)
+
+    # this is where we actally add the dependency to this package


typo : actually

alalazo · 2017-09-27T10:58:31Z

lib/spack/spack/package.py

+        return dict(
+            (name, conds) for name, conds in self.dependencies.items()
+            if any(dt in self.dependencies[name][cond].type
+                   for cond in conds for dt in deptypes))


"""Get subset of the dependencies that *may* have certain types."""

?

personally I think the spack info output should be reworked for clarity, but maybe not in this PR. Organizing the dependency data is all of a sudden harder. But yes this comment conveys the message better. fixed.

alalazo · 2017-09-27T11:36:14Z

lib/spack/spack/directives.py

+
+    # apply patches to the dependency
+    for execute_patch in patches:
+        execute_patch(dependency)


Am I right in saying that all the nested directive machinery exists to evaluate this single statement:

depends_on(<spec>, patches=[patch(..., when=<cond>), ...], # condition on dependency when=<cond>)

while still permitting to use the word "patch" on the rhs of a named argument? Can't we just capture the arguments in a named tuple - or something similar - like:

depends_on(<spec>, patches=[Patch(..., when=<cond>), ...], # named tuple spack.directives.Patch, not spack.patch.Patch when=<cond>)

and use a more direct logic within _depends_on? Am I missing some fundamental details here?

alalazo · 2017-09-27T12:06:10Z

lib/spack/spack/spec.py

+                # Special-case: keeps variant values unique but ordered.
+                s.variants['patches'] = MultiValuedVariant('patches', ())
+                mvar = s.variants['patches']
+                mvar._value = mvar._original_value = tuple(dedupe(patches))


Here we are using a MV variant with a reserved name to keep track of patches, and correctly set it after concretization. I am curious why we didn't go with an additional Spec attribute and a modification to Spec.to_node_dict.

scheibelp · 2017-09-27T20:24:47Z

I'll have some initial comments on this by the end of today

tgamblin · 2017-09-28T03:21:03Z

@alalazo: ok all the suggestions are fixed and I added docs. @scheibelp FYI.

Comments on @alalazo's more in-depth questions:

Clearing the DirectiveMetaMixin attribute seems absolutely correct to me, as now it may get populated as a side effect of the lazy evaluation above, for instance when depends_on eagerly evaluates the nested patches directive.

There's a little ambiguity in the term "nested" here. clearing the queue handles calls that really are "nested", in the sense that the depends_on directive calls patch. It doesn't handle calls to patches that are in parameters -- those are handled inside the directive decorator. The reason for that is that the parameters are enqueued right before the decorator is called, but truly nested calls are only enqueued in the mixin, after each directive is executed. It's complicated but it seems to work well, and I can't think of any cases where it would do something unexpected.

Am I right in saying that all the nested directive machinery exists to evaluate this single statement:
depends_on(<spec>,
    patches=[patch(..., when=<cond>), ...],    # condition on dependency
    when=<cond>)
while still permitting to use the word "patch" on the rhs of a named argument? Can't we just capture the arguments in a named tuple - or something similar - like:
depends_on(<spec>,
    patches=[Patch(..., when=<cond>), ...],    # named tuple spack.directives.Patch, not spack.patch.Patch
    when=<cond>)
and use a more direct logic within _depends_on? Am I missing some fundamental details here?

I don't think you're missing much:

The call to patch inside _depends_on doesn't actually require a lot of machinery -- just the one statement in DirectiveMetaMixin.
The parameter calls to patch() do require a bit more complexity -- all the code in the directive decorator.

But I think there are good reasons to make them the same. Right now, if you know how to write a patch directive, you also know how to write a patch parameter for depends_on. The packager doesn't have to remember that it's Patch in one scenario and patch in the other, and I think we should try really hard to avoid adding more details and subtle special cases to packagers' lives.

This also means there is only one piece of code to maintain for patch directives, and only one place to make a mistake. That's only partially true since patch now adds patches to either Dependency or Package objects, but I still think it's good that adding a patch to either (which is moderately complex) is done essentially the same way.

Here we are using a MV variant with a reserved name to keep track of patches, and correctly set it after concretization. I am curious why we didn't go with an additional Spec attribute and a modification to Spec.to_node_dict.

So, the basic rationale here is that there is a lot of machinery already in place for multi-value variants: adding them, managing a list of attributes, displaying them in spack find, adding them to the spec YAML, hashing them, etc. etc. I actually think we should figure out a way to consolidate them with compiler flags so that we don't have to maintain a special dict class for that. Compiler flags are almost the same, in that their names are effectively reserved (we should probably disallow adding variants for them, too), but they are separate attributes. I think it'll be much easier to have all that stuff as a unified parameter system, especially when we start doing SAT solves on it.

scheibelp

This is mostly questions and documentation suggestions. I'm still trying to figure out the directive logic. I'll keep looking at this tomorrow.

scheibelp · 2017-09-27T20:21:31Z

lib/spack/spack/util/web.py

@@ -328,3 +332,126 @@ def find_versions_of_archive(archive_urls, list_url=None, list_depth=0):
                continue

    return versions
+
+
+def spider_checksums(url_dict, name, stage_function=None, keep_stage=False):


This appears to have been moved vs. edited. Is that correct? Why is it called spider_checksums? Spidering typically means to me building on an initial set to pull more results

Yep -- just moved. Mainly because it spiders webpages.

I guess what I'm saying is that that all this is doing is taking a list of versions, converting each version to a url, and then fetching from that list of URLs. There is no poking around inside each URL which is what I normally think of when I think of spidering. I think get_checksums_for_versions would be more accurate and straightforward.

scheibelp · 2017-09-27T20:22:45Z

lib/spack/spack/util/web.py

+    return version_lines
+
+
+def fetch_remote_package_versions(pkg):


Recently the protobuf package had to override Package.fetch_remote_versions (see https://github.com/LLNL/spack/pull/5373), what are your thoughts on that?

Oh that's a good catch. We should probably move it back if there are actually packages that need custom logic. I can tweak that.

scheibelp · 2017-09-27T20:43:50Z

lib/spack/spack/dependency.py

+
+    @property
+    def name(self):
+        """Get the name of the package to be patched.


This docstring is confusing IMO. It would be fine to say "The name of the dependency package".

scheibelp · 2017-09-27T21:23:31Z

lib/spack/spack/fetch_strategy.py

@@ -967,7 +969,7 @@ def from_list_url(pkg):
       the specified package's version."""
    if pkg.list_url:
        try:
-            versions = pkg.fetch_remote_versions()
+            versions = spack.util.web.fetch_remote_packge_versions(pkg)


Typo: fetch_remote_pack[a]ge_versions

scheibelp · 2017-09-27T21:33:37Z

lib/spack/spack/patch.py

        self.url = path_or_url
-        self.md5 = kwargs.get('md5')
+
+        self.archive_sha256 = None


It looks like you handled swapping sha256 for md5 for a couple packages. Out of curiosity was that all the packages using UrlPatch?

Yes -- that's the only place the checksum is needed. For patches in the filesystem, we compute the sha256 on demand.

scheibelp · 2017-09-27T23:23:02Z

lib/spack/spack/directives.py

+                # caller can return the directive if it should be queued.
+                def remove_directives(arg):
+                    directives = DirectiveMetaMixin._directives_to_be_executed
+                    if isinstance(arg, (list, tuple)):


This function is always initially called with a collection so IMO it's clearer to replace everything from here to the else with for arg in args:

Not sure I understand what you want here -- args is outside the function scope and if an arg happens to be a collection, we still need to descend into it...

For the two cases where this function is used

remove_directives(args) remove_directives(kwargs.values())

in both cases a collection is being passed in. I meant to say: change the name of the parameter in this definition to args and then replace lines 206-210 with for arg in args:.

Also are there other examples of directives as arguments besides passing patch to depends_on? Either way IMO it would make sense to describe that particular case in the explanatory comment.

scheibelp · 2017-09-28T00:13:58Z

lib/spack/spack/patch.py

+            if 'archive_sha256' not in kwargs:
+                raise PatchDirectiveError(
+                    "Compressed patches require 'archive_sha256' "
+                    "and patch 'sha256' attributes: %s" % self.url)


Why is there a 'sha256' and 'archive_256' attribute? It doesn't seem strictly required. Is it to alert the user that they should be creating a hash of the archive and not the decompressed patch? When there is a sha256 sum for the archive why is the sha256 of the decompressed patch also required?

Yeah, this annoyed me as well, but I do not see a good way around it. Here's why:

This PR makes spec.yaml (and thus the DAG hash) include patches. So, we need a sha256 (or some deterministic hash) for each patch, so that each spec.yaml represents one unique configuration in a consistent way.

We need to add patch hashes to the spec on or before the end of concretization, which is when specs are set in stone.

We need to verify archives before passing them to things like tar or unzip. If we do not verify, someone could potentially replace a download with a malformed tarball or zip file and exploit a vulnerability in the compression tools.

(3) is really why we need the archive_sha256. If we could just expand things safely and check only the patch file, we'd be fine with just the patch file's checksum.

We could make (2) work for compressed patches by fetching all the necessary patches before or during concretization. But that requires us to do remote operations that may fail (or may need a mirror), and I like the fact that you can currently reason about packages (i.e. concretize a spec) without anything but the spec and a repo. I think we should try to keep things that way.

You could think about approximating (1) by just using the hash of the compressed archive (which is pretty much unique), but there are issues with that. gzip, tar, etc. aren't deterministic, so if someone changes the way a patch is distributed (e.g. uses a different compression mechanism, or re-tars the same patch), the hash changes, and you won't have the same hash for the same patch over time.

Anyway, that's the rationale. I think compressed patches are infrequent (even if nwchem has an awful lot of them) and adding a little pain for compressed patches seems worth it to avoid a bunch of downloads being required before concretize.

Thoughts?

@alalazo?

If it's easier to implement this way and the majority of patches aren't affected by it, then that seems reasonable. That being said while all of 1-3 make sense to me I still don't see why these are both required and/or why both would need to be set. I was assuming if we have a sha256 and the patch is an archive, it is assumed that the sha256 is for the archive. So I think that's another way of saying I agree with

You could think about approximating (1) by just using the hash of the compressed archive (which is pretty much unique)

As a side note: I'm unclear why the hash of the compressed archive would be less unique than the hash of the decompressed archive.

Regarding:

there are issues with that. gzip, tar, etc. aren't deterministic, so if someone changes the way a patch is distributed (e.g. uses a different compression mechanism, or re-tars the same patch), the hash changes, and you won't have the same hash for the same patch over time.

I think it's fine to treat this the same as if they had edited the patch contents themselves. It would be no less noisy because the archive's sha256 would be changing as often as this happens anyway (and the package must track the archive hash on account of 3).

scheibelp · 2017-09-28T02:57:07Z

lib/spack/spack/dependency.py

+        """Get the name of the package to be patched.
+
+        This is needed so that a ``Dependency`` acts like a ``Package``
+        object when passed to a directive.


When is a dependency object passed to a directive? I see cases where patches are passed to a directive.

I just deleted this part of the comment since I think it's unnecessary with your suggestion to just say "the name of the dependency package".

But Dependency objects are passed to the patch() directive inside _depends_on.

scheibelp · 2017-09-28T03:31:15Z

lib/spack/spack/directives.py

+
+                # Nasty, but it's the best way I can think of to avoid
+                # side effects if directive results are passed as args
+                remove_directives(args)


I'm trying to wrap my head around the directives logic and your updates to it. Could you provide an example of what goes wrong when this isn't done?

I feel like there's got to be a way to avoid undoing undesired modifications, but I admit I don't get what's going on here yet.

Sure. Say you do this:

class Foo(Package): depends_on('bar', patches=[ patch('bar-bugfix.patch'), patch('https://example.com/bar-feature.patch', sha256='abc123...'))

If you do not remove the results of the two patch calls from the queue here, they'll be lazily executed along with all the other directives in DirectiveMetaMixin.__init__ (the metaclass initializer), because the @directive decorator adds them to DirectiveMetaMixin._directives_to_be_executed. The point of this is to only pass the result of patch() to the depends_on() directive, and let that directive handle it.

Basically the point of all the extra directive machinery is so that only directives at the class definition level get executed. The others are either nested (e.g., like patch() in _depends_on, which is only executed for its side effects) or they're passed as parameters, and we let the callee directive decide what to do with their results. The callee (depends_on) could execute the _execute_patch function objects itself, or it could conceivably enqueue them.

The evaluation of patch is passed to depends_on so that it may call _execute_patch with the Dependency created by _depends_on. So I'm confused by the line

The others are either nested (e.g., like patch() in _depends_on, which is only executed for its side effects) or they're passed as parameters, and we let the callee directive decide what to do with their results.

because patch in depends_on is an example of the latter as far as I can tell from examining the code, but you explicitly use it as an example of the former.

scheibelp · 2017-09-28T03:34:55Z

lib/spack/spack/package.py

@@ -1044,7 +1090,11 @@ def do_patch(self):
            except spack.multimethod.NoSuchMethodError:
                # We are running a multimethod without a default case.
                # If there's no default it means we don't need to patch.
-                tty.msg("No patches needed for %s" % self.name)
+                if not patched:
+                    # if we didn't apply a patch, AND the patch function


Suggestion: "if we didn't apply a patch from a patch directive, AND..."

"apply a patch" is general and I had to read this several times to figure out what it meant.

tgamblin · 2017-09-28T04:46:49Z

Ok @scheibelp: I pushed changes to address your review. I also replied to your questions but the replies are on now-outdated commits so I think you'll have to expand them above.

scheibelp

OK thanks for the responses. I have a few more comments and questions. I'm going to look at this again tomorrow but by this point I think I have a general understanding of the how and why of most of the changes here.

scheibelp · 2017-09-28T17:46:02Z

lib/spack/spack/directives.py

+                # caller can return the directive if it should be queued.
+                def remove_directives(arg):
+                    directives = DirectiveMetaMixin._directives_to_be_executed
+                    if isinstance(arg, (list, tuple)):


For the two cases where this function is used

remove_directives(args) remove_directives(kwargs.values())

in both cases a collection is being passed in. I meant to say: change the name of the parameter in this definition to args and then replace lines 206-210 with for arg in args:.

scheibelp · 2017-09-28T17:55:21Z

lib/spack/spack/patch.py

+            if 'archive_sha256' not in kwargs:
+                raise PatchDirectiveError(
+                    "Compressed patches require 'archive_sha256' "
+                    "and patch 'sha256' attributes: %s" % self.url)


If it's easier to implement this way and the majority of patches aren't affected by it, then that seems reasonable. That being said while all of 1-3 make sense to me I still don't see why these are both required and/or why both would need to be set. I was assuming if we have a sha256 and the patch is an archive, it is assumed that the sha256 is for the archive. So I think that's another way of saying I agree with

You could think about approximating (1) by just using the hash of the compressed archive (which is pretty much unique)

As a side note: I'm unclear why the hash of the compressed archive would be less unique than the hash of the decompressed archive.

Regarding:

there are issues with that. gzip, tar, etc. aren't deterministic, so if someone changes the way a patch is distributed (e.g. uses a different compression mechanism, or re-tars the same patch), the hash changes, and you won't have the same hash for the same patch over time.

I think it's fine to treat this the same as if they had edited the patch contents themselves. It would be no less noisy because the archive's sha256 would be changing as often as this happens anyway (and the package must track the archive hash on account of 3).

scheibelp · 2017-09-28T18:04:00Z

lib/spack/spack/dependency.py

+        Args:
+            pkg (type): Package that has this dependency
+            spec (Spec): Spec indicating dependency requirements
+            patches (list): list of patches to apply to this dependency


This still appears to be out of sync

scheibelp · 2017-09-28T18:08:25Z

lib/spack/spack/util/web.py

@@ -328,3 +332,126 @@ def find_versions_of_archive(archive_urls, list_url=None, list_depth=0):
                continue

    return versions
+
+
+def spider_checksums(url_dict, name, stage_function=None, keep_stage=False):


I guess what I'm saying is that that all this is doing is taking a list of versions, converting each version to a url, and then fetching from that list of URLs. There is no poking around inside each URL which is what I normally think of when I think of spidering. I think get_checksums_for_versions would be more accurate and straightforward.

scheibelp · 2017-09-28T22:04:03Z

lib/spack/spack/directives.py

+
+            # Ignore any directives executed *within* top-level
+            # directives by clearing out the queue they're appended to
+            DirectiveMetaMixin._directives_to_be_executed = []


(EDIT: this was intended to add to https://github.com/LLNL/spack/pull/5476/files#r141277358)

The only example I see of this is the call to patch in _depends_on (which only seems to be to support the syntactic sugar depends_on('libelf', patches='foo.patch')). Why not handle this the way the extends directive calls _depends_on vs. depends_on? In other words I can see that the remove_directives logic below would be difficult to get rid of if users can pass directives as arguments, but this case looks like a problem we create for ourselves.

scheibelp · 2017-09-28T22:20:36Z

lib/spack/spack/directives.py

+
+                # Nasty, but it's the best way I can think of to avoid
+                # side effects if directive results are passed as args
+                remove_directives(args)


The evaluation of patch is passed to depends_on so that it may call _execute_patch with the Dependency created by _depends_on. So I'm confused by the line

The others are either nested (e.g., like patch() in _depends_on, which is only executed for its side effects) or they're passed as parameters, and we let the callee directive decide what to do with their results.

because patch in depends_on is an example of the latter as far as I can tell from examining the code, but you explicitly use it as an example of the former.

scheibelp · 2017-09-28T23:41:51Z

lib/spack/spack/directives.py

+                # caller can return the directive if it should be queued.
+                def remove_directives(arg):
+                    directives = DirectiveMetaMixin._directives_to_be_executed
+                    if isinstance(arg, (list, tuple)):


Also are there other examples of directives as arguments besides passing patch to depends_on? Either way IMO it would make sense to describe that particular case in the explanatory comment.

- Previously, dependencies and dependency_types were stored as separate dicts on Package. - This means a package can only depend on another in one specific way, which is usually but not always true. - Prior code unioned dependency types statically across dependencies on the same package. - New code stores dependency relationships as their own object, with a spec constraint and a set of dependency types per relationship. - Dependency types are now more precise - There is now room to add more information to dependency relationships. - New Dependency class lives in dependency.py, along with deptype definitions that used to live in spack.spec. Move deptype definitions to spack.dependency

- move `spack.cmd.checksum.get_checksums` to `spack.util.web.spider_checksums` - move `spack.error.NoNetworkError` to `spack.util.web.NoNetworkError` since it is only used there.

- Functions returned by directives were all called `_execute`, which made reading stack traces hard because you couldn't tell what directive a frame came from. - renamed them all to `_execute_<directive>` - Exceptions in directives were only really used in one or two places -- get rid of the boilerplate init functions and let the callsite specify the message.

- A package can depend on a special patched version of its dependencies. - The `Spec` YAML (and therefore the hash) now includes the sha256 of the patch in the `Spec` YAML, which changes its hash. - The special patched version will be built separately from a "vanilla" version of the same package. - This allows packages to maintain patches on their dependencies without affecting either the dependency package or its dependents. This could previously be accomplished with special variants, but having to add variants means the hash of the dependency changes frequently when it really doesn't need to. This commit allows the hash to change *just* for dependencies that need patches. - Patching dependencies shouldn't be the common case, but some packages (qmcpack, hpctoolkit, openspeedshop) do this kind of thing and it makes the code structure mirror maintenance responsibilities. - Note that this commit means that adding or changing a patch on a package will change its hash. This is probably what *should* happen, but we haven't done it so far. - Only applies to `patch()` directives; `package.py` files (and their `patch()` functions) are not hashed, but we'd like to do that in the future. - The interface looks like this: `depends_on()` can optionally take a patch directive or a list of them: depends_on(<spec>, patches=patch(..., when=<cond>), when=<cond>) # or depends_on(<spec>, patches=[patch(..., when=<cond>), patch(..., when=<cond>)], when=<cond>) - Previously, the `patch()` directive only took an `md5` parameter. Now it only takes a `sha256` parameter. We restrict this because we want to be consistent about which hash is used in the `Spec`. - A side effect of hashing patches is that *compressed* patches fetched from URLs now need *two* checksums: one for the downloaded archive and one for the content of the patch itself. Patches fetched uncompressed only need a checksum for the patch. Rationale: - we include the content of the *patch* in the spec hash, as that is the checksum we can do consistently for patches included in Spack's source and patches fetched remotely, both compressed and uncompressed. - we *still* need the patch of the downloaded archive, because we want to verify the download *before* handing it off to tar, unzip, or another decompressor. Not doing so is a security risk and leaves users exposed to any arbitrary code execution vulnerabilities in compression tools.

tgamblin · 2017-09-30T08:43:47Z

Ok this is updated with @scheibelp's suggestions and some bugfixes.

tgamblin · 2017-09-30T09:07:25Z

@naromero77: this is merged.

naromero77 · 2017-10-01T03:45:14Z

@tgamblin thanks

refers spack#5587 closes spack#5593 The implementation introduced in spack#5476 used `dedupe` to set a private member of a `MultiValuedVariant` object. `dedupe` is meant to give a stable deduplication of the items in a sequence, and returns an object whose value depends on the order of items in its argument. Now the private member is set implicitly using the associated property setter. The logic changes from stable deduplication to a totally ordered deduplication (i.e. instead of `dedupe(t)` it uses `sorted(set(t))`). This should also grant that packages that shuffle the same set of patch directives maintain the same hash.

alalazo · 2017-10-04T11:06:50Z

lib/spack/spack/spec.py

+                else:
+                    mvar = dspec.spec.variants['patches']
+                mvar._value = mvar._original_value = tuple(
+                    dedupe(list(mvar._value) + patches))


@tgamblin @scheibelp I'll write this comment here just to be sure I passed the information. What we are doing in the line:

mvar._value = mvar._original_value = tuple(dedupe(list(mvar._value) + patches))

is breaking an invariant of the class (the assumption that self._value is totally ordered). This assumption is transparent to the user of the class, and bypassing the property setter here is not very robust. In fact any call to a method that does:

self.value = self.value + other

can potentially reorder self._value (and lead to #5587?). An example of such a method may be for instance MultiValuedVariant.constrain.

alalazo · 2017-10-04T11:52:53Z

lib/spack/spack/spec.py

@@ -1829,6 +1805,43 @@ def concretize(self):
            if s.namespace is None:
                s.namespace = spack.repo.repo_for_pkg(s.name).namespace

+            # Add any patches from the package to the spec.
+            patches = []
+            for cond, patch_list in s.package_class.patches.items():


@tgamblin @scheibelp There's a thing I don't fully understand about the patch mechanism. The information a packager provides with a directive is stored in a dictionary. This means that the for loop above is not ordered. So we are assuming a weak ordering by constraints. To be more clear, if we have:

patch('foo', when='@1') patch('another_foo', when='%gcc') patch('bar', when='@1') patch('another_bar', when='%gcc') patch('baz', when='+shared')

right now we'll be ensured that:

foo is applied immediately before bar

another_foo is applied immediately before another_bar

because they share the same cond, but we don't know whether the first patch is foo, another_foo or baz.

Would it make sense to relax this constraint and say that the patches must be commutative? If this is the case, then I think #5596 will solve #5587 after aggregating the two patches for mpfr which depend on their relative order of application.

If that's not the case then I think we should go completely the other direction and ensure that the order of patches:

reflects the order of directives, regardless of the condition

enters the hash mechanism

Note that this is getting a lot trickier than the case before, since we can now inject patches. For example, if in a DAG we have 2 nodes A and B that both want to inject patches on C we must ensure that A and B are always traversed in the same relative order in any graph were they are present.

Do the considerations above sound reasonable to you?

To update this, https://github.com/LLNL/spack/pull/5596 now preserves the weak ordering that currently exists on ~~packages~~ patches (EDIT).

mathstuf · 2017-10-04T15:51:36Z

Note that this has some consequences: there are problems concretizing associated with the (implicit?) patches variant. See #5598. Some initial poking has not been sufficient, so something more in-depth is required.

fixes #5587 In trying to preserve patch ordering, #5476 made equality inconsistent for the added 'patches' variant. This commit maintains the original weak ordering of patch applications while preserving consistency of comparisons. The ordering DOES NOT enter the hashing mechanism. It's supposed to be a hotfix, while we think of a cleaner and more-permanent solution.

naromero77 · 2017-10-05T03:12:47Z

Thanks seems to have resolved my issue with PR#4907.

Fixes #7885 #7193 added the patches_to_apply function to collect patches which are then applied in Package.do_patch. However this only collects patches that are associated with the Package object and does not include Spec-related patches (which are applied by dependents, added in #5476). Spec.patches already collects patches from the package as well as those applied by dependents, so the Package.patches_to_apply function isn't necessary. All uses of Package.patches_to_apply are replaced with Package.spec.patches. This also updates Package.content_hash to require the associated spec to be concrete: Spec.patches is only set after concretization. Before this PR, it was possible for Package.content_hash to be valid before concretizing the associated Spec if all patches were associated with the Package (vs. being applied by dependents). This behavior was unreliable though so the change is unlikely to be disruptive.

tgamblin added dependencies patch specs WIP labels Sep 26, 2017

tgamblin requested a review from alalazo September 26, 2017 09:36

tgamblin added the don't-merge-yet label Sep 26, 2017

tgamblin requested a review from scheibelp September 26, 2017 09:41

tgamblin changed the title ~~Features/dependency patching~~ Packages can require patches on dependencies Sep 26, 2017

tgamblin force-pushed the features/dependency-patching branch from 89f9a7a to abfaa1c Compare September 27, 2017 01:20

tgamblin removed WIP don't-merge-yet labels Sep 27, 2017

alalazo reviewed Sep 27, 2017

View reviewed changes

tgamblin force-pushed the features/dependency-patching branch from 92d4daf to f225d47 Compare September 28, 2017 03:03

scheibelp reviewed Sep 28, 2017

View reviewed changes

tgamblin force-pushed the features/dependency-patching branch from f225d47 to d5e5a17 Compare September 28, 2017 04:45

tgamblin mentioned this pull request Sep 28, 2017

Bison fails to compile on Mac OS High Sierra #5521

Closed

tgamblin force-pushed the features/dependency-patching branch 2 times, most recently from 00f2603 to 05ea79d Compare September 28, 2017 21:10

scheibelp reviewed Sep 28, 2017

View reviewed changes

scheibelp mentioned this pull request Sep 29, 2017

Add test deptype #5132

Merged

tgamblin added 2 commits September 29, 2017 22:09

Only print "no patches needed" if there were no patches.

99bc72e

Patch.apply() shouldn't affect working directory of caller.

346c0f9

tgamblin force-pushed the features/dependency-patching branch 2 times, most recently from d6882e4 to 7cee03f Compare September 30, 2017 07:39

tgamblin added 2 commits September 30, 2017 00:39

Disable duplicate cross-reference warnings in Sphinx.

be22433

tgamblin force-pushed the features/dependency-patching branch from 7cee03f to 4c1dd80 Compare September 30, 2017 08:24

tgamblin added 5 commits September 30, 2017 01:25

Consolidate some web-spidering commands in spack.util.web

8073b4e

- move `spack.cmd.checksum.get_checksums` to `spack.util.web.spider_checksums` - move `spack.error.NoNetworkError` to `spack.util.web.NoNetworkError` since it is only used there.

add spack flake8 exception for long checksums

3a7ff5d

Documentation for dependency patching.

4c1dd80

tgamblin merged commit 96d2488 into develop Sep 30, 2017

tgamblin deleted the features/dependency-patching branch September 30, 2017 23:32

tgamblin mentioned this pull request Oct 1, 2017

qmcpack: new package #4907

Merged

junghans mentioned this pull request Oct 2, 2017

patch: add workdir option #5501

Merged

tgamblin mentioned this pull request Oct 2, 2017

Adding flang - a llvm based Fortran compiler #5459

Merged

alalazo mentioned this pull request Oct 4, 2017

multi-valued variants comparison fix #5593

Closed

alalazo mentioned this pull request Oct 4, 2017

Patches: the underlying MV variant now uses the property setter #5596

Merged

alalazo reviewed Oct 4, 2017

View reviewed changes

skosukhin mentioned this pull request Oct 16, 2017

Bugfix: from_list_url(). #5780

Merged

kljohann mentioned this pull request Oct 16, 2017

Fix staging of resources for failed patches #5681

Merged

tgamblin mentioned this pull request Oct 23, 2017

spack does not pickup changes to package.py and the hash is not updated #5865

Closed

scheibelp mentioned this pull request Nov 1, 2017

Why does spack install new version of package instead of using existing one? #5839

Closed

tgamblin added this to the v0.11.0 milestone Nov 12, 2017

alalazo mentioned this pull request Nov 22, 2017

patch files are not part of the hash calculation #1056

Closed

tgamblin mentioned this pull request Feb 20, 2018

Features/package hash #7193

Merged

scheibelp mentioned this pull request May 25, 2018

Fix bug where patches specified by dependents were not applied #8272

Merged

Packages can require patches on dependencies #5476

Packages can require patches on dependencies #5476

Conversation

tgamblin commented Sep 26, 2017 • edited Loading

alalazo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scheibelp commented Sep 27, 2017

tgamblin commented Sep 28, 2017

scheibelp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgamblin Sep 28, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgamblin commented Sep 28, 2017

scheibelp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scheibelp Sep 28, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgamblin commented Sep 30, 2017

tgamblin commented Sep 30, 2017

naromero77 commented Oct 1, 2017

alalazo Oct 4, 2017 • edited Loading

Choose a reason for hiding this comment

alalazo Oct 4, 2017 • edited Loading

Choose a reason for hiding this comment

scheibelp Oct 4, 2017 • edited Loading

Choose a reason for hiding this comment

mathstuf commented Oct 4, 2017

naromero77 commented Oct 5, 2017

tgamblin commented Sep 26, 2017 •

edited

Loading

tgamblin Sep 28, 2017 •

edited

Loading

scheibelp Sep 28, 2017 •

edited

Loading

alalazo Oct 4, 2017 •

edited

Loading

alalazo Oct 4, 2017 •

edited

Loading

scheibelp Oct 4, 2017 •

edited

Loading