-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Packages can require patches on dependencies #5476
Conversation
89f9a7a
to
abfaa1c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Among the points above I was able to give a close look at:
- the refactoring of dependencies metadata
- the modifications to the meta-classes and directives
I think 1. is great, and it will permit to be quite flexible in the future if we need a dependee to modify other aspects of its dependencies.
For 2. the only concern I have is if we really need that additional complexity. The meta-class served the purpose of delaying the directive machinery until when the package to which we want to attach attributes actually existed. The support for nested directives takes a great deal to deactivate that same machinery when a nested directive is encountered.
But since the Dependency
objects to which we want to attach attributes are already constructed when we need to evaluate patch
(see comment below), is it worth generalizing the mechanism in this way or we can give up using patch
lowercase to permit a more direct logic in the implementation of depends_on
? Am I missing some important details here?
lib/spack/spack/dependency.py
Outdated
builtin function ``all`` or the string 'all', which result in | ||
a tuple of all dependency types known to Spack. | ||
""" | ||
if deptype in (None, 'all', all): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we check for None
here? From what I can see the only point where:
s = canonical_deptype(None)
is used is within unit tests. Is this a leftover from some refactoring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None
is commonly the default value for deptypes=
kwargs. But you're right, I suppose we should find usages and replace it with 'all'. It'll be clearer that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in next push
|
||
elif isinstance(deptype, string_types): | ||
if deptype not in all_deptypes: | ||
raise ValueError('Invalid dependency type: %s' % deptype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OT (not strictly part of the code review): I also tend to use built-in exceptions in code, rather than declaring specific ones. I do that mainly based on the assumption that Spack is basically an application. Anyhow I have seen in a couple of places lately (like #5462) that Spack is also being used "as a library" to write scripts.
Do you think it makes sense to start adopting a pattern like:
class SpackValueError(ValueError, SpackError):
pass
so that people are allowed to decide whether to manage a generic ValueError
or an error coming from Spack? Just asking to raise the question and know your opinion on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think when a ValueError
occurs, it's generally because there is an error in the program, not something that would need to be handled programmatically or that you'd need an exception handler for. The only place I've seen ValueError
even caught is if someone's trying to construct an int from a string or something similar, and generally in that case you juts make a really narrow try/catch statement.
For things that the caller would actually want to handle (like network errors, or issues with packages) I think wrapping in a SpackError makes a lot more sense.
lib/spack/spack/dependency.py
Outdated
return (deptype,) | ||
|
||
elif isinstance(deptype, (tuple, list)): | ||
invalid = next((d for d in deptype if d not in all_deptypes), None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nitpicking: if there's more than one invalid argument in the list or tuple, the error message will just show the first one, instead of pointing them out all at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
lib/spack/spack/dependency.py
Outdated
|
||
A dependency is a requirement for a configuration of another package | ||
that satisfies a particular spec. The dependency can have *types*, | ||
which determine *how* that packge configuration is required, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: packge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
lib/spack/spack/dependency.py
Outdated
Args: | ||
pkg (type): Package that has this dependency | ||
spec (Spec): Spec indicating dependency requirements | ||
patches (list): list of patches to apply to this dependency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__init__
signature and docs seem to disagree, as there's no patches argument to __init__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still appears to be out of sync
|
||
# wrapped function returns same result as original so | ||
# that we can nest directives | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the nastiness here:
- nested directives are stripped from the argument list
- then the current directive is evaluated with the stripped list, like it was before
- the executor is returned so that inner directives like
patches=[patches(...)]
can be collected by the outer directive
lib/spack/spack/directives.py
Outdated
else p for p in patches] | ||
assert all(callable(p) for p in patches) | ||
|
||
# this is where we actally add the dependency to this package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo : actually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
lib/spack/spack/package.py
Outdated
return dict( | ||
(name, conds) for name, conds in self.dependencies.items() | ||
if any(dt in self.dependencies[name][cond].type | ||
for cond in conds for dt in deptypes)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Get subset of the dependencies that *may* have certain types."""
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personally I think the spack info
output should be reworked for clarity, but maybe not in this PR. Organizing the dependency data is all of a sudden harder. But yes this comment conveys the message better. fixed.
|
||
# apply patches to the dependency | ||
for execute_patch in patches: | ||
execute_patch(dependency) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I right in saying that all the nested directive machinery exists to evaluate this single statement:
depends_on(<spec>,
patches=[patch(..., when=<cond>), ...], # condition on dependency
when=<cond>)
while still permitting to use the word "patch" on the rhs of a named argument? Can't we just capture the arguments in a named tuple - or something similar - like:
depends_on(<spec>,
patches=[Patch(..., when=<cond>), ...], # named tuple spack.directives.Patch, not spack.patch.Patch
when=<cond>)
and use a more direct logic within _depends_on
? Am I missing some fundamental details here?
# Special-case: keeps variant values unique but ordered. | ||
s.variants['patches'] = MultiValuedVariant('patches', ()) | ||
mvar = s.variants['patches'] | ||
mvar._value = mvar._original_value = tuple(dedupe(patches)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are using a MV variant with a reserved name to keep track of patches, and correctly set it after concretization. I am curious why we didn't go with an additional Spec
attribute and a modification to Spec.to_node_dict
.
I'll have some initial comments on this by the end of today |
92d4daf
to
f225d47
Compare
@alalazo: ok all the suggestions are fixed and I added docs. @scheibelp FYI. Comments on @alalazo's more in-depth questions:
There's a little ambiguity in the term "nested" here. clearing the queue handles calls that really are "nested", in the sense that the
I don't think you're missing much:
But I think there are good reasons to make them the same. Right now, if you know how to write a This also means there is only one piece of code to maintain for patch directives, and only one place to make a mistake. That's only partially true since
So, the basic rationale here is that there is a lot of machinery already in place for multi-value variants: adding them, managing a list of attributes, displaying them in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly questions and documentation suggestions. I'm still trying to figure out the directive logic. I'll keep looking at this tomorrow.
lib/spack/spack/util/web.py
Outdated
@@ -328,3 +332,126 @@ def find_versions_of_archive(archive_urls, list_url=None, list_depth=0): | |||
continue | |||
|
|||
return versions | |||
|
|||
|
|||
def spider_checksums(url_dict, name, stage_function=None, keep_stage=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to have been moved vs. edited. Is that correct? Why is it called spider_checksums
? Spidering typically means to me building on an initial set to pull more results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep -- just moved. Mainly because it spiders webpages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess what I'm saying is that that all this is doing is taking a list of versions, converting each version to a url, and then fetching from that list of URLs. There is no poking around inside each URL which is what I normally think of when I think of spidering. I think get_checksums_for_versions
would be more accurate and straightforward.
lib/spack/spack/util/web.py
Outdated
return version_lines | ||
|
||
|
||
def fetch_remote_package_versions(pkg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recently the protobuf package had to override Package.fetch_remote_versions
(see https://github.com/LLNL/spack/pull/5373), what are your thoughts on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's a good catch. We should probably move it back if there are actually packages that need custom logic. I can tweak that.
lib/spack/spack/dependency.py
Outdated
|
||
@property | ||
def name(self): | ||
"""Get the name of the package to be patched. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This docstring is confusing IMO. It would be fine to say "The name of the dependency package".
lib/spack/spack/fetch_strategy.py
Outdated
@@ -967,7 +969,7 @@ def from_list_url(pkg): | |||
the specified package's version.""" | |||
if pkg.list_url: | |||
try: | |||
versions = pkg.fetch_remote_versions() | |||
versions = spack.util.web.fetch_remote_packge_versions(pkg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: fetch_remote_pack[a]ge_versions
self.url = path_or_url | ||
self.md5 = kwargs.get('md5') | ||
|
||
self.archive_sha256 = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you handled swapping sha256 for md5 for a couple packages. Out of curiosity was that all the packages using UrlPatch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes -- that's the only place the checksum is needed. For patches in the filesystem, we compute the sha256 on demand.
# caller can return the directive if it should be queued. | ||
def remove_directives(arg): | ||
directives = DirectiveMetaMixin._directives_to_be_executed | ||
if isinstance(arg, (list, tuple)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is always initially called with a collection so IMO it's clearer to replace everything from here to the else with for arg in args:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand what you want here -- args
is outside the function scope and if an arg happens to be a collection, we still need to descend into it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the two cases where this function is used
remove_directives(args)
remove_directives(kwargs.values())
in both cases a collection is being passed in. I meant to say: change the name of the parameter in this definition to args
and then replace lines 206-210 with for arg in args:
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also are there other examples of directives as arguments besides passing patch
to depends_on
? Either way IMO it would make sense to describe that particular case in the explanatory comment.
if 'archive_sha256' not in kwargs: | ||
raise PatchDirectiveError( | ||
"Compressed patches require 'archive_sha256' " | ||
"and patch 'sha256' attributes: %s" % self.url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a 'sha256' and 'archive_256' attribute? It doesn't seem strictly required. Is it to alert the user that they should be creating a hash of the archive and not the decompressed patch? When there is a sha256 sum for the archive why is the sha256 of the decompressed patch also required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this annoyed me as well, but I do not see a good way around it. Here's why:
- This PR makes
spec.yaml
(and thus the DAG hash) include patches. So, we need asha256
(or some deterministic hash) for each patch, so that eachspec.yaml
represents one unique configuration in a consistent way. - We need to add patch hashes to the spec on or before the end of concretization, which is when specs are set in stone.
- We need to verify archives before passing them to things like
tar
orunzip
. If we do not verify, someone could potentially replace a download with a malformed tarball or zip file and exploit a vulnerability in the compression tools.
(3) is really why we need the archive_sha256
. If we could just expand things safely and check only the patch file, we'd be fine with just the patch file's checksum.
We could make (2) work for compressed patches by fetching all the necessary patches before or during concretization. But that requires us to do remote operations that may fail (or may need a mirror), and I like the fact that you can currently reason about packages (i.e. concretize a spec) without anything but the spec and a repo. I think we should try to keep things that way.
You could think about approximating (1) by just using the hash of the compressed archive (which is pretty much unique), but there are issues with that. gzip
, tar
, etc. aren't deterministic, so if someone changes the way a patch is distributed (e.g. uses a different compression mechanism, or re-tars the same patch), the hash changes, and you won't have the same hash for the same patch over time.
Anyway, that's the rationale. I think compressed patches are infrequent (even if nwchem has an awful lot of them) and adding a little pain for compressed patches seems worth it to avoid a bunch of downloads being required before concretize
.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's easier to implement this way and the majority of patches aren't affected by it, then that seems reasonable. That being said while all of 1-3 make sense to me I still don't see why these are both required and/or why both would need to be set. I was assuming if we have a sha256 and the patch is an archive, it is assumed that the sha256 is for the archive. So I think that's another way of saying I agree with
You could think about approximating (1) by just using the hash of the compressed archive (which is pretty much unique)
As a side note: I'm unclear why the hash of the compressed archive would be less unique than the hash of the decompressed archive.
Regarding:
there are issues with that. gzip, tar, etc. aren't deterministic, so if someone changes the way a patch is distributed (e.g. uses a different compression mechanism, or re-tars the same patch), the hash changes, and you won't have the same hash for the same patch over time.
I think it's fine to treat this the same as if they had edited the patch contents themselves. It would be no less noisy because the archive's sha256 would be changing as often as this happens anyway (and the package must track the archive hash on account of 3).
lib/spack/spack/dependency.py
Outdated
"""Get the name of the package to be patched. | ||
|
||
This is needed so that a ``Dependency`` acts like a ``Package`` | ||
object when passed to a directive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is a dependency object passed to a directive? I see cases where patches are passed to a directive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just deleted this part of the comment since I think it's unnecessary with your suggestion to just say "the name of the dependency package".
But Dependency
objects are passed to the patch()
directive inside _depends_on
.
|
||
# Nasty, but it's the best way I can think of to avoid | ||
# side effects if directive results are passed as args | ||
remove_directives(args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to wrap my head around the directives logic and your updates to it. Could you provide an example of what goes wrong when this isn't done?
I feel like there's got to be a way to avoid undoing undesired modifications, but I admit I don't get what's going on here yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Say you do this:
class Foo(Package):
depends_on('bar', patches=[
patch('bar-bugfix.patch'),
patch('https://example.com/bar-feature.patch', sha256='abc123...'))
If you do not remove the results of the two patch
calls from the queue here, they'll be lazily executed along with all the other directives in DirectiveMetaMixin.__init__
(the metaclass initializer), because the @directive
decorator adds them to DirectiveMetaMixin._directives_to_be_executed
. The point of this is to only pass the result of patch()
to the depends_on()
directive, and let that directive handle it.
Basically the point of all the extra directive machinery is so that only directives at the class definition level get executed. The others are either nested (e.g., like patch()
in _depends_on
, which is only executed for its side effects) or they're passed as parameters, and we let the callee directive decide what to do with their results. The callee (depends_on
) could execute the _execute_patch
function objects itself, or it could conceivably enqueue them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The evaluation of patch
is passed to depends_on
so that it may call _execute_patch
with the Dependency
created by _depends_on
. So I'm confused by the line
The others are either nested (e.g., like patch() in _depends_on, which is only executed for its side effects) or they're passed as parameters, and we let the callee directive decide what to do with their results.
because patch
in depends_on
is an example of the latter as far as I can tell from examining the code, but you explicitly use it as an example of the former.
lib/spack/spack/package.py
Outdated
@@ -1044,7 +1090,11 @@ def do_patch(self): | |||
except spack.multimethod.NoSuchMethodError: | |||
# We are running a multimethod without a default case. | |||
# If there's no default it means we don't need to patch. | |||
tty.msg("No patches needed for %s" % self.name) | |||
if not patched: | |||
# if we didn't apply a patch, AND the patch function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: "if we didn't apply a patch from a patch directive, AND..."
"apply a patch" is general and I had to read this several times to figure out what it meant.
f225d47
to
d5e5a17
Compare
Ok @scheibelp: I pushed changes to address your review. I also replied to your questions but the replies are on now-outdated commits so I think you'll have to expand them above. |
00f2603
to
05ea79d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK thanks for the responses. I have a few more comments and questions. I'm going to look at this again tomorrow but by this point I think I have a general understanding of the how and why of most of the changes here.
# caller can return the directive if it should be queued. | ||
def remove_directives(arg): | ||
directives = DirectiveMetaMixin._directives_to_be_executed | ||
if isinstance(arg, (list, tuple)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the two cases where this function is used
remove_directives(args)
remove_directives(kwargs.values())
in both cases a collection is being passed in. I meant to say: change the name of the parameter in this definition to args
and then replace lines 206-210 with for arg in args:
.
if 'archive_sha256' not in kwargs: | ||
raise PatchDirectiveError( | ||
"Compressed patches require 'archive_sha256' " | ||
"and patch 'sha256' attributes: %s" % self.url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's easier to implement this way and the majority of patches aren't affected by it, then that seems reasonable. That being said while all of 1-3 make sense to me I still don't see why these are both required and/or why both would need to be set. I was assuming if we have a sha256 and the patch is an archive, it is assumed that the sha256 is for the archive. So I think that's another way of saying I agree with
You could think about approximating (1) by just using the hash of the compressed archive (which is pretty much unique)
As a side note: I'm unclear why the hash of the compressed archive would be less unique than the hash of the decompressed archive.
Regarding:
there are issues with that. gzip, tar, etc. aren't deterministic, so if someone changes the way a patch is distributed (e.g. uses a different compression mechanism, or re-tars the same patch), the hash changes, and you won't have the same hash for the same patch over time.
I think it's fine to treat this the same as if they had edited the patch contents themselves. It would be no less noisy because the archive's sha256 would be changing as often as this happens anyway (and the package must track the archive hash on account of 3).
lib/spack/spack/dependency.py
Outdated
Args: | ||
pkg (type): Package that has this dependency | ||
spec (Spec): Spec indicating dependency requirements | ||
patches (list): list of patches to apply to this dependency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still appears to be out of sync
lib/spack/spack/util/web.py
Outdated
@@ -328,3 +332,126 @@ def find_versions_of_archive(archive_urls, list_url=None, list_depth=0): | |||
continue | |||
|
|||
return versions | |||
|
|||
|
|||
def spider_checksums(url_dict, name, stage_function=None, keep_stage=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess what I'm saying is that that all this is doing is taking a list of versions, converting each version to a url, and then fetching from that list of URLs. There is no poking around inside each URL which is what I normally think of when I think of spidering. I think get_checksums_for_versions
would be more accurate and straightforward.
|
||
# Ignore any directives executed *within* top-level | ||
# directives by clearing out the queue they're appended to | ||
DirectiveMetaMixin._directives_to_be_executed = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(EDIT: this was intended to add to https://github.com/LLNL/spack/pull/5476/files#r141277358)
The only example I see of this is the call to patch
in _depends_on
(which only seems to be to support the syntactic sugar depends_on('libelf', patches='foo.patch')
). Why not handle this the way the extends
directive calls _depends_on
vs. depends_on
? In other words I can see that the remove_directives
logic below would be difficult to get rid of if users can pass directives as arguments, but this case looks like a problem we create for ourselves.
|
||
# Nasty, but it's the best way I can think of to avoid | ||
# side effects if directive results are passed as args | ||
remove_directives(args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The evaluation of patch
is passed to depends_on
so that it may call _execute_patch
with the Dependency
created by _depends_on
. So I'm confused by the line
The others are either nested (e.g., like patch() in _depends_on, which is only executed for its side effects) or they're passed as parameters, and we let the callee directive decide what to do with their results.
because patch
in depends_on
is an example of the latter as far as I can tell from examining the code, but you explicitly use it as an example of the former.
# caller can return the directive if it should be queued. | ||
def remove_directives(arg): | ||
directives = DirectiveMetaMixin._directives_to_be_executed | ||
if isinstance(arg, (list, tuple)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also are there other examples of directives as arguments besides passing patch
to depends_on
? Either way IMO it would make sense to describe that particular case in the explanatory comment.
d6882e4
to
7cee03f
Compare
- Previously, dependencies and dependency_types were stored as separate dicts on Package. - This means a package can only depend on another in one specific way, which is usually but not always true. - Prior code unioned dependency types statically across dependencies on the same package. - New code stores dependency relationships as their own object, with a spec constraint and a set of dependency types per relationship. - Dependency types are now more precise - There is now room to add more information to dependency relationships. - New Dependency class lives in dependency.py, along with deptype definitions that used to live in spack.spec. Move deptype definitions to spack.dependency
7cee03f
to
4c1dd80
Compare
- move `spack.cmd.checksum.get_checksums` to `spack.util.web.spider_checksums` - move `spack.error.NoNetworkError` to `spack.util.web.NoNetworkError` since it is only used there.
- Functions returned by directives were all called `_execute`, which made reading stack traces hard because you couldn't tell what directive a frame came from. - renamed them all to `_execute_<directive>` - Exceptions in directives were only really used in one or two places -- get rid of the boilerplate init functions and let the callsite specify the message.
- A package can depend on a special patched version of its dependencies. - The `Spec` YAML (and therefore the hash) now includes the sha256 of the patch in the `Spec` YAML, which changes its hash. - The special patched version will be built separately from a "vanilla" version of the same package. - This allows packages to maintain patches on their dependencies without affecting either the dependency package or its dependents. This could previously be accomplished with special variants, but having to add variants means the hash of the dependency changes frequently when it really doesn't need to. This commit allows the hash to change *just* for dependencies that need patches. - Patching dependencies shouldn't be the common case, but some packages (qmcpack, hpctoolkit, openspeedshop) do this kind of thing and it makes the code structure mirror maintenance responsibilities. - Note that this commit means that adding or changing a patch on a package will change its hash. This is probably what *should* happen, but we haven't done it so far. - Only applies to `patch()` directives; `package.py` files (and their `patch()` functions) are not hashed, but we'd like to do that in the future. - The interface looks like this: `depends_on()` can optionally take a patch directive or a list of them: depends_on(<spec>, patches=patch(..., when=<cond>), when=<cond>) # or depends_on(<spec>, patches=[patch(..., when=<cond>), patch(..., when=<cond>)], when=<cond>) - Previously, the `patch()` directive only took an `md5` parameter. Now it only takes a `sha256` parameter. We restrict this because we want to be consistent about which hash is used in the `Spec`. - A side effect of hashing patches is that *compressed* patches fetched from URLs now need *two* checksums: one for the downloaded archive and one for the content of the patch itself. Patches fetched uncompressed only need a checksum for the patch. Rationale: - we include the content of the *patch* in the spec hash, as that is the checksum we can do consistently for patches included in Spack's source and patches fetched remotely, both compressed and uncompressed. - we *still* need the patch of the downloaded archive, because we want to verify the download *before* handing it off to tar, unzip, or another decompressor. Not doing so is a security risk and leaves users exposed to any arbitrary code execution vulnerabilities in compression tools.
Ok this is updated with @scheibelp's suggestions and some bugfixes. |
@naromero77: this is merged. |
@tgamblin thanks |
refers spack#5587 closes spack#5593 The implementation introduced in spack#5476 used `dedupe` to set a private member of a `MultiValuedVariant` object. `dedupe` is meant to give a stable deduplication of the items in a sequence, and returns an object whose value depends on the order of items in its argument. Now the private member is set implicitly using the associated property setter. The logic changes from stable deduplication to a totally ordered deduplication (i.e. instead of `dedupe(t)` it uses `sorted(set(t))`). This should also grant that packages that shuffle the same set of patch directives maintain the same hash.
else: | ||
mvar = dspec.spec.variants['patches'] | ||
mvar._value = mvar._original_value = tuple( | ||
dedupe(list(mvar._value) + patches)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgamblin @scheibelp I'll write this comment here just to be sure I passed the information. What we are doing in the line:
mvar._value = mvar._original_value = tuple(dedupe(list(mvar._value) + patches))
is breaking an invariant of the class (the assumption that self._value
is totally ordered). This assumption is transparent to the user of the class, and bypassing the property setter here is not very robust. In fact any call to a method that does:
self.value = self.value + other
can potentially reorder self._value
(and lead to #5587?). An example of such a method may be for instance MultiValuedVariant.constrain
.
@@ -1829,6 +1805,43 @@ def concretize(self): | |||
if s.namespace is None: | |||
s.namespace = spack.repo.repo_for_pkg(s.name).namespace | |||
|
|||
# Add any patches from the package to the spec. | |||
patches = [] | |||
for cond, patch_list in s.package_class.patches.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgamblin @scheibelp There's a thing I don't fully understand about the patch mechanism. The information a packager provides with a directive is stored in a dictionary. This means that the for loop above is not ordered. So we are assuming a weak ordering by constraints. To be more clear, if we have:
patch('foo', when='@1')
patch('another_foo', when='%gcc')
patch('bar', when='@1')
patch('another_bar', when='%gcc')
patch('baz', when='+shared')
right now we'll be ensured that:
foo
is applied immediately beforebar
another_foo
is applied immediately beforeanother_bar
because they share the same cond
, but we don't know whether the first patch is foo
, another_foo
or baz
.
Would it make sense to relax this constraint and say that the patches must be commutative? If this is the case, then I think #5596 will solve #5587 after aggregating the two patches for mpfr
which depend on their relative order of application.
If that's not the case then I think we should go completely the other direction and ensure that the order of patches:
- reflects the order of directives, regardless of the condition
- enters the hash mechanism
Note that this is getting a lot trickier than the case before, since we can now inject patches. For example, if in a DAG we have 2 nodes A and B that both want to inject patches on C we must ensure that A and B are always traversed in the same relative order in any graph were they are present.
Do the considerations above sound reasonable to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To update this, https://github.com/LLNL/spack/pull/5596 now preserves the weak ordering that currently exists on packages patches (EDIT).
Note that this has some consequences: there are problems concretizing associated with the (implicit?) |
fixes #5587 In trying to preserve patch ordering, #5476 made equality inconsistent for the added 'patches' variant. This commit maintains the original weak ordering of patch applications while preserving consistency of comparisons. The ordering DOES NOT enter the hashing mechanism. It's supposed to be a hotfix, while we think of a cleaner and more-permanent solution.
Thanks seems to have resolved my issue with PR#4907. |
Fixes #7885 #7193 added the patches_to_apply function to collect patches which are then applied in Package.do_patch. However this only collects patches that are associated with the Package object and does not include Spec-related patches (which are applied by dependents, added in #5476). Spec.patches already collects patches from the package as well as those applied by dependents, so the Package.patches_to_apply function isn't necessary. All uses of Package.patches_to_apply are replaced with Package.spec.patches. This also updates Package.content_hash to require the associated spec to be concrete: Spec.patches is only set after concretization. Before this PR, it was possible for Package.content_hash to be valid before concretizing the associated Spec if all patches were associated with the Package (vs. being applied by dependents). This behavior was unreliable though so the change is unlikely to be disruptive.
Addresses an issue that came up in #4907.
A package can now depend on a special patched version of its dependencies.
depends_on()
can optionally take a patch directive or a list of them:The
Spec
YAML (and therefore the hash) now includes the sha256 ofthe patch in the
Spec
YAML, which changes its hash.version of the same package.
without affecting either the dependency package or its dependents.
This could previously be accomplished with special variants, but
having to add variants means the hash of the dependency changes
frequently when it really doesn't need to. This commit allows the
hash to change just for dependencies that need patches.
(qmcpack, hpctoolkit, openspeedshop) do this kind of thing and it
makes the code structure mirror maintenance responsibilities.
This commit means that adding or changing a patch on a
package will change its hash. This is probably what should happen,
but we haven't done it so far. (@adamjstewart mentioned some ANL
folks wanting this)
patch()
directives;package.py
files (and theirpatch()
functions) are not hashed, but we'd like to do that in thefuture.
To implement these extensions to Spack's DSL, some fanciness was added to
directives.DirectiveMetaMixin
. In particular:results won't be enqueued for lazy execution like a normal directive.
Previously, the
patch()
directive only took anmd5
parameter. Nowit takes a
sha256
parameter. We restrict the hash used because we wantto be consistent about which hash is used in the
Spec
.from URLs now need two checksums: one for the downloaded archive and
one for the content of the patch itself. Patches fetched uncompressed
only need a checksum for the patch. Rationale:
the checksum we can do consistently for patches included in Spack's
source and patches fetched remotely, both compressed and
uncompressed.
to verify the download before handing it off to tar, unzip, or
another decompressor. Not doing so is a security risk and leaves
users exposed to any arbitrary code execution vulnerabilities in
compression tools.
This also refactors the way dependency metadata is stored on packages.
Previously, dependencies and dependency_types were stored as separate
dicts on Package. This means a package can only depend on another in one
specific way, which is usually but not always true.
Prior code unioned dependency types statically across dependencies on
the same package.
New code stores dependency relationships as their own object (a
Dependency
),with a spec constraint and a set of dependency types and patches per relationship.
New
Dependency
class lives inspack.dependency
, along with deptypedefinitions that used to live in
spack.spec
.Updated
nwchem
andnauty
to usesha256
for patches (@junghans)Lines with long checksums are now ignored by
spack flake8
Documentation
@naromero77: this is nearly done, and should let you continue to maintain your own patches on dependencies for
qmcpack
.@jgalarowicz: you may be interested in this for
openspeedshop
.@mwkrentel: you may be interested in this for
hpctoolkit
, too, as I know you maintain a lot of patches on your dependencies. This would allow you to maintain them with thehpctoolkit
package wtihout disrupting other packages.