Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spack should be more aggressive about reusing installed software #311

Closed
alfredo-gimenez opened this issue Jan 7, 2016 · 35 comments · Fixed by #25310
Closed

Spack should be more aggressive about reusing installed software #311

alfredo-gimenez opened this issue Jan 7, 2016 · 35 comments · Fixed by #25310
Labels
concretization snl-atdm Issues for SNL ATDM usage of spack
Projects

Comments

@alfredo-gimenez
Copy link
Contributor

When I have python@2.7.10 installed:

(cab689):~$ spack find python
==> 1 installed packages.
-- chaos_5_x86_64_ib / gcc@4.9.2 --------------------------------
python@2.7.10

I now try to install py-twisted, explicitly setting the dependency ^python@2.7.10:

(cab689):~$ spack install -v py-twisted^python@2.7.10
==> Installing py-twisted
==> Installing python
==> bzip2 is already installed in /g/g22/gimenez1/src/spack/opt/spack/chaos_5_x86_64_ib/gcc-4.9.2/bzip2-1.0.6-wl4v7wdok42cfndertdgyxys2au2ljpz.
==> ncurses is already installed in /g/g22/gimenez1/src/spack/opt/spack/chaos_5_x86_64_ib/gcc-4.9.2/ncurses-6.0-2v7r63atwq6aw3p66bc3mkp7hxeoxgqx.
==> zlib is already installed in /g/g22/gimenez1/src/spack/opt/spack/chaos_5_x86_64_ib/gcc-4.9.2/zlib-1.2.8-mbw4kksfiiloopjcuqbwrktbxe7hq73x.
==> openssl is already installed in /g/g22/gimenez1/src/spack/opt/spack/chaos_5_x86_64_ib/gcc-4.9.2/openssl-1.0.2e-qs3iwf2rhwlck3qsyrlea7i7zbxluntg.
==> sqlite is already installed in /g/g22/gimenez1/src/spack/opt/spack/chaos_5_x86_64_ib/gcc-4.9.2/sqlite-3.8.5-2fhvbyidf72xkkazmqnng4ofp2z2hgxk.
==> readline is already installed in /g/g22/gimenez1/src/spack/opt/spack/chaos_5_x86_64_ib/gcc-4.9.2/readline-6.3-zclrirpahthnvxm2kj2qbz3rup6agcg5.
==> Already downloaded /g/g22/gimenez1/src/spack/var/spack/stage/python-2.7.10-4azwfxr6b6fddsanso7fgk5xivgdnffs/Python-2.7.10.tar.xz.

As you can see, spack tries to reinstall python 2.7.10.

I went ahead with the installation to see the dependency graph of the new python, here it is:

(cab689):~$ spack find -d python
==> 2 installed packages.
-- chaos_5_x86_64_ib / gcc@4.9.2 --------------------------------
    python@2.7.10
        ^bzip2@1.0.6
        ^ncurses@6.0
        ^openssl@1.0.2e
            ^zlib@1.2.8
        ^readline@6.3
            ^ncurses@6.0
        ^sqlite@3.8.5
    python@2.7.10
        ^bzip2@1.0.6
        ^ncurses@6.0
        ^openssl@1.0.2e
            ^zlib@1.2.8
        ^readline@6.3
            ^ncurses@6.0
        ^sqlite@3.8.5
        ^zlib@1.2.8

The newly installed python has ^zlib, everything else is the same. py-twisted does not, however, depend on python^zlib, so not sure why spack is making this new dependency requirement. To verify that py-twisted is using the new python^zlib:

(cab689):~$ spack find -d py-twisted
==> 1 installed packages.
-- chaos_5_x86_64_ib / gcc@4.9.2 --------------------------------
    py-twisted@15.4.0
        ^py-setuptools@18.1
            ^python@2.7.10
                ^bzip2@1.0.6
                ^ncurses@6.0
                ^openssl@1.0.2e
                    ^zlib@1.2.8
                ^readline@6.3
                    ^ncurses@6.0
                ^sqlite@3.8.5
                ^zlib@1.2.8
        ^python@2.7.10
            ^bzip2@1.0.6
            ^ncurses@6.0
            ^openssl@1.0.2e
                ^zlib@1.2.8
            ^readline@6.3
                ^ncurses@6.0
            ^sqlite@3.8.5
            ^zlib@1.2.8

TL;DR py-twisted should use the existing python, but it creates a new python^zlib for no apparent reason.

@tgamblin
Copy link
Member

tgamblin commented Jan 7, 2016

I think I need to put this on the agenda for the telcon today. Spack's current logic is:

  1. Concretize
  2. Match against existing installs.

This was intended be conservative, but what it ends up doing is not reusing as much as it could. If we did more matching against existing installs as part of the concretization process, we would reuse more by default, at the price of maybe linking against more old stuff.

So I guess my question is: do you want Spack to "settle" for installed stuff more often? I think the answer for most people (like @trws) is "yes". I think we should change the default concretization policy to be to match installed first.

That will make installs less deterministic. i.e., your install order will affect what you link against. We should probably provide a command-line option to install a "clean-slate" version of the package, where we concretize without considering installed packages and rebuild stuff that's not current. That would let you get the current behavior only if linking with the existing package doesn't work.

@alfredo-gimenez: Thoughts?

@davidbeckingsale
Copy link
Contributor

@tgamblin This has bitten me too with GCC. I think the settling for installed stuff is good, but maybe this needs to be controlled per package. For example, rebuilding something small with a new dependency is no big deal, but if it's something like gcc or llvm it would be better to stick with what's already installed.

@tgamblin
Copy link
Member

tgamblin commented Jan 7, 2016

@davidbeckingsale: It sounds like you'd be happy with the default being to take what is already installed. You could always do an install --aggressive or something to rebuild more stuff.

@alfredo-gimenez
Copy link
Contributor Author

I agree with @davidbeckingsale. Especially in a situation where I want multiple python libraries to all extend the same python, I'd prefer it to look for existing python.

The other weird thing is, python has zlib as a dependency, but spack first installed python with no zlib in the dependency graph...

@tgamblin
Copy link
Member

tgamblin commented Jan 7, 2016

@alfredo-gimenez: that happens when a package.py file changes and you do a git pull. Likely, you installed a prior version of the Python package that assumed a system zlib, and someone has since updated the python package.

This change is actually pretty easy to implement, so I think we can play with it. Do others have opinions on this? @eschnett, @alalazo, @nrichart, @mathstuf @mplegendre ?

@mathstuf
Copy link
Contributor

mathstuf commented Jan 7, 2016

The problem, as I see it, is that when adding a new dependency, existing installs don't have a flag for it and are treated as if they didn't build with it. As an example, adding a c++ flag to the GCC build to indicate that g++ is wanted would mean that existing builds "don't conform" since spack would think it doesn't meet the requirements (even though a gcc build would have g++ already). There are at least two ways to deal with it:

  • treat the previous missing dependency declaration as a bug and say "sorry"; or
  • when adding a new dependency link, have a field for what it means when an existing build doesn't have anything about its state (is this a flag to turn off a default-on bit or is it a flag to turn on some bit of previously unavailable functionality).

I prefer the first simply because the second's maintenance burden is likely very high compared to the time to just rebuild the software again (When can the declarations safely be dropped? Was it dependent on some system state when it was built (say, a system Qt that happened to be found in which case there is no right answer)?).

@eschnett
Copy link
Contributor

eschnett commented Jan 7, 2016

I have openmpi installed, and whenever I install a package that requires mpi, spack begins to build mpich. So: yes please, spack should look at installed packages and variants.

@mathstuf
Copy link
Contributor

mathstuf commented Jan 7, 2016

I think this is different than taking a system version and making it a package (as far as spack is concerned at least). I think the process for that case would be "tell spack that this mpi I already have installed satisfies the "mpi" requirement any package might need" (maybe with some related information about a prefix or something else). For this case, it's as if we have a package frobnitz which happens to use MPI if it finds it on its own, but spack never knew that it needed MPI in the first place. So existing builds have MPI if MPI was (at the time of that package's build) present and doesn't if it wasn't there. I think what you want is indeed something spack should support, but this is not the same problem.

@mplegendre
Copy link
Contributor

The work in #120 can handle the MPI case. It uses a package.yaml config file to set "preferred" package configurations in Spack. You could, for example, specify that you're a openmpi@1.8.4 and icc@15 shop, and Spack will by default concertize with that MPI and compiler. You can also start specifying locations for external packages, which Spack then use rather than building its own versions. By pointing an external package at your local MPI installation, and setting that version as preferred you could have default Spack builds always link against the local MPI.

This is independent of the original request to prefer already-installed packages. As a first-order we could prefer packages specified in the packages.py, and as a second-order we could prefer packages already installed.

@tgamblin
Copy link
Member

tgamblin commented Jan 7, 2016

@mathstuf:
I think we have this one covered. It's 1 but instead of throwing a bug we just conservatively rebuild. The specific semantics are implemented in spec.py#323. Basically if a spec is concrete (i.e. it is already installed and all known variants were filled in when it was concretized) then absence of a variant is treated as unsatisfiable. If the spec is abstract, then absence is treated as satisfiable, because you could constrain it to have that variant. If we went to a model where we looked at existing installs first, we would do #1 but rebuild instead of saying sorry. This seems right to me. I suppose we could warn the user in cases where we rebuild something that is close to an already installed version. Printing out why we rebuilt might actually be kind of cool.

"tell spack that this mpi I already have installed satisfies the "mpi" requirement any package might need" (maybe with some related information about a prefix or something else).

The externals stuff (#120), as @mplegendre mentioned, does this. That is getting merged along with the cray port PR #309.

I think what you want is indeed something spack should support, but this is not the same problem.

I think @eschnett meant he has OpenMPI installed via Spack. If OpenMPI is the first thing you install with Spack, it would be really nice if subsequent specs that require MPI just resolved against it.

@mplegendre

As a first-order we could prefer packages specified in the packages.py, and as a second-order we could prefer packages already installed.

Which to prefer gets a little complicated and you might want it configurable. I think we should allow someone to decide where they rank their installs stuff compared to their own and site packages.py concretization orders. I could see people wanting both. If I'm the guy deploying for everyone, I probably want to stick to the stack as much as possible. If i'm building in my home directory I might prefer my own packages to site concretization prefs (especially if they were bundled with Spack and not mine)

@eschnett
Copy link
Contributor

eschnett commented Jan 7, 2016

@mathstuf Let me clarify: I have the spack package openmpi installed, and nevertheless spack wants to install the mpich package to satisfy the virtual mpi package requirement. Instead, it should be happy with openmpi.

@tgamblin
Copy link
Member

tgamblin commented Jan 7, 2016

Based on this I think the consensus is to look at what is installed, and to add some type of precedence for file-based concretization preferences (once that is merged... I don't think anyone but me, @mplegendre, and @becker33 have seen that :)

@mplegendre
Copy link
Contributor

@tgamblin Allowing users rank package.yaml vs site-installs vs user-installs would get complicated. I think we should just pick a simple order and enforce it.

If a user explicitly specifies a package version as preferred in package.yaml, then they probably already have that package installed or want to install it, so first-order should be package.yaml. Second-order should be preferring local installs and third-order should be preferring site-installs. These are all just about picking defaults, so if Spack picks a default wrong then a user can always just be more explicit about what they want.

@tgamblin
Copy link
Member

tgamblin commented Jan 7, 2016

@mplegendre: I think picking an order within a config scope makes sense. I was thinking more in terms of "what about site scope". My preference would be something like ~/.spack/packages.py > installed packages > $spack/etc/spack/packages.py. That way we can put sensible defaults in the Spack distro, which the users can do a hard override on, but if a user likes a particular MPI better and installs it, they get the MPI they implicitly asked for.

@trws
Copy link
Contributor

trws commented Jan 7, 2016

I'm coming in on this late, and it seems that the overall decision
matches my preferences, but I would note that a couple of things are
being conflated just a bit here in terms of options at least. It only
really matters in corner cases, but I think of these as separate:

  1. rebuild or not when a package of the correct version and all
    specified variants matching but non-default un-specified variants
  2. rebuild or not when a new version of a direct dependency is available
  3. rebuild or not when a new version of an indirect dependency is
    available
  4. rebuild transitive and intermediate dependencies for which nearer
    packages meet all requirements of the current package but may not match
    updated global variant and package preferences (for example, building a
    vim plugin that depends on vim, should you rebuild python with a new
    dependency because of an updated configuration setting?)

Case 1 and 2 are, by my mind and seemingly this discussion, pretty
clearly things that we want not to cause rebuilds by default.
Presumably 3 and 4 are as well, but at least 2-4 probably warrant
configuration options or control flags, because someone will want to do
each of those. In gentoo-land, where as @tgamblin points out names are
often a bit odd, these roughly translate to 2:--update (-u) update
this package and all direct dependencies, 3:--deep (-D) consider the
entire dependency graph including past packages that are already
available, 4:--newuse (-N) rebuild every package seen that has had its
use flags (variants) change (by user addition or subtraction or package
maintainer addition or subtraction) since the version available.

These can be used alone or combined to get pretty much every behavior I
can think of, from nothing giving you "reuse everything" to (my usual
emerge flags) -uDN for "I want this package and all of its direct and
transitive dependencies to be their most up to date and variant
compliant version."

This also reminds me, is there a way to refer to "all packages directly
requested to be installed by the user?" It occurs to me because the
standard "update everything please" in emerge, and ports come to think
of it, is something akin to emerge -uDN world where world only refers
to those packages installed intentionally, excluding all that were
pulled in by dependency resolution.

@alalazo
Copy link
Member

alalazo commented Jan 8, 2016

Due to the time zone I am also entering the discussion late. Besides agreeing with basically everything has been said, I just want to add a consideration that maybe everyone considered implicit : it seems to me that most of the points in this discussion basically ask for more meta-data associated with package entries in the db.

Example 1 : original python ^zlib issue

The original issue from which the the discussion started stems from the fact that a package.py file has been updated and, if I understood it correctly, we don't store any kind of hash for the installation recipe yet. If we inspect local installs before concretization we may very likely use outdated installed packages silently in cases like this. Adding an hash that relates only to the package installation instructions (or something similar) may permit at least to mark packages as up-to-date or out-of-date : package.py changed or even out-of-date : dependency <name> out-of-date and warn users about those situations. This goes in the direction of telling users why something fails (or just why something may be potentially dangerous).

Example 2 : finer updates

@trws consideration about being able to update every package explicitly installed by a user may be implemented by adding an explicit_request = <boolean> attribute in db entries.

Example 3 : external repositories

The last example that comes to my mind is external repositories. I still have to read the code, so that my concerns may have been answered there already, but in case : do we keep track of the provenience of an installed package? A use case that comes to my mind is having in an external repository a custom and site-specific version of a package that is also present in the built-in:

  1. I would like to be able to install simultaneously the 'vanilla' version along with the custom version present in my external repo
  2. I want to be warned at dependency resolution time if I removed the custom package.py from the external repository but left an installed version of it

@tgamblin tgamblin mentioned this issue Jan 14, 2016
2 tasks
nrichart referenced this issue in scheibelp/spack Jan 15, 2016
There are two sensible defaults for building boost libraries: build all of them
or build none of them. Previously the Spack boost package took the first
approach. This commit changes to building no libraries by default. The user can
specify which libraries they need using variants (e.g. +iostreams to compile the
boost iostreams library). If no libraries are built then a header-only install
is performed (no compilation, just copy header files to prefix). The consequence
of this change is that packages which specify a dependency on boost may now fail
(until they are updated to specify exactly which boost libraries they need
compiled).

The user may now specify whether to build shared libraries (static libraries are
always built) and whether to build libraries with/out multi-threading support
(default is to only build with multi-threading support).

The executable on the user-config.jam toolset line is set to Spack's cc script.
Before, without this, the desired toolset was used but Spack deferred to the
boost build system to choose the compiler version.

bzip2 and zlib are always specified as dependencies when iostreams is built
(before this could be controlled with the +compression variant).
@mathstuf
Copy link
Contributor

Is this basically solved given #839?

@adamjstewart
Copy link
Member

Not really. While #839 may solve this particular case with Python, the underlying problem is that if I install, let's say, hdf5+szip, and then I install netcdf, Spack will reinstall hdf5~szip because szip defaults to False. Spack does not currently take into account what is already installed. It decides exactly what should be installed, and then checks if that exact combination of versions and variants is already installed.

@davydden
Copy link
Member

davydden commented Jul 18, 2016

@adamjstewart but i think this particular issue would be solved if #839 is implemented assuming that Spack re-use build-only dependencies. I think that's what @mathstuf was referring to.

@adamjstewart
Copy link
Member

Yes, Python would no longer be re-installed I guess. But this problem occurs for every package in Spack, not just Python.

@alalazo alalazo added this to Needs triage in Most wanted via automation Jul 26, 2018
@alalazo alalazo changed the title Installing py-twisted, spack installs a new version of python with ^zlib Spack should be more aggressive about reusing installed software Jul 26, 2018
@alalazo alalazo moved this from Needs triage to High priority in Most wanted Jul 26, 2018
@sethrj
Copy link
Contributor

sethrj commented Mar 27, 2019

I'm coming in really late to this discussion, but I've hit this issue a few times [1] and yesterday noticed a potentially useful feature in Homebrew that could possibly serve as an analog for Spack.

The brew pin command will prevent a package "from being upgraded when issuing the brew upgrade formula command". Perhaps something like this could be implemented similarly to (and perhaps using the same infrastructure as) the external buildable: false feature in packages.yaml)?


[1] For example, I have Python set to use [3.7.2] in my packages.yaml file, but spack spec py-ipython shows (truncated here):

py-ipython@7.3.0%clang@9.0.0-apple
    ^py-appnope@0.1.0%clang@9.0.0-apple
        ^python@2.7.16%clang@9.0.0-apple

even though the only dependency of py-appnope is the PythonPackage class. Changing the spec to py-ipython^python@3.7.2 fixes the dependency. Also note that I only have python 3 installed, not python 2.

@healther
Copy link
Contributor

This should be "just" a bug in the concretiser or some other dependency restricts to python@2.7.X, the list in spack spec does not show all dependency edges, but only the "first" encountered

@trws
Copy link
Contributor

trws commented Mar 29, 2019 via email

@jthies
Copy link
Contributor

jthies commented May 14, 2019

Hi,
this issue has been open for more than three years and summarizes my frustrations with spack quite well.
Suppose I want to quickly try out some high-level simulation code, I don't want it to reinstall MPI+dependencies, hdf5, etc. I just want to type (say)

spack install --reuse and skip irrelevant variant settings in dependencies when concretizing.
To get such behavior right now I have to maintain an extensive packages.yaml, and still it happens all the time that openmpi is being rebuilt because some irrelevant variant was added to a dependency. Are there any plans to include such an option in spack?

@healther
Copy link
Contributor

Are you aware of the possibility to do spack install <simulation> ^/<hashofyouropenmpi>?

In general I don't really see the point, as it does not cost me time to do the rebuild. It does cost my computer and I don't care about its opinions too much. I think that is also the point of view of most of the main contributors (just judging by this issue being open for 3 years) -> so feel free to open a pull request that implements this functionality. I'd expect it to be merged relatively quickly, but I'd also expect it to cost significant time to implement that (as it touches the inner workings of the concretiser)

@jthies
Copy link
Contributor

jthies commented May 16, 2019

thanks, the hash-based package selection looks useful, I didn't know about that. It still means I have to maintain the packages.yaml file carefully.

@adamjstewart
Copy link
Member

@healther I think most of the main contributors, including myself, would love to see Spack reuse installed dependencies as much as possible. But as you said, this completely changes how the concretizer works and would be a large overhaul, which is why this issue has been open for so long.

@jthies
Copy link
Contributor

jthies commented May 17, 2019

a non-intrusive approach would be to have a spack command to generate (part of) a packages.yaml file that presribes a number of installed packages (or loaded modules)

@bartlettroscoe
Copy link

CC: @fryeguy52, @becker33

Any chance this will get fixed or spack will add an option to make spack do this?

For SNL ATDM we are needing to break up the building of the compiler, the "tools" packages, MPI, and the TPL packages into four separate spack install commands and spack is rebuilding the same package with different hashes several times but otherwise there are no differences that I can see in the specs.

We are already needing to generate package.yaml files to reuse the "tools" packages and MPI that is built in the downstream TPLs package builds. I guess we could script up the the generation of entries for the package.yaml files for these other libraries that get constantly rebuilt like 'libiconv', 'numactl', etc. We have already created the infrastructure for doing this so this would not be that hard to do. The only challenge is having to pin down the versions of all of these packages since that is the only safe way to find install directories of the the form:

spack/opt/spack/<arch>/<pkg-compiler-name>-<pkg-compiler-ver>/<pkg-name>-<pkg-ver>-<hash>

A-priori we know all of this info accept for the has <hash> so we do a:

$ ls -d spack/opt/spack/<arch>/<pkg-compiler-name>-<pkg-compiler-ver>/<pkg-name>-<pkg-ver>-*

Currently, that can return more than one directory because of this issue (so we just return the first directory found). But if we populate package_common_<pkg-compiler-name>-<pkg-compiler-ver>.yaml files with the list of packages that we know that spack is rebuilding between these different sets of packages, then I think we can guarantee only a single install of each of these packages for each compiler <pkg-compiler-name>-<pkg-compiler-ver>. Since we have a closed set of packages that we are needing to install with Spack for SNL ATDM, that should be tractable. But that will add to our scripting code which is already up to:

-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Bourne Shell                     2            195            153            710
YAML                             6              5             19            133
-------------------------------------------------------------------------------
SUM:                             8            200            172            843
-------------------------------------------------------------------------------

But that 800+ lines of scripting code is less code than writing a package install system from scratch so so be it.

@becker33
Copy link
Member

becker33 commented Aug 8, 2019

@bartlettroscoe this is dependent on a rewrite of the concretization algorithm that Todd is currently working on. @fryeguy52 can tell you about the demo @tgamblin gave at the workshop this week. It's not fully functional yet, but it appears to be coming along well.

@bartlettroscoe bartlettroscoe added the snl-atdm Issues for SNL ATDM usage of spack label Nov 1, 2019
matz-e pushed a commit to matz-e/spack that referenced this issue Apr 27, 2020
- Added tag 7.6.6
- Defined neurodamus-neocortex, hippocampus, thalamus to use
  neuron@7.6.6 based on Michael Hines' suggestion on the
  preferred version
@jthies
Copy link
Contributor

jthies commented Jun 5, 2020

Any progress on this? After much frustration I adopted a multi-stage installation similar to what @bartlettroscoe described, but it would be a lot easier if I could tell spack to reuse installed specs rather than rebuild the latest version.

@tgamblin
Copy link
Member

tgamblin commented Aug 9, 2021

For folks still interested in this, check out #25310, which will finally allow the concretizer to aggressively reuse installed packages, packages from build caches, and packages from upstreams.

Most wanted automation moved this from High priority to Closed Nov 5, 2021
AlexanderRichert-NOAA added a commit to AlexanderRichert-NOAA/spack that referenced this issue Aug 28, 2023
Fix BUFR_INC variables for bufr recipe & add MADIS 4.5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
concretization snl-atdm Issues for SNL ATDM usage of spack
Projects
No open projects
Most wanted
  
Closed
Development

Successfully merging a pull request may close this issue.