Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster virtuals and concretization #1015

Merged
merged 19 commits into from
Aug 9, 2016
Merged

Conversation

tgamblin
Copy link
Member

@tgamblin tgamblin commented Jun 2, 2016

This should fix #676.

@citibeth, @davydden, @alalazo, @becker33, @mplegendre and others will be interested in this. The changes here bring deal concretization down to around 3 seconds from around 20. dealii might be one of the most complicated packages in off-the-shelf spack. We have larger ones internally and so does @citibeth, so hopefully this will ease their life a bit.

Things in this PR (or related to it):

  • e8b4d5f (already in develop because it was a major time saver) fixes large amount of time spent in re.Scanner. A Lexer (implemented with re.Scanner), which compiles regexes, was being constructed every time we parse a spec, and that is expensive. Since spec parsing is a large part of evaluating directives like depends_on, this caused package.py import to take a long time. Adding a single lexer for all parsers fixes this.
  • Cache the ProviderIndex. We have to import all package.py files when we concretize specs that have virtual dependencies, because we have to find providers but we don't know which package.py files actually call provides. The ProviderIndex holds that information and is now cached per package repository and automatically updated when package.py files are newer than the index. It can also be updated incrementally, so updates after the first will not be very expensive.
  • Lazy evaluation of all_package_names() so that processing small specs doesn't even look at most package.py files.
  • Faster hash key function for compiler flags (FlagMap) -- uses tuples and doesn't manipulate strings. This saves a second or so for dealii.
  • To make the ProviderIndex compact, Spec.to_yaml won't write out empty fields in abstract specs. This should not affect things like spec.yaml or the install database, as those only contain concrete specs.
  • Allow ProviderIndex caches to be stored in ~/.spack/caches when $repo/index.yaml is not writable. This would let users cache providers even for multi-user spack repos.
  • Tests for the new ProviderIndex cache.

All of these commits attack the problem that concretization (and running many spack commands) require some fairly expensive operations to happen for every package.py file. This PR reduces the cost of these operations and avoids O(num packages) operations where it can. In the best case, only has to stat each package.py file instead of loading all of them and parsing specs. Generally just runningstat is pretty fast.

The concretization algorithm is still at least quadratic, but the DAGs are still small by modern standards in Spack, so this is not killing us yet.

@tgamblin tgamblin changed the title [WIP] Features/faster virtuals [WIP] Features/faster virtuals and concretization Jun 2, 2016
@citibeth
Copy link
Member

citibeth commented Jun 2, 2016

Is this ready to beta test in regular use?
On Jun 2, 2016 4:54 AM, "Todd Gamblin" notifications@github.com wrote:

@citibeth https://github.com/citibeth, @davydden
https://github.com/davydden, @alalazo https://github.com/alalazo,
@becker33 https://github.com/becker33, @mplegendre
https://github.com/mplegendre and others will be interested in this.
The changes here bring deal concretization down to around 3 seconds from
around 20. dealii might be one of the most complicated packages in
off-the-shelf spack. We have larger ones internally and so does @citibeth
https://github.com/citibeth, so hopefully this will ease their life a
bit.

Things in this PR (or related to it):

  • e8b4d5f
    e8b4d5f
    (already in develop) fixes large amount of time spent in re.Scanner. A
    Lexer (implemented with re.Scanner), which compiles regexes, was being
    constructed every time we parse a spec, and that is expensive. Since spec
    parsing is a large part of evaluating directives like depends_on, this
    caused package.py import to take a long time. Adding a single lexer
    for all parsers fixes this.
  • Cache the ProviderIndex. We have to import all package.py files
    when we concretize specs that have virtual dependencies, because we have to
    find providers but we don't know which package.py files actually call
    provides. The ProviderIndex holds that information and is now cached
    per package repository and automatically updated when package.py files
    are newer than the index. It can also be updated incrementally, so updates
    after the first will not be very expensive.
  • Lazy evaluation of all_package_names() so that processing small
    specs doesn't even look at most package.py files.
  • Faster hash key function for compiler flags (FlagMap) -- uses tuples
    and doesn't manipulate strings. This saves a second or so for dealii.
  • To make the ProviderIndex compact, Spec.to_yaml won't write out
    empty fields in abstract specs. This should not affect things like
    spec.yaml or the install database, as those only contain concrete
    specs.
  • [] Tests for the new ProviderIndex cache.
  • [] Allow ProviderIndex caches to be stored in ~/.spack/caches when
    $repo/index.yaml is not writable. This would let users cache providers
    even for multi-user spack repos.

I don't believe it's the actual algorithm that's hurting concretization

at the moment

You can view, comment on, or merge this pull request online at:

#1015
Commit Summary

  • Lazily evaluate all_package_names in repository.py
  • More compact YAML formatting for abstract specs.
  • Make ProviderIndex yaml-izable.
  • Make ProviderIndexes mergeable, so we can cache them per-repo.
  • Add a ProviderIndex cache.
  • Remove vestigial methods from Package.
  • Faster key in FlagMap._cmp_key
  • Fix namespace support in Repo.get_pkg_class()

File Changes

Patch Links:


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1015, or mute the thread
https://github.com/notifications/unsubscribe/AB1cdwk5SuLRA0RKdaohlTbO8WaYdjoSks5qHpougaJpZM4IsUcF
.

@tgamblin
Copy link
Member Author

tgamblin commented Jun 2, 2016

@citibeth I think so. The only reason I have not merged it yet is that I have not written unit tests for the ProviderIndex cache, and I want to ensure the cache works even if you can't read the repository directory.

If you want to beta test it, the only thing you would need to remember is that if things go wrong, remove $repo_root/index.yaml and $repo_root/lock. If you are just using the builtin spack repository, $repo_root is $spack_root/var/spack/repos/builtin.

@willelm
Copy link

willelm commented Jun 2, 2016

I've been frustrated by speed and eager to try the fix. I merged with 9411cb1, and I think there's an issue with python activation. Try building py-ipython and activating it. It builds fine but gives errors when I start it about readline services being unavailable. Readline is clearly a dependency of python in the spec, but doesn't seem to be followed.

@citibeth
Copy link
Member

I got the problem below. Maybe my merge was bad.

[rpfische@gs611-gibbs spack]$ spack install patchelf
Traceback (most recent call last):
  File "/home2/rpfische/spack/bin/spack", line 179, in <module>
    main()
  File "/home2/rpfische/spack/bin/spack", line 157, in main
    return_val = command(parser, args)
  File "/home2/rpfische/spack/lib/spack/spack/cmd/install.py", line 82, in install
    explicit=True)
  File "/home2/rpfische/spack/lib/spack/spack/package.py", line 870, in do_install
    if 'install' in install_phases and spack.install_layout.check_installed(self.spec):
  File "/home2/rpfische/spack/lib/spack/spack/directory_layout.py", line 272, in check_installed
    path = self.path_for_spec(spec)
  File "/home2/rpfische/spack/lib/spack/spack/directory_layout.py", line 138, in path_for_spec
    path = self.relative_path_for_spec(spec)
  File "/home2/rpfische/spack/lib/spack/spack/directory_layout.py", line 208, in relative_path_for_spec
    spec.dag_hash(self.hash_len))
  File "/home2/rpfische/spack/lib/spack/spack/spec.py", line 743, in dag_hash
    self.to_node_dict(), default_flow_style=True, width=sys.maxint)
  File "/home2/rpfische/spack/lib/spack/external/yaml/__init__.py", line 202, in dump
    return dump_all([data], stream, Dumper=Dumper, **kwds)
  File "/home2/rpfische/spack/lib/spack/external/yaml/__init__.py", line 190, in dump_all
    dumper.represent(data)
  File "/home2/rpfische/spack/lib/spack/external/yaml/representer.py", line 28, in represent
    node = self.represent_data(data)
  File "/home2/rpfische/spack/lib/spack/external/yaml/representer.py", line 57, in represent_data
    node = self.yaml_representers[data_types[0]](self, data)
  File "/home2/rpfische/spack/lib/spack/external/yaml/representer.py", line 223, in represent_dict
    return self.represent_mapping(u'tag:yaml.org,2002:map', data)
  File "/home2/rpfische/spack/lib/spack/external/yaml/representer.py", line 123, in represent_mapping
    node_value = self.represent_data(item_value)
  File "/home2/rpfische/spack/lib/spack/external/yaml/representer.py", line 57, in represent_data
    node = self.yaml_representers[data_types[0]](self, data)
  File "/home2/rpfische/spack/lib/spack/external/yaml/representer.py", line 223, in represent_dict
    return self.represent_mapping(u'tag:yaml.org,2002:map', data)
  File "/home2/rpfische/spack/lib/spack/external/yaml/representer.py", line 123, in represent_mapping
    node_value = self.represent_data(item_value)
  File "/home2/rpfische/spack/lib/spack/external/yaml/representer.py", line 41, in represent_data
    if self.ignore_aliases(data):
  File "/home2/rpfische/spack/lib/spack/external/yaml/representer.py", line 142, in ignore_aliases
    if data in [None, ()]:
  File "/home2/rpfische/spack/lib/spack/llnl/util/lang.py", line 240, in <lambda>
    setter('__eq__', lambda s,o: (s is o) or (o is not None and s._cmp_key() == o._cmp_key()))
AttributeError: 'tuple' object has no attribute '_cmp_key'

@citibeth
Copy link
Member

I've had success with a different merge. This PR has made my life with Spack so much better. I recommend it be merged without delay.

- Don't need to list all packages unless we have to.

- Only use the list of all packages for existence checks if we have
  generated it for some other purpose.
- Don't add empty/absent fields to Spec YAML when they're not there.
- allow a provider index to be stored and re-read.
- Spack will check if the index needs updating, and will only parse
  all package files if it does.

- Spack tries to parse as few package files as necessary.
- modules weren't set properly as attributes in parent modules
- global compiler cache breaks tests that come after this one.
- Make namespace, arch, and dependnecies show up in spec yaml
  only if they're set.

- Lost some of this functionality with deptypes
Major stuff:

- Created a FileCache for managing user cache files in Spack.  Currently just
  handles virtuals.

- Moved virtual cache from the repository to the home directory so that users do
  not need write access to Spack repositories to use them.

- Refactored `Transaction` class in `database.py` -- moved it to
  `LockTransaction` in `lock.py` and made it reusable by other classes.

Other additions:

- Added tests for file cache and transactions.

- Added a few more tests for database

- Fixed bug in DB where writes could happen even if exceptions were raised
  during a transaction.

- `spack uninstall` now attempts to repair the database when it discovers that a
  prefix doesn't exist but a DB record does.
@tgamblin
Copy link
Member Author

tgamblin commented Aug 9, 2016

This is ready to merge. I've cleaned up a number of changes, and caching is now done in ~/.spack so it doesn't require write access to Spack repos to work. I also simplified file locking a bit.

It's rebased on current develop, so the merge should be clean. Can @citibeth, @alalazo, @gartung, @davydden, @glennpj try this out and let me know if it helps?

@glennpj: I think your issue in #676 may be something different, but this is worth a try.

@tgamblin tgamblin changed the title [WIP] Features/faster virtuals and concretization Features/faster virtuals and concretization Aug 9, 2016
@tgamblin tgamblin changed the title Features/faster virtuals and concretization Faster virtuals and concretization Aug 9, 2016
@tgamblin tgamblin merged commit a095fd5 into develop Aug 9, 2016
@davydden
Copy link
Member

davydden commented Aug 9, 2016

@tgamblin thanks! sorry, did not have time to check it out, but ought to be fine.

@tgamblin tgamblin deleted the features/faster-virtuals branch October 11, 2016 20:02
olupton pushed a commit to olupton/spack that referenced this pull request Feb 7, 2022
Co-authored-by: Blanco Alonso Jorge <jblanco@bbpv2.epfl.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spack is Great, but too slow
4 participants