Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add build artifact caching #67

Merged
merged 1 commit into from
Jul 17, 2016
Merged

Conversation

vmrob
Copy link
Contributor

@vmrob vmrob commented Jul 9, 2016

This patch introduces caching that can be made available with needy cache-path set [path] and acts as a key/value store to archive the contents of the build directory after builds. When the cache contains artifacts for a particular build, they are used instead of going through the build process.

@vmrob
Copy link
Contributor Author

vmrob commented Jul 9, 2016

Looking for some comments. I still need to write some tests to cover the new stuff, but this implementation currently works on my machine.

from .caches.directorycache import DirectoryCache


def __invoke_with_configured_needy(func, parameters):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to deduplicate repeated code here, but ugh. I'm not sure of a clean way to do this using the with blocks.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleaner way would be with another context manager. This is fine though. Not a big deal.

@vmrob
Copy link
Contributor Author

vmrob commented Jul 10, 2016

This PR is still a little bit WIP, but I think the structure is mostly good. RE garbage collection in the base class: I don't think it's applicable here unless we generalize it further with methods for enumerating keys and deleting keys. If that's the case, then GC in the base class essentially becomes something like...

def collect_garbage(self, max_age=timedelta(weeks=2)):
    for key in [k for k in self.keys() if k.age() > max_age]:
        self.expire_key(key)

I'm not saying that's a nice thing to have, I'm just not sure it's within the scope of this first iteration without knowing what we want out of other cache implementations.

So TODO:

  • Use transactions for keys
  • Tests
  • Better garbage collection?

ldflags emits the linker flags required to use the satisfied needs
builddir emits the build directory for a need
sourcedir emits the source directory for a need
pkg-config-path emits the pkg-config path for a need
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Why is this pkg-config-path stuff showing up in this diff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spacing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and you forgot to add that to the epilog before.

@vmrob vmrob changed the title Add build artifact caching WIP: Add build artifact caching Jul 11, 2016
@vmrob
Copy link
Contributor Author

vmrob commented Jul 11, 2016

Tests and better garbage collection are what remain. I'm not sure if the file locking works or not--about to write some tests for it.

@ccbrown
Copy link
Owner

ccbrown commented Jul 11, 2016

Notes as discussed in person:

  • Garbage collection can be built externally if transactions and an unset method are provided by the cache implementation. One possible implementation would be to maintain an index or manifest, and write last-use times to it.
  • Some potential cache configurables such as garbage collection policy should be set per-cache. Thus their configuration would need to be stored in the cache itself. This emphasizes the need for transactions.

@vmrob vmrob changed the title WIP: Add build artifact caching Add build artifact caching Jul 16, 2016
@vmrob vmrob force-pushed the directory-cache branch 5 times, most recently from 9dea642 to c69ab23 Compare July 16, 2016 10:05
logging.info('Storing build artifacts in cache')
try:
with self.__load_manifest() as m:
self.__cache.store_directory(directory, key, timeout=self.__lock_timeout)
Copy link
Owner

@ccbrown ccbrown Jul 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could reduce contention if you update the manifest, then release the lock before the store

with self.__load_manifest() as m:
    m.touch(key)
    self.__collect_garbage(m)
self.__cache.store_directory(directory, key, timeout=self.__lock_timeout)

for key in [k for k in manifest if manifest[k].use_time < min_time]:
try:
with self.__cache.lease(key, timeout=0):
self.__cache.unset_key(key)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if you invoke self.__cache.unset_key(key) without a lease?

@vmrob vmrob force-pushed the directory-cache branch 3 times, most recently from 7ea22f0 to 0d48fa2 Compare July 16, 2016 23:58

def set_cache_path(args=[]):
needs_dir = Needy.resolve_needs_directory('.')
default_cache_path = os.path.join(needs_dir, '.cache') if needs_dir else None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't currently have a use-case for a default cache in the needs directory, which is already a build cache. i'd leave off the default for now

@ccbrown
Copy link
Owner

ccbrown commented Jul 17, 2016

Okay so I'm gonna put two hard merge requirements on this:

  • Make Directory take normal arguments instead of a dictionary. I would consider ways to refactor the serialization / deserialization as well, but it's good enough for now.
  • I'd like to agree on a set of commands that takes into account future support for other cache types.

@vmrob
Copy link
Contributor Author

vmrob commented Jul 17, 2016

I'd like to agree on a set of commands that takes into account future support for other cache types.

I'm all ears. Best I can think of is what you've already suggested or just requiring a modification by hand if you want something other than a directory to be default. I'm totally ok requiring hand-crafted configuration files too.

I'll go ahead and make a needy.caching module tomorrow with cache, build_cache, and manifest. Are you sure about breaking the convention of sources.*, projects.*, and caches.* with needy.caching.directory_cache?

This patch introduces caching that can be made available with `needy cache set [path]` and acts as a key/value store to archive the contents of the build directory after builds. When the cache contains artifacts for a particular build, they are used instead of going through the build process.
@vmrob
Copy link
Contributor Author

vmrob commented Jul 17, 2016

Fixed the name issues in main, moved needy.manifest.Manifest to needy.cache.Manifest and moved the tests accordingly.

@ccbrown ccbrown merged commit 752f411 into ccbrown:master Jul 17, 2016
@ccbrown
Copy link
Owner

ccbrown commented Jul 17, 2016

Alright, let's do this.

@vmrob vmrob deleted the directory-cache branch July 18, 2016 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants