Cache control #5136

Makman2 · 2018-02-07T19:13:16Z

Once the NextGen-Core is implemented, we have way more possibilities for different cache operating modes that shall be available as CLI arguments in coala.

--cache-strategy / --cache-protocol:
Controls how coala manages caches for the next run.

Following modes could be implemented:
- none: Don't use a cache at all. A shortcut-flag could be additionally implemented, --no-cache, effectively meaning --cache-protocol=none
- primitive: Use a cache that grows infinitely. All cache entries are stored for all following runs, and aren't removed. Effective when many recurrent changes happen in coafiles and settings. Fastest in storing.
- lri/last-recently-used
  Cached items persist only until the next run.
  Stretch issue: Implement count-parameters that allow to control when to discard items from the cache, e.g. after 3 runs of coala without using a cached item, discard it.
My recommendation is to use lri as default, as coala mostly is executed locally.
--clear-cache
Clears the cache.
--export-cache / --import-cache
Maybe useful to share caches. Like CI server for any project run coala, and you can download the cache from there as an artifact to speed up your builds / coala runs.
--cache-compression
Accepts as arguments:
- none: No cache compression. This is default.
- Other flags that specify common compression capabilities Python provides (for example lzma or gzip).
  Cache compression should be evaluated before regarding its effectiveness, because the cache will mainly store hashes which usually aren't really redundant, the gain might be very low. The little performance penalty when loading the cache might be too much when respecting a possible very low gain of cache space reduction.
--optimize-cache
A little performance penalty to make the cache loading faster. Particularly this feature shall utilize pickletools.optimize. But this is not exclusive to this flag.

The text was updated successfully, but these errors were encountered:

Makman2 · 2018-02-07T19:13:22Z

Blocked by

palash25 · 2018-03-09T21:31:36Z

@Makman2 with reference to the cache-compression what I gather from your description is that we need to know whether compression would be a good idea before implementing it (i.e. projects requiring large disk space will need to cahce their results) for this we can have a small piece of code to determine the size of the repository and only initialize the compression if the repo is above a minimum threshold of size (which we will need to decide but I'm guessing it might be hundreds of MBs or a few GBs for large projects). Correct me if I'm wrong.

palash25 · 2018-03-09T21:43:24Z

I am also curious about import/export cache will this be a separate module for e.g. something likeCacheTransfer.pywhich will make get and post requests to different CI servers like Travis and Circle using their respective APIs and the requests library?

palash25 · 2018-03-09T21:45:48Z

I think this issue can be a part of the caching/performance optimization project.

Makman2 · 2018-03-10T17:26:49Z

Cache compression should be evaluated before regarding its effectiveness, because the cache will mainly store hashes which usually aren't really redundant, the gain might be very low. The little performance penalty when loading the cache might be too much when respecting a possible very low gain of cache space reduction.

It's that we don't compress files, but binary data which might not have much redundancy (actually it's also files, but this shall change, explaining below). If data has no redundancy compression is too ineffective and maintaining compression features would be useless.

I am also curious about import/export cache will this be a separate module for e.g. something likeCacheTransfer.pywhich will make get and post requests to different CI servers like Travis and Circle using their respective APIs and the requests library?

No no no :) It's just that I want to be able to do coala --export-cache coacache.cache (the same for importing) to have a reliable interface for transferring caches. It could be that for now this boils down to a simple copy command (copying the coala cache file out of some coala-specific location).

The idea with the CI is just a possible use-case (has also to be investigated). Consider a very large project, which generates a 100MB cache (that's already quite insane). The coala analysis has taken 2h. So new developers can speed up their runs, they would just download this file, which is configured being offered on CI. They do coala --import-cache ... and if they want to run coala, it doesn't take 2h initially. Or consider the CI build cache itself: Instead of requiring a clean coala run, we would just cache coala's cache file inside our builds, and the next build restores it. Consecutive CI builds take way less time.

I think this issue can be a part of the caching/performance optimization project.

Yes.

(actually it's also files, but this shall change, explaining below)

So about caching again: The new core caches the task objects emitted by the bear. These task objects are effectively just arguments to the analyze function packed into a tuple. As you recall, local bears (or now called FileBear) have following signature:

def analyze(self, filename, file, ...):
    ...

The argument file contains all the file contents directly, and thus is saved into the cache. This is something I want to avoid in future by using a "file-factory" or "file-proxy", which is just some interface to reading files. It will implement proper cache-saving methods to reduce storage requirements by not including the file itself (but just the name, timestamps, etc...).

palash25 · 2018-03-21T11:23:24Z

@Makman2 I wanted to know more about the design of this. Can all these flags reside in a separate module (CacheModes.py) as decorator functions and then we can use these decorators wherever we need them in our codebase to cache data like in Core.py (this is the design that I am currently including in my proposal).

Makman2 · 2018-03-21T19:35:57Z

Don't understand that quite^^ What do you want to cache like Core.py?

palash25 · 2018-03-21T19:48:00Z

I'm sorry that was poorly phrased.

I meant that whether the implementations of these flags will reside in a separate module or as functions in Core.py

Makman2 · 2018-03-21T20:53:57Z

Separate module, but could be located inside core module (not the Core.py file itself).

Makman2 · 2018-03-21T20:54:44Z

Or even inside coalib.core.caching/coalib.core.cache or so

Makman2 added difficulty/medium area/core labels Feb 7, 2018

Makman2 added this to the Nextgen-Core milestone Feb 7, 2018

Makman2 added the status/blocked The issue requires other referenced issues/PRs to be solved/merged before being worked on label Feb 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache control #5136

Cache control #5136

Makman2 commented Feb 7, 2018

Makman2 commented Feb 7, 2018

palash25 commented Mar 9, 2018

palash25 commented Mar 9, 2018

palash25 commented Mar 9, 2018

Makman2 commented Mar 10, 2018 •

edited

palash25 commented Mar 21, 2018 •

edited

Makman2 commented Mar 21, 2018

palash25 commented Mar 21, 2018

Makman2 commented Mar 21, 2018

Makman2 commented Mar 21, 2018

Cache control #5136

Cache control #5136

Comments

Makman2 commented Feb 7, 2018

Makman2 commented Feb 7, 2018

palash25 commented Mar 9, 2018

palash25 commented Mar 9, 2018

palash25 commented Mar 9, 2018

Makman2 commented Mar 10, 2018 • edited

palash25 commented Mar 21, 2018 • edited

Makman2 commented Mar 21, 2018

palash25 commented Mar 21, 2018

Makman2 commented Mar 21, 2018

Makman2 commented Mar 21, 2018

Makman2 commented Mar 10, 2018 •

edited

palash25 commented Mar 21, 2018 •

edited