Memcached #58

afbjorklund · 2015-12-07T06:50:34Z

Support for memcached as a distributed cache for ccache

This is the rebased and squashed version of memcached-ccache, memcached-review and memcached-only. It adds two new options, memcached_conf and memcached_only (and a read_only_memcached).

Based on tardyp:memcached-ccache code from 2013 (Pull Request #30)

See http://memcached.org for the cache server (you will also need libmemcached)
Some getting started instructions can be found here: GETTING_STARTED.md

Update: The above instructions will install everything but on localhost only.
Here is how to continue after that, by using MULTIPLE_SERVERS.md

jrosdahl · 2015-12-22T22:02:26Z

Looks promising from a cursory glance!

I've started looking at the patch series now but I can't promise to be very responsive during christmas...

[PATCH 01/16] Add function lost in merge (imerge bug?)

Seems like the merge in f49770a
was faulty, in that it dropped the safe_create_wronly function ?

I don't remember, of course, but my guess is no. I think that I simply inlined the function since it only was used in one place and because the master branch at the time had started using mkstemp instead of relying on the home-grown tmp_string() name being unique. (The major point of git-imerge is to carefully take things step by step, so I doubt that I trusted git-imerge to do something without verifying the result.)

[PATCH 02/16] Add function to write to a file

I think that safe_write needs to be even safer to earn its name. :-) More specifically, it should handle EINTR (and EAGAIN while at it) like copy_fd does.

On the other hand:

[PATCH 04/16] Add memcached support, as a secondary cache

Using write_file to populate the ccache from memcached won't fly for several reasons:

If the system crashes in the middle of a copy, the cached file will likely be corrupt.
The files won't be compressed even if compression is requested in the ccache configuration.
Cache statistics won't get updated properly.

I think that we need a function that more or less does what put_file_in_cache already does but with a data buffer as the input.

+/* blob format for storing:
+

char magic[4]; # 'CCH1', might change for other version of ccache

ccache will erase the blob in memcached if wrong magic

Have you considered adding the magic to the key (as part of the string or as part of the hash) instead of the value? Then it wouldn't be necessary to verify/prune results with the wrong format. ccache does similar things for the object hash already; see for instance the first if statement in calculate_object_hash.

+memcached_suite() {

Since memcached support is opt-in, the memcached part of the test suite also needs to be opt-in in some way.

afbjorklund · 2015-12-23T12:35:41Z

Thanks for reviewing. Will take a look at the write functions, and rebase + squash the late bug fixes...

When it comes to magic, I think that was only for the memcached storage. It does use the main hash.
Currently we have three types of memcached entries, so it can be useful to verify that they are "sane".
We probably already need a second version of it, to be able to include cov and dwo entries in the cache. But they are supposed to be compressed before uploading, so copying on download should be OK.

afbjorklund · 2015-12-23T15:20:03Z

Did the rebase to "master" and the cleanup with regards to squashing bug fixes and commit messages.
The comments about write_file/put_file_in_cache and memcached_suite still needs addressing, though.

afbjorklund · 2016-01-07T20:03:38Z

Have rebased to master, and addressed the cppcheck style issues.
Also made the testsuites opt-in, as in Makefile created by configure.

Avoid writing truncated files in cache, in case of any errors. Since we are writing cached files, they're already compressed.

afbjorklund · 2016-01-08T21:43:17Z

@jrosdahl, all issues are addressed. Added a "put_data_in_cache".
Made sure to update stats, and made write_file use a temporary file.

jrosdahl · 2016-01-10T19:10:32Z

Looks good. Thanks! I have pushed the commits to a new dev/memcached branch along with some tweaks.

I think that it would be a good idea to ask on the mailing list for other testers/comments before merging it to master.

afbjorklund · 2016-01-10T19:54:24Z

Sounds good! I wrote some "getting started" instructions (above), how to build it.
What we need now is some better benchmarks to compare local/remote cache...

Maybe use the same code as in https://ccache.samba.org/performance.html,
or possibly some other similar newer code, on a handful of configured servers ?

cleeland · 2018-11-05T17:58:33Z

Here's what I would like to see:

Implement #218.

Does this really need to be fully implemented as a pre-requisite? Or would it be sufficient to create an abstraction that synthesizes access to the constituents parts of the file and leave the choice of single/multi-file to the abstraction?

jrosdahl · 2018-11-06T20:39:47Z

@afbjorklund writes:

It makes sense, but adds even more work before it can be merged...

Yes, I'm painfully aware...

It's similar to how the ccache configuration system is often one of the biggest "hurdles" to adding new functionality, even though it might be a better user experience in the end.

Not sure if I understand this. In what way is the configuration system a hurdle? Should we change something?

Anyway, I suppose we could using something like the Apache module system and let the user decide how they want to actually build it...

I'm not very familiar with the Apache module system, but I think that it would be overkill to be that flexible. I don't see a need for compiling backends statically into the main binary, for instance. Do you?

As we discussed earlier, something similar could be used for compression too. Currently it is very wired to zlib, and adding something like lz4 is still just another special case (similar to replacing/extending file storage with memcached storage). It makes for a lot of code branching and conditional compilation, and hard to change at runtime (different ccache).

I'm less interested in having support for different compression algorithms. I think that there is a clear difference between compression and storage: We can choose a single compression algorithm and ccache will work fine for everybody, but I don't think that we can make a single opininated choice for storage backends.

Will see if I can revive some of the old code, and see what the API would look like for such a backend abstraction.

Thank you.

I did multiple implementations, but only kept one of them - due to the added complexity...

Right. Even if we only ever will write one storage backend (memcached), I think that the code will benefit from a clean API instead of being sort of hardcoded.

Not sure that adding libmemcached-dev to the list of dependencies qualifies here, but OK.

Note that I'm talking about runtime dependencies, not buildtime dependencies. It is an installation dependency on libmemcached11 (not the -dev package) I want to avoid for the main ccache package (and linking statically with libmemcached is not a good option). The envisioned ccache-memcached package would only contain the memcached backend plugin, not the main program.

jrosdahl · 2018-11-06T21:15:37Z

@cleeland writes:

Does this really need to be fully implemented as a pre-requisite? Or would it be sufficient to create an abstraction that synthesizes access to the constituents parts of the file and leave the choice of single/multi-file to the abstraction?

My tentative idea about the backend API is that it more or less only should contain simple key-value storage primitives. I don't want backends to be aware of which parts a result comprises or how/if they are compressed, etc.

The current code stores several entries per result (.o, .stderr, .d, ...) on disk, so using a simple key-value API both for that and for memcached operations would mean that ccache would make several network calls per result, which isn't good. Compare this with @afbjorklund's implementation which combines all entries into one to avoid multiple memcached operations. This is why I thought that having #218 in place first would make things easier.

Does this make sense?

afbjorklund · 2018-11-06T21:25:38Z

For configuration, I just meant that you have to do the gperf and the rest of it. Which is slightly more work than just adding a getenv. But I think we got the hang of it now, and adding new configuration items in a separate commit (from the rest of the code) makes it manageable - and goes well with docs and tests...

The main reason for compiling it into the main binary is performance, but I guess it would be a good idea to actually measure the time it would take to dlopen it before jumping to any premature conclusions. But it will be loaded a lot of times, unlike a server program that would load once and then stay resident.

afbjorklund · 2018-11-06T21:30:47Z

We used the Apache runtime for the Ganglia modules, which is why I came to think of it...

e.g. https://github.com/ganglia/ganglia-modules-linux/blob/master/io/Makefile.am

Then again it also uses automake (while ccache does not), so maybe that also helped.

https://apr.apache.org/docs/apr/1.6/group__apr__dso.html (glib would also work)

jonnyyu · 2018-11-15T05:19:40Z

Thanks for discussing the possibilities to evolve this branch.

In the meantime, could you do a favor to merge master to this branch?
The branch is lag behind quite a few changes. I tried to resolve but there're too many conflicts I am not confident to resolve. thanks!

jrosdahl · 2018-11-20T20:16:51Z

In the meantime, could you do a favor to merge master to this branch?

I understand that you and others are using the dev/memcached branch in production, which makes me uncomfortable since the branch was meant for developing stuff, not to hold a production-ready and supported semi-permanent branch. I want to have freedom to make backward incompatible changes on any dev/* branch or just delete them and start over. At the same time I don't want to do you a disservice, but I also don't want to spend time on maintaining this branch since I have decided that I want to see another direction of the solution as described earlier. This solution will most likely be incompatible with the dev/memcached branch, and it will be done iteratively on master and/or other branches, while this branch will be deleted or left as it is until we are done cherry-picking code from the branch.

ccache version 3.4.2

ccache version 3.4.3

afbjorklund · 2019-02-03T20:28:33Z

As per the discussion above, I will change the old branch to only track 3.4-maint.
Hopefully this is "good enough" if you were using the old memcached in production...

The new branch dev/memcached will be deleted, and will have to be re-written first.
Will leave this PR (against 3.4), but that probably means that it will never get merged.

jimklimov · 2019-03-05T17:15:30Z

Hi, I found this work from earlier mailing list archives :) and wanted to ask if this or newer direction of shared ccache database index would fit our use-case: we have a build farm with many recyclable nodes (their local storage tends to disappear), and a shared NFS server for persistent data, including a .ccache directory used by the many build nodes.

Sometimes, the NFS access seems like a bottleneck (e.g. local work on the server, such as find listing of files in the index is from few times to an order of magnitude faster than from a build worker over NFS).

I wonder if memcached running on each node (the NFS server as well as builders) could be a faster solution than filesystem queries over NFS to find if an object with needed checksum exists, while writes to the .ccache directory would still be backed by the same NFS share so newly compiled and cached object files do persist across worker recyclings?

Hope the question makes sense :)

Also I wonder if this removal of (just part of) filesystem-bound work, would help avoid the issue we have sometimes, when a worker is abruptly killed, and it leaves stats.lock(....lock) files that effectively block and radically slow down subsequent ccache clients because these lock files never disappear, and subsequent ccache programs loop over the dozens of added .lock extensions until they give up... this is very annoying to say the least ;)

afbjorklund · 2019-03-05T19:59:31Z

@jimklimov : yes, this was/is the idea of the memcached backend. there is no current support for a mixed memcached/nfs store such as the one you are mentioning, instead other solutions such as couchbase were recommended for persistent cache. But it should be possible to do a home-grown solution, either by synching the local cache to external backup storage or by using a memcached-download script.

It also runs in a mixed mode, where files are still stored locally on each server - but pushed/pulled to the remote for usage by other servers. Instead of the alternative, without local disk - here called "memcached only" - where each server talks directly to the memcached cluster... It is also recommended to use a local proxy, to save each and every ccache from having to talk with servers on the network directly.

Reference:

There's no good docs on how to set up a cluster, just some sample Vagrantfile on multiple servers...

jrosdahl · 2019-05-20T19:40:22Z

Closing this PR now as it won't be merged in its current form. It will hopefully reappear in another form in the future.

afbjorklund · 2019-05-20T22:30:15Z

As noted above, this memcached branch was a slightly older (3.4.3) version than dev/memcached, which was updated to commit f771208 based on ccache v3.4.3-48-g554195a (v3.5~35) - before it was deleted.

Due to the new storage backend, the new memcached structure will probably not be compatible with the old memcached structure used here - but more likely to be based on the "aggregated" storage format of #411.

afbjorklund · 2021-05-25T19:01:19Z

I moved the code that used to be in this branch, into a separate ccache fork and into a separate binary:
(when requiring a special branch and a special ./configure, it was harder to build and harder to install)

https://github.com/afbjorklund/memccache

I think that the C++ version is better served by the new API, and the http and redis backends in PR #676
With couchbase no longer supporting memcached, I think those other alternatives are easier to support...

The old code (in C) will follow 3.7-maint.

…emcached port. This patch is not safe for WITH_CCACHE_BUILD support yet as that causes all ports to depend on devel/ccache. Enabling that patch would then cause the new devel/libmemcached dependency to require devel/ccache which is a cyclic dependency. The autoconf dependency also causes issues. Add a devel/ccache-memcached slave port that would allow a user to use the ccache+memcached package manually with ports without WITH_CCACHE_BUILD. This patch comes from ccache/ccache#58 and has been an ongoing effort over a few years to be merged into the mainline of ccache. Documenation for it can be found in the MANUAL file at: /usr/local/share/doc/ccache/MANUAL.txt Sponsored by: Dell EMC Isilon

Submitted to: ccache/ccache#58

tardyp mentioned this pull request Dec 11, 2015

Memcached ccache #30

Closed

afbjorklund force-pushed the memcached branch from a439672 to 4dc540e Compare December 23, 2015 15:17

afbjorklund added 15 commits January 7, 2016 18:11

Add function to create a file

24232d0

Add function to write to a file

cb156a7

Add memcached configuration

b87b1f6

Add memcached support, as a secondary cache

b1e4882

Allow values bigger than the memcached max

bf0204c

Also store manifest in memcached for direct lookup

db705bf

Include python helper scripts for memcached

764df52

Allow opt-out from memcached storage (get only)

aa44ad0

Add some memcached documentation to the manual

9812bc8

Mention moxi (memcached proxy) and couchbase

008b065

Add memcached_only configuration

58cb675

Add memcached_only, not using file system

cc91018

Avoid hitting memcached twice (direct and cpp)

d72a882

Add enable static build option

2670417

Address style issues flagged by cppcheck

8dbf369

afbjorklund force-pushed the memcached branch from 4dc540e to ec3d0b4 Compare January 7, 2016 20:04

afbjorklund added 4 commits January 8, 2016 22:12

Only run the memcached tests when enabled

1f699f7

Update stats when writing cache from memcached

270a467

Use temp file for write_file (like copy_file)

ccefd47

Avoid writing truncated files in cache, in case of any errors. Since we are writing cached files, they're already compressed.

Remove unused safe_create_wronly function

1cfcfb1

afbjorklund force-pushed the memcached branch from ec3d0b4 to 1cfcfb1 Compare January 8, 2016 21:43

jrosdahl mentioned this pull request Dec 3, 2018

merge master to dev/memcached #328

Closed

jrosdahl mentioned this pull request Feb 3, 2019

memcached branch corrupts dependency file #352

Closed

afbjorklund added 2 commits February 3, 2019 21:14

Merge tag 'v3.4.2' into memcached

af4d79b

ccache version 3.4.2

Merge tag 'v3.4.3' into memcached

9498101

ccache version 3.4.3

afbjorklund changed the base branch from master to 3.4-maint February 3, 2019 20:29

jrosdahl mentioned this pull request Apr 3, 2019

Implement memcached as backend #372

Closed

This was referenced May 1, 2019

Roadmap of upcoming major features #404

Closed

Design and implement a backend API and plugin framework #414

Closed

jrosdahl closed this May 20, 2019

shawnl mentioned this pull request Jun 12, 2019

Support kyototycoon backend #435

Closed

nim65s mentioned this pull request Feb 28, 2020

[CI] update gitlab CI stack-of-tasks/eigenpy#170

Merged

jrosdahl mentioned this pull request Jun 10, 2021

Storage backends: configuration format #855

Closed

jrosdahl added the closed: won't fix/implement/merge Will not fix/implement/merge label Nov 2, 2021

jrosdahl mentioned this pull request Nov 2, 2021

Visual C/C++ compiler support #506

Merged

6 tasks

jrosdahl mentioned this pull request Nov 13, 2022

Remote storage backend proxy #1212

Closed

svmhdvn pushed a commit to svmhdvn/freebsd-ports that referenced this pull request Jan 10, 2024

ccache-memcached: Fix crash due to shadowed manifest_name.

0717f70

Submitted to: ccache/ccache#58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memcached #58

Memcached #58

afbjorklund commented Dec 7, 2015 •

edited

jrosdahl commented Dec 22, 2015

ccache will erase the blob in memcached if wrong magic

afbjorklund commented Dec 23, 2015

afbjorklund commented Dec 23, 2015

afbjorklund commented Jan 7, 2016

afbjorklund commented Jan 8, 2016

jrosdahl commented Jan 10, 2016 •

edited

afbjorklund commented Jan 10, 2016

cleeland commented Nov 5, 2018

jrosdahl commented Nov 6, 2018

jrosdahl commented Nov 6, 2018

afbjorklund commented Nov 6, 2018

afbjorklund commented Nov 6, 2018 •

edited

jonnyyu commented Nov 15, 2018 •

edited

jrosdahl commented Nov 20, 2018

afbjorklund commented Feb 3, 2019

jimklimov commented Mar 5, 2019

afbjorklund commented Mar 5, 2019

jrosdahl commented May 20, 2019

afbjorklund commented May 20, 2019 •

edited

afbjorklund commented May 25, 2021

Memcached #58

Memcached #58

Conversation

afbjorklund commented Dec 7, 2015 • edited

Support for memcached as a distributed cache for ccache

jrosdahl commented Dec 22, 2015

ccache will erase the blob in memcached if wrong magic

afbjorklund commented Dec 23, 2015

afbjorklund commented Dec 23, 2015

afbjorklund commented Jan 7, 2016

afbjorklund commented Jan 8, 2016

jrosdahl commented Jan 10, 2016 • edited

afbjorklund commented Jan 10, 2016

cleeland commented Nov 5, 2018

jrosdahl commented Nov 6, 2018

jrosdahl commented Nov 6, 2018

afbjorklund commented Nov 6, 2018

afbjorklund commented Nov 6, 2018 • edited

jonnyyu commented Nov 15, 2018 • edited

jrosdahl commented Nov 20, 2018

afbjorklund commented Feb 3, 2019

jimklimov commented Mar 5, 2019

afbjorklund commented Mar 5, 2019

jrosdahl commented May 20, 2019

afbjorklund commented May 20, 2019 • edited

afbjorklund commented May 25, 2021

afbjorklund commented Dec 7, 2015 •

edited

jrosdahl commented Jan 10, 2016 •

edited

afbjorklund commented Nov 6, 2018 •

edited

jonnyyu commented Nov 15, 2018 •

edited

afbjorklund commented May 20, 2019 •

edited