Local disk cache: part 1 #530

ola-rozenfeld · 2024-01-31T22:35:10Z

Introducing the diskcache package, CAS-only (Action Cache to be added in part 2).

Split out from #529 which has the whole thing, for reference.

goodov · 2024-02-01T06:39:25Z

go/pkg/diskcache/diskcache.go

+	for i := 0; i < 256; i++ {
+		_ = os.MkdirAll(filepath.Join(root, fmt.Sprintf("%02x", i)), os.ModePerm)
+	}
+	_ = filepath.WalkDir(root, func(path string, d fs.DirEntry, err error) error {


can this be run in parallel for each 00-ff directory inside the for-loop above? Synchronous 256 directory walks doesn't look great, especially when each directory will contain thousands of files.

DiskCache::New should be multithreaded and use all available IO as possible. FS walk on posix OSes may be pretty fast, but Windows is very slow when we're talking about directory walking with thousands of files.

Yep, absolutely. Done.

goodov · 2024-02-01T07:17:12Z

go/pkg/diskcache/diskcache.go

+		return err
+	}
+	defer out.Close()
+	_, err = io.Copy(out, in)


can we move the file copy function into a separate source (similar to atim_*) so it can be altered for darwin and windows to make use of clonefile and upcoming Windows DevDrive CoW?

clonefile does preserve the chmod, so it can be a single call. We also don't need to manually do a buffer copy here. Just let the OS perform the file copy action and update chmod if it's required, it should be much faster.

CloneFile utils can be found here as an example:
https://github.com/git-lfs/git-lfs/blob/main/tools/util_darwin.go
https://github.com/git-lfs/git-lfs/blob/main/tools/util_linux.go

Great idea, thank you! I didn't know of clonefile. I renamed the system-specific files to sys_* and changed the header to match, so that this change can be nicely done in a followup PR. It is a very contained optimization which doesn't affect the cache layout, so it's backward-compatible, and hence imo it belongs in a folllowup.

maruel

Thanks for the implementation!

maruel · 2024-02-06T22:34:31Z

go/pkg/diskcache/diskcache.go

+	if err != nil {
+		return err
+	}
+	// Required sanity check: sometimes the copy pretends to succeed, but doesn't, if


Can you expand on that? That's news to me. Or you mean in the general case of a race condition?

In general, if we write to a file while concurrently deleting it, the Copy might not return an error (even if no bytes were actually copied), and neither the Stat. But the Stat will return different size than expected (usually, 0).
But the comment made me notice that io.Copy actually returns the number of bytes copied and seems to be correct, so I amended the sanity check to only verify the copy return value to be faster.

maruel · 2024-02-06T22:36:08Z

go/pkg/diskcache/diskcache.go

+	}
+	it := iUntyped.(*qitem)
+	it.mu.RLock()
+	if err := copyFile(d.getPath(k), path, dg.Size); err != nil {


This simplifies the control flow:

it.mu.RLock() err := copyFile(d.getPath(k), path, dg.Size) it.mu.RUnlock() if err != nil {

Yep, thank you.

maruel · 2024-02-06T22:45:47Z

go/pkg/diskcache/diskcache.go

+		prefixDir := filepath.Join(root, fmt.Sprintf("%02x", i))
+		go func() {
+			defer wg.Done()
+			_ = os.MkdirAll(prefixDir, os.ModePerm)


There are extreme cases where this could happen (e.g. disk full). I'd recommend to use https://pkg.go.dev/golang.org/x/sync/errgroup#WithContext for error management instead of logging as a silent failure.

Yeah, basically my main problem here was I didn't want to return an error from New, because that would mean that the client Opt can return an error on Apply, and the code currently doesn't support that.

To be fair, that is the right thing to do, imo -- options can, in general, fail. So I'd propose a separate independent PR to add an error return type to Apply, and then I can propagate all the errors properly from here. @mrahs does that SGTY?

Another idea (pushing the problem further down the line) would be to return an error from New now, but then log + ignore it in Apply. Which is what I'll do now, until @mrahs weighs in with whether he's okay with me changing the Apply type.

Thank you!

I'd push error handling on the user by making the client.Opt accept a *DiskCache instance.

maruel · 2024-02-06T22:48:52Z

go/pkg/diskcache/diskcache.go

+
+func (d *DiskCache) StoreCas(dg digest.Digest, path string) error {
+	if dg.Size > int64(d.maxCapacityBytes) {
+		return fmt.Errorf("blob size %d exceeds DiskCache capacity %d", dg.Size, d.maxCapacityBytes)


A more graceful behavior is to try to trigger a synchronous GC first before erroring out.

Why? This will fail for sure regardless of GC. Note that we compare with maxCapacityBytes, not the current sizeBytes.

maruel · 2024-02-06T22:49:34Z

go/pkg/diskcache/diskcache.go

+	}
+	it.mu.Lock()
+	defer it.mu.Unlock()
+	_, exists := d.store.LoadOrStore(it.key, it)


d.store is already synchronized by it.mu. I'd vote to make store a normal map[digest]*qitem instead; it would be faster.

Oh, no, store cannot be synchronized by it.mu because it.mu is new (that would be a race condition). It could, in theory, be a normal map synchronized by mu instead (the DiskCache class variable that we use to protect queue), but that would be slower than using a sync map:

The Map type is optimized for two common use cases: (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In these two cases, use of a Map may significantly reduce lock contention compared to a Go map paired with a separate Mutex or RWMutex.

(from https://pkg.go.dev/sync#Map)

We have a classical case 1 here, where almost always a key is only written once but read many times from concurrent threads. A sync map should greatly reduce lock contention.

maruel · 2024-02-06T22:51:45Z

go/pkg/diskcache/sys_windows.go

+
+// This will return correct values only if `fsutil behavior set disablelastaccess 0` is set.
+// Tracking of last access time is disabled by default on Windows.
+func GetLastAccessTime(path string) (time.Time, error) {


I'd recommend instead func FileInfoToAccessTime(i *os.FileInfo) time.Time and keep the os.Stat() call at the call site.

maruel · 2024-02-06T22:55:44Z

go/pkg/diskcache/diskcache.go

+}
+
+// If the digest exists in the disk cache, copy the file contents to the given path.
+func (d *DiskCache) LoadCas(dg digest.Digest, path string) bool {


I'd propose to rename from LoadCas to CopyFromCache

I realize the sister function is called StoreCas. I don't know if it's a nomenclature reused in this code base, if so ignore this comment.

Yeah, Load* and Store* are consistent with sync.Map's naming, we also sometimes use Get/Update. I'll leave the naming decisions to @mrahs.

I think the cache can have three main methods: putting bytes, getting bytes, copying bytes to a file path. Since the cache deals with bytes, I'm inclined towards Write(key, Reader), Read(key, Writer), Copy(key, filepath), but Store, Load, LoadInto/CopyTo also sound reasonable. I made another comment on generalizing this method which would avoid having to choose suffixes.

maruel · 2024-02-06T22:59:09Z

go/pkg/diskcache/diskcache.go

+		}()
+	}
+	wg.Wait()
+	go res.gc()


I think gc should only be called if len(res.queue) > X && res.sizeBytes > Y && res.queue[0].lat < time.Now().Sub(Z) to reduce unnecessary processing.

Not sure I understand -- gc is a daemon thread, it blocks on the GC request channel (gcReq) and only works if there are things to do (total size is over capacity). And we only send a GC request to the channel if we reach capacity. The current line only starts the daemon thread. Did you mean changes the gc thread itself, or to places where a GC request is added to the channel?

goodov · 2024-02-13T13:58:10Z

I'm a little concerned about the whole "scan this huge dir on startup" process. In Chromium Reclient is auto started/stopped during each autoninja invocation. This is different compared to how Goma worked: there was a separate goma_ctl ensure_start command that kept the proxy in the background until a user terminated it manually, that's why local disk cache re-read wasn't an issue.

Does this mean reproxy startup (and autoninja call in general) will be delayed by the overall time to scan "local cache" directory? Or is this a separate process right now? @ola-rozenfeld can you please do some tests around this area?

I can think of these options to improve the situation:

Run the "local cache" scan in background and just "fail" cache lookups until it's ready (maybe with some separate status). This is not perfect, because the scan will be even slower when the actual build starts.
Use some kind of a simple "database" file that can be hot-loaded at startup and then be updated/synced in background (proto file for ex. or maybe some Go-specific serialization).

maruel

I'm a little concerned about the whole "scan this huge dir on startup" process. In Chromium Reclient is auto started/stopped during each autoninja invocation. This is different compared to how Goma worked: there was a separate goma_ctl ensure_start command that kept the proxy in the background until a user terminated it manually, that's why local disk cache re-read wasn't an issue.

Does this mean reproxy startup (and autoninja call in general) will be delayed by the overall time to scan "local cache" directory? Or is this a separate process right now? @ola-rozenfeld can you please do some tests around this area?

I can think of these options to improve the situation:

Run the "local cache" scan in background and just "fail" cache lookups until it's ready (maybe with some separate status). This is not perfect, because the scan will be even slower when the actual build starts.

Use some kind of a simple "database" file that can be hot-loaded at startup and then be updated/synced in background (proto file for ex. or maybe some Go-specific serialization).

I agree and was about to suggest the same thing. In isolated, I used the second option. A very simple json file worked fine, really, just keeping object in a LRU list. Then a priority queue is not needed, just mutate the slice in place. Go is surprisingly fast at this, I would be surprised if it were not faster than using a priority queue. One lock instead of many.

The isolated code soft-failed when a file was missing / corrupted on fetch.
Only "once in a while" would you scan the whole directory to make sure there's no forgotten files.

maruel · 2024-02-12T21:33:20Z

go/pkg/diskcache/sys_linux.go

+	"time"
+)
+
+func FileInfoToAccessTime(info fs.FileInfo) time.Time {


I'd vote to not export this function.

maruel · 2024-02-12T21:34:25Z

go/pkg/diskcache/diskcache_test.go

+		}
+	}
+	// Randomly access and store files from different threads.
+	eg, _ := errgroup.WithContext(context.Background())


If there's no context, use eg := errgroup.Group{}

maruel · 2024-02-12T21:55:00Z

go/pkg/diskcache/diskcache.go

+			return filepath.WalkDir(prefixDir, func(path string, d fs.DirEntry, err error) error {
+				// We log and continue on all errors, because cache read errors are not critical.
+				if err != nil {
+					return fmt.Errorf("error reading cache directory: %v", err)


s/%v/%w/g ?

maruel · 2024-02-12T21:56:45Z

go/pkg/diskcache/diskcache.go

+				if err != nil {
+					return fmt.Errorf("error parsing cached file name %s: %v", path, err)
+				}
+				atime, err := getLastAccessTime(path)


It's likely already cached in d.Info(), I'd inline this function here to save one stat per file.

maruel · 2024-02-12T21:57:33Z

go/pkg/diskcache/diskcache.go

+	return uint64(atomic.LoadInt64(&d.sizeBytes))
+}
+
+func (d *DiskCache) getKeyFromFileName(fname string) (key, error) {


This doesn't need to be a member function.

mrahs · 2024-02-23T19:11:42Z

go/pkg/diskcache/diskcache.go

+	testGcTicks      chan uint64
+}
+
+func New(ctx context.Context, root string, maxCapacityBytes uint64) (*DiskCache, error) {


I think walking the disk on initialization is going to introduce a performance hit proportional to the size of the cache. I like the idea of persisting the state to a file and reloading that single file on startup. This has two main advantages:

cheaper startup cost

access times are persisted as known by the cache, regardless of the nuances of the underlying filesystem (e.g. may be mounted with noatime)

mrahs · 2024-02-23T19:25:55Z

go/pkg/diskcache/diskcache.go

+	return filepath.Join(d.root, k.digest.Hash[:2], fmt.Sprintf("%s.%d", k.digest.Hash[2:], k.digest.Size))
+}
+
+func (d *DiskCache) StoreCas(dg digest.Digest, path string) error {


I think generalizing this method to Write(dg digest.Digest, io.Reader) error provides several advantages:

works for both cas and action results

can potentially work out of the box with digested virtual inputs Allow clients to provide InputDigests to avoid unnecessary transfer #525

delegates serialization to the user which reduces the dependencies of this pkg

We should also document that the cache does not enforce any relationship between dg and the reader's bytes.

mrahs · 2024-02-23T20:40:58Z

go/pkg/diskcache/diskcache.go

+	return nil
+}
+
+func (d *DiskCache) gc() {


I think this implementation may have the following issues:

if the capacity is not sufficient for the workload, eviction becomes part of every update.

having three different locks being acquired and released independently is prune to races; e.g. attempting to load an item after it's been deleted from the ordered list but before it's been deleted from the map, will fail to update the index in the list.

the client must remember to Shutdown the cache to avoid resource leaks.

The capacity issue can be mitigated by evicting a percentage of the cache rather than just enough to meet the threshold. This accommodates several new items before triggering another eviction.

The synchronization issue can be traded off with a simpler implementation that uses a single lock. Updating the list and the map should be an atomic operation in which the entire list is shared and therefore is the bottleneck. Actually, hits are bounded by IO anyways. Misses on the other hand can be faster with less locking. Then again, misses are always followed by updates which are bounded by IO. I think IO is more of a bottleneck than synchronization so it's worth going for a simple locking flow.

How bad is it if eviction events were blocking? I think if eviction saturates IO bandwidth it should be the same whether it's async or sync. Maybe blocking would be faster if we consider disk locality. I also think that most users would prefer using a large cache to avoid eviction anyways. Without a background eviction task, the cache would be simpler to work with.

wknapik · 2024-04-05T21:02:30Z

*bump*

ola-rozenfeld · 2024-04-09T18:51:48Z

bump

Yes, I'm really sorry for disappearing -- I've had very hectic few weeks and didn't have time for this. This effort is something I'm doing voluntarily in my spare time, and it is not currently prioritized by EngFlow. I will be back to working on this in the next week or two!

wknapik · 2024-04-11T12:34:27Z

Thanks @ola-rozenfeld, much appreciated

ola-rozenfeld mentioned this pull request Jan 31, 2024

Local disk cache #529

Open

Local disk cache: part 1

1046709

ola-rozenfeld force-pushed the ola-disk-cache-1 branch from 6939738 to 1046709 Compare January 31, 2024 22:45

goodov reviewed Feb 1, 2024

View reviewed changes

Adressing comments

21666d4

ola-rozenfeld force-pushed the ola-disk-cache-1 branch from c22848c to 21666d4 Compare February 1, 2024 22:01

maruel reviewed Feb 6, 2024

View reviewed changes

ola-rozenfeld force-pushed the ola-disk-cache-1 branch 4 times, most recently from 16916b3 to 7b8156c Compare February 11, 2024 17:25

Addressing Marc Antoine's comments

df6b425

ola-rozenfeld force-pushed the ola-disk-cache-1 branch from 7b8156c to df6b425 Compare February 11, 2024 17:28

maruel reviewed Feb 13, 2024

View reviewed changes

mrahs reviewed Feb 23, 2024

View reviewed changes

Local disk cache: part 1 #530

Are you sure you want to change the base?

Local disk cache: part 1 #530

Conversation

ola-rozenfeld commented Jan 31, 2024

goodov Feb 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

goodov Feb 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maruel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ola-rozenfeld Feb 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrahs Feb 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

goodov commented Feb 13, 2024

maruel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wknapik commented Apr 5, 2024

ola-rozenfeld commented Apr 9, 2024

wknapik commented Apr 11, 2024

goodov Feb 1, 2024 •

edited

Loading

goodov Feb 1, 2024 •

edited

Loading

ola-rozenfeld Feb 11, 2024 •

edited

Loading

mrahs Feb 23, 2024 •

edited

Loading