Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local disk cache: part 1 #530

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ola-rozenfeld
Copy link
Contributor

Introducing the diskcache package, CAS-only (Action Cache to be added in part 2).

Split out from #529 which has the whole thing, for reference.

for i := 0; i < 256; i++ {
_ = os.MkdirAll(filepath.Join(root, fmt.Sprintf("%02x", i)), os.ModePerm)
}
_ = filepath.WalkDir(root, func(path string, d fs.DirEntry, err error) error {
Copy link

@goodov goodov Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be run in parallel for each 00-ff directory inside the for-loop above? Synchronous 256 directory walks doesn't look great, especially when each directory will contain thousands of files.

DiskCache::New should be multithreaded and use all available IO as possible. FS walk on posix OSes may be pretty fast, but Windows is very slow when we're talking about directory walking with thousands of files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, absolutely. Done.

return err
}
defer out.Close()
_, err = io.Copy(out, in)
Copy link

@goodov goodov Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move the file copy function into a separate source (similar to atim_*) so it can be altered for darwin and windows to make use of clonefile and upcoming Windows DevDrive CoW?

clonefile does preserve the chmod, so it can be a single call. We also don't need to manually do a buffer copy here. Just let the OS perform the file copy action and update chmod if it's required, it should be much faster.

CloneFile utils can be found here as an example:
https://github.com/git-lfs/git-lfs/blob/main/tools/util_darwin.go
https://github.com/git-lfs/git-lfs/blob/main/tools/util_linux.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, thank you! I didn't know of clonefile. I renamed the system-specific files to sys_* and changed the header to match, so that this change can be nicely done in a followup PR. It is a very contained optimization which doesn't affect the cache layout, so it's backward-compatible, and hence imo it belongs in a folllowup.

Copy link

@maruel maruel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the implementation!

if err != nil {
return err
}
// Required sanity check: sometimes the copy pretends to succeed, but doesn't, if
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on that? That's news to me. Or you mean in the general case of a race condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, if we write to a file while concurrently deleting it, the Copy might not return an error (even if no bytes were actually copied), and neither the Stat. But the Stat will return different size than expected (usually, 0).
But the comment made me notice that io.Copy actually returns the number of bytes copied and seems to be correct, so I amended the sanity check to only verify the copy return value to be faster.

}
it := iUntyped.(*qitem)
it.mu.RLock()
if err := copyFile(d.getPath(k), path, dg.Size); err != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This simplifies the control flow:

it.mu.RLock()
err := copyFile(d.getPath(k), path, dg.Size)
it.mu.RUnlock()
if err != nil {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, thank you.

prefixDir := filepath.Join(root, fmt.Sprintf("%02x", i))
go func() {
defer wg.Done()
_ = os.MkdirAll(prefixDir, os.ModePerm)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are extreme cases where this could happen (e.g. disk full). I'd recommend to use https://pkg.go.dev/golang.org/x/sync/errgroup#WithContext for error management instead of logging as a silent failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, basically my main problem here was I didn't want to return an error from New, because that would mean that the client Opt can return an error on Apply, and the code currently doesn't support that.

To be fair, that is the right thing to do, imo -- options can, in general, fail. So I'd propose a separate independent PR to add an error return type to Apply, and then I can propagate all the errors properly from here. @mrahs does that SGTY?

Another idea (pushing the problem further down the line) would be to return an error from New now, but then log + ignore it in Apply. Which is what I'll do now, until @mrahs weighs in with whether he's okay with me changing the Apply type.

Thank you!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd push error handling on the user by making the client.Opt accept a *DiskCache instance.


func (d *DiskCache) StoreCas(dg digest.Digest, path string) error {
if dg.Size > int64(d.maxCapacityBytes) {
return fmt.Errorf("blob size %d exceeds DiskCache capacity %d", dg.Size, d.maxCapacityBytes)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more graceful behavior is to try to trigger a synchronous GC first before erroring out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? This will fail for sure regardless of GC. Note that we compare with maxCapacityBytes, not the current sizeBytes.

}
it.mu.Lock()
defer it.mu.Unlock()
_, exists := d.store.LoadOrStore(it.key, it)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d.store is already synchronized by it.mu. I'd vote to make store a normal map[digest]*qitem instead; it would be faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, no, store cannot be synchronized by it.mu because it.mu is new (that would be a race condition). It could, in theory, be a normal map synchronized by mu instead (the DiskCache class variable that we use to protect queue), but that would be slower than using a sync map:

The Map type is optimized for two common use cases: (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In these two cases, use of a Map may significantly reduce lock contention compared to a Go map paired with a separate Mutex or RWMutex.

(from https://pkg.go.dev/sync#Map)

We have a classical case 1 here, where almost always a key is only written once but read many times from concurrent threads. A sync map should greatly reduce lock contention.


// This will return correct values only if `fsutil behavior set disablelastaccess 0` is set.
// Tracking of last access time is disabled by default on Windows.
func GetLastAccessTime(path string) (time.Time, error) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend instead func FileInfoToAccessTime(i *os.FileInfo) time.Time and keep the os.Stat() call at the call site.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

// If the digest exists in the disk cache, copy the file contents to the given path.
func (d *DiskCache) LoadCas(dg digest.Digest, path string) bool {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd propose to rename from LoadCas to CopyFromCache

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize the sister function is called StoreCas. I don't know if it's a nomenclature reused in this code base, if so ignore this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Load* and Store* are consistent with sync.Map's naming, we also sometimes use Get/Update. I'll leave the naming decisions to @mrahs.

Copy link
Collaborator

@mrahs mrahs Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cache can have three main methods: putting bytes, getting bytes, copying bytes to a file path. Since the cache deals with bytes, I'm inclined towards Write(key, Reader), Read(key, Writer), Copy(key, filepath), but Store, Load, LoadInto/CopyTo also sound reasonable. I made another comment on generalizing this method which would avoid having to choose suffixes.

}()
}
wg.Wait()
go res.gc()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think gc should only be called if len(res.queue) > X && res.sizeBytes > Y && res.queue[0].lat < time.Now().Sub(Z) to reduce unnecessary processing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand -- gc is a daemon thread, it blocks on the GC request channel (gcReq) and only works if there are things to do (total size is over capacity). And we only send a GC request to the channel if we reach capacity. The current line only starts the daemon thread. Did you mean changes the gc thread itself, or to places where a GC request is added to the channel?

@ola-rozenfeld ola-rozenfeld force-pushed the ola-disk-cache-1 branch 4 times, most recently from 16916b3 to 7b8156c Compare February 11, 2024 17:25
@goodov
Copy link

goodov commented Feb 13, 2024

I'm a little concerned about the whole "scan this huge dir on startup" process. In Chromium Reclient is auto started/stopped during each autoninja invocation. This is different compared to how Goma worked: there was a separate goma_ctl ensure_start command that kept the proxy in the background until a user terminated it manually, that's why local disk cache re-read wasn't an issue.

Does this mean reproxy startup (and autoninja call in general) will be delayed by the overall time to scan "local cache" directory? Or is this a separate process right now? @ola-rozenfeld can you please do some tests around this area?

I can think of these options to improve the situation:

  1. Run the "local cache" scan in background and just "fail" cache lookups until it's ready (maybe with some separate status). This is not perfect, because the scan will be even slower when the actual build starts.
  2. Use some kind of a simple "database" file that can be hot-loaded at startup and then be updated/synced in background (proto file for ex. or maybe some Go-specific serialization).

Copy link

@maruel maruel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned about the whole "scan this huge dir on startup" process. In Chromium Reclient is auto started/stopped during each autoninja invocation. This is different compared to how Goma worked: there was a separate goma_ctl ensure_start command that kept the proxy in the background until a user terminated it manually, that's why local disk cache re-read wasn't an issue.

Does this mean reproxy startup (and autoninja call in general) will be delayed by the overall time to scan "local cache" directory? Or is this a separate process right now? @ola-rozenfeld can you please do some tests around this area?

I can think of these options to improve the situation:

  1. Run the "local cache" scan in background and just "fail" cache lookups until it's ready (maybe with some separate status). This is not perfect, because the scan will be even slower when the actual build starts.
  2. Use some kind of a simple "database" file that can be hot-loaded at startup and then be updated/synced in background (proto file for ex. or maybe some Go-specific serialization).

I agree and was about to suggest the same thing. In isolated, I used the second option. A very simple json file worked fine, really, just keeping object in a LRU list. Then a priority queue is not needed, just mutate the slice in place. Go is surprisingly fast at this, I would be surprised if it were not faster than using a priority queue. One lock instead of many.

The isolated code soft-failed when a file was missing / corrupted on fetch.
Only "once in a while" would you scan the whole directory to make sure there's no forgotten files.

"time"
)

func FileInfoToAccessTime(info fs.FileInfo) time.Time {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote to not export this function.

}
}
// Randomly access and store files from different threads.
eg, _ := errgroup.WithContext(context.Background())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's no context, use eg := errgroup.Group{}

return filepath.WalkDir(prefixDir, func(path string, d fs.DirEntry, err error) error {
// We log and continue on all errors, because cache read errors are not critical.
if err != nil {
return fmt.Errorf("error reading cache directory: %v", err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/%v/%w/g ?

if err != nil {
return fmt.Errorf("error parsing cached file name %s: %v", path, err)
}
atime, err := getLastAccessTime(path)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's likely already cached in d.Info(), I'd inline this function here to save one stat per file.

return uint64(atomic.LoadInt64(&d.sizeBytes))
}

func (d *DiskCache) getKeyFromFileName(fname string) (key, error) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be a member function.

testGcTicks chan uint64
}

func New(ctx context.Context, root string, maxCapacityBytes uint64) (*DiskCache, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think walking the disk on initialization is going to introduce a performance hit proportional to the size of the cache. I like the idea of persisting the state to a file and reloading that single file on startup. This has two main advantages:

  • cheaper startup cost
  • access times are persisted as known by the cache, regardless of the nuances of the underlying filesystem (e.g. may be mounted with noatime)

return filepath.Join(d.root, k.digest.Hash[:2], fmt.Sprintf("%s.%d", k.digest.Hash[2:], k.digest.Size))
}

func (d *DiskCache) StoreCas(dg digest.Digest, path string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think generalizing this method to Write(dg digest.Digest, io.Reader) error provides several advantages:

We should also document that the cache does not enforce any relationship between dg and the reader's bytes.

return nil
}

func (d *DiskCache) gc() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this implementation may have the following issues:

  • if the capacity is not sufficient for the workload, eviction becomes part of every update.
  • having three different locks being acquired and released independently is prune to races; e.g. attempting to load an item after it's been deleted from the ordered list but before it's been deleted from the map, will fail to update the index in the list.
  • the client must remember to Shutdown the cache to avoid resource leaks.

The capacity issue can be mitigated by evicting a percentage of the cache rather than just enough to meet the threshold. This accommodates several new items before triggering another eviction.

The synchronization issue can be traded off with a simpler implementation that uses a single lock. Updating the list and the map should be an atomic operation in which the entire list is shared and therefore is the bottleneck. Actually, hits are bounded by IO anyways. Misses on the other hand can be faster with less locking. Then again, misses are always followed by updates which are bounded by IO. I think IO is more of a bottleneck than synchronization so it's worth going for a simple locking flow.

How bad is it if eviction events were blocking? I think if eviction saturates IO bandwidth it should be the same whether it's async or sync. Maybe blocking would be faster if we consider disk locality. I also think that most users would prefer using a large cache to avoid eviction anyways. Without a background eviction task, the cache would be simpler to work with.

@wknapik
Copy link

wknapik commented Apr 5, 2024

*bump*

@ola-rozenfeld
Copy link
Contributor Author

bump

Yes, I'm really sorry for disappearing -- I've had very hectic few weeks and didn't have time for this. This effort is something I'm doing voluntarily in my spare time, and it is not currently prioritized by EngFlow. I will be back to working on this in the next week or two!

@wknapik
Copy link

wknapik commented Apr 11, 2024

Thanks @ola-rozenfeld, much appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants