Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache should not be racy #49

Open
steveklabnik opened this issue Aug 2, 2023 · 4 comments
Open

Cache should not be racy #49

steveklabnik opened this issue Aug 2, 2023 · 4 comments

Comments

@steveklabnik
Copy link
Collaborator

So in #42, one issue that comes up multiple times is cache misses.

Basically, Registrymanager::get_package is racy:

  1. look up package in cache, if it's there, return it
  2. if not, fetch the package
  3. insert package into cache

This works but has the property of a "race condition." (but not a data race!!!) If we spin up two threads, and both are looking for the same package, and thread 1 goes "1, don't have it, go to step 2 and fetch" and then thread 2 starts and does the same, we've fetched the package twice. This manifests as a cache miss.

The alternative is to cause 1 to block in some fashion. We want it to hold some sort of lock, so that if someone else wants something we're working on, we ask them to wait until it's ready. But there's not just one way to do this!

The first way, and the way that's most obvious, is that you put a mutex around the whole thing. That way, if someone is downloading a new package, you wait until they're done. The issue there is that you'd end up blocking everyone when a package is being downloaded, which kind of defeats the purpose of downloading them in paralell, haha.

So my next thought is that we need to do is change the data modelling from "package exists or not" to "package doesn't exist, package is being downloaded, package is complete in the cache." This more properly represents what's going on in the real world. But it's not actually the best way forward. Why? Well, what happens when we have a cache hit? We want to simply return the package. If we know that some other task is already downloading the package, what we actually want is to return. We have no more work to do! Sure it's not ready yet, but the other task will make sure it is.

So, I think that's the task to fix this issue. I don't think it should be a super complex diff, but I'm not going to try and write a patch right this second.

@steveklabnik
Copy link
Collaborator Author

@Boshen had asked this over there:

For cache misses, I want to learn the proper approach as well. Do we put the data structure in a single thread and communicate it with message passing, or do we use some mutexes and condvars as describe in this blog post https://medium.com/@polyglot_factotum/rust-concurrency-patterns-condvars-and-locks-e278f18db74f?

I think both can work, it just depends. Well rather, I would never use a condvar unless you're absolutely sure you need a condvar ;)

The basic question is "Mutex or message passing?" and in my opinion, mutexes tend to just be easier more often. Really just depends, if you want to do an actor-style "this task owns this data and responds to messages about it" that works fine too.

@steveklabnik
Copy link
Collaborator Author

We want to simply return the package. If we know that some other task is already downloading the package, what we actually want is to return. We have no more work to do! Sure it's not ready yet, but the other task will make sure it is.

Speaking with @anonrig in private messages, this won't actually work, because the caller can assume that after this returns true, the package is downloaded, and so that could cause issues if that work began before the task that's currently downloading it is finished doing so. So this won't work.

So really what we're looking for is a sort of synchronization point, where for example things block on "everything is downloaded" rather than a bunch of individual "this is downloaded" responses. I feel like that's slightly poorly worded but I hope it makes sense.

@metcoder95
Copy link

What about semaphores over individual packages?
Maybe I'm being too simplistic, but each package might be in three states cached, downloading, pending; and each thread coordinates about what step follows.

Or has been already discarded? 😅

@anonrig
Copy link
Member

anonrig commented Aug 14, 2023

Maybe I'm being too simplistic, but each package might be in three states cached, downloading, pending; and each thread coordinates about what step follows.

I think this might be the best approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants