Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re-introduce cache healing when we see an invalid cache entry #1707

Merged
merged 3 commits into from
Feb 19, 2024

Conversation

BurntSushi
Copy link
Member

@BurntSushi BurntSushi commented Feb 19, 2024

This PR introduces more robust cache healing when uv fails to
deserialize an existing cache entry.

("Cache healing" in this context means that if uv fails to
deserialize a cache entry, then it will automatically invalidate that
entry and re-generate the data. Typically by sending an HTTP request.)

Previous to some optimizations I made around deserialization, we were
already doing this. After those optimizations, deserializing a cache
policy and the payload were split into two steps. While deserializing
a cache policy retained its cache healing behavior, deserializing the
payload did not. This became an issue when #1556 landed, which changed
one of our rkyv data types. This in turn made our internal types
incompatible with existing cache entries. One could work-around this
by clearing uv's cache with uv clean, but we should just do it
automatically on a cache entry by entry basis.

This does technically introduce a new cost by pessimistically cloning
the HTTP request so that we can re-send it if necessary (see the commit
messages for the knot pushing me toward this approach). So I re-ran my
favorite ad-hoc benchmark:

$ hyperfine -w10 --runs 50 "uv-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" "uv-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" ; A bart
Benchmark 1: uv-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
  Time (mean ± σ):     114.4 ms ±   3.2 ms    [User: 149.4 ms, System: 221.5 ms]
  Range (min … max):   106.7 ms … 122.0 ms    50 runs

Benchmark 2: uv-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
  Time (mean ± σ):     114.0 ms ±   3.0 ms    [User: 146.0 ms, System: 223.3 ms]
  Range (min … max):   105.3 ms … 121.4 ms    50 runs

Summary
  uv-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null ran
    1.00 ± 0.04 times faster than uv-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null

Which is about what I expected.

We should endeavor to have a better testing strategy for these kinds of bugs, but I think it might be a little tricky to do. I created #1699 to track that.

Fixes #1571

Copy link
Member

@zanieb zanieb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Charlie might be a better reviewer though.

This is just moving code to make it easier to use response handling in a
different context. No behavior changes.
Basically, instead of it wrapping its data up inside an enum, we just
return the data itself. That way, the caller can control what is done
with it. Right now, all uses just wrap it into the enum anyway, so this
looks somewhat pointless, but we'll use this flexibility in the next
commit.
This change detects payload deserialization errors, emits a (non-user)
warning and sends a fresh request to get new data.

This self-healing existed before some of my zero-copy deserialization
work landed, but my changes ended up losing half of it. The half that
still remains is if the cache policy itself fails to deserialize (since
it is done separately from the payload). This commit now also does it if
the payload fails to deserialize.

Unfortunately, we do clone the request here pessimistically in case we
need to resend it. Indeed, we might end up sending two requests: a
re-validation request and then another fresh request if we discover our
cached data is bad after the revalidation request.

We could try to deserialize the cached data before (possibly) sending
the initial request. But this does extra work (checking the validity of
the cached copy) even if we don't use it. It's hard to know exactly
which approach would be faster, but I went this route because the
request is typically very small.
@BurntSushi BurntSushi merged commit cd1f619 into main Feb 19, 2024
7 checks passed
@BurntSushi BurntSushi deleted the ag/fix-cache-healing branch February 19, 2024 17:33
@BurntSushi BurntSushi added the bug Something isn't working label Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Reading from cache archive failed while upgrading pytest
2 participants