Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hash-checking mode in pip-sync #474

Closed
1 of 5 tasks
konstin opened this issue Nov 21, 2023 · 1 comment
Closed
1 of 5 tasks

Support hash-checking mode in pip-sync #474

konstin opened this issue Nov 21, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request security

Comments

@konstin
Copy link
Member

konstin commented Nov 21, 2023

See https://pip.pypa.io/en/stable/topics/secure-installs/#hash-checking-mode

  • Support hash-checking mode in pip-compile output #131
  • Read hashes from requirements.txt format (https://pip.pypa.io/en/stable/reference/requirements-file-format/#per-requirement-options)
  • Compute the sha256 when downloading a distribution (both source dist and wheel), store them in the cache (and make sure to keep in sync with cache invalidation) or check that they match the File description (TODO: Does this have a perf impact? If yes, do we always want to do this or only if the registry doesn't tell us the sha?)
  • When installing, check the hashes
    • Ignore distribution with mismatching hashes: A better matching wheel might have been uploaded since the lockfile was created, but we have to ignore it in hash checking more and fall back to the next file. Report when there is no distribution because non matched the hashes (but would without hashes)
@charliermarsh
Copy link
Member

I've started to work on this. Doing something I've never done, sharing my existing notes on how it works: https://astral-sh.notion.site/Hash-checking-in-uv-fd27f5a51e8f4f5f8547a183fbf3e006

@charliermarsh charliermarsh self-assigned this Apr 4, 2024
charliermarsh added a commit that referenced this issue Apr 10, 2024
## Summary

This PR adds support for hash-checking mode in `pip install` and `pip
sync`. It's a large change, both in terms of the size of the diff and
the modifications in behavior, but it's also one that's hard to merge in
pieces (at least, with any test coverage) since it needs to work
end-to-end to be useful and testable.

Here are some of the most important highlights:

- We store hashes in the cache. Where we previously stored pointers to
unzipped wheels in the `archives` directory, we now store pointers with
a set of known hashes. So every pointer to an unzipped wheel also
includes its known hashes.
- By default, we don't compute any hashes. If the user runs with
`--require-hashes`, and the cache doesn't contain those hashes, we
invalidate the cache, redownload the wheel, and compute the hashes as we
go. For users that don't run with `--require-hashes`, there will be no
change in performance. For users that _do_, the only change will be if
they don't run with `--generate-hashes` -- then they may see some
repeated work between resolution and installation, if they use `pip
compile` then `pip sync`.
- Many of the distribution types now include a `hashes` field, like
`CachedDist` and `LocalWheel`.
- Our behavior is similar to pip, in that we enforce hashes when pulling
any remote distributions, and when pulling from our own cache. Like pip,
though, we _don't_ enforce hashes if a distribution is _already_
installed.
- Hash validity is enforced in a few different places:
1. During resolution, we enforce hash validity based on the hashes
reported by the registry. If we need to access a source distribution,
though, we then enforce hash validity at that point too, prior to
running any untrusted code. (This is enforced in the distribution
database.)
2. In the install plan, we _only_ add cached distributions that have
matching hashes. If a cached distribution is missing any hashes, or the
hashes don't match, we don't return them from the install plan.
3. In the downloader, we _only_ return distributions with matching
hashes.
4. The final combination of "things we install" are: (1) the wheels from
the cache, and (2) the downloaded wheels. So this ensures that we never
install any mismatching distributions.
- Like pip, if `--require-hashes` is provided, we require that _all_
distributions are pinned with either `==` or a direct URL. We also
require that _all_ distributions have hashes.

There are a few notable TODOs:

- We don't support hash-checking mode for unnamed requirements. These
should be _somewhat_ rare, though? Since `pip compile` never outputs
unnamed requirements. I can fix this, it's just some additional work.
- We don't automatically enable `--require-hashes` with a hash exists in
the requirements file. We require `--require-hashes`.

Closes #474.

## Test Plan

I'd like to add some tests for registries that report incorrect hashes,
but otherwise: `cargo test`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request security
Projects
None yet
Development

No branches or pull requests

3 participants