Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better hashing of sources #4762

Open
2 tasks done
chrisburr opened this issue Jan 31, 2023 · 4 comments · May be fixed by #5277
Open
2 tasks done

Better hashing of sources #4762

chrisburr opened this issue Jan 31, 2023 · 4 comments · May be fixed by #5277
Labels
source::contributor created by a frequent contributor stale::recovered [bot] recovered after being marked as stale type::feature request for a new feature or capability

Comments

@chrisburr
Copy link
Contributor

Checklist

  • I added a descriptive title
  • I searched open requests and couldn't find a duplicate

What is the idea?

Support a content dependent hash of sources rather than only hashing the source file itself. This is how nixpkgs's fetchzip works and could also be used to have a stable hash for recipes using git_url/git_rev.

Why is this needed?

I occasionally see builds failing due to the source coming from an autogenerated git tarball/zip being different (most commonly on github, but the same applies to most git hosts). This is because the git archives are dependent on the git version used and are not intended to be stable or reproducible.

It would also be nice to have more resistance to changing tags if it was also applied to git_url/git_rev.

What should happen?

My first guess would be to have an key like sha256 in the sources section like content_sha256 which would trigger the content-based hash to be checked.

Additional Context

Mixpkgs solved this quite a long time ago with their fetchzip funcion so that would probably be a valuable source of ideas on the implementation.

@chrisburr chrisburr added the type::feature request for a new feature or capability label Jan 31, 2023
@dholth dholth added the source::contributor created by a frequent contributor label Feb 6, 2023
Copy link

github-actions bot commented Feb 7, 2024

Hi there, thank you for your contribution!

This issue has been automatically marked as stale because it has not had recent activity. It will be closed automatically if no further activity occurs.

If you would like this issue to remain open please:

  1. Verify that you can still reproduce the issue at hand
  2. Comment that the issue is still reproducible and include:
    - What OS and version you reproduced the issue on
    - What steps you followed to reproduce the issue

NOTE: If this issue was closed prematurely, please leave a comment.

Thanks!

@github-actions github-actions bot added the stale [bot] marked as stale due to inactivity label Feb 7, 2024
@github-actions github-actions bot added the stale::closed [bot] closed after being marked as stale label Mar 8, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 8, 2024
@jaimergp jaimergp reopened this Apr 12, 2024
@jaimergp jaimergp added stale::recovered [bot] recovered after being marked as stale and removed stale [bot] marked as stale due to inactivity labels Apr 12, 2024
@jaimergp
Copy link
Contributor

This would be handy if we ever start storing sources along the built artifacts. One thing to take care of is the local conda-build cache. sha256 is used to see if the artifact is already available, so I guess with content_sha256 we wouldn't be able to provide a cache. This would be a good option for github dynamic archives and others, but we don't need it for stable artifacts, which could still use sha256.

@jaimergp jaimergp linked a pull request Apr 12, 2024 that will close this issue
3 tasks
@github-actions github-actions bot removed the stale::closed [bot] closed after being marked as stale label Apr 13, 2024
@wolfv
Copy link
Contributor

wolfv commented Apr 15, 2024

We could also record a content hash in the "finalized sources" section of the "rendered recipe" for the new recipe format: conda/ceps#74

@ifitchet
Copy link

ifitchet commented Apr 18, 2024

Just a thought on the YAML structure regarding content_* hashes. If the content is a function of the content and independent of the compression method then should content_md5 etc. be a child node of source rather than a (repeated) child node of each compressed file?

  - source:
    - content_md5: blah
    - fn: foo.tar.gz
       url: 
       md5: other-blah
    - fn: foo.zip
       url:
       md5: third-blah

(I doubt that the above is valid YAML but you get the idea.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
source::contributor created by a frequent contributor stale::recovered [bot] recovered after being marked as stale type::feature request for a new feature or capability
Projects
Status: 🏗️ In Progress
Development

Successfully merging a pull request may close this issue.

5 participants