Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Burn in flow_run_id, remote_url, sha into package meta data #1577

Merged
merged 19 commits into from
Nov 6, 2023

Conversation

dbast
Copy link
Member

@dbast dbast commented Feb 12, 2022

This is ready for discussion, if that concept would be excepted or what changes to apply to it: It is based on the idea to make it possible to trace back existing packages to their origin (repo, commit and CI run) for audit purposes, but also for easier rebuilds in the future of older packages by knowing from what repo and commit they have been created.

This uses the conda-build 3.21.8 feature to add extra meta data to the resulting package to include the information, from which git-repo/feedstock, commit sha1 and CI RUN ID a package was created.

How can I test that? Just using an existing feedstock, where I have access and do a local rerender with rotating through all the different CI providers?

@wolfv Can I assume this also works with mamba-build/boa as it uses conda-build in the end to create the package?

Thanks!

cc @cjmartian, who implemented that in conda-build.

Docs:

Checklist

  • Added a news entry

@wolfv
Copy link
Member

wolfv commented Feb 12, 2022

We should test that mambabuild forwards the arguments properly but in general this should work, yeah!

@cjmartian
Copy link

This all looks correct to me, haven't really touched conda forge CI before though.

@h-vetinari
Copy link
Member

This sounds like a cool addition! Any updates on this?

@dbast
Copy link
Member Author

dbast commented May 17, 2022

This is now enabled for defaults packages... will look into rebasing and testing that.

Update: see e.g. wget -qO- https://anaconda.org/anaconda/pytest/7.1.2/download/linux-64/pytest-7.1.2-py38h06a4308_0.tar.bz2 | tar xOz info/about.json | jq '.extra'

Though the extra key names are a bit different in defaults: remote_url, sha, flow_run_id vs GIT_URL, GIT_SHA1, CI_RUN_ID. That can be harmonised.

@dbast dbast changed the title [RFC] Burn in CI_RUN_ID, GIT_URL, GIT_SHA1 into package meta data [WIP] Burn in CI_RUN_ID, GIT_URL, GIT_SHA1 into package meta data Jun 22, 2022
Co-authored-by: Isuru Fernando <isuruf@gmail.com>
@jaimergp
Copy link
Member

jaimergp commented May 17, 2023

@dbast - is this still [WIP]?

@dbast
Copy link
Member Author

dbast commented May 26, 2023

@jaimergp Thanks for looking into this .. the PR is not really WIP anymore, but hard to test:

  • the used conda-build feature (with tests) to burn in that meta data was added via this PR Add extra meta to about.json conda/conda-build#4303
  • The write_about_json(_) function, which in the end burns in that data in conda-build, unfortunately does not write anything to the build log, which would help to verify that the right extra meta data is burned into the package.
  • Thus this can be only verified by inspecting created packages, which is harder than just looking at the build log.
  • So even if this current conda-smithy PR would be correct atm, it would be very hard over time to warrant that any future changes (like newly added CI providers) don't degrade this functionality in conda-smithy.
  • So best thing would be to add a log line in conda-build that logs the extra data... after that we can see in every build log, if we consistently burn-in the right meta data... and even helps to co-relate the build log with the package, by both showing the same hashes.

@jaimergp
Copy link
Member

jaimergp commented Jul 5, 2023

The logs will be available in the upcoming conda-build release this month 👀

@dbast
Copy link
Member Author

dbast commented Aug 7, 2023

conda-build 3.26.0 is available and makes testing this much easier by logging the extra-meta data.

@jakirkham
Copy link
Member

jakirkham commented Aug 7, 2023

Thanks Daniel! 🙏

Think something like this is a good idea

There are other use cases we might want to consider. For example adding links on Anaconda.org for the recipe source, etc. For example ( conda/conda-build#2489 )? What tool we use for this? Whether we package build logs?

This may benefit from a design discussion (and maybe a CEP) to ensure broad support in tooling and use cases for this kind of functionality to ensure broad usability

@isuruf isuruf requested a review from a team as a code owner November 2, 2023 20:52
@isuruf isuruf changed the title [WIP] Burn in CI_RUN_ID, GIT_URL, GIT_SHA1 into package meta data Burn in CI_RUN_ID, GIT_URL, GIT_SHA1 into package meta data Nov 2, 2023
@isuruf
Copy link
Member

isuruf commented Nov 2, 2023

@conda-forge/core, this is ready for a review

@isuruf isuruf changed the title Burn in CI_RUN_ID, GIT_URL, GIT_SHA1 into package meta data Burn in flow_run_id, remote_url, sha into package meta data Nov 2, 2023
@isuruf
Copy link
Member

isuruf commented Nov 2, 2023

@jakirkham, I changed the variable names to the ones used by Anaconda.

Here's the metadata from the pkgs/main/python pkg

extra:
  copy_test_source_files: true
  feedstock-name: python
  final: true
  flow_run_id: 5a3cdab7-40d3-44ab-8d4e-16946a951bef
  recipe-maintainers:
    - isuruf
    - jakirkham
    - katietz
    - mbargull
    - mingwandroid
    - msarahan
    - ocefpaf
    - pelson
    - scopatz
    - xhochy
  remote_url: git@github.com:AnacondaRecipes/python-feedstock.git
  sha: 4ad3a1da3343d359f1961b7a0cb721cdd396f6b9

@jaimergp
Copy link
Member

jaimergp commented Nov 3, 2023

Added support for defaults in conda-metadata-app, and it now displays detailed provenance info if available, taking you to the exact commit that produced that build.

Example

image

So when this PR lands, we will have that info for conda-forge moving forward! 🥳

news/burn_in_ids.rst Outdated Show resolved Hide resolved
dbast and others added 4 commits November 3, 2023 13:37
Co-authored-by: jaimergp <jaimergp@users.noreply.github.com>
Co-authored-by: jaimergp <jaimergp@users.noreply.github.com>
Co-authored-by: jaimergp <jaimergp@users.noreply.github.com>
@dbast
Copy link
Member Author

dbast commented Nov 3, 2023

Yay, thanks for the help here!

@isuruf isuruf requested a review from a team November 6, 2023 16:11
@isuruf isuruf merged commit df653f1 into conda-forge:main Nov 6, 2023
2 checks passed
@jaimergp
Copy link
Member

jaimergp commented Nov 8, 2023

Woohoo! We have provenance data now! Look at this https://conda-metadata-app.streamlit.app/?q=conda-forge%2Fnoarch%2Fmakim-1.8.3-pyh707e725_0.conda

image

"Provenance" cell in the table.

@AndresGuzman-Ballen
Copy link

This is absolutely amazing indeed! Out of curiosity, does this mean that only packages built starting today will include the remote_url metadata? Or is this going to be retroactively applied and in a couple of weeks/months when CI has made its way to all the packages, then theoretically any conda package that has ever existed in conda-forge channel will have a remote_url entry?

@jaimergp
Copy link
Member

jaimergp commented Nov 8, 2023

I'm afraid this PR only covers new artifacts. I'm not even sure how we could retroactively find the needed bits.

@AndresGuzman-Ballen
Copy link

I had a feeling. If you've ever watched the movie "Primer", the same time-travel logic applies where you can only travel back to the time the time-machine was invented LOL

@jakirkham
Copy link
Member

Or it could be like the game Continuum. You can change the past, but you have to change it back when you are done 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants