Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change tar command used in get_source_tarball_from_git to get reproducible tarballs #4248

Merged
merged 9 commits into from Mar 18, 2024

Conversation

Rovanion
Copy link
Contributor

Hi,
This pull request contains two commits performing two changes to easybuild that were needed to build a piece of Python software that was not located on the Python Package Index but only existed in a git repository.

The first change allows sources in exts_list to specify a git_config. The second change changes the command used to produce tarballs from git archives so that they are made in a reproducible manner. Otherwise each rerun of eb would result in a checksum mismatch.

These two patches have been tested on CentOS 7 and no other operating system. The command used to produce the tarball from git is complicated by the old versions of software that is found on CentOS/RHEL 7. But in the comments instructions on how to drop support for such outdated operating systems is also included.

If there are any questions, feel free to ask.
Cheerio!

Micket
Micket previously requested changes May 4, 2023
easybuild/framework/easyblock.py Outdated Show resolved Hide resolved
easybuild/tools/filetools.py Outdated Show resolved Hide resolved
@Rovanion Rovanion force-pushed the reproducible-git-tarballs branch from 1b4e18d to 049b0cb Compare May 5, 2023 07:42
@boegel boegel added this to the 4.x milestone May 10, 2023
@boegel boegel changed the title Reproducible git tarballs change tar command used in get_source_tarball_from_git to get reproducible tarballs May 10, 2023
@Rovanion
Copy link
Contributor Author

Rovanion commented May 11, 2023 via email

@Rovanion Rovanion force-pushed the reproducible-git-tarballs branch 2 times, most recently from 82cb6f0 to d257534 Compare May 11, 2023 08:15
@Rovanion
Copy link
Contributor Author

Rovanion commented May 11, 2023 via email

@Rovanion Rovanion changed the title change tar command used in get_source_tarball_from_git to get reproducible tarballs Change tar command used in get_source_tarball_from_git to get reproducible tarballs May 11, 2023
@boegel boegel modified the milestones: 4.x, 5.0 Jun 7, 2023
@boegel boegel added the EasyBuild-5.0 EasyBuild 5.0 label Jun 7, 2023
@boegel
Copy link
Member

boegel commented Jun 7, 2023

We've discussed this PR a bit in a recent EasyBuild conf call, and the general consensus was that we can indeed proceed with this, but do so in the scope of the EasyBuild 5.0 effort (since it's a significant change).

@Rovanion Are you up for making the tests happy here that complain about style issues?
Once that's done, we could look into moving this over to target the 5.0.x branch...

@Rovanion
Copy link
Contributor Author

Rovanion commented Jun 7, 2023

Yeah I can do that. I'll see if I can get it in before vacation.

@Rovanion Rovanion force-pushed the reproducible-git-tarballs branch 2 times, most recently from 768d66a to f621000 Compare June 12, 2023 11:03
@Rovanion
Copy link
Contributor Author

All linter complaints should be fixed now.

boegel
boegel previously requested changes Jun 13, 2023
test/framework/filetools.py Outdated Show resolved Hide resolved
@Rovanion Rovanion force-pushed the reproducible-git-tarballs branch 3 times, most recently from 9daa03c to bf4f3f6 Compare June 19, 2023 15:15
@Rovanion
Copy link
Contributor Author

The only workflow errors that seem to occur are running Python 2.7 and 3.5 which seem to have been (partly?) dropped from the list of supported Pythons to run Easybuild on?

@boegel boegel added the change label Jun 20, 2023
@Rovanion
Copy link
Contributor Author

The check "Tests for Apptainer container support / build (3.7, 1.0.0) (pull_request)" have failed due to a fail to write a file, looks like the machine used for testing failed in this case.

@Rovanion
Copy link
Contributor Author

Rovanion commented Jul 4, 2023

Is there anything else that needs to be done before this can be merged?

@boegel
Copy link
Member

boegel commented Jul 5, 2023

@Rovanion We need to take another detailed look at this, but what definitely needs to happen is re-targeting this to the 5.0.x branch, since we're actively working towards EasyBuild 5.0 which includes some breaking changes (see https://github.com/orgs/easybuilders/projects/18 +

@Rovanion Rovanion changed the base branch from develop to 5.0.x July 5, 2023 15:49
@jfgrimm
Copy link
Member

jfgrimm commented Dec 20, 2023

when not excluding .git, then:

  • .git/logs/HEAD, .git/logs/refs/heads/main, .git/logs/refs/remotes/origin/HEAD contain unix timestamps from when the repo was cloned
  • .git/index differs
  • .git/objects/pack/pack-*.{idx,pack} have different names, and contents

the unix timestamps are easy enough to deal with, but the metadata encoded in the index and pack files is more complicated

@boegel
Copy link
Member

boegel commented Jan 3, 2024

@Rovanion Any idea if the issues that @jfgrimm raised can be fixed?

@Rovanion
Copy link
Contributor Author

Rovanion commented Jan 8, 2024

Hi,
I've thought some about this but not reached any conclusions I'm satisfied with.

On the BSD/Mac side: Does EasyBuild have a bootstrapping process/story in place so that it may use a self-built version of GNU Tar?

If not: I read up on flags for FreeBSDs Tar one day during the Holidays, was not able to find any definitive answer but hints that it should be possible to coax it into submission.

On the subject of including .git: It could perhaps be possible to rewrite the files in .git, but then its contents wouldn't be the same as when the binary artifact was built so is it even valid by then? My main question is though: For which usecase is does this feature exist?

@boegel boegel changed the title Change tar command used in get_source_tarball_from_git to get reproducible tarballs change tar command used in get_source_tarball_from_git to get reproducible tarballs Jan 17, 2024
@lexming
Copy link
Contributor

lexming commented Feb 14, 2024

I brought this PR to the last EB5 conf call and we might have a solution for it:

  1. use a common set of features between GNU tar and BSD tar to guarantee compatibility
  2. discard --mtime due to point 1 and instead manually set the timestamps of each file with touch -t 197001010100 prior to tar
  3. use reproducible tarballs for sources from git repos that are pulled without .git (that is the default behaviour)
  4. sources from git repos explicitly setting keep_git_dir=True will be kept with .git and won't be checksumed, because we can't as discussed here

@Rovanion is this a workable solution for you?

@lexming
Copy link
Contributor

lexming commented Feb 14, 2024

A little proof of concept with touch:

$ git clone https://github.com/easybuilders/easybuild
Cloning into 'easybuild'...
remote: Enumerating objects: 11089, done.
remote: Counting objects: 100% (909/909), done.
remote: Compressing objects: 100% (457/457), done.
remote: Total 11089 (delta 482), reused 851 (delta 442), pack-reused 10180
Receiving objects: 100% (11089/11089), 595.66 MiB | 10.09 MiB/s, done.
Resolving deltas: 100% (7324/7324), done.
# Delete .git folder
$ rm -rf easybuild/.git
# Checksum of downloaded files
$ tar --create --file easybuild.tar easybuild/
$ sha256sum easybuild.tar
98457ba141e829641878550bcf82681c87380acd106d695235385043dc7327ad  easybuild.tar
# Checksum of downloaded files with timestamp reset
$ find easybuild/ -exec touch -t 197001010100 {} \;
$ tar --create --owner=0 --group=0 --numeric-owner --format=gnu --null --file easybuild.0.tar easybuild/
$ sha256sum easybuild.0.tar
e1c54e194764bcd89165da527c25b4f5d82fcbd132be9d5f1f75ee40d9eb6ddc  easybuild.0.tar
# Wait some time and repeat the clone + touch + tar
$ sha256sum easybuild.1.tar
e1c54e194764bcd89165da527c25b4f5d82fcbd132be9d5f1f75ee40d9eb6ddc  easybuild.1.tar

@Rovanion
Copy link
Contributor Author

Yes, that would work. I'm not available to work on EasyBuild at the moment, I have been directed to prioritize other projects. Feel free to implement this until I am able to work on EasyBuild again.

Copy link
Contributor

@lexming lexming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. I made a PR to this one with the discussed changes. Please check Rovanion#1

The PR looks messy because I'm also syncing to current state of 5.0.x branch.
The meaningful commit is Rovanion@8d9e14e

  • use reproducible archive for git repos without .git. dir
  • set timestamps with touch
  • avoid deprecated GZIP environment variable by piping into gzip

# print names of all files and folders excluding .git directory
'find', repo_name, '-name ".git"', '-prune', '-o', '-print0',
# reset access and modification timestamps
'-exec', 'touch', '-t 197001010100', '{}', '\;', '|',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rovanion I missed using a raw string for this bit:

Suggested change
'-exec', 'touch', '-t 197001010100', '{}', '\;', '|',
'-exec', 'touch', '-t 197001010100', '{}', r'\;', '|',

for \; in get_source_tarball_from_git.
@boegel
Copy link
Member

boegel commented Mar 2, 2024

@Rovanion Problems with Apptainer tests was fixed in #4472, which was also applied to the 5.0.x branch (so you'll need to sync up your PR branch with the current 5.0.x branch)

Copy link
Contributor

@lexming lexming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rovanion I opened another PR to this one that just syncs with recent changes in the 5.0.x branch. Unfortunately, at the time of the previous PR, unit tests were broken. Can you please merge Rovanion#2?

@boegel
Copy link
Member

boegel commented Mar 11, 2024

@Rovanion ping?

@Rovanion
Copy link
Contributor Author

I think all tests have passed now. All that remains are the three reviews.

@jfgrimm jfgrimm dismissed stale reviews from boegel, Micket, and lexming March 18, 2024 17:47

req. changes made

Copy link
Contributor

@lexming lexming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rovanion Thanks a lot for keeping up with the updates!

LGTM

@lexming lexming merged commit db2ab3a into easybuilders:5.0.x Mar 18, 2024
32 checks passed
@Rovanion
Copy link
Contributor Author

Rovanion commented Mar 20, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants