-
Notifications
You must be signed in to change notification settings - Fork 138
Description
Within the context of packaging, ideally anything that is fetched should have its integrity checked. In most cases, this involves comparing the checksum of the file to what's expected. In the case of tarball, there's reasonable expectation that it should be reproducible, as it involves fetching source-code exactly to the commit as defined in eng/Versions.Details.xml. Unfortunately, this is not the case, as the produced tarball contains many volatile files:
- ./src/*/.git directory in each product repo
- ./src/*/.git/modules in each product repo
- ./git-info vis-a-vis OfficialBuildId for certain products
Within the context of producing the Alpine aport for dotnet6, I wanted the ability to integrity check the tarball and cache it. This was done using the following patch:
From 518e05e8b4cd823c2c7365f46178322331206416 Mon Sep 17 00:00:00 2001
From: Antoine Martin <dev@ayakael.net>
Date: Wed, 16 Feb 2022 15:59:42 +0000
Subject: [PATCH 1/1] reproducible tarball
---
.../src/Tarball_WriteSourceRepoProperties.cs | 6 ++---
.../tools/SourceBuildArcadeTarball.targets | 8 +++----
.../SourceBuildArcadeTarball_strip-git.sh | 24 +++++++++++++++++++
3 files changed, 31 insertions(+), 7 deletions(-)
create mode 100755 src/SourceBuild/Arcade/tools/SourceBuildArcadeTarball_strip-git.sh
diff --git a/src/SourceBuild/Arcade/src/Tarball_WriteSourceRepoProperties.cs b/src/SourceBuild/Arcade/src/Tarball_WriteSourceRepoProperties.cs
index b6b2793f3..55d680971 100644
--- a/src/SourceBuild/Arcade/src/Tarball_WriteSourceRepoProperties.cs
+++ b/src/SourceBuild/Arcade/src/Tarball_WriteSourceRepoProperties.cs
@@ -105,12 +105,12 @@ private static DerivedVersion GetVersionInfo(string version, string commitCount)
if (releaseParts.Length == 2)
{
// NuGet does this - arbitrary build IDs
- return new DerivedVersion { OfficialBuildId = DateTime.Now.ToString("yyyyMMdd.1"), PreReleaseVersionLabel = releaseParts[0] };
+ return new DerivedVersion { OfficialBuildId = DateTime.Now.ToString("19900101.1"), PreReleaseVersionLabel = releaseParts[0] };
}
else if (releaseParts.Length == 3)
{
// VSTest uses full dates for the first part of their preview build numbers
- if (DateTime.TryParseExact(releaseParts[1], "yyyyMMdd", new CultureInfo("en-US"), DateTimeStyles.AssumeLocal, out DateTime fullDate))
+ if (DateTime.TryParseExact(releaseParts[1], "19900101", new CultureInfo("en-US"), DateTimeStyles.AssumeLocal, out DateTime fullDate))
{
return new DerivedVersion { OfficialBuildId = $"{releaseParts[1]}.{releaseParts[2]}", PreReleaseVersionLabel = releaseParts[0] };
}
@@ -139,7 +139,7 @@ private static DerivedVersion GetVersionInfo(string version, string commitCount)
{
// finalized version number (x.y.z) - probably not our code
// VSTest, Application Insights, Newtonsoft.Json do this
- return new DerivedVersion { OfficialBuildId = DateTime.Now.ToString("yyyyMMdd.1"), PreReleaseVersionLabel = string.Empty };
+ return new DerivedVersion { OfficialBuildId = DateTime.Now.ToString("19900101.1"), PreReleaseVersionLabel = string.Empty };
}
throw new FormatException($"Can't derive a build ID from version {version} (commit count {commitCount}, release {string.Join(";", nugetVersion.Release.Split('-', '.'))})");
diff --git a/src/SourceBuild/Arcade/tools/SourceBuildArcadeTarball.targets b/src/SourceBuild/Arcade/tools/SourceBuildArcadeTarball.targets
index aaf297ac0..d50d4a63c 100644
--- a/src/SourceBuild/Arcade/tools/SourceBuildArcadeTarball.targets
+++ b/src/SourceBuild/Arcade/tools/SourceBuildArcadeTarball.targets
@@ -15,6 +15,7 @@
<PropertyGroup>
<ArtifactsDir>$(RepoRoot)artifacts/</ArtifactsDir>
<TarballDir>$(ArtifactsDir)tarball/</TarballDir>
+ <TarballNameDir>$([MSBuild]::MakeRelative($([System.IO.Path]::GetDirectoryName($(TarballDir))), $(TarballDir)))</TarballNameDir>
<TarballRootDir>$([MSBuild]::EnsureTrailingSlash('$(TarballDir)'))</TarballRootDir>
<TarballSourceDir>$(TarballRootDir)src/</TarballSourceDir>
<TarballGitInfoDir>$(TarballRootDir)git-info/</TarballGitInfoDir>
@@ -42,7 +43,7 @@
<Error Text="TarballFilePath not specified." Condition=" '$(TarballFilePath)' == '' " />
<MakeDir Directories="$(TarballFileDir)" Condition=" '$(TarballFileDir)' != '' " />
- <Exec Command="tar --numeric-owner -zcf $(TarballFilePath) -C $(TarballDir) ." />
+ <Exec Command="tar --format=gnu --sort=name --owner=0 --group=0 --numeric-owner --mtime='1990-01-01Z' -zcf $(TarballFilePath) -C $(TarballDir)/.. $(TarballNameDir)" />
<Message Text="Tarball '$(TarballFilePath)' was successfully created from '$(TarballDir)'" Importance="High" />
</Target>
@@ -175,10 +176,10 @@
Command="git submodule update --init --recursive --depth 1"
WorkingDirectory="$(TarballRepoSourceDir)" />
- <!-- Remove the git objects folder to free up tarball space -->
+ <!-- Strips git directories / submodules to bare minimum for reproducibility -->
<Exec
- Command="rm -rf objects"
- WorkingDirectory="$(TarballRepoSourceDir).git"
+ Command="$(RepoRoot)/src/SourceBuild/Arcade/tools/SourceBuildArcadeTarball_strip-git.sh $(SourceDir)"
+ WorkingDirectory="$(TarballSourceDir)"
Condition="$(PreserveTarballGitFolders) != 'true'" />
<Message Text="--> Done Cloning Repo $(SourceBuildRepoName)" Importance="High" />
diff --git a/src/SourceBuild/Arcade/tools/SourceBuildArcadeTarball_strip-git.sh b/src/SourceBuild/Arcade/tools/SourceBuildArcadeTarball_strip-git.sh
new file mode 100755
index 000000000..51a4ed842
--- /dev/null
+++ b/src/SourceBuild/Arcade/tools/SourceBuildArcadeTarball_strip-git.sh
@@ -0,0 +1,24 @@
+#!/bin/bash
+repodir="$1"
+for i in $(find $repodir -name '.git' | sort -r); do
+ if [ -f "$i" ]; then
+ gitdir="${i/\/.git}/$(sed 's/gitdir: //' $i)"
+ commit=$(cat "$gitdir"/HEAD)
+ else
+ gitdir=$i
+ commit=${i%/.*}
+ commit=${commit##*.}
+ fi
+
+ url="$(grep -m 1 url "$gitdir"/config)"
+ url=${url#*= }
+
+ rm -rf "$i"
+ mkdir -p "$i"
+
+ echo $commit > "$i"/HEAD
+ {
+ echo "[remote \"origin\"]"
+ echo -n "url = \"$url\""
+ } > "$i"/config
+done
--
2.34.1By default, SourceBuildArcadeTarball.targets only deleted .git/objects to save on space. This patch pushes the deletion a step further to the bare minimum. I defined this minimum as what is required by build and by git apply. This turns out to be the existence of .git/HEAD with commit and .git/config with remote url defined. This operation also needs to apply to git submodules. So during tarball build, instead of executing rm .git/objects, a script is executed that looks for all .git files / directories, parses the required information (commit + remote url), then deletes those files / directories and replaces them with a bare minimum git folder.
Another issue was that certain repos have volatile OfficialBuildId as they are generated based on tarball generation date. Thus, the code that generates those volatile build id now sets those to a generic date. In this case: 19900101.1. This is a problem, as we then need to change it back before build. Alpine aport compensates by setting these back to build date executing sed -i "s|<OfficialBuildId>19900101.1</OfficialBuildId>|<OfficialBuildId>$(date +"%Y%m%d.1")</OfficialBuildId>|" "$i" within a for loop. A better solution should be implemented.
Not all is done, though, as reproducibility also needs to happen at the compression level, hence the use of <Exec Command="tar --format=gnu --sort=name --owner=0 --group=0 --numeric-owner --mtime='1990-01-01Z' -zcf $(TarballFilePath) -C $(TarballDir)/.. $(TarballNameDir)" />. Inclusion of TarballNameDir is in there, as I wanted the tarball to extract, by default, in a subdir rather than on ./
I'm opening a discussion on this, as I'd love a better solution to the OfficialBuildId issue, and see if there's a broader appreciation for this.
Made as part of Alpine Linux dotnet6 packaging project, see #2782
Metadata
Metadata
Assignees
Labels
Type
Projects
Status