[WIP] Remove non reproducible parts from image config for stable builds #30611

Open
wants to merge 1 commit into
from

Projects

None yet

7 participants

@justincormack
Member

If you do the same Docker build again and again, you will currently get
a different outcome. Madness!

The reason for this is that the config file inside the image, from which
the hash is calculated, has some fields that are effectively random or
which change over time.

These are

  • config.Hostname - build containers have random hostnames
  • container_config.Hostname - same
  • container - ID of container used to commit, effectively random
  • created - timestamp of creation
  • history.created - same

This PR leaves the hostname (and domainname) fields blank, and
the container ID.

Timestamps are set to Go zero time. See https://reproducible-builds.org/docs/timestamps/
for rationale. Users who want timestamps should use labels with eg the commit
time or other repeatable time.

TODO: tests, deprecation notices on fields.

With this patch:

whale:repro justin$ cat Dockerfile
FROM scratch
ADD . .
whale:repro justin$ docker build -q --no-cache .
sha256:d6f99a7d1f661973ecda1aa34394c0523774bd61382dee348ec8457471b95578
whale:repro justin$ docker build -q --no-cache .
sha256:d6f99a7d1f661973ecda1aa34394c0523774bd61382dee348ec8457471b95578
whale:repro justin$ docker build -q --no-cache .
sha256:d6f99a7d1f661973ecda1aa34394c0523774bd61382dee348ec8457471b95578

without:

justin@zander:~/tmp/repro$ docker build -q --no-cache .
sha256:47170bd064a699b2838ecc8b279813da1edde5b49a00d8e866863fa1da94c477
justin@zander:~/tmp/repro$ docker build -q --no-cache .
sha256:f9e5f8f8ac1e23f1d1ed45150d4c69f251056840d8745c4606001f6c340b2baa
justin@zander:~/tmp/repro$ docker build -q --no-cache .
sha256:5924b383e19182bc2d6e382041d5acc11985cdea76c18db12fcbbcd867fd66fb

Signed-off-by: Justin Cormack justin.cormack@docker.com

cc @riyazdf @stevvooe

little-egret

@justincormack justincormack [WIP] Remove non reproducible parts from image config for stable builds
If you do the same Docker build again and again, you will currently get
a different outcome. Madness!

The reason for this is that the config file inside the image, from which
the hash is calculated, has some fields that are effectively random or
which change over time.

These are

- config.Hostname - build containers have random hostnames
- container_config.Hostname - same
- container - ID of container used to commit, effectively random
- created - timestamp of creation
- history.created - same

This PR leaves the hostname (and domainname) fields blank, and
the container ID.

Timestamps are set to Go zero time. See https://reproducible-builds.org/docs/timestamps/
for rationale. Users who want timestamps should use labels with eg the commit
time or other repeatable time.

TODO: tests, deprecation notices on fields.

With this patch:
```
whale:repro justin$ cat Dockerfile
FROM scratch
ADD . .
whale:repro justin$ docker build -q --no-cache .
sha256:d6f99a7d1f661973ecda1aa34394c0523774bd61382dee348ec8457471b95578
whale:repro justin$ docker build -q --no-cache .
sha256:d6f99a7d1f661973ecda1aa34394c0523774bd61382dee348ec8457471b95578
whale:repro justin$ docker build -q --no-cache .
sha256:d6f99a7d1f661973ecda1aa34394c0523774bd61382dee348ec8457471b95578
```
without:
```
justin@zander:~/tmp/repro$ docker build -q --no-cache .
sha256:47170bd064a699b2838ecc8b279813da1edde5b49a00d8e866863fa1da94c477
justin@zander:~/tmp/repro$ docker build -q --no-cache .
sha256:f9e5f8f8ac1e23f1d1ed45150d4c69f251056840d8745c4606001f6c340b2baa
justin@zander:~/tmp/repro$ docker build -q --no-cache .
sha256:5924b383e19182bc2d6e382041d5acc11985cdea76c18db12fcbbcd867fd66fb
```

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
bf7bdd5
@stevvooe
Contributor
stevvooe commented Feb 1, 2017

@justincormack Interesting!

So, this is classically handled in merge.

While we really do need timestamps for reproducible builds, eliding them will result in some odd UX behavior. We'll have to consider the omission against this.

@justincormack
Member

Test failures are:

  • TestBuildSameDockerfileWithAndWithoutCache test uses the fact that build hash is the same to see if cache used, which no longer works, should be fairly easy to fix
  • lots of date based failures - as creation date is not there any more these fail. More problematic.

The other option to still retain creation date is to make reproducible builds an option, not a default, by having a build (and other create operation) time option to set creation date (eg you could set it from the git timestamp of your code, or unix 0 if you don't care about metadata). The other changes are ok anyway I think, it is just the timestamp. This would work for me, as I am fine opting in to reproducibility, but is not as nice as reproducible by default. However given the number of tests we have with dates, I think this is probably the best option, will take a look at it.

@justincormack
Member

Specifying the timestamp would also allow using the equivalent of --clamp-mtime on the layer tarballs which would help a lot see https://reproducible-builds.org/docs/archives/

@tonistiigi
Contributor

I think having an option is the way to go at least initially. I think both timestamp and boolean flag make sense. We also need to clean up the UI, zero time should not show up or in image inspect and images/history should not show that it was created 292 years ago.

@@ -117,6 +117,35 @@ func merge(userConf, imageConf *containertypes.Config) error {
return nil
}
+// Return the reproducible parts of the config, which currently excludes hostname as random
@tonistiigi
tonistiigi Feb 10, 2017 Contributor

Can't we just mark hostname as omitempty

@sathieu
Contributor
sathieu commented Feb 11, 2017

Why not use SOURCE_DATE_EPOCH as creation time, defaulting to current time ?

@tonistiigi
Contributor

Remembered one other thing that should be checked with this. Local build cache and image deletion logic is based on the parent chain. With this change, it becomes very easy to end up with the same image IDs and that could probably overwrite previous parent IDs or leave them in an inconsistent state. This should happen for example when someone just reorders Dockerfile commands.

@justincormack
Member

@tonistiigi ah interesting, will think about adding some tests for that. It may be ok, but worth checking. Planning to get back to this in a bit when I have some more time...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment