Skip to content

md5sum hashes do not match when you build the same code twice on macos #53

@xynny

Description

@xynny

Just discovered this recently since I was working on build caching, but it seems like the default scala rule here do not build jars with the same hash on two separate runs with the same inputs. This seems to break the caching code in bazel that gets a digest with md5.
This is what I ran on this repo here:

bazel build test/... 
md5 bazel-bin/test/*.jar > hash
bazel clean
bazel build test/... 
md5 bazel-bin/test/*.jar > hash2

When you diff the two files it shows that all the jars have different hashes. even the ijars.

Xins-MacBook-Pro:rules_scala xinlu$ diff hash hash2
1,3c1,3
< MD5 (bazel-bin/test/ExportOnly_deploy.jar) = e513f71cb2024bf2a5c6ac0a0208fb5e
< MD5 (bazel-bin/test/ExportOnly_ijar.jar) = e513f71cb2024bf2a5c6ac0a0208fb5e
< MD5 (bazel-bin/test/Exported_deploy.jar) = 4e1b1b620a002219a2edda47dc9a9cd2

---
> MD5 (bazel-bin/test/ExportOnly_deploy.jar) = 764d68cad512ea85baf75aec623c465e
> MD5 (bazel-bin/test/ExportOnly_ijar.jar) = 764d68cad512ea85baf75aec623c465e
> MD5 (bazel-bin/test/Exported_deploy.jar) = 31ea9c77a3bbf21d2b0ffc6def3247e4
5,6c5,6
< MD5 (bazel-bin/test/HelloLibTest_deploy.jar) = 1faf15bba18d60a4d9fee7937f3d18d6
< MD5 (bazel-bin/test/HelloLib_deploy.jar) = d72085331bdb480df424be08c0b8b5bd

---
> MD5 (bazel-bin/test/HelloLibTest_deploy.jar) = 9822c828fc3a1cbb87dbcd3800ba9e62
> MD5 (bazel-bin/test/HelloLib_deploy.jar) = 398401f2f02554b2d60aa62288cb255f
10,11c10,11
< MD5 (bazel-bin/test/MacroTest_deploy.jar) = 8c4767eead2843d79b850e56c86ea42e
< MD5 (bazel-bin/test/OtherLib_deploy.jar) = 2fc708e492140ed78ff0a489df106dcd

---
> MD5 (bazel-bin/test/MacroTest_deploy.jar) = 8024d3769157936d4777fe21aa88c25e
> MD5 (bazel-bin/test/OtherLib_deploy.jar) = be1812ca6d4eda50fdbb0bf340b088c8
13c13
< MD5 (bazel-bin/test/Runtime_deploy.jar) = fffa4662d3403c7a2daa042154e03f8f

---
> MD5 (bazel-bin/test/Runtime_deploy.jar) = efea7e4838f6bcd8582a0a5d86ca38f8
15,17c15,17
< MD5 (bazel-bin/test/ScalaBinary_deploy.jar) = 75871f3e5ef7addcbbfad04fe813b87a
< MD5 (bazel-bin/test/ScalaLibBinary_deploy.jar) = c46c0b69cdea700f37b40ceeb6b0bdd6
< MD5 (bazel-bin/test/ScalaLibResources_deploy.jar) = 4c0d57f5ad553e91f83f8b8bc866a195

---
> MD5 (bazel-bin/test/ScalaBinary_deploy.jar) = 4d9d183d54111532a7b88067d6e4572d
> MD5 (bazel-bin/test/ScalaLibBinary_deploy.jar) = 79684af04f1f3fa6b581775611b05aac
> MD5 (bazel-bin/test/ScalaLibResources_deploy.jar) = ea4157e3f8f2e59587bddbd26999bd89
19c19
< MD5 (bazel-bin/test/a_deploy.jar) = 08fe25daf5678fe3e1d43bc52c88daf5

---
> MD5 (bazel-bin/test/a_deploy.jar) = 5f071b2165bd91ff1fee97b5409cf282
21,25c21,25
< MD5 (bazel-bin/test/b_deploy.jar) = e513f71cb2024bf2a5c6ac0a0208fb5e
< MD5 (bazel-bin/test/b_ijar.jar) = e513f71cb2024bf2a5c6ac0a0208fb5e
< MD5 (bazel-bin/test/c_deploy.jar) = e513f71cb2024bf2a5c6ac0a0208fb5e
< MD5 (bazel-bin/test/c_ijar.jar) = e513f71cb2024bf2a5c6ac0a0208fb5e
< MD5 (bazel-bin/test/d_deploy.jar) = ecb50527b59a959320496d20809276bc

---
> MD5 (bazel-bin/test/b_deploy.jar) = 764d68cad512ea85baf75aec623c465e
> MD5 (bazel-bin/test/b_ijar.jar) = 764d68cad512ea85baf75aec623c465e
> MD5 (bazel-bin/test/c_deploy.jar) = 764d68cad512ea85baf75aec623c465e
> MD5 (bazel-bin/test/c_ijar.jar) = 764d68cad512ea85baf75aec623c465e
> MD5 (bazel-bin/test/d_deploy.jar) = 9bbe26c787071564c34af4cd418f7b9d
27,28c27,28
< MD5 (bazel-bin/test/jar_export_deploy.jar) = e513f71cb2024bf2a5c6ac0a0208fb5e
< MD5 (bazel-bin/test/jar_export_ijar.jar) = e513f71cb2024bf2a5c6ac0a0208fb5e

---
> MD5 (bazel-bin/test/jar_export_deploy.jar) = 764d68cad512ea85baf75aec623c465e
> MD5 (bazel-bin/test/jar_export_ijar.jar) = 764d68cad512ea85baf75aec623c465e

---

I think this is because the jar command used by the scala.bzl rule changes the timestamp of the Manifest file even if you touch it. Example:


Xins-MacBook-Pro:rules_scala xinlu$ unzip -l bazel-bin/test/ExportOnly_deploy.jar
Archive:  bazel-bin/test/ExportOnly_deploy.jar
  Length     Date   Time    Name
 --------    ----   ----    ----
        0  05-18-16 13:58   META-INF/
      118  05-18-16 13:58   META-INF/MANIFEST.MF
 --------                   -------
      118                   2 files

Basically this means that the touch to set the timestamp to 1980 didn't matter.
In our own internal version I changed the jarring to zip -X -q -FS since -FS keeps timestamps and now the hash is the same between two different builds that have the same inputs because the timestamps match.

Also this seems to be on mac, and not linux. The hashes seem to be okay on linux, but a lot of our devs use macs and it actually breaks their local cache.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions