-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICU data file builds are non-deterministic #55637
Comments
Tagging subscribers to this area: @directhex Issue DetailsSee Rolf's comment on https://github.com/xamarin/xamarin-macios/pull/11983/files#r657759425
|
/cc @steveisok |
For reference here's a table with some of the sizes: dotnet/designs#225 (comment). They differ even for the same architecture between two platforms. One of my theories is that it's related to the zlib version / default options on the build machine. |
https://linux.die.net/man/1/pkgdata Emphasis mine. icudt.dat is arch specific according to the documentation. |
So, are you suggesting "by design" then? |
Would that explain why the size differs between arm64 on Android and iOS? (ie. essentially same arch) I would also expect x64 and arm64 to be within margin of error, that was not the case. |
The ICU docs say it's by design. |
@filipnavara I'd need to examine pkgdata.cpp more closely to be able to answer that |
The iOS one is about 200 Kb (13%) bigger than the Android one. That may be worth another look even if it turns out to be by design. |
Hm. From more docs:
|
The problem I have with "it says it's by design but that's likely wrong in the docs and I'm sure we can ignore them" is it may well be true today, but it might not be tomorrow. We're on ICU 67, maybe 68 is the time when it goes from "big endian vs little endian" to "big endian vs little endian, also 32 bit vs 64 bit" or "big endian vs little endian, also 32 bit vs 64 bit, also ILP32 vs LLP64 vs LP64" |
There are two separate practical issues:
|
Local build today:
I'll try to run Android builds on the same machine tomorrow to check whether there is any difference. Can someone recheck the current transport packages whether the problem still exists? |
Android arm64 on the same machine is also identical:
Looks like the local vanilla builds do the correct thing. The pipelines for all iOS variants run on the same machine so that should rule out some compiler configuration differences (for arm64 vs arm). Android and iOS use different build machines though (Linux vs macOS). |
I checked some somewhat recent transport package and it's still broken:
Really not sure what is going on and what is the difference between my local builds and those from the CI. For reference, the command line I used for the builds (taken from the YAML):
Build machine: macOS 12.0 Beta (21A5284e), Xcode 12.5, Android NDK 22.1 |
New theory: Some of the CI builds are done sequentially in a tree that is not cleaned in-between the builds. Maybe there are some artifacts from previous build incorrectly packaged into later builds? For example, doing the sequence of the first two builds of iOS Simulator (x64, followed by x86) already produces wrong results:
|
Slowly making some progress:
Building the architectures in opposite order results in identical output data. Retrying it few more times seems to produce non-deterministic results. |
This is a good theory. Building ICU leaves one dirty file in the source tree (ie outside artifacts/), that could be the issue
…Sent from my iPhone
On Jul 29, 2021, at 4:13 AM, Filip Navara ***@***.***> wrote:
New theory: Some of the CI builds are done sequentially in a tree that is not cleaned in-between the builds. Maybe there are some artifacts from previous build incorrectly packaged into later builds?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I also see the dirty file in the source tree (curr/root.txt). Additionally I was able to reproduce one more botched build and found out few more details: There's a tool to extract the icudt.dat file ( If nothing else this seems to be a reliable way to detect the corruption without having a reference data files for comparison. |
The single modified file seems to be a red herring. It's done by the MSBuild script early on:
|
New theory based on my latest experiments. The .res files use a feature called "pool bundle" where one invocation of the In the instances where I managed to capture two successive runs with different .res files generated the difference in the binary files were the references into the "pool bundle". You can see that on the very first screenshot above where one files has the strings in the hex dump while the other one does not (because they are references into the bundle). The Makefile rules will regenerate the pool bundle in some cases as I've accidentally witnessed in some of the experiments where I deleted single .res file and tried to rebuild it. That leads me to believe that there could be a race condition with parallelization of the make process where two (or more) rules decide to rebuild the pool file which does not exist yet and the second invocation overwrites it with partial data while the first .res files are still generated. To answer one more question: Why was I getting the |
And this is why computers should be illegal |
If it's of any help I have saved artifacts of some of the builds that ended up with different binaries. The .res files definitely differ on whether they used the pool.res resources or not. |
Reducing the |
It was promising, but it was also a coincidence I think. |
Hm. I have a "bad" .res file which derb won't touch, even with a -i |
@directhex Good job, can you share it somewhere (along with the other .res files, or at least pool.res)? I learned to read the files in hex editor so I would take a look at some point... |
Hm. No, never mind, it does have valid includes, it's just it seems to not accept a fully qualified path to the .res file (i.e. BUT The lone file with the includes is why the .dat files differ. If I pull in am_ET.res from another arch, i get the same checksum when regenerating the .dat |
I need to look at the source of the .res creation tool, and see how it decides whether or not to use includes |
You don't even need to pull it from another arch. Usually rebuilding the single .res files produces the smaller one. Note that the problem happened for me on various different .res files inconsistently. There could be some pattern but I didn't see it at first. |
As per dotnet/runtime#55637 we have a deterministic builds problem with ICU, where any given build is likely to produce different icudt*.dat files when they ought to be the same (and builds across different host OSes are likely to yield different results) We can help reduce app sizes if we can de-duplicate these files, so commit and use "known good" files to git and use those for preference in packaging. It's gross, but it'll do for now until we get to the bottom of the determinism problem.
The ICU build is a mess, and that mess is largely upstream.
We should have the same data files for all mobile targets, and we can't even get them the same building the same target twice in a row. So the best we can do is build a whole bunch of times, check in the most consistent files (i.e. those with the highest chance of counting as "correct"), and refresh the checked in files when we make changes to the repo |
See Rolf's comment on https://github.com/xamarin/xamarin-macios/pull/11983/files#r657759425
This is unintentional, we're using the same filter so they should be identical.
The text was updated successfully, but these errors were encountered: