converter.sh: Don't parallelise write_hash (#17742)

write_hash appends to scratch_files/hashes.csv without any form of locking, but is performed in parallel. Additionally there is no synchronisation to wait for all write_hash invocations to have completed before reading the CSV. This results in non-deterministic breakage and missing hashes in the resulting files. Remove parallelism here to fix the breakage. Note, just calling wait_for_jobs after the loop reduces the occurrence of problems, but is insufficient since multiple appends do occasionally race and hashes may still be missing.
Kas-tle · Dec 24, 2023 · 57cc7dd · 57cc7dd
1 parent a6c8903
commit 57cc7dd
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/converter.sh b/converter.sh
@@ -296,7 +296,7 @@ function write_hash () {
 }
 
 while IFS=, read -r gid predicate path
-    do write_hash "${predicate}" "${path}" "${gid}" "scratch_files/hashes.csv" &
+    do write_hash "${predicate}" "${path}" "${gid}" "scratch_files/hashes.csv"
 done < scratch_files/paths.csv > /dev/null
 
 jq -cR 'split(",")' scratch_files/hashes.csv | jq -s 'map({(.[0]): [.[1], .[2]]}) | add' > scratch_files/hashmap.json