Skip to content

Commit

Permalink
converter.sh: Don't parallelise write_hash (#17742)
Browse files Browse the repository at this point in the history
write_hash appends to scratch_files/hashes.csv without any form of
locking, but is performed in parallel. Additionally there is no
synchronisation to wait for all write_hash invocations to have completed
before reading the CSV. This results in non-deterministic breakage and
missing hashes in the resulting files.

Remove parallelism here to fix the breakage.

Note, just calling wait_for_jobs after the loop reduces the occurrence
of problems, but is insufficient since multiple appends do occasionally
race and hashes may still be missing.
  • Loading branch information
amalon committed Dec 24, 2023
1 parent a6c8903 commit 57cc7dd
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion converter.sh
Original file line number Diff line number Diff line change
Expand Up @@ -296,7 +296,7 @@ function write_hash () {
}

while IFS=, read -r gid predicate path
do write_hash "${predicate}" "${path}" "${gid}" "scratch_files/hashes.csv" &
do write_hash "${predicate}" "${path}" "${gid}" "scratch_files/hashes.csv"
done < scratch_files/paths.csv > /dev/null

jq -cR 'split(",")' scratch_files/hashes.csv | jq -s 'map({(.[0]): [.[1], .[2]]}) | add' > scratch_files/hashmap.json
Expand Down

0 comments on commit 57cc7dd

Please sign in to comment.