Apache Iceberg version
1.4.2
Query engine
Spark
Please describe the bug 🐞
Hi.
There are many spark jobs are concurrently writing partition data for one table. During the final metadata commit phase (HadoopTableOperations::commit), if multiple processes concurrently execute rename file operations, the job may be have concurrency security problem, which can cause the overwrite of the metadata.json file and the partition data is writen failed, but the spark jobs is successful.
Relate souce code:
`
private void renameToFinal(FileSystem fs, Path src, Path dst, int nextVersion) {
try {
lockManager.acquire(dst.toString(), src.toString());
if (fs.exists(dst)) {
throw new CommitFailedException("Version %d already exists: %s", nextVersion, dst);
}
if (!fs.rename(src, dst)) {
CommitFailedException cfe =
new CommitFailedException("Failed to commit changes using rename: %s", dst);
RuntimeException re = tryDelete(src);
if (re != null) {
cfe.addSuppressed(re);
}
throw cfe;
}
} catch (IOException e) {
CommitFailedException cfe =
new CommitFailedException(e, "Failed to commit changes using rename: %s", dst);
RuntimeException re = tryDelete(src);
if (re != null) {
cfe.addSuppressed(re);
}
throw cfe;
} finally {
lockManager.release(dst.toString(), src.toString());
}
}
`
May I ask if anyone has encountered a similar issue and how it was resolved,thanks!
Willingness to contribute
Apache Iceberg version
1.4.2
Query engine
Spark
Please describe the bug 🐞
Hi.
There are many spark jobs are concurrently writing partition data for one table. During the final metadata commit phase (HadoopTableOperations::commit), if multiple processes concurrently execute rename file operations, the job may be have concurrency security problem, which can cause the overwrite of the metadata.json file and the partition data is writen failed, but the spark jobs is successful.
Relate souce code:
`
private void renameToFinal(FileSystem fs, Path src, Path dst, int nextVersion) {
try {
lockManager.acquire(dst.toString(), src.toString());
if (fs.exists(dst)) {
throw new CommitFailedException("Version %d already exists: %s", nextVersion, dst);
}
}
`
May I ask if anyone has encountered a similar issue and how it was resolved,thanks!
Willingness to contribute