You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since the following PR: #6335FastAppend and subclasses of MergingSnapshotProducer will skip newly added data files during manual commit retries (calling commit again inside a try-catch in user code).
This happens because the cached value is set to an empty list instead of null during cleanUncommittedAppends and then during retry when newDataFilesAsManifests is called the logic is skipped and no data files are returned.
The result can be partially applied changes if the user does manual retries of commits. For example, the following code will produce a rewrite that applies the deletes but does not add the new file:
importjava.utilimportjava.util.UUIDimportscala.collection.JavaConverters._importorg.apache.iceberg.aws.glue.GlueCatalogimportorg.apache.iceberg.catalog._importorg.apache.iceberg.data.GenericRecordimportorg.apache.iceberg.data.parquet.GenericParquetWriterimportorg.apache.iceberg.parquet.Parquetimportorg.apache.iceberg.types.Typesimportorg.apache.iceberg.{DataFile, PartitionSpec, Schema, Table, data}
objectTestRewriteCommits {
defmain(args: Array[String]):Unit= {
valcatalog=newGlueCatalog()
catalog.initialize("iceberg", Map.empty[String, String].asJava)
valschema=newSchema(
Types.NestedField.required(1, "id", Types.StringType.get()),
);
valtableName="temp4"valtableId=TableIdentifier.of("prod_iceberg", tableName)
valbasePath=s"s3://s3-bucket-path/ice/tables/${tableName}/"valtableProperties: util.Map[String, String] =Map(
"format-version"->"2",
"commit.retry.num-retries"->"0"//turn off retries for more control during testing process
).asJava
if (!catalog.tableExists(tableId)) {
catalog.createTable(tableId, schema, PartitionSpec.unpartitioned(), basePath, tableProperties)
}
valtable= catalog.loadTable(tableId)
valaddedFiles= (1 to 2).map(i => {
valfile:DataFile= writeFile(basePath, table)
valappend= table.newAppend()
append.appendFile(file)
append.commit()
file
})
valtransaction= table.newTransaction()
valrewrite= transaction.newRewrite()
addedFiles.foreach(rewrite.deleteFile)
rewrite.addFile(writeFile(basePath, table))
rewrite.commit()
try {
// Make sure this commit fails (I failed it by breaking at glue.updateTable(updateTableRequest.build()); and changing the table from athena.
transaction.commitTransaction()
} catch {
casee: Throwable=>// This retry will run successfully but the result will not contain the data file added during the rewrite.
transaction.commitTransaction()
}
}
privatedefwriteFile(basePath: String, table: Table) = {
valwriter=Parquet.writeData(
table.io().newOutputFile(basePath +UUID.randomUUID().toString +".parquet"))
.forTable(table)
.overwrite(true)
.createWriterFunc(GenericParquetWriter.buildWriter)
.build[data.Record]()
writer.write(Iterable(GenericRecord.create(table.schema()).copy("id", "1")).asJava)
writer.close()
valfile= writer.toDataFile
file
}
}
I think fixing this can be done by either setting the cached value to null like it was before or by forbidding calling commit more than once.
The text was updated successfully, but these errors were encountered:
This reverts commit bb66918.
Iceberg 1.4.x contains a silent correctness issue when concurrently
committing writes to a table.
See: apache/iceberg#9227
This reverts commit bb66918.
Iceberg 1.4.x contains a silent correctness issue when concurrently
committing writes to a table.
See: apache/iceberg#9227
This reverts commit bb66918.
Iceberg 1.4.x contains a silent correctness issue when concurrently
committing writes to a table.
See: apache/iceberg#9227
This reverts commit bb66918.
Iceberg 1.4.x contains a silent correctness issue when concurrently
committing writes to a table.
See: apache/iceberg#9227
Apache Iceberg version
1.4.2 (latest release)
Query engine
Athena
Please describe the bug 馃悶
Since the following PR: #6335
FastAppend
and subclasses ofMergingSnapshotProducer
will skip newly added data files during manual commit retries (calling commit again inside a try-catch in user code).This happens because the cached value is set to an empty list instead of null during
cleanUncommittedAppends
and then during retry whennewDataFilesAsManifests
is called the logic is skipped and no data files are returned.The result can be partially applied changes if the user does manual retries of commits. For example, the following code will produce a rewrite that applies the deletes but does not add the new file:
I think fixing this can be done by either setting the cached value to null like it was before or by forbidding calling
commit
more than once.The text was updated successfully, but these errors were encountered: