You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I run the generate method it creates/updates manifest files for all the partitions in the table . Currently it works fine and seems fast enough since the number of partitions in the table is less . But the number of partitions is expected to grow to thousands .
The text was updated successfully, but these errors were encountered:
Next release 0.7.0 (hopefully this week) will add support for an incremental manifest generation. You will be able to set a table property and all commits made to the table will automatically update the manifests for only those partitions that the commit had written to.
Is there way to generate manifest files for only the partitions that were updated/created in the current transaction .
df = spark.read.csv(s3://temp/2020-01-01.csv)
delta_table = DeltaTable.forPath(spark, delta_table_path)
delta_table.alias("source").merge(df.alias("new_data"), "source.id =
new_data.id").whenNotMatchedInsertAll().execute()
partition = " date=2020-01-01 "
spark.read.format("delta").load(delta_table_path).where(
expr(partition)).repartition( 4).write.option("dataChange",
"False").format(
"delta").mode("overwrite").option('replaceWhere',partition).save(delta_table_path)
delta_table.generate("symlink_format_manifest")
When I run the generate method it creates/updates manifest files for all the partitions in the table . Currently it works fine and seems fast enough since the number of partitions in the table is less . But the number of partitions is expected to grow to thousands .
The text was updated successfully, but these errors were encountered: