Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Incremental Manifest File Generation #453

Closed
jainpriyansh786 opened this issue Jun 14, 2020 · 3 comments
Closed

Support for Incremental Manifest File Generation #453

jainpriyansh786 opened this issue Jun 14, 2020 · 3 comments
Milestone

Comments

@jainpriyansh786
Copy link

Is there way to generate manifest files for only the partitions that were updated/created in the current transaction .

df = spark.read.csv(s3://temp/2020-01-01.csv)

delta_table = DeltaTable.forPath(spark, delta_table_path)

delta_table.alias("source").merge(df.alias("new_data"), "source.id = new_data.id").whenNotMatchedInsertAll().execute()

partition = " date=2020-01-01 "
spark.read.format("delta").load(delta_table_path).where(
expr(partition)).repartition( 4).write.option("dataChange",
"False").format(
"delta").mode("overwrite").option('replaceWhere',partition).save(delta_table_path)

delta_table.generate("symlink_format_manifest")

When I run the generate method it creates/updates manifest files for all the partitions in the table . Currently it works fine and seems fast enough since the number of partitions in the table is less . But the number of partitions is expected to grow to thousands .

@brkyvz
Copy link
Collaborator

brkyvz commented Jun 15, 2020

@tdas any recommendations here?

@tdas
Copy link
Contributor

tdas commented Jun 15, 2020

Next release 0.7.0 (hopefully this week) will add support for an incremental manifest generation. You will be able to set a table property and all commits made to the table will automatically update the manifests for only those partitions that the commit had written to.

@tdas tdas added this to the 0.7.0 milestone Jun 15, 2020
@jainpriyansh786
Copy link
Author

Thanks for the update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants