Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add documentation how to configure delta.logRetentionDuration #2072

Closed
djouallah opened this issue Jan 13, 2024 · 8 comments · Fixed by #2250
Closed

add documentation how to configure delta.logRetentionDuration #2072

djouallah opened this issue Jan 13, 2024 · 8 comments · Fixed by #2250
Labels
enhancement New feature or request

Comments

@djouallah
Copy link

Description

I am trying to create a delta table like this with a log limited to 1 day

from deltalake import DeltaTable
import pyarrow as pa
dt = DeltaTable.create(
    table_uri='s3://aemo/scada2',
     schema = pa.schema([
                      pa.field('SETTLEMENTDATE', pa.timestamp('us')),
                      pa.field('DUID', pa.string()),
                      pa.field('SCADAVALUE', pa.float64()),
                      pa.field('Date', pa.date32()),
                      pa.field('week', pa.string()),
                      pa.field('file', pa.string())
                      ]) ,
mode ='error',
partition_by="week",
configuration = {"delta.logRetentionDuration": "1 days"} ,
storage_options=storage_options
)

when i run dt.cleanup_metadata() it seems it still using 30 days ?

@djouallah djouallah added the enhancement New feature or request label Jan 13, 2024
@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Jan 13, 2024

@djouallah does the table contain checkpoints? Otherwise it doesn't remove any logs since that could corrupt the table

@djouallah
Copy link
Author

@ion-elgreco it does indeed is this Delta Rust specific ?

@ion-elgreco
Copy link
Collaborator

@djouallah if you're using 0.15.1 it does the correct behavior of only removing up to a checkpoint based on the logRetetentionDuration. Before 0.15.1 it would actually remove based on the logRetentionDuration only which could invalidate a table state.

@djouallah
Copy link
Author

@ion-elgreco I am using 0.15.1 and it is not removing anything, is the format I used correct ?

@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Jan 13, 2024

Ah format should be interval <amount> <unit>, so try interval 1 day .

At the same time can you try interval 1 days, I think at the moment we don't parse the plural version so this might not work..

@djouallah
Copy link
Author

sorry for being pedantic, but "1 days" is what spark uses, for compatibility reasons, isn't delta rust follow the same approach, is the format in delta protocol ?

@ion-elgreco
Copy link
Collaborator

@djouallah these things are not part of the protocol. I am aware that the plural version is what spark only supports, we can add that in soon, it's trivial to add

@djouallah
Copy link
Author

@ion-elgreco i appreciate you are doing free work, I am happy with whatever you pick, I was just curious :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants