-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Introduce external storage API for Delta Log based on FoundationDB #867
Comments
Just to clarify, the linked PR doesn't really "store the Delta Log in DynamoDB". DynamoDB is only used to provide mutual exclusion, a feature that S3 is lacking by not having a "put-if-absent API". Mutual exclusion is one of the three properties on which Delta Lake ACID guarantees are predicated. Are you interested in storing the entire DeltaLog in DynamoDB / FoundationDB? If so, are there specific use cases as to why you prefer to do so? Thanks for making this issue and prompting this discussion! |
Hi @scottsand-db ,
Yes, I think this is at least an interesting technical idea worth trying out, and here are some of the potential UCs and benefits of this approach. Potential UCs:
More opportunities: Usage of fast, transactional & distributed KV-like stores with decent key listing performance as a metadata layer opens a door for further optimizations, such as:
Some of these designs are already applied in other systems, for example, Firebolt and Iceberg. |
Following up on this. Having a logstore that writes to FoundationDB won't work, as our LogStore APIs don't encapsulate all file system (log store) interactions, despite the name. e.g. checkpoints don't go through the LogStore. |
There is a very good concept about storing Delta Log in DynamoDB, introduced in this PR.
However, DynamoDB might not be the tool of choice for users who are using other clouds or having an on-premise setup. Distributed KV storages with transaction support, such as FoundationDB could be a nice extension for Delta Log storage.
Another benefit is that using such an external system it becomes possible to introduce multi-table transactions with explicit API, for example (just a concept, indeed the final API shall be revised):
The text was updated successfully, but these errors were encountered: