Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checkpoint writer #280

Merged
merged 22 commits into from
Jun 11, 2021
Merged

Add checkpoint writer #280

merged 22 commits into from
Jun 11, 2021

Conversation

xianwill
Copy link
Collaborator

@xianwill xianwill commented Jun 2, 2021

Description

This PR adds support for writing checkpoints.

This PR is ready with the exception of the missing Arrow Map feature discussed in this slack thread. I would like to merge as is to get the code in and follow up with a fix afterwards.

I haven't implemented an invocation pattern in this PR. I want to take a look at using S3 event triggers with lambda instead of the design proposed originally.

TBH though, I'd like to leave the invocation to a different PR and merge this one first. If we merge this API, callers may create their own invocation strategy.

For the PR proper, I have a couple of design points I'm especially interested in getting feedback on:

  1. I had a hard time deciding where to hang the methods that actually do the checkpoint. The design proposed initially in Writing Checkpoints #273 adds them to DeltaTable. When it came time to implement, I tucked them under the CheckPointWriter instead, cuz I just had a funny smell in my nose. Easily movable and would love opinions from others.
  2. I ended up creating this type in schema.rs called DeltaLogSchemaFactory that merges the table specific schema into common schema fields . I went back and forth in my mind between readable/maintainable code vs efficient code on how to structure this and err'ed on the side of readable/maintainable. My existing implementation is based on the thought that common delta log schema fields are more readable and maintainable if written using the json macro and deserialized from there. Also - I struggled with mod placement. I've written this code a few different ways already so it will be easy to restructure if you have strong opinions that make the decision easier.

Related Issue(s)

NOTE I also have this dirty pre-draft branch I'm working on to add a lambda executable for checkpoints after this PR.

@xianwill xianwill requested review from houqp and rtyler June 2, 2021 19:46
@xianwill xianwill requested a review from mosyp June 3, 2021 13:29
@xianwill xianwill changed the title Add checkpoint writer (wip) [WIP] Add checkpoint writer Jun 5, 2021
@xianwill xianwill marked this pull request as ready for review June 7, 2021 16:19
@xianwill xianwill changed the title [WIP] Add checkpoint writer Add checkpoint writer Jun 7, 2021
rust/src/delta.rs Outdated Show resolved Hide resolved
rust/src/delta.rs Outdated Show resolved Hide resolved
@xianwill
Copy link
Collaborator Author

xianwill commented Jun 8, 2021

NOTE I also have this dirty pre-draft branch I'm working on to add a lambda executable for checkpoints after this PR.

@houqp
Copy link
Member

houqp commented Jun 8, 2021

sent a PR for reducing clones: xianwill#1

rust/src/delta.rs Outdated Show resolved Hide resolved
rust/src/delta.rs Outdated Show resolved Hide resolved
rust/src/delta.rs Outdated Show resolved Hide resolved
rust/src/delta.rs Outdated Show resolved Hide resolved
rust/src/schema.rs Outdated Show resolved Hide resolved
rust/src/schema.rs Outdated Show resolved Hide resolved
rust/src/schema.rs Outdated Show resolved Hide resolved
rust/src/schema.rs Outdated Show resolved Hide resolved
rust/src/delta.rs Outdated Show resolved Hide resolved
rust/src/delta.rs Outdated Show resolved Hide resolved
rust/' Outdated Show resolved Hide resolved
@xianwill xianwill requested a review from houqp June 10, 2021 15:06
Copy link
Member

@houqp houqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rest looks good to me 👍

rust/src/delta.rs Show resolved Hide resolved
rust/src/checkpoints.rs Outdated Show resolved Hide resolved
rust/src/checkpoints.rs Outdated Show resolved Hide resolved
houqp
houqp previously approved these changes Jun 11, 2021
@xianwill xianwill merged commit 3d16de9 into delta-io:main Jun 11, 2021
@xianwill xianwill deleted the checkpoints branch June 13, 2021 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement delta log checkpointing
3 participants