Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul hoodie-commons and re-implement HoodieTableMetadata #22

Closed
prazanna opened this issue Jan 4, 2017 · 1 comment
Closed

Overhaul hoodie-commons and re-implement HoodieTableMetadata #22

prazanna opened this issue Jan 4, 2017 · 1 comment
Assignees

Comments

@prazanna
Copy link
Contributor

prazanna commented Jan 4, 2017

With implementation of merge-on-read underway

As a pre-requisite, This will be good time to overhaul hoodie-commons and work on right abstractions.
Also introduce java 8 features into code to make it more succinct

@prazanna prazanna added this to the 0.2.5 milestone Jan 4, 2017
@prazanna prazanna self-assigned this Jan 4, 2017
@prazanna
Copy link
Contributor Author

prazanna commented Jan 9, 2017

The following is the gist of changes done

  1. All low-level operation of creating a commit code was in HoodieClient which made it hard to share code if there was a compaction commit.
  2. HoodieTableMetadata contained a mix of metadata and filtering files. (Also few operations required FileSystem to be passed in because those were called from TaskExecutors and others had FileSystem as a global variable). Since merge-on-read requires a lot of that code, but will have to change slightly on how it operates on the metadata and how it filters the files. The two set of operation are split into HoodieTableMetaClient and TableFileSystemView.
  3. Everything (active commits, archived commits, cleaner log, save point log and in future delta and compaction commits) in HoodieTableMetaClient is a HoodieTimeline. Timeline is a series of instants, which has an in-built concept of inflight and completed commit markers.
  4. A timeline can be queries for ranges, contains and also use to create new datapoint (create a new commit etc). Commit (and all the above metadata) creation/deletion is streamlined in a timeline
  5. Multiple timelines can be merged into a single timeline, giving us an audit timeline to whatever happened in a hoodie dataset. This also helps with Implement a way to see all audit events with details on a hoodie dataset #55.
  6. Move to java 8 and introduce java 8 succinct syntax in refactored code

@prazanna prazanna removed this from the 0.2.5 milestone Jan 11, 2017
@prazanna prazanna closed this as completed Apr 2, 2017
vinishjail97 pushed a commit to vinishjail97/hudi that referenced this issue Dec 15, 2023
…03980dd0e7a011fd0a04748e6e44

Fixing determining target table schema for delta sync with empty batch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant