New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor HoodieTableFileSystemView using FileGroups & FileSlices #201
Conversation
vinothchandar
commented
Jun 20, 2017
- This is a pre-requisite for implementing something like BucketedIndex
- Portions of code need to be reworked (Cleaner, Caching, Index) to work using these abstractions
- Will file follow ons for scoped down work.
- UpdateHandle -> MergeHandle, InsertHandle -> CreateHandle - Also bunch of code cleanup in different places
- Merged all filter* and get* methods - new constructor takes filestatus[] - All existing tests pass - FileGroup is all files that belong to a fileID within a partition - FileSlice is a generation of data and log files, starting at a base commit
- Usage now marks code as clearly using either RO or RT views, for future evolution - Tests on all of FileGroups and FileSlices
return recordRDD.map(record -> { | ||
String bucket = getBucket(record.getRecordKey()); | ||
//HACK(vc) a non-existent commit is provided here. | ||
record.setCurrentLocation(new HoodieRecordLocation("000", bucket)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make commitTime an Optional in HoodieRecordLocation ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to just make the location the fileGroupID going forward.
@@ -234,7 +234,8 @@ private Connection getHBaseConnection() { | |||
|
|||
@Override | |||
public boolean rollbackCommit(String commitTime) { | |||
// TODO (weiy) | |||
// Can't really rollback here. HBase only can let you go from recordKey to fileID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could potentially take snapshots and restore hbase to snapshot before commit started. But this has to be understood well before implementing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but you cant snapshot every commit. its pretty expensive. anyways :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. There are minor things about being consistent in usage of builders and models being immutable, but since this is a big diff and we do not want to rebase this over and over - I am going to merge this in. I will do a pass over this again and do the minor changes once you are done with the bucketed index.
yeah. the mutability/immutablity part - I punted on for now. I will explain more in person. High level, we need to make a call on if its better to list the entire table once before starting and then we can construct a nice immuatable FileSystemView object and achieve all that.. but then there are tradeoffs. but agree on your high level sentiment |