-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GOBBLIN-200] Add cleanable state store dataset #2097
Conversation
@ibuenros please review |
return this.version.equals(otherAsDateTime.version) ? 0 : this.version.compareTo(otherAsDateTime.version); | ||
} | ||
|
||
@Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this equal() if the behavior is the same as parent class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this triggers findbugs warning, which recommends explicitly overriding even if behaviour is the same.
* A cleanable {@link DatasetStoreDataset} | ||
*/ | ||
@Data | ||
public class CleanableDatasetStoreDataset extends ModificationTimeDataset { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably don't want to assume CleanableDatasetStoreDataset always use time based strategy. . You probably should rename it to ModificationTimeDatasoreDataset if only modification time is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the ModificationTimeDataset contains many FileSystem specific constructs which doesn't represent all the datasets of DatasetStore type, which could be many other types like mysql DB, FS, or other implementations.
I would recommend to keep your first commit, and create your own interface like CleanableDatasetStore, which is similar to CleanableDataset but excluding the inheritance from FileSystemDataset.
Then your CleanableDatasetStoreDataset can implement CleanableDatasetStore ::clean(), which is similar to MultiVersionCleanableDatasetBase::clean (), but can be simplified that VersionFinder is always DatasetStateStoreVersion type. The DatasetStateStoreVersion is an abstract class which takes DatasetStateStoreEntryManager as a constructor argument and serve as a base class of all the version types related to dataset state store
@@ -39,7 +40,7 @@ | |||
/** | |||
* A {@link DatasetsFinder} to find {@link DatasetStoreDataset}s. | |||
*/ | |||
public class DatasetStoreDatasetFinder implements DatasetsFinder<DatasetStoreDataset> { | |||
public class DatasetStoreDatasetFinder implements DatasetsFinder<Dataset> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it's better to limit the type to DatasetStoreDataset here. However this will cause some incompatibility if you want to reuse the existing ModificationTimeDataset. That's why I don't like this class to be derived from ModificationTimeDataset. Please check with Issac to see if this type restriction can be removed. Otherwise you can define your own CleanableDatasetStoreDataset like I mentioned earlier which should always be DatasetStoreDataset type.
@Override | ||
protected void cleanImpl(Collection deletableVersions) throws IOException { | ||
for (Object version : deletableVersions) { | ||
((TimestampedDatasetStateStoreVersion) version).getEntry().delete(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this logic can be generic (apply to all datastore datasets), so we can avoid the downcast type conversion.
* A cleanable {@link DatasetStoreDataset} | ||
*/ | ||
@Data | ||
public class CleanableDatasetStoreDataset extends ModificationTimeDataset { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the ModificationTimeDataset contains many FileSystem specific constructs which doesn't represent all the datasets of DatasetStore type, which could be many other types like mysql DB, FS, or other implementations.
I would recommend to keep your first commit, and create your own interface like CleanableDatasetStore, which is similar to CleanableDataset but excluding the inheritance from FileSystemDataset.
Then your CleanableDatasetStoreDataset can implement CleanableDatasetStore ::clean(), which is similar to MultiVersionCleanableDatasetBase::clean (), but can be simplified that VersionFinder is always DatasetStateStoreVersion type. The DatasetStateStoreVersion is an abstract class which takes DatasetStateStoreEntryManager as a constructor argument and serve as a base class of all the version types related to dataset state store
* {@link TimestampedDatasetVersion} that has a {@link DatasetStateStoreEntryManager} | ||
*/ | ||
@Getter | ||
public class TimestampedDatasetStateStoreVersion extends TimestampedDatasetVersion { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TimestampedDatasetVersion is FileSystem typed. Consider to have a base class and make TimestampedDatasetStateStoreVersion as a derived class.
@@ -76,7 +77,7 @@ private StateStorePredicate buildPredicate() { | |||
} | |||
|
|||
@Override | |||
public List<DatasetStoreDataset> findDatasets() throws IOException { | |||
public List<Dataset> findDatasets() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar problem.
@yukuai518 refactored based on your comments:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
Collections.sort(versions, Collections.reverseOrder()); | ||
|
||
Collection<T> deletableVersions = this.getVersionSelectionPolicy().listSelectedVersions(versions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be the other way around? VersionSelectionPolicy
selects the versions to keep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be used as versions to delete as well, and MultiVersionCleanableDatasetBase
does it this way.
/** | ||
* {@link DatasetVersion} that has a {@link DatasetStateStoreEntryManager} | ||
*/ | ||
public interface DatasetStateStoreVersion extends DatasetVersion { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allows CleanableDatasetStoreDataset
to call getEntry()
+1 |
Closes apache#2097 from jack-moseley/state_store_cleaner
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
Create wrappers for
DatasetStoreDataset
andDatasetStoreDatasetFinder
to make them CleanableTests
Created
CleanableDatasetStoreDatasetTest
Commits