Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GOBBLIN-200] Add cleanable state store dataset #2097

Closed
wants to merge 5 commits into from

Conversation

jack-moseley
Copy link
Contributor

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):

Create wrappers for DatasetStoreDataset and DatasetStoreDatasetFinder to make them Cleanable

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Created CleanableDatasetStoreDatasetTest

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@jack-moseley
Copy link
Contributor Author

@ibuenros please review

return this.version.equals(otherAsDateTime.version) ? 0 : this.version.compareTo(otherAsDateTime.version);
}

@Override
Copy link
Contributor

@yukuai518 yukuai518 Sep 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this equal() if the behavior is the same as parent class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this triggers findbugs warning, which recommends explicitly overriding even if behaviour is the same.

* A cleanable {@link DatasetStoreDataset}
*/
@Data
public class CleanableDatasetStoreDataset extends ModificationTimeDataset {
Copy link
Contributor

@yukuai518 yukuai518 Sep 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably don't want to assume CleanableDatasetStoreDataset always use time based strategy. . You probably should rename it to ModificationTimeDatasoreDataset if only modification time is used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the ModificationTimeDataset contains many FileSystem specific constructs which doesn't represent all the datasets of DatasetStore type, which could be many other types like mysql DB, FS, or other implementations.

I would recommend to keep your first commit, and create your own interface like CleanableDatasetStore, which is similar to CleanableDataset but excluding the inheritance from FileSystemDataset.

Then your CleanableDatasetStoreDataset can implement CleanableDatasetStore ::clean(), which is similar to MultiVersionCleanableDatasetBase::clean (), but can be simplified that VersionFinder is always DatasetStateStoreVersion type. The DatasetStateStoreVersion is an abstract class which takes DatasetStateStoreEntryManager as a constructor argument and serve as a base class of all the version types related to dataset state store

@@ -39,7 +40,7 @@
/**
* A {@link DatasetsFinder} to find {@link DatasetStoreDataset}s.
*/
public class DatasetStoreDatasetFinder implements DatasetsFinder<DatasetStoreDataset> {
public class DatasetStoreDatasetFinder implements DatasetsFinder<Dataset> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it's better to limit the type to DatasetStoreDataset here. However this will cause some incompatibility if you want to reuse the existing ModificationTimeDataset. That's why I don't like this class to be derived from ModificationTimeDataset. Please check with Issac to see if this type restriction can be removed. Otherwise you can define your own CleanableDatasetStoreDataset like I mentioned earlier which should always be DatasetStoreDataset type.

@Override
protected void cleanImpl(Collection deletableVersions) throws IOException {
for (Object version : deletableVersions) {
((TimestampedDatasetStateStoreVersion) version).getEntry().delete();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic can be generic (apply to all datastore datasets), so we can avoid the downcast type conversion.

* A cleanable {@link DatasetStoreDataset}
*/
@Data
public class CleanableDatasetStoreDataset extends ModificationTimeDataset {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the ModificationTimeDataset contains many FileSystem specific constructs which doesn't represent all the datasets of DatasetStore type, which could be many other types like mysql DB, FS, or other implementations.

I would recommend to keep your first commit, and create your own interface like CleanableDatasetStore, which is similar to CleanableDataset but excluding the inheritance from FileSystemDataset.

Then your CleanableDatasetStoreDataset can implement CleanableDatasetStore ::clean(), which is similar to MultiVersionCleanableDatasetBase::clean (), but can be simplified that VersionFinder is always DatasetStateStoreVersion type. The DatasetStateStoreVersion is an abstract class which takes DatasetStateStoreEntryManager as a constructor argument and serve as a base class of all the version types related to dataset state store

* {@link TimestampedDatasetVersion} that has a {@link DatasetStateStoreEntryManager}
*/
@Getter
public class TimestampedDatasetStateStoreVersion extends TimestampedDatasetVersion {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TimestampedDatasetVersion is FileSystem typed. Consider to have a base class and make TimestampedDatasetStateStoreVersion as a derived class.

@@ -76,7 +77,7 @@ private StateStorePredicate buildPredicate() {
}

@Override
public List<DatasetStoreDataset> findDatasets() throws IOException {
public List<Dataset> findDatasets() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar problem.

@jack-moseley
Copy link
Contributor Author

@yukuai518 refactored based on your comments:

  • Removed CleanableDataset dependency on FileSystem
  • Changed CleanableDatasetStoreDataset to inherit directly from CleanableDataset
  • Changed CleanableDatasetStoreDataset to be an abstract base class where subclasses can choose which VersionFinder/Policy to use
  • Added DatasetStateStoreVersion

Copy link
Contributor

@yukuai518 yukuai518 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


Collections.sort(versions, Collections.reverseOrder());

Collection<T> deletableVersions = this.getVersionSelectionPolicy().listSelectedVersions(versions);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be the other way around? VersionSelectionPolicy selects the versions to keep.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be used as versions to delete as well, and MultiVersionCleanableDatasetBase does it this way.

/**
* {@link DatasetVersion} that has a {@link DatasetStateStoreEntryManager}
*/
public interface DatasetStateStoreVersion extends DatasetVersion {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of this class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allows CleanableDatasetStoreDataset to call getEntry()

@ibuenros
Copy link
Contributor

+1

@asfgit asfgit closed this in 70cbe91 Sep 29, 2017
zxliucmu pushed a commit to zxliucmu/incubator-gobblin that referenced this pull request Nov 16, 2017
Closes apache#2097 from jack-moseley/state_store_cleaner
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants