Skip to content

Deprecate GC_TRASH_IGNORE, add tests to confirm Trash works#3436

Merged
dlmarion merged 4 commits into
apache:2.1from
dlmarion:deprecate-gc-trash-ignore
Jun 1, 2023
Merged

Deprecate GC_TRASH_IGNORE, add tests to confirm Trash works#3436
dlmarion merged 4 commits into
apache:2.1from
dlmarion:deprecate-gc-trash-ignore

Conversation

@dlmarion
Copy link
Copy Markdown
Contributor

This commit:

  1. Deprecated GC_TRASH_IGNORE
  2. Modifies MiniAccumuloCluster so that the user can provide Hadoop properties for the Hadoop configuration that is used by the MiniDFS cluster.
  3. Adds ITs that confirm:
    a. Deleted files do not get put into the Trash in the default configuration (Accumulo enabled / Hadoop disabled)
    because the default value for fs.trash.interval is zero, which means disabled.
    b. Deleted files do get put into the Trash when the default Accumulo configuration is used (GC_TRASH_IGNORE = false) and fs.trash.interval is set to a nonzero value.
    c. Deleted files do not get put into the Trash when GC_TRASH_IGNORE is set to true and fs.trash.interval is set to a nonzero value.
    d. Deleted files do not get put into the Trash when Accumulo and Hadoop are enabled, but a custom Trash policy is used that filters file names.

This commit:
  1. Deprecated GC_TRASH_IGNORE
  2. Modifies MiniAccumuloCluster so that the user can provide
     Hadoop properties for the Hadoop configuration that is used
     by the MiniDFS cluster.
  3. Adds ITs that confirm:
    a. Deleted files do not get put into the Trash in the default
       configuration (Accumulo enabled / Hadoop disabled) because
       the default value for fs.trash.interval is zero, which
       means disabled.
    b. Deleted files do get put into the Trash when the default
       Accumulo configuration is used (GC_TRASH_IGNORE = false)
       and fs.trash.interval is set to a nonzero value.
    c. Deleted files do not get put into the Trash when
       GC_TRASH_IGNORE is set to true and fs.trash.interval is
       set to a nonzero value.
    d. Deleted files do not get put into the Trash when Accumulo
       and Hadoop are enabled, but a custom Trash policy is used
       that filters file names.
@dlmarion
Copy link
Copy Markdown
Contributor Author

Kicked off full IT build

@dlmarion
Copy link
Copy Markdown
Contributor Author

Full IT build successful

Copy link
Copy Markdown
Member

@ctubbsii ctubbsii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes suggested to add comments and expand the log messages with some helpful details. I was in a rush and didn't really proofread them, so they can probably be worded better than my attempt. Mainly, I just think it's helpful to have a brief description, rather than rely on the name alone to try to figure out the differences between the test cases in the separate classes.

It does strike me that most of the ITs are basically just testing Hadoop's own behavior, rather than our own. Once our prop is removed, these tests can be simplified. I'm not sure we need to test Hadoop behavior so comprehensively in Accumulo, but for now it makes sense, because of the testing of the interaction with our config prop.

Comment thread server/gc/src/test/java/org/apache/accumulo/gc/SimpleGarbageCollectorTest.java Outdated
Comment thread server/gc/src/test/java/org/apache/accumulo/gc/SimpleGarbageCollectorTest.java Outdated
Comment thread server/gc/src/test/java/org/apache/accumulo/gc/SimpleGarbageCollectorTest.java Outdated
Comment thread server/gc/src/main/java/org/apache/accumulo/gc/GCRun.java Outdated
Comment thread server/gc/src/main/java/org/apache/accumulo/gc/GCRun.java Outdated
@EdColeman
Copy link
Copy Markdown
Contributor

An argument to test the hadoop functionality - at least to some extent is to validate the the function continues to work as expected in future releases - if the "hadoop" functionality changed (or broke) we may catch that in out automated test runs and then we would be in a position to take some action - maybe we need to do something different? Submit a patch to hadoop? Or at a minimum, warn users that the function may not work as expected after x.y.z.

I'm not sure how extensively we are checking the behavior here, but some validation might be valuable.

@dlmarion dlmarion merged commit d4a5559 into apache:2.1 Jun 1, 2023
@dlmarion dlmarion deleted the deprecate-gc-trash-ignore branch June 1, 2023 11:22
@ctubbsii
Copy link
Copy Markdown
Member

ctubbsii commented Jun 7, 2023

An argument to test the hadoop functionality - at least to some extent is to validate the the function continues to work as expected in future releases - if the "hadoop" functionality changed (or broke) we may catch that in out automated test runs and then we would be in a position to take some action - maybe we need to do something different? Submit a patch to hadoop? Or at a minimum, warn users that the function may not work as expected after x.y.z.

I'm not sure how extensively we are checking the behavior here, but some validation might be valuable.

My argument against, as a general rule, is that regression testing for Hadoop features should be contributed to Hadoop. We cannot be the landing zone for testing all the Hadoop features we think Hadoop is not testing adequately. I suggest contributing those tests to Hadoop to make Hadoop better, and watching its build statuses/change history for the features we care about, and try to focus our tests on testing our code.

I'm definitely not arguing against testing... just advocating for contributing them to the proper venue. We shouldn't treat Accumulo as a personal playground for testing other libraries we care about. We should play more nicely with those other open source projects, and contribute tests to them, if we think they need that testing.

That said, everything is on a case-by-case basis, and I'm not sure exactly what, if anything, should change here once the property is dropped, with respect to this testing, until I look at it, which I will do shortly. But I think it's important to stake out a general principle, so we don't suffer from scope creep.

EDIT: after looking at this specific case, I kept all the tests (except the one that checked that we bypassed the trash, since we can't bypass the trash after the property is dropped). The reason I kept them in this specific case is that I think they are also integration tests to ensure we're calling moveToTrash in the first place. These ITs could be replaced with a mocked UT, though, just to make sure we're calling fs.moveToTrash, and we can defer to Hadoop to make sure that fs.moveToTrash actually works. I will pass on making these mocked UTs for now, though.

ctubbsii added a commit to ctubbsii/accumulo that referenced this pull request Jun 7, 2023
* Remove references to deprecated compaction strategies that don't exist
* Relocate unfortunately named `ConfigurableCompactionStrategy`, which
  existed solely to support the shell for user-initiated compactions,
  into two separate classes in accumulo-core.jar, as
  `ShellCompactCommandConfigurer` and `ShellCompactCommandSelector`
  1. Split into two classes to make it clear how they map to the SPI
     interfaces and to avoid confusing overloaded init methods with
     two different InitParameters classes that have the same simple name
  2. Put in accumulo-core.jar so it's on the classpath for servers
     that do compaction and the shell, so we can get the class name
     without using a String literal, which is fragile
  3. Avoid use of the "CompactionStrategy" naming, because that is the
     name of the deprecated, and now removed, interface
  4. Update corresponding tests
* Remove check for deprecated accumulo-site.xml, which was replaced in
  2.1. 3.x requires upgrade from at least 2.1, so that either will have
  already been updated in 2.1, or it will never have existed in 3.x.
  Also remove related config converter utilities.
* Remove randomizeVolumes from the admin command, which was previously
  made a noop and deprecated
* Remove unused accumulo-site.xml test resource
* Remove deprecated FateCommand from the shell
* Remove noop debug option from shell command-line that did nothing
  because its functionality has been replaced by the ability to
  dynamically configure log4j2 logging in 2.x
* Remove deprecated gc.trash.ignore (`GC_TRASH_IGNORE`) property, as
  follow-on to apache#3436; this includes removing a test that's no longer
  possible to succeed, since it's no longer possible to configure
  Accumulo to ignore Hadoop's trash settings; some comments were cleaned
  up in the remaining tests
ctubbsii added a commit that referenced this pull request Jun 7, 2023
* Remove references to deprecated compaction strategies that don't exist
* Relocate unfortunately named `ConfigurableCompactionStrategy`, which
  existed solely to support the shell for user-initiated compactions,
  into two separate classes in accumulo-core.jar, as
  `ShellCompactCommandConfigurer` and `ShellCompactCommandSelector`
  1. Split into two classes to make it clear how they map to the SPI
     interfaces and to avoid confusing overloaded init methods with
     two different InitParameters classes that have the same simple name
  2. Put in accumulo-core.jar so it's on the classpath for servers
     that do compaction and the shell, so we can get the class name
     without using a String literal, which is fragile
  3. Avoid use of the "CompactionStrategy" naming, because that is the
     name of the deprecated, and now removed, interface
  4. Update corresponding tests
* Remove check for deprecated accumulo-site.xml, which was replaced in
  2.1. 3.x requires upgrade from at least 2.1, so that either will have
  already been updated in 2.1, or it will never have existed in 3.x.
  Also remove related config converter utilities.
* Remove randomizeVolumes from the admin command, which was previously
  made a noop and deprecated
* Remove unused accumulo-site.xml test resource
* Remove deprecated FateCommand from the shell
* Remove noop debug option from shell command-line that did nothing
  because its functionality has been replaced by the ability to
  dynamically configure log4j2 logging in 2.x
* Remove deprecated gc.trash.ignore (`GC_TRASH_IGNORE`) property, as
  follow-on to #3436; this includes removing a test that's no longer
  possible to succeed, since it's no longer possible to configure
  Accumulo to ignore Hadoop's trash settings; some comments were cleaned
  up in the remaining tests
@ctubbsii ctubbsii added this to the 2.1.1 milestone Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants