Deprecate GC_TRASH_IGNORE, add tests to confirm Trash works#3436
Conversation
This commit:
1. Deprecated GC_TRASH_IGNORE
2. Modifies MiniAccumuloCluster so that the user can provide
Hadoop properties for the Hadoop configuration that is used
by the MiniDFS cluster.
3. Adds ITs that confirm:
a. Deleted files do not get put into the Trash in the default
configuration (Accumulo enabled / Hadoop disabled) because
the default value for fs.trash.interval is zero, which
means disabled.
b. Deleted files do get put into the Trash when the default
Accumulo configuration is used (GC_TRASH_IGNORE = false)
and fs.trash.interval is set to a nonzero value.
c. Deleted files do not get put into the Trash when
GC_TRASH_IGNORE is set to true and fs.trash.interval is
set to a nonzero value.
d. Deleted files do not get put into the Trash when Accumulo
and Hadoop are enabled, but a custom Trash policy is used
that filters file names.
|
Kicked off full IT build |
|
Full IT build successful |
ctubbsii
left a comment
There was a problem hiding this comment.
Minor changes suggested to add comments and expand the log messages with some helpful details. I was in a rush and didn't really proofread them, so they can probably be worded better than my attempt. Mainly, I just think it's helpful to have a brief description, rather than rely on the name alone to try to figure out the differences between the test cases in the separate classes.
It does strike me that most of the ITs are basically just testing Hadoop's own behavior, rather than our own. Once our prop is removed, these tests can be simplified. I'm not sure we need to test Hadoop behavior so comprehensively in Accumulo, but for now it makes sense, because of the testing of the interaction with our config prop.
|
An argument to test the hadoop functionality - at least to some extent is to validate the the function continues to work as expected in future releases - if the "hadoop" functionality changed (or broke) we may catch that in out automated test runs and then we would be in a position to take some action - maybe we need to do something different? Submit a patch to hadoop? Or at a minimum, warn users that the function may not work as expected after x.y.z. I'm not sure how extensively we are checking the behavior here, but some validation might be valuable. |
My argument against, as a general rule, is that regression testing for Hadoop features should be contributed to Hadoop. We cannot be the landing zone for testing all the Hadoop features we think Hadoop is not testing adequately. I suggest contributing those tests to Hadoop to make Hadoop better, and watching its build statuses/change history for the features we care about, and try to focus our tests on testing our code. I'm definitely not arguing against testing... just advocating for contributing them to the proper venue. We shouldn't treat Accumulo as a personal playground for testing other libraries we care about. We should play more nicely with those other open source projects, and contribute tests to them, if we think they need that testing. That said, everything is on a case-by-case basis, and I'm not sure exactly what, if anything, should change here once the property is dropped, with respect to this testing, until I look at it, which I will do shortly. But I think it's important to stake out a general principle, so we don't suffer from scope creep. EDIT: after looking at this specific case, I kept all the tests (except the one that checked that we bypassed the trash, since we can't bypass the trash after the property is dropped). The reason I kept them in this specific case is that I think they are also integration tests to ensure we're calling moveToTrash in the first place. These ITs could be replaced with a mocked UT, though, just to make sure we're calling fs.moveToTrash, and we can defer to Hadoop to make sure that fs.moveToTrash actually works. I will pass on making these mocked UTs for now, though. |
* Remove references to deprecated compaction strategies that don't exist
* Relocate unfortunately named `ConfigurableCompactionStrategy`, which
existed solely to support the shell for user-initiated compactions,
into two separate classes in accumulo-core.jar, as
`ShellCompactCommandConfigurer` and `ShellCompactCommandSelector`
1. Split into two classes to make it clear how they map to the SPI
interfaces and to avoid confusing overloaded init methods with
two different InitParameters classes that have the same simple name
2. Put in accumulo-core.jar so it's on the classpath for servers
that do compaction and the shell, so we can get the class name
without using a String literal, which is fragile
3. Avoid use of the "CompactionStrategy" naming, because that is the
name of the deprecated, and now removed, interface
4. Update corresponding tests
* Remove check for deprecated accumulo-site.xml, which was replaced in
2.1. 3.x requires upgrade from at least 2.1, so that either will have
already been updated in 2.1, or it will never have existed in 3.x.
Also remove related config converter utilities.
* Remove randomizeVolumes from the admin command, which was previously
made a noop and deprecated
* Remove unused accumulo-site.xml test resource
* Remove deprecated FateCommand from the shell
* Remove noop debug option from shell command-line that did nothing
because its functionality has been replaced by the ability to
dynamically configure log4j2 logging in 2.x
* Remove deprecated gc.trash.ignore (`GC_TRASH_IGNORE`) property, as
follow-on to apache#3436; this includes removing a test that's no longer
possible to succeed, since it's no longer possible to configure
Accumulo to ignore Hadoop's trash settings; some comments were cleaned
up in the remaining tests
* Remove references to deprecated compaction strategies that don't exist
* Relocate unfortunately named `ConfigurableCompactionStrategy`, which
existed solely to support the shell for user-initiated compactions,
into two separate classes in accumulo-core.jar, as
`ShellCompactCommandConfigurer` and `ShellCompactCommandSelector`
1. Split into two classes to make it clear how they map to the SPI
interfaces and to avoid confusing overloaded init methods with
two different InitParameters classes that have the same simple name
2. Put in accumulo-core.jar so it's on the classpath for servers
that do compaction and the shell, so we can get the class name
without using a String literal, which is fragile
3. Avoid use of the "CompactionStrategy" naming, because that is the
name of the deprecated, and now removed, interface
4. Update corresponding tests
* Remove check for deprecated accumulo-site.xml, which was replaced in
2.1. 3.x requires upgrade from at least 2.1, so that either will have
already been updated in 2.1, or it will never have existed in 3.x.
Also remove related config converter utilities.
* Remove randomizeVolumes from the admin command, which was previously
made a noop and deprecated
* Remove unused accumulo-site.xml test resource
* Remove deprecated FateCommand from the shell
* Remove noop debug option from shell command-line that did nothing
because its functionality has been replaced by the ability to
dynamically configure log4j2 logging in 2.x
* Remove deprecated gc.trash.ignore (`GC_TRASH_IGNORE`) property, as
follow-on to #3436; this includes removing a test that's no longer
possible to succeed, since it's no longer possible to configure
Accumulo to ignore Hadoop's trash settings; some comments were cleaned
up in the remaining tests
This commit:
a. Deleted files do not get put into the Trash in the default configuration (Accumulo enabled / Hadoop disabled)
because the default value for fs.trash.interval is zero, which means disabled.
b. Deleted files do get put into the Trash when the default Accumulo configuration is used (GC_TRASH_IGNORE = false) and fs.trash.interval is set to a nonzero value.
c. Deleted files do not get put into the Trash when GC_TRASH_IGNORE is set to true and fs.trash.interval is set to a nonzero value.
d. Deleted files do not get put into the Trash when Accumulo and Hadoop are enabled, but a custom Trash policy is used that filters file names.