New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACCUMULO-3521: minor improvements to iterators #237
Conversation
If there is no opposition to deprecating OrIterator, I will merge these changes. |
I'm iffy on this. It seems like you're just deprecating it without replacement because it doesn't have any tests. Is this accurate? |
Mostly... its also not clear whether it actually functions correctly or not. |
Sorry to be pedantic -- again, is this because of the lack of unit tests? Or, did you try to use it and didn't have success? Backstory is that I have a lot experience with this iterator and, while it's been years, I know it did work. I can't think of any changes that would have caused it to not work -- this is why I'm asking. |
I haven't tried to use it, mainly is because its very confusing and not documented. I do believe you that it worked at one point. But no one has touched the code ever (other than formatting) and without a good test, its likely to rot if it hasn't already. Since you have experience with it, would you be able to write a test? |
I have the ability and knowledge, just can't guarantee the time :) IMO, leaving this iterator out of the |
Isn't that something that Deprecation does? Tells users to use at your own risk. Just having the class up in the "iterator" package as opposed to the "user" package doesn't portray this... |
Verbatim from JDK8 "A program element annotated @deprecated is one that programmers are discouraged from using, typically because it is dangerous, or because a better alternative exists. Compilers warn when a deprecated program element is used or overridden in non-deprecated code." Personally, I wouldn't call the OrIterator's possible code-rot/negligence, "dangerous". IIRC, the {{o.a.a.c.i.user}} package was introduced to bridge the gap between "Iterators we expect users to pull off the shelf" and "Iterators which YMMV using" (a nod towards SKVI not being in public API). I think the OrIterator's lack of love is adequately advertised by not being in {{o.a.a.c.i.user}}. |
I'm not a fan of using deprecation tags to signal "YMMV". Not being in the public API does that all by itself. Currently, (for better or worse) no iterators are in the public API. Some are more risky than others, but I don't think we can use deprecation tags to meaningfully distinguish between continuous ranges of risk. |
On the PR for adding close to iterators we noticed the OrIterator seemed to be doing something incorrect. However, without knowing what the iterators is supposed to actually do, we were not sure :) Under some case in seek it removes an iterator and then later adds it. Does anyone know whats going on here? |
@keith-turner it remove sources that don't match the term or don't have any values. Thus removing it from the TermSources. This seems reasonable based on dealing with a source as a term amongst a list of iterable sources. |
@phrocker do you think it should still call |
That does look like a bug to me. When it is heap-ifying each column (termsource), it's saying that for the given However, most of the time, I would guess that the above is not actually triggered (they would get a new instance of the iterator that would re-construct the term sources). edit: in other words, I think this is meant to be an "optimization" to prevent re-seeking a TermSource that we already once knew didn't exist in the current |
@keith-turner I agree it seems to break the SKVI contract by trying to make some optimization that likely isn't of great benefit. |
@keith-turner I'll admit I'm still trying to wrap my head around the why. I'm trying to reverse engineer it to figure out if there's something I'm missing... |
+1 I think that removal breaks the contract, but, for the wikisearch use-case with very large rows/shards, this bug wouldn't have been noticed. I'm not sure of anyone who has ever used the OrIterator/AndIterator for general purpose table schemas. But again, a quick unit test would likely be the proof in the puddin'. |
There is also another version of OrIterator in the wikisearch repo: https://github.com/apache/accumulo-wikisearch/blob/master/query/src/main/java/org/apache/accumulo/examples/wikisearch/iterator/OrIterator.java |
*/ | ||
@Test | ||
public void testLossyOption() throws IOException, IllegalAccessException, InstantiationException { | ||
Encoder<List<Long>> encoder = SummingArrayCombiner.VarLongArrayEncoder.class.newInstance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason could not do new VarLongArrayEncoder()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops... weird copy and paste hold over. Fixed in 44262a3
@@ -53,30 +54,33 @@ private static long process(TreeMap<Key,Value> sourceMap, TreeMap<Key,Value> res | |||
public void test() throws IOException { | |||
TreeMap<Key,Value> sourceMap = new TreeMap<>(); | |||
Value emptyValue = new Value("".getBytes()); | |||
IteratorSetting iteratorSetting = new IteratorSetting(1, FirstEntryInRowIterator.class); | |||
FirstEntryInRowIterator.setNumScansBeforeSeek(iteratorSetting, 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the test fail if this is not set to properly? If not, could call iteratorSetting.getOptions() after this and check if it was really set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Added in 44262a3
Created a separate PR for OrIterator: #238 |
Will merge today since it seems everyone is OK with the other changes. |
👍 |
44262a3
to
b30160b
Compare
modified: core/src/main/java/org/apache/accumulo/core/iterators/FirstEntryInRowIterator.java modified: core/src/main/java/org/apache/accumulo/core/iterators/TypedValueCombiner.java modified: core/src/test/java/org/apache/accumulo/core/iterators/FirstEntryInRowIteratorTest.java modified: core/src/test/java/org/apache/accumulo/core/iterators/user/CombinerTest.java
b30160b
to
34532dc
Compare
Updated iterators mentioned in ACCUMULO-3521, added tests to cover untested methods and deprecated OrIterator. Couldn't find IteratorUtil.getMaxPriority and .findIterator methods. StatsCombiner is in examples.