Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACCUMULO-3521: minor improvements to iterators #237

Merged
merged 1 commit into from Apr 4, 2017

Conversation

milleruntime
Copy link
Contributor

Updated iterators mentioned in ACCUMULO-3521, added tests to cover untested methods and deprecated OrIterator. Couldn't find IteratorUtil.getMaxPriority and .findIterator methods. StatsCombiner is in examples.

@milleruntime
Copy link
Contributor Author

If there is no opposition to deprecating OrIterator, I will merge these changes.

@joshelser
Copy link
Member

If there is no opposition to deprecating OrIterator

I'm iffy on this. It seems like you're just deprecating it without replacement because it doesn't have any tests. Is this accurate?

@milleruntime
Copy link
Contributor Author

Mostly... its also not clear whether it actually functions correctly or not.

@joshelser
Copy link
Member

Mostly... its also not clear whether it actually functions correctly or not

Sorry to be pedantic -- again, is this because of the lack of unit tests? Or, did you try to use it and didn't have success?

Backstory is that I have a lot experience with this iterator and, while it's been years, I know it did work. I can't think of any changes that would have caused it to not work -- this is why I'm asking.

@milleruntime
Copy link
Contributor Author

I haven't tried to use it, mainly is because its very confusing and not documented. I do believe you that it worked at one point. But no one has touched the code ever (other than formatting) and without a good test, its likely to rot if it hasn't already. Since you have experience with it, would you be able to write a test?

@joshelser
Copy link
Member

Since you have experience with it, would you be able to write a test?

I have the ability and knowledge, just can't guarantee the time :)

IMO, leaving this iterator out of the user package was intended to note that it's not great for users to directly consume. I don't see any value added to slapping a "Deprecated" on it too.

@milleruntime
Copy link
Contributor Author

MO, leaving this iterator out of the user package was intended to note that it's not great for users to directly consume. I don't see any value added to slapping a "Deprecated" on it too.

Isn't that something that Deprecation does? Tells users to use at your own risk. Just having the class up in the "iterator" package as opposed to the "user" package doesn't portray this...

@joshelser
Copy link
Member

Isn't that something that Deprecation does? Tells users to use at your own risk

Verbatim from JDK8

"A program element annotated @deprecated is one that programmers are discouraged from using, typically because it is dangerous, or because a better alternative exists. Compilers warn when a deprecated program element is used or overridden in non-deprecated code."

Personally, I wouldn't call the OrIterator's possible code-rot/negligence, "dangerous". IIRC, the {{o.a.a.c.i.user}} package was introduced to bridge the gap between "Iterators we expect users to pull off the shelf" and "Iterators which YMMV using" (a nod towards SKVI not being in public API). I think the OrIterator's lack of love is adequately advertised by not being in {{o.a.a.c.i.user}}.

@ctubbsii
Copy link
Member

ctubbsii commented Apr 3, 2017

Isn't that something that Deprecation does?

I'm not a fan of using deprecation tags to signal "YMMV". Not being in the public API does that all by itself. Currently, (for better or worse) no iterators are in the public API. Some are more risky than others, but I don't think we can use deprecation tags to meaningfully distinguish between continuous ranges of risk.

@keith-turner
Copy link
Contributor

On the PR for adding close to iterators we noticed the OrIterator seemed to be doing something incorrect. However, without knowing what the iterators is supposed to actually do, we were not sure :)

Under some case in seek it removes an iterator and then later adds it. Does anyone know whats going on here?

https://github.com/milleruntime/accumulo/blob/4366d58e60b641f4ace075d7812fc947e2e18910/core/src/main/java/org/apache/accumulo/core/iterators/OrIterator.java#L225

@phrocker
Copy link
Contributor

phrocker commented Apr 3, 2017

@keith-turner it remove sources that don't match the term or don't have any values. Thus removing it from the TermSources. This seems reasonable based on dealing with a source as a term amongst a list of iterable sources.

@keith-turner
Copy link
Contributor

@phrocker do you think it should still call sorted.add(TS) when it removes TS from the iterator? The comments make it sound like maybe it shouldn't, but I am not sure.

@joshelser
Copy link
Member

joshelser commented Apr 3, 2017

should still call sorted.add(TS) when it removes TS from the iterator?

That does look like a bug to me. When it is heap-ifying each column (termsource), it's saying that for the given Range it was seek'ed to, there are no results. However, there is no reason that this instance of the OrIterator couldn't be seek'ed to a new Range without being torn-down.

However, most of the time, I would guess that the above is not actually triggered (they would get a new instance of the iterator that would re-construct the term sources).

edit: in other words, I think this is meant to be an "optimization" to prevent re-seeking a TermSource that we already once knew didn't exist in the current Range (during the teardown-reseek case)... if that actually clarifies things (lol).

@phrocker
Copy link
Contributor

phrocker commented Apr 3, 2017

@keith-turner I agree it seems to break the SKVI contract by trying to make some optimization that likely isn't of great benefit.

@phrocker
Copy link
Contributor

phrocker commented Apr 3, 2017

@keith-turner I'll admit I'm still trying to wrap my head around the why. I'm trying to reverse engineer it to figure out if there's something I'm missing...

@joshelser
Copy link
Member

I agree it seems to break the SKVI contract

+1

I think that removal breaks the contract, but, for the wikisearch use-case with very large rows/shards, this bug wouldn't have been noticed. I'm not sure of anyone who has ever used the OrIterator/AndIterator for general purpose table schemas. But again, a quick unit test would likely be the proof in the puddin'.

@milleruntime
Copy link
Contributor Author

*/
@Test
public void testLossyOption() throws IOException, IllegalAccessException, InstantiationException {
Encoder<List<Long>> encoder = SummingArrayCombiner.VarLongArrayEncoder.class.newInstance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason could not do new VarLongArrayEncoder()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops... weird copy and paste hold over. Fixed in 44262a3

@@ -53,30 +54,33 @@ private static long process(TreeMap<Key,Value> sourceMap, TreeMap<Key,Value> res
public void test() throws IOException {
TreeMap<Key,Value> sourceMap = new TreeMap<>();
Value emptyValue = new Value("".getBytes());
IteratorSetting iteratorSetting = new IteratorSetting(1, FirstEntryInRowIterator.class);
FirstEntryInRowIterator.setNumScansBeforeSeek(iteratorSetting, 10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the test fail if this is not set to properly? If not, could call iteratorSetting.getOptions() after this and check if it was really set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Added in 44262a3

@milleruntime
Copy link
Contributor Author

Created a separate PR for OrIterator: #238

@milleruntime
Copy link
Contributor Author

Will merge today since it seems everyone is OK with the other changes.

@joshelser
Copy link
Member

👍

	modified:   core/src/main/java/org/apache/accumulo/core/iterators/FirstEntryInRowIterator.java
	modified:   core/src/main/java/org/apache/accumulo/core/iterators/TypedValueCombiner.java
	modified:   core/src/test/java/org/apache/accumulo/core/iterators/FirstEntryInRowIteratorTest.java
	modified:   core/src/test/java/org/apache/accumulo/core/iterators/user/CombinerTest.java
@asfgit asfgit merged commit 34532dc into apache:master Apr 4, 2017
@milleruntime milleruntime deleted the ACCUMULO-3521 branch May 19, 2017 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants