Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArrayIndexOutOfBoundsException in Cascading #1794

Open
fwbrasil opened this issue Feb 9, 2018 · 7 comments
Open

ArrayIndexOutOfBoundsException in Cascading #1794

fwbrasil opened this issue Feb 9, 2018 · 7 comments

Comments

@fwbrasil
Copy link
Contributor

fwbrasil commented Feb 9, 2018

One of our e2e tests fails when I try to use the the develop branch:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
	at cascading.tuple.TupleEntryChainIterator.next(TupleEntryChainIterator.java:79)
	at cascading.tuple.TupleEntryChainIterator.next(TupleEntryChainIterator.java:32)
	at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
	at com.twitter.scalding.typed.cascading_backend.AsyncFlowDefRunner$$anonfun$getIterable$1$$anon$1.foreach(AsyncFlowDefRunner.scala:360)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at com.twitter.scalding.typed.cascading_backend.AsyncFlowDefRunner$$anonfun$getIterable$1$$anon$1.map(AsyncFlowDefRunner.scala:360)
	at com.twitter.data_platform.e2e_testing.jobs.dal_keyval_source_summingbird.VerifyResultsExecutionApp$$anonfun$3.apply(VKVSTest.scala:104)
	at com.twitter.data_platform.e2e_testing.jobs.dal_keyval_source_summingbird.VerifyResultsExecutionApp$$anonfun$3.apply(VKVSTest.scala:102)
	at scala.util.Success$$anonfun$map$1.apply(Try.scala:237)

Considering that Iterator.foreach checks if hasNext before calling next, it seems that
TupleEntryChainIterator enters a bad state where currentIterator points to an invalid position.

I haven't been able to reproduce the cascading bug in isolation yet.

cc/ @johnynek

@johnynek
Copy link
Collaborator

johnynek commented Feb 9, 2018

I wonder if the source you are dealing with has a bug with toIterator? We assume we can call that again and again, but maybe this source has an issue there?

@fwbrasil
Copy link
Contributor Author

fwbrasil commented Feb 9, 2018

It seems to be a bug in cascading. TupleEntryChainIterator should never throw if used correctly (hasNext and then next), which is the case.

@johnynek
Copy link
Collaborator

johnynek commented Feb 9, 2018

I wonder if it is exhibited in cascading 2.7?

@johnynek
Copy link
Collaborator

johnynek commented Feb 9, 2018

also: why did we not trigger it before, but now we do?

@johnynek
Copy link
Collaborator

I'd love to find a repro of this issue.

@fwbrasil
Copy link
Contributor Author

fwbrasil commented Apr 16, 2018

I've investigated this issue a little more. The bug is not in TupleEntryChainIterator, but in the underlying iterator impl HadoopTupleEntrySchemeIterator. Its hasNext returns true initially but a second call to hasNext returns false, even before next is called.

@johnynek
Copy link
Collaborator

@fwbrasil is this a race condition in Hadoop? we have seen a few of what looks like those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants