Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shufflejoin and ArrayIndexOutOfBoundsException #1436

Closed
antonkulaga opened this issue Mar 12, 2017 · 7 comments
Closed

shufflejoin and ArrayIndexOutOfBoundsException #1436

antonkulaga opened this issue Mar 12, 2017 · 7 comments
Milestone

Comments

@antonkulaga
Copy link
Contributor

@antonkulaga antonkulaga commented Mar 12, 2017

I am trying to reproduce bedtools operations with adam and I often get annoying java.lang.ArrayIndexOutOfBoundsException errors on shuffleJoin for this data ( http://quinlanlab.org/tutorials/bedtools/bedtools.html ). In particular I am doing shufflejoiin for
cpg.bed and exons.bed

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 15.0 failed 1 times, most recent failure: Lost task 0.0 in stage 15.0 (TID 111, localhost): java.lang.ArrayIndexOutOfBoundsException: 1484
@antonkulaga antonkulaga changed the title org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 15.0 failed 1 times, most recent failure: Lost task 0.0 in stage 15.0 (TID 111, localhost): java.lang.ArrayIndexOutOfBoundsException: 1484 shufflejoin and ArrayIndexOutOfBoundsException Mar 12, 2017
@fnothaft
Copy link
Member

@fnothaft fnothaft commented Mar 12, 2017

OOC, have you used the new implementation in #1324? It is a really clean rewrite of the ShuffleRegionJoin core. CC @devin-petersohn

@antonkulaga
Copy link
Contributor Author

@antonkulaga antonkulaga commented Mar 12, 2017

@fnothaft I am using latest stable ADAM release, there broadcastJoining freezes (even though the bed files are supersmall as you may see) while shuffleJoin gives this annoying java.lang.ArrayIndexOutOfBoundsException.

@antonkulaga
Copy link
Contributor Author

@antonkulaga antonkulaga commented Mar 12, 2017

Hm, I think it is spark-notebook to blame for broadcastJoining freezing, after reload it was ok.
By the way, in what situations do you recommend shuffleJoining and in what broadcastJoining?

@devin-petersohn
Copy link
Member

@devin-petersohn devin-petersohn commented Mar 14, 2017

Hi @antonkulaga. I am trying to reproduce the ArrayIndexOutOfBoundsException but I am having a hard time doing so with the files you linked to. I attempted all types of shuffleRegionJoins joining both datasets and had no issues. Is there something specific you were doing when this happened?

@antonkulaga
Copy link
Contributor Author

@antonkulaga antonkulaga commented Mar 16, 2017

@devin-petersohn I have not tried your PR, maybe in your PR everything works fine

@heuermh
Copy link
Member

@heuermh heuermh commented Mar 21, 2017

@antonkulaga Can this issue be closed?

@devin-petersohn
Copy link
Member

@devin-petersohn devin-petersohn commented Jun 5, 2017

Resolved with #1324

@heuermh heuermh modified the milestone: 0.23.0 Jul 22, 2017
@heuermh heuermh added this to Completed in Release 0.23.0 Jan 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.