New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PartitionAndJoin should throw an exception if it sees an unmapped read #297

Closed
kozanitis opened this Issue Jul 9, 2014 · 9 comments

Comments

Projects
None yet
5 participants
@kozanitis
Contributor

kozanitis commented Jul 9, 2014

PartitionAndJoin crashes with a null pointer exception when I call it to join a set of 4 mouse-chrM coordinates with a small mouse file.

You can find my mouse.bam, mouse.adam, the coordinate test.txt file and my source code here:

https://github.com/kozanitis/misc

This is the exception stack:
2014-07-09 09:39:57 WARN TaskSetManager:70 - Loss was due to java.lang.NullPointerException
java.lang.NullPointerException
at org.bdgenomics.adam.rdd.NonoverlappingRegions.hasRegionsFor(RegionJoin.scala:311)
at org.bdgenomics.adam.rdd.MultiContigNonoverlappingRegions.filter(RegionJoin.scala:365)
at org.bdgenomics.adam.rdd.RegionJoin$$anonfun$5.apply(RegionJoin.scala:125)
at org.bdgenomics.adam.rdd.RegionJoin$$anonfun$5.apply(RegionJoin.scala:125)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

@kozanitis

This comment has been minimized.

Show comment
Hide comment
@kozanitis

kozanitis Jul 9, 2014

Contributor

I resolved the issue by filtering out the unmapped adam records before calling PartitionAndJoin. Perhaps a relevant exception might be of help?

Contributor

kozanitis commented Jul 9, 2014

I resolved the issue by filtering out the unmapped adam records before calling PartitionAndJoin. Perhaps a relevant exception might be of help?

@kozanitis kozanitis closed this Jul 9, 2014

@carlyeks

This comment has been minimized.

Show comment
Hide comment
@carlyeks

carlyeks Jul 10, 2014

Member

Good point, we should be catching and throwing an exception in the case that the reads are unmapped. I'm going to leave this open until we fix that.

Member

carlyeks commented Jul 10, 2014

Good point, we should be catching and throwing an exception in the case that the reads are unmapped. I'm going to leave this open until we fix that.

@carlyeks carlyeks reopened this Jul 10, 2014

@carlyeks carlyeks self-assigned this Jul 10, 2014

@tdanford tdanford changed the title from PartitionAndJoin exceptions to PartitionAndJoin should throw an exception if it sees an unmapped read Oct 14, 2014

@tdanford tdanford assigned tdanford and unassigned carlyeks Oct 14, 2014

@tdanford

This comment has been minimized.

Show comment
Hide comment
@tdanford

tdanford Oct 15, 2014

Contributor

So the question here is, "should it throw an exception?" (which technically, it already does: it throws an un-interpretable NPE) Or "should it simply filter out the unmapped reads before attempting to do the join?"

I'm inclined to think that the latter is a better option.

Contributor

tdanford commented Oct 15, 2014

So the question here is, "should it throw an exception?" (which technically, it already does: it throws an un-interpretable NPE) Or "should it simply filter out the unmapped reads before attempting to do the join?"

I'm inclined to think that the latter is a better option.

@kozanitis

This comment has been minimized.

Show comment
Hide comment
@kozanitis

kozanitis Oct 15, 2014

Contributor

@tdanford definitely your second option sounds user friendlier... But my vote goes for the first approach as it looks cleaner from a design point of view. @fnothaft , @massie what you guys think?

Contributor

kozanitis commented Oct 15, 2014

@tdanford definitely your second option sounds user friendlier... But my vote goes for the first approach as it looks cleaner from a design point of view. @fnothaft , @massie what you guys think?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 15, 2014

Member

I prefer #1; the user should check the validity of their data before doing the join.

Member

fnothaft commented Oct 15, 2014

I prefer #1; the user should check the validity of their data before doing the join.

@massie

This comment has been minimized.

Show comment
Hide comment
@massie

massie Oct 15, 2014

Member

I like the fast-fail exception but we should ensure that it's easy for a
user to understand the error, e.g. "PartitionAndJoin on RDD with unmapped
reads is not supported" using a NotSupportedException or something similar.

-Matt

On Wed, Oct 15, 2014 at 3:51 PM, Frank Austin Nothaft <
notifications@github.com> wrote:

I prefer #1 #1; the user
should check the validity of their data before doing the join.


Reply to this email directly or view it on GitHub
#297 (comment)
.

Member

massie commented Oct 15, 2014

I like the fast-fail exception but we should ensure that it's easy for a
user to understand the error, e.g. "PartitionAndJoin on RDD with unmapped
reads is not supported" using a NotSupportedException or something similar.

-Matt

On Wed, Oct 15, 2014 at 3:51 PM, Frank Austin Nothaft <
notifications@github.com> wrote:

I prefer #1 #1; the user
should check the validity of their data before doing the join.


Reply to this email directly or view it on GitHub
#297 (comment)
.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 15, 2014

Member

@massie that's what @tdanford adds in #421

Member

fnothaft commented Oct 15, 2014

@massie that's what @tdanford adds in #421

@massie

This comment has been minimized.

Show comment
Hide comment
@massie

massie Oct 15, 2014

Member

+1

-Matt

On Wed, Oct 15, 2014 at 3:58 PM, Frank Austin Nothaft <
notifications@github.com> wrote:

@massie https://github.com/massie that's what @tdanford
https://github.com/tdanford adds in #421
#421


Reply to this email directly or view it on GitHub
#297 (comment)
.

Member

massie commented Oct 15, 2014

+1

-Matt

On Wed, Oct 15, 2014 at 3:58 PM, Frank Austin Nothaft <
notifications@github.com> wrote:

@massie https://github.com/massie that's what @tdanford
https://github.com/tdanford adds in #421
#421


Reply to this email directly or view it on GitHub
#297 (comment)
.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 6, 2016

Member

Closed as won't fix.

Member

fnothaft commented Jul 6, 2016

Closed as won't fix.

@fnothaft fnothaft closed this Jul 6, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment