New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FASTQ Reader leaks connections #1974

Closed
harsha2010 opened this Issue Apr 4, 2018 · 1 comment

Comments

Projects
None yet
2 participants
@harsha2010
Copy link

harsha2010 commented Apr 4, 2018

When reading paired fast files using sc.loadAlignments, if I loop over a set of files and read RDDs, I run out of connections soon

com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1114)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1064)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1264)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
	at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
	at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1676)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:294)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
	at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:127)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:256)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:256)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:256)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:256)
	at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:84)
	at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:84)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.immutable.List.map(List.scala:285)
	at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:84)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:256)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:256)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:256)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:256)
	at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:75)
	at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:75)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.immutable.List.map(List.scala:285)
	at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:75)
	at org.apache.spark.rdd.RDD$$anonfun$groupBy$1.apply(RDD.scala:699)
	at org.apache.spark.rdd.RDD$$anonfun$groupBy$1.apply(RDD.scala:699)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:371)
	at org.apache.spark.rdd.RDD.groupBy(RDD.scala:698)
	at org.bdgenomics.adam.rdd.read.SingleReadBucket$.apply(SingleReadBucket.scala:98)
	at org.bdgenomics.adam.rdd.read.AlignmentRecordRDD.groupReadsByFragment(AlignmentRecordRDD.scala:1114)
	at org.bdgenomics.adam.rdd.read.AlignmentRecordRDD.toFragments(AlignmentRecordRDD.scala:410)

@fnothaft fnothaft added the bug label Apr 4, 2018

@fnothaft fnothaft added this to the 0.24.1 milestone Apr 4, 2018

@fnothaft fnothaft self-assigned this Apr 4, 2018

@fnothaft

This comment has been minimized.

Copy link
Member

fnothaft commented Apr 5, 2018

Culprit is

splittable = BlockCompressedInputStream.isValidFile(new BufferedInputStream(is));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment