Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many open files when running PileupSpark on a WES sample #6642

Open
mwiewior opened this issue Jun 6, 2020 · 2 comments · May be fixed by #6643
Open

Too many open files when running PileupSpark on a WES sample #6642

mwiewior opened this issue Jun 6, 2020 · 2 comments · May be fixed by #6643
Assignees

Comments

@mwiewior
Copy link

mwiewior commented Jun 6, 2020

Hello,
I'm trying to run SparkPileup from the most recent GATK 4.1.7 on a WES sample like this:

gatk PileupSpark --spark-runner SPARK --spark-master local[{threads}] --conf "spark.driver.memory=22g" -I $DATA/NA12878.proper.wes.md.bam -R $DATA/Homo_sapiens_assembly18.fasta -O /tmp/gatk4s_{threads}.pileup'

mwiewior@Mareks-MacBook-Pro ~ % spark-submit --version
Welcome to
____ __
/ / ___ / /
\ / _ / _ `/ __/ '/
/
/ .__/_,// //_\ version 2.4.5
/
/

Using Scala version 2.11.12, OpenJDK 64-Bit Server VM, 1.8.0_252
Branch HEAD

No matter how high I set the max file descriptors (even to 1M)
mwiewior@Mareks-MacBook-Pro ~ % ulimit -a
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 8192
-c: core file size (blocks) 0
-v: address space (kbytes) unlimited
-l: locked-in-memory size (kbytes) unlimited
-u: processes 2048
-n: file descriptors 1000000

I'm keep on getting the following error:

20/06/06 14:56:35 ERROR Utils: Aborting task
org.broadinstitute.hellbender.exceptions.UserException$CouldNotReadInputFile: Couldn't read file file:///private/var/folders/5s/v5t08tmd42z_2m2c30vqf6kc0000gn/T/spark-556aa7a2-4d88-4bae-ad16-36d5af920fa9/userFiles-aeb68992-3215-4897-8f8a-040396296185/Homo_sapiens_assembly18.fasta. Error was: Fasta index file could not be opened: /private/var/folders/5s/v5t08tmd42z_2m2c30vqf6kc0000gn/T/spark-556aa7a2-4d88-4bae-ad16-36d5af920fa9/userFiles-aeb68992-3215-4897-8f8a-040396296185/Homo_sapiens_assembly18.fasta.fai
at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.(CachingIndexedFastaSequenceFile.java:159)
at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.(CachingIndexedFastaSequenceFile.java:125)
at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.(CachingIndexedFastaSequenceFile.java:110)
at org.broadinstitute.hellbender.engine.ReferenceFileSource.(ReferenceFileSource.java:35)
at org.broadinstitute.hellbender.engine.spark.LocusWalkerSpark.lambda$getAlignmentsFunction$a99dbf6a$1(LocusWalkerSpark.java:113)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:125)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:125)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:130)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:129)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:141)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: htsjdk.samtools.SAMException: Fasta index file could not be opened: /private/var/folders/5s/v5t08tmd42z_2m2c30vqf6kc0000gn/T/spark-556aa7a2-4d88-4bae-ad16-36d5af920fa9/userFiles-aeb68992-3215-4897-8f8a-040396296185/Homo_sapiens_assembly18.fasta.fai
at htsjdk.samtools.reference.FastaSequenceIndex.(FastaSequenceIndex.java:74)
at htsjdk.samtools.reference.IndexedFastaSequenceFile.(IndexedFastaSequenceFile.java:98)
at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:139)
at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.(CachingIndexedFastaSequenceFile.java:148)
... 24 more
Caused by: java.nio.file.FileSystemException: /private/var/folders/5s/v5t08tmd42z_2m2c30vqf6kc0000gn/T/spark-556aa7a2-4d88-4bae-ad16-36d5af920fa9/userFiles-aeb68992-3215-4897-8f8a-040396296185/Homo_sapiens_assembly18.fasta.fai: Too many open files

Is there any additional parameter I can set to overcome this issue?
Any ideas welcome.

Thanks,
Marek

@mwiewior
Copy link
Author

mwiewior commented Jun 6, 2020

Mayby it is related to #6578 and #5316 ? However, I tested versrions: 4.1.7, 4.1.6 4.1.5, 4.1.4.1 and the problem persists together with spark 2.4.3 and spark 2.4.5.

mwiewior pushed a commit to mwiewior/gatk that referenced this issue Jun 7, 2020
@mwiewior mwiewior linked a pull request Jun 7, 2020 that will close this issue
@mwiewior
Copy link
Author

mwiewior commented Jun 7, 2020

The proposed PR doesn't fix the problem completely. When running local[1] it fails with the same error(too many open files) after finishing ~400/555 partitions - without the fix it failed after ~25. When running in parallel(local[n], n>1) it starts hanging after processing around ~40 partitions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants