New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parallelism during FASTA output #842

Closed
fnothaft opened this Issue Oct 2, 2015 · 2 comments

Comments

Projects
None yet
2 participants
@fnothaft
Member

fnothaft commented Oct 2, 2015

We should be able to increase the parallelism of ADAM->FASTA transformation introduced in #816 by using repartitionAndSortWithinPartitions and streaming data through, instead of using a groupBy.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Dec 3, 2015

Member

I don't see any explicit groupBy or groupByKeys in the code path for ADAM → FASTA, although I could be missing something. There is a reduceByKey here

https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/rdd/contig/NucleotideContigFragmentRDDFunctions.scala#L102

Member

heuermh commented Dec 3, 2015

I don't see any explicit groupBy or groupByKeys in the code path for ADAM → FASTA, although I could be missing something. There is a reduceByKey here

https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/rdd/contig/NucleotideContigFragmentRDDFunctions.scala#L102

@fnothaft fnothaft added the wontfix label Mar 3, 2017

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Mar 3, 2017

Member

Upon reflection, I don't think this is actually important. Closing.

Member

fnothaft commented Mar 3, 2017

Upon reflection, I don't think this is actually important. Closing.

@fnothaft fnothaft closed this Mar 3, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment