Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parallelism during FASTA output #842

Closed
fnothaft opened this issue Oct 2, 2015 · 2 comments
Closed

Improve parallelism during FASTA output #842

fnothaft opened this issue Oct 2, 2015 · 2 comments
Labels

Comments

@fnothaft
Copy link
Member

@fnothaft fnothaft commented Oct 2, 2015

We should be able to increase the parallelism of ADAM->FASTA transformation introduced in #816 by using repartitionAndSortWithinPartitions and streaming data through, instead of using a groupBy.

@heuermh
Copy link
Member

@heuermh heuermh commented Dec 3, 2015

I don't see any explicit groupBy or groupByKeys in the code path for ADAM → FASTA, although I could be missing something. There is a reduceByKey here

https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/rdd/contig/NucleotideContigFragmentRDDFunctions.scala#L102

@fnothaft fnothaft added the wontfix label Mar 3, 2017
@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Mar 3, 2017

Upon reflection, I don't think this is actually important. Closing.

@fnothaft fnothaft closed this Mar 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.