New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Top level WrappedRDD or similar abstraction #1173
Comments
I don't think the wrapped RDD pattern makes sense unless you have metadata that you want to package up with the RDD. That's the sole reason I wanted to move to the
Indeed; that comes in from here, but the concrete non-
If it's a trait and not an abstract class, you'll still be struggling with #1092. Well, specifically if implementing classes make use of multiple trait inheritance, that is. |
Couldn't the methods in |
Right now we can only support RDD's under the GenomicRDD wrapper. It would be nice to swap this out for IntervalRDD's. |
|
The main thing we do with IntervalRDD is filterByInterval(), which there is no good query for in GenomicRdd. |
So we could modify IntervalRDD so it accepts [T:ClassTag] and we hard code ReferenceRegion instead of allowing it to be a ClassTag, which I am fine with. The only problem that still remains is Mango has no way to reference things by sample name/file. We need this to store multiple samples and differentiate them in the GUI. |
Since reporting this, I've gone with having a sequence dictionary even when one doesn't necessarily make sense, so Note regarding your comment above, we've since removed the Flatten CLI, so perhaps more cleanup can be done. That concrete class moved to https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/rdd/ADAMRDDFunctions.scala#L175 |
Thinking about implementing phenotype support, or sequence/slice/read as proposed in bdg-formats, it appears that
GenomicRDD
is not quite the top level abstraction that it should be.I propose a new top level trait/interface/abstract class
WrappedRDD
(or a better name) that includes onlyand perhaps some of the saveAsParquet stuff (not exactly sure, does that come in from
ADAMRDDFunctions
?) ThenGenomicRDD
adds the sequence dictionary, and so on.This would give us something to extend from when a sequence dictionary is not required and a place to put code level documentation about the wrapped-RDD pattern that we've established. It may also help address #1092.
The text was updated successfully, but these errors were encountered: