-
Notifications
You must be signed in to change notification settings - Fork 52
Description
BioSequences is a large package. Probably larger than it needs to be. It seems like a hold-over from the olden days when BioJulia was much more monolithic similar to e.g. Biopython.
In my opinion, it's easier and cleaner if BioSequences only contain the sequence types itself, and the most basic processing on them. Nothing fancy.
Are there things we can move out of BioSequences for v3? Here are my suggestions. If anyone thinks any of this belongs in this package, please leave a comment
Functionality to remove
-
src/composition.jl -
src/demultiplexer.jl -
src/minhash.jl -
src/nmask.jl -
src/refseq - ConditionIterator
- VoidAlphabet
- CharAlphabet
Functionality to migrate to other packages
- K-mer composition
- Barcode demultiplexing
- K-mer minhashing
Dependencies to remove
These are either not used, or can be removed with trivial changes
- BioGenerics
- Printf
- Combinatorics
- IndexableBitVectors
A few comments to these
composition and minhash
These kmer techniques really belong in another package. There are so many cool kmer bithacks, and so much to do with kmers. We can't possibly do it all in this package.
I'd much rather have a handful of kmer iterators in this package, and leave it at that. Other packages can then build on the kmer iterators and sequences from here.
NMask and RefSeq
Do people really use these? It seems to me one might as well use LongDNASeq or its 2-bit equivalent. Theoretically, there is some space saving possible by using the NMask. Practically speaking, I doubt we have the dev manpower to maintain yet another longsequence-like sequence.
Demultiplexer
This seems very application-specific, and should probably be part of some NGS-related package.
ConditionIterator
This just seems like a re-implementation of Iterators.Filter, with much less functionality.
VoidAlphabet and CharAlphabet
I can't see the point of VoidAlphabet at all. CharAlphabet is useful for testing the generic interfaces, so I propose we move it from src to test.