Skip to content

Spring cleanup for v3 #138

@jakobnissen

Description

@jakobnissen

BioSequences is a large package. Probably larger than it needs to be. It seems like a hold-over from the olden days when BioJulia was much more monolithic similar to e.g. Biopython.

In my opinion, it's easier and cleaner if BioSequences only contain the sequence types itself, and the most basic processing on them. Nothing fancy.

Are there things we can move out of BioSequences for v3? Here are my suggestions. If anyone thinks any of this belongs in this package, please leave a comment

Functionality to remove

  • src/composition.jl
  • src/demultiplexer.jl
  • src/minhash.jl
  • src/nmask.jl
  • src/refseq
  • ConditionIterator
  • VoidAlphabet
  • CharAlphabet

Functionality to migrate to other packages

  • K-mer composition
  • Barcode demultiplexing
  • K-mer minhashing

Dependencies to remove
These are either not used, or can be removed with trivial changes

  • BioGenerics
  • Printf
  • Combinatorics
  • IndexableBitVectors

A few comments to these

composition and minhash

These kmer techniques really belong in another package. There are so many cool kmer bithacks, and so much to do with kmers. We can't possibly do it all in this package.
I'd much rather have a handful of kmer iterators in this package, and leave it at that. Other packages can then build on the kmer iterators and sequences from here.

NMask and RefSeq

Do people really use these? It seems to me one might as well use LongDNASeq or its 2-bit equivalent. Theoretically, there is some space saving possible by using the NMask. Practically speaking, I doubt we have the dev manpower to maintain yet another longsequence-like sequence.

Demultiplexer

This seems very application-specific, and should probably be part of some NGS-related package.

ConditionIterator

This just seems like a re-implementation of Iterators.Filter, with much less functionality.

VoidAlphabet and CharAlphabet

I can't see the point of VoidAlphabet at all. CharAlphabet is useful for testing the generic interfaces, so I propose we move it from src to test.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions