Data structure for reference sequences
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.github
src
test
.gitignore
.travis.yml
CODE_OF_CONDUCT.md
CONTRIBUTING.md
LICENSE.md
README.md
REQUIRE
appveyor.yml

README.md

ReferenceSequences

Build Status

ReferenceSequences.jl provides a data structure for reference sequences. It is common that a reference sequence contains only five kinds of nucleotides ("ACGTN") and the occurrence of 'N' is sparse and clustered. In such a case, ReferenceSequence can compress positions of 'N' and aggressively save memory space.

julia> using Bio.Seq

julia> using ReferenceSequences

# create ReferenceSequence from DNASequence of Bio.Seq
julia> seq = ReferenceSequence(dna"ACGT"^5 * dna"N"^10)
30nt Reference Sequence:
ACGTACGTACGTACGTACGTNNNNNNNNNN

julia> DNASequence(seq)  # round trip
30nt DNA Sequence:
ACGTACGTACGTACGTACGTNNNNNNNNNN

julia> seq[4]  # access an element
T

julia> seq[15:25]  # make a subsequence (copy-free)
11nt Reference Sequence:
GTACGTNNNNN

julia> seq[1:4] == dna"ACGT"  # comparison
true

TODO

  • FASTA parser