Skip to content
Julia's package for working on Bioinformatics with DNA, RNA and Protein Sequences
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
doc
src
test
LICENSE.md
README.md
REQUIRE

README.md

BioSeq.jl

Version 0.4.0: BioSeq

Package for working with Nucleotides and Amino Acids on The Julia Language

Installation

Pkg.init() # Only the first time you install a Julia's Package

Pkg.add("BioSeq") # Install BioSeq.jl

using BioSeq # Starting to use BioSeq

Features

  • 2-bit DNA sequence DNA2Seq for saving memory
    • Faster vectorized test for calculate percentage of GC, and test A C T G on DNA2Seq
  • 8-bit bitstype Nucleotide and AminoAcid
    • Vectors of this types can be used as DNA, RNA or Protein Sequences
      • Some string's functions working for Sequences:
        • Case conversions
        • Matching functions (search, replace and others)
      • IUPAC Regex is available for matching functions
      • PROSITE patterns are available for matching functions
    • Alignments can be represented as Matrices of this types
    • DArray of this types can be used for parallel computation
    • Memory-mapped arrays of this types can be used for huge data
  • 8-bit Bit-Level Coding Scheme for Nucleotides
  • Translation methods and genetic codes
  • Tools for using IntSet/Set/Dict as alphabets
    • Common alphabets as IntSet, including extended IUPAC
    • Dicts for generate complement for nucleotide sequences or change between 3 letter and 1 letter alphabets on Proteins
    • Test for characters on alphabet
    • Check for all characters on alphabet
    • Swap for alphabet conversions

Documentation

Demo

julia> using BioSeq

julia> const dna4alphabet = alphabet(nt"ACTG", false)
Case Insensitive Alphabet{Nucleotide} of 4 elements:

 indice   : 256-element Uint8 Array
 alphabet : 4-element Nucleotide Array

 alphabet                       indice[alphabet]
 Nucleotide (Int64)             Uint8 (Int64)

 A (65)                         0x01 (1)
 C (67)                         0x02 (2)
 T (84)                         0x03 (3)
 G (71)                         0x04 (4)


julia> dnaseq = repeat( nt"GATTACA" , 2 )
14-element Nucleotide Array:
 G
 A
 T
 T
 A
 C
 A
 G
 A
 T
 T
 A
 C
 A

julia> check(dnaseq, dna4alphabet)
true

julia> protseq = translate(dnaseq,1)
4-element AminoAcid Array:
 D
 Y
 R
 L

julia> if ismatch( prosite"<D-x-[RM]" , protseq )
         threeletters = swap(protseq, AMINO_1LETTER_TO_3 )
       end
4-element ASCIIString Array:
 "ASP"
 "TYR"
 "ARG"
 "LEU"

Contributing

Fork and send a pull request or create a GitHub issue for bug reports or feature requests

You can’t perform that action at this time.