Skip to content

Julia's package for working on Bioinformatics with DNA, RNA and Protein Sequences

License

Notifications You must be signed in to change notification settings

diegozea/BioSeq.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BioSeq.jl

Version 0.4.0: BioSeq

Package for working with Nucleotides and Amino Acids on The Julia Language

Installation

Pkg.init() # Only the first time you install a Julia's Package

Pkg.add("BioSeq") # Install BioSeq.jl

using BioSeq # Starting to use BioSeq

Features

  • 2-bit DNA sequence DNA2Seq for saving memory
    • Faster vectorized test for calculate percentage of GC, and test A C T G on DNA2Seq
  • 8-bit bitstype Nucleotide and AminoAcid
    • Vectors of this types can be used as DNA, RNA or Protein Sequences
      • Some string's functions working for Sequences:
        • Case conversions
        • Matching functions (search, replace and others)
      • IUPAC Regex is available for matching functions
      • PROSITE patterns are available for matching functions
    • Alignments can be represented as Matrices of this types
    • DArray of this types can be used for parallel computation
    • Memory-mapped arrays of this types can be used for huge data
  • 8-bit Bit-Level Coding Scheme for Nucleotides
  • Translation methods and genetic codes
  • Tools for using IntSet/Set/Dict as alphabets
    • Common alphabets as IntSet, including extended IUPAC
    • Dicts for generate complement for nucleotide sequences or change between 3 letter and 1 letter alphabets on Proteins
    • Test for characters on alphabet
    • Check for all characters on alphabet
    • Swap for alphabet conversions

Documentation

Demo

julia> using BioSeq

julia> const dna4alphabet = alphabet(nt"ACTG", false)
Case Insensitive Alphabet{Nucleotide} of 4 elements:

 indice   : 256-element Uint8 Array
 alphabet : 4-element Nucleotide Array

 alphabet                       indice[alphabet]
 Nucleotide (Int64)             Uint8 (Int64)

 A (65)                         0x01 (1)
 C (67)                         0x02 (2)
 T (84)                         0x03 (3)
 G (71)                         0x04 (4)


julia> dnaseq = repeat( nt"GATTACA" , 2 )
14-element Nucleotide Array:
 G
 A
 T
 T
 A
 C
 A
 G
 A
 T
 T
 A
 C
 A

julia> check(dnaseq, dna4alphabet)
true

julia> protseq = translate(dnaseq,1)
4-element AminoAcid Array:
 D
 Y
 R
 L

julia> if ismatch( prosite"<D-x-[RM]" , protseq )
         threeletters = swap(protseq, AMINO_1LETTER_TO_3 )
       end
4-element ASCIIString Array:
 "ASP"
 "TYR"
 "ARG"
 "LEU"

Contributing

Fork and send a pull request or create a GitHub issue for bug reports or feature requests

About

Julia's package for working on Bioinformatics with DNA, RNA and Protein Sequences

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages