# BioRuby Tutorial

* Copyright (C) 2001-2003 KATAYAMA Toshiaki <k .at. bioruby.org>
* Copyright (C) 2005-2011 Pjotr Prins, Naohisa Goto and others

This document was last modified: 2011/10/14
Current editor: Michael O'Keefe <okeefm (at) rpi (dot) edu>
The latest version resides in the GIT source code repository:  [Tutorial.rd](https://github.com/bioruby/bioruby/blob/master/doc/Tutorial.rd).

## Introduction

This is a tutorial for using Bioruby. A basic knowledge of Ruby is required.
If you want to know more about the programming language, we recommend the
latest Ruby book [Programming Ruby](http://www.pragprog.com/titles/ruby)
by Dave Thomas and Andy Hunt - the first edition can be read online
[here](http://www.ruby-doc.org/docs/ProgrammingRuby/).

For BioRuby you need to install Ruby and the BioRuby package on your computer
You can check whether Ruby is installed on your computer and what
version it has with the
```shell
  % ruby -v
```
command. You should see something like:
```
  ruby 1.9.2p290 (2011-07-09 revision 32553) [i686-linux]
```
If you see no such thing you'll have to install Ruby using your installation
manager. For more information see the
[Ruby](http://www.ruby-lang.org/en/) website.

With Ruby download and install Bioruby using the links on the
[Bioruby](http://bioruby.org/) website. The recommended installation is via 
RubyGems:
```shell
gem install bio
```
See also the Bioruby [wiki](http://bioruby.open-bio.org/wiki/Installation).

A lot of BioRuby's documentation exists in the source code and unit tests. To
really dive in you will need the latest source code tree. The embedded rdoc
documentation can be viewed online at
[bioruby's rdoc](http://bioruby.org/rdoc/). But first lets start!

## Trying Bioruby

Now test the following:

In [1]:
require 'bio'

true

In [2]:
seq = Bio::Sequence::NA.new("atgcatgcaaaa")

"atgcatgcaaaa"

In [3]:
seq.complement

"ttttgcatgcat"

## Working with nucleic / amino acid sequences (Bio::Sequence class)

The Bio::Sequence class allows the usual sequence transformations and
translations.  In the example below the DNA sequence "atgcatgcaaaa" is
converted into the complemental strand and spliced into a subsequence; 
next, the nucleic acid composition is calculated and the sequence is
translated into the amino acid sequence, the molecular weight
calculated, and so on. When translating into amino acid sequences, the
frame can be specified and optionally the codon table selected (as
defined in codontable.rb).

In [4]:
seq = Bio::Sequence::NA.new("atgcatgcaaaa")

"atgcatgcaaaa"

In [5]:
# complemental sequence (Bio::Sequence::NA object)
seq.complement

"ttttgcatgcat"

In [6]:
seq.subseq(3,8) # gets subsequence of positions 3 to 8 (starting from 1)

"gcatgc"

In [7]:
seq.gc_percent

33

In [8]:
seq.composition

{"a"=>6, "t"=>2, "g"=>2, "c"=>2}

In [9]:
seq.translate

"MHAK"

In [10]:
seq.translate(2)        # translate from frame 2

"CMQ"

In [11]:
seq.translate(1,11)     # codon table 11

"MHAK"

In [12]:
seq.translate.codes

["Met", "His", "Ala", "Lys"]

In [13]:
seq.translate.names

["methionine", "histidine", "alanine", "lysine"]

In [14]:
seq.translate.composition

{"M"=>1, "H"=>1, "A"=>1, "K"=>1}

In [15]:
seq.translate.molecular_weight

485.6050000000001

In [16]:
seq.complement.translate

"FCMH"

get a random sequence with the same NA count:

In [17]:
counts = {'a'=>seq.count('a'),'c'=>seq.count('c'),'g'=>seq.count('g'),'t'=>seq.count('t')}

{"a"=>6, "c"=>2, "g"=>2, "t"=>2}

Nucleic acid sequence are members of the ```Bio::Sequence::NA``` class, and
amino acid sequence are members of the ```Bio::Sequence::AA``` class.  Shared
methods are in the parent ```Bio::Sequence``` class.

As ```Bio::Sequence``` inherits Ruby's ```String``` class, you can use
```String``` class methods. For example, to get a subsequence, you can
not only use ```subseq(from, to)``` but also ```String#[]```.
Please take note that the Ruby's string's are base 0 - i.e. the first letter
has index 0, for example: