Skip to content
jandot edited this page Apr 4, 2011 · 27 revisions

Feedback

First things first: we’ve set up a feedback page at UserVoice

Introduction and history

The ruby ensembl API was first devised by Jan Aerts in June 2007 on release 45 of the core database, and was extended by Francesco Strozzi for the variation database. The purpose of the API was basically to mimick all functionality that’s available in the perl API for that database. With the help of the Ensembl core team, the API for the core database was worked out while being a “Geek for a Week”.

An example

require 'ensembl'

include Ensembl::Core

DBConnection.connect('homo_sapiens',50)

puts "== Get a slice =="
slice = Slice.fetch_by_region('chromosome','4',10000,99999,-1)
puts slice.display_name

puts "== Print all genes for that slice (regardless of coord_system genes are annotated on) =="
slice.genes.each do |gene|
  puts gene.stable_id + "\t" + gene.status + "\t" + gene.slice.display_name
end

puts "== Get a transcript and print its 5'UTR, CDS and protein sequence =="
transcript = Transcript.find_by_stable_id('ENST00000380593')
puts "5'UTR: " + transcript.five_prime_utr_seq
puts "CDS: " + transcript.cds_seq
puts "peptide: " + transcript.protein_seq

DBConnection.connection.disconnect!
DBConnection.connect('bos_taurus',45)

puts "== Transforming a cow gene from chromosome level to scaffold level =="
gene = Gene.find(2408)
gene_on_scaffold = gene.transform('scaffold')
puts "Original: " + gene.slice.display_name
puts "Now: " + gene_on_scaffold.slice.display_name

puts "== What things are related to a 'gene' object? =="
puts 'Genes belong to: ' + Gene.reflect_on_all_associations(:belongs_to).collect{|a| a.name.to_s}.join(',')
puts 'Genes have many: ' + Gene.reflect_on_all_associations(:has_many).collect{|a| a.name.to_s}.join(',')

Output is:

== Get a slice ==
chromosome:NCBI36:4:10000:99999:-1
== Print all genes for that slice (regardless of coord_system genes are annotated on) ==
ENSG00000197701 KNOWN chromosome:NCBI36:4:43227:77340:1
ENSG00000207643 NOVEL chromosome:NCBI36:4:55032:55124:1
== Get a transcript and print its 5’UTR, CDS and protein sequence ==
5’UTR: ggaggaggtgaggagggtttgctgggtgg…agcactaggtcttcccgtcacctccacctctctcc
CDS: atgacccggctctgcttacccagacccgaagcacgtg…caaccccatcccactgcctgtgtctgttga
peptide: MTRLCLPRPEAREDPIPVPP…HDSPRRHSGFGSIEGQPHPTACVC*
== Transforming a cow gene from chromosome level to scaffold level ==
Original: chromosome:Btau_3.1:4:8104409:8496477:-1
Now: scaffold:Btau_3.1:Chr4.003.10:1590801:1982869:1
== What things are related to a ‘gene’ object? ==
Genes belong to: seq_region
Genes have many: object_xrefs,attrib_types,xrefs,transcripts,gene_attribs

At the moment only the core and variation databases are covered, leaving others like compara, funcgen and other_features. Hopefully these will be added in the future. Marc Hoeppner from Stockholm University is working on the compara part.

Comparion to the Perl API

This Ruby API to the Ensembl databases is very much inspired by the Perl API provided by the Ensembl team. Given that they are two different languages, there are of course some differences:

  • There is only one API for the different Ensembl releases. In the Perl API, the user needs to load the version of the API that matches the database release he wants to work with.
  • The Slice class is defined slightly differently. In the Perl API, the “slice” of an object is the whole seq_region (read: chromosome) that that object is defined on. For example: the “slice” of the gene BRCA2 is chromosome 13. In contrast, the “slice” in the Ruby API is delineated by the start and stop positions of that object; the “slice” for the same gene using the Ruby API is chromosome:GRCh37:13:32889611:32973347:1. This makes additional functionality available for the Ruby objects. You can for example check if one object overlaps with of is contained within another object. For example: gene1.overlaps?(gene2).
  • Ruby’s introspection makes it possible to investigate the structure of the database from within the code, and for example check what types of object are related to e.g. a gene: Gene.reflect_on_all_associations(:belongs_to) reports that a gene “belongs to” a seq_region and an analysis.
  • The simplicity of the Ruby programming language makes the interactive shell a viable option for many queries, bypassing the need to write one-off scripts. The command ensembl homo_sapiens 60, for example, starts an irb session with a connection to release 60 of the Homo sapiens core and variation databases.
  • Very important for us, the developers: the Ruby API is very easy to maintain. The ActiveRecord library on which the API is built takes care of almost all the functionality that we want to have in the API: querying, linking tables, … As a result, we only need less than 1500 lines of significant code (= after removing empty lines, comments, and lines that only contain the word “end”) to provide this full functionality.
  • In contrast to the Perl API, there is no need to create Adaptor objects. In the Perl API these are necessary to connect to a specific table (e.g. GeneAdaptor). In Ruby this is handled by class methods.
  • At the moment, the Ruby API only provides an interface to the core and variation databases. As this project is not driven by the Ensembl team itself, we have to make priorities… This also means that if someone is interested in adding other databases (e.g. compara, funcgen), please let us know!

Installation

The API is made available as a gem. Prerequisites include the mysql gem. To install, type

gem install ruby-ensembl-api

Documentation

An extensive tutorial is available here. This tutorial is a ruby version of the perl tutorial available at the Ensembl website (with permission).

Full documentation on classes and methods can be found here.

To do

  • Add additional databases: compara, …