Skip to content

Latest commit



469 lines (335 loc) · 18.3 KB


File metadata and controls

469 lines (335 loc) · 18.3 KB

Rfam API

Most data in Rfam can be accessed programmatically using a RESTful API allowing for integration with other resources.


You can also access the data using a :ref:`database:Public MySQL Database` that contains the latest Rfam release.

The data can be accessed in several formats which can be specified in the URL:

Here is how to retrieve an XML description of an Rfam family using curl:



<?xml version="1.0" encoding="UTF-8"?>
<!-- information on Rfam family RF00360 (snoZ107_R87), generated: 12:57:01 31-Oct-2016 -->
<rfam xmlns:xsi=""
  <entry entry_type="Rfam" accession="RF00360" id="snoZ107_R87">
Small nucleolar RNA Z107/R87
Z107 and R87 are members of the C/D class of snoRNA which contain the C (UGAUGA) and D (CUGA) box motifs. Most of the members of the box C/D family function in directing site-specific 2'-O-methylation of substrate RNA
      <author>Moxon SJ</author>
      <seed_source>Moxon SJ</seed_source>
      <type>Gene; snRNA; snoRNA; CD-box;</type>
      <structure_source>Predicted; RNAfold; Moxon SJ, Daub J, Gardner PP</structure_source>
    <cm_details num_states="">
      <build_command>cmbuild -F CM SEED</build_command>
      <calibrate_command>cmcalibrate --mpi CM</calibrate_command>
      <search_command>cmsearch --cpu 4 --verbose --nohmmonly -T 19 -Z 549862.597050 CM SEQDB</search_command>

Rfam API can also be used from a script written in any programming language, for example Python or Perl.

Python example script

import json
import requests

r = requests.get('')
print r.json()['rfam']['acc']

Perl example script


use strict;
use warnings;

use LWP::UserAgent;

my $ua = LWP::UserAgent->new;

my $res = $ua->get('' );

if ( $res->is_success ) {
  print $res->content;
else {
  print STDERR $res->status_line, "\n";

Returns general information about an Rfam family, such as curation details, search parameters, etc.


Returns the ID for the family with the given Rfam accession or ID.


Example output:


Example output:



Returns the schematic secondary structure image for the family. The following types of secondary structure diagrams are supported:

  • cons (sequence conservation)
  • fcbp (basepair conservation)
  • cov (covariation)
  • ent (relative entropy)
  • maxcm (maximum CM parse)
  • norm (normal)
  • rscape (R-scape [1] analysis of Rfam SEED alignment)
  • rscape-cyk (secondary structure predicted by R-scape [1] based on Rfam SEED alignment)


Returns the covariance model for the specified family.


Returns the list of all sequence regions for the specified families in tab-delimited format.


Some families have too many regions to list. The server will return a status of 403 Forbidden in these cases.


Returns the raw data for the phylogenetic tree in NHX format based on seed alignment.


Returns a PNG image showing the phylogenetic tree for the specified family based on seed alignment. The image can be labelled either using species names or sequence accessions.


Returns the HTML image map that is used in conjunction with the tree image to highlight tree nodes in the Rfam website.



The HTML snippet contains an <img> tag that automatically loads the tree image.

Returns the mapping between an Rfam family, EMBL sequence regions and PDB residues. The plain text file has a tab-delimited format.


The following methods can be used to return family alignments in various formats.


You can request a compressed version of the alignment by adding gzip=1 to the URL.

Returns the Stockholm-format seed alignment for the specified family.


Returns the seed alignment for the specified family in one of the following formats:

  • stockholm (standard Stockholm format - default)
  • pfam (Stockholm with sequences on a single line conservation)
  • fasta (gapped FASTA format)
  • fastau (ungapped FASTA format)


In addition to a sequence search user interface, it is possible to run single-sequence Rfam searches programmatically.

Running a search is a two step process:

  1. submit the search sequence
  2. retrieve search results

The reason for separating the operation into two steps rather than performing a search in a single operation is that the time taken to perform a sequence search will vary according to the length of the sequence searched. Most web clients, browsers or scripts, will simply time-out if a response is not received within a short time period, usually less than a minute. By submitting a search, waiting and then retrieving results as a separate operation, we avoid the risk of a client reaching a time-out before the results are returned.

The following example uses simple command-line tools to submit the search and retrieve results, but the whole process is easily transferred to a single script or program.

It is usually most convenient to save your sequence into a plain text file, something like this:

$ cat test.seq

The sequence should contain only valid sequence characters. You can break the sequence across multiple lines to make it easier to handle.

When you send a request to the server, you can specify the format of the response. The server supports JSON (application/json) and XML (text/xml) output. In the examples below we'll use the JSON output format by adding an Accept header to the request, specifying the media type application/json. You could use the "content-type" parameter on the URL, rather than setting a header.

curl -X 'POST' '' -H 'accept: application/json' -F 'sequence_file=@test.seq'

Example output:

  "resultURL": "",
  "jobId": "infernal_cmscan-R20240522-154022-0777-68207895-p1m"

Having submitted the search, you now need to check the resultURL given in the response.

Although you can check for results immediately, if you poll before your job has completed you won't receive a full response. Instead, the HTTP response will have its status set appropriately and the body of the response will contain only string giving the status. You should ideally check the HTTP status of the response, rather than relying on the body of the response. See below for a table showing the response status codes that the server may return.

When writing a script to submit searches and retrieve results, please add a short delay between the submission and the first attempt to retrieve results. Most search jobs are returned within four to five seconds of submission, depending greatly on the length of the sequence to be searched.

The response that was returned from the first query includes a URL from which you can now retrieve results:

curl -X 'GET' '' -H 'accept: application/json'
"numHits": 1,
"jobId": "infernal_cmscan-R20240522-154022-0777-68207895-p1m",
"opened": "2024-05-22 15:40:27",
"started": "2024-05-22 15:40:27",
"closed": "2024-05-22 15:42:16",
"hits": {
  "5S_rRNA": [
      "id": "5S_rRNA",
      "acc": "RF00001",
      "start": 1,
      "end": 119,
      "strand": "+",
      "GC": 0.49,
      "score": 104.9,
      "E": 4.5e-24,
      "alignment": {
        "nc": "#NC                                                                                                                                       ",
        "ss": "#SS               (((((((((,,,,\u003C\u003C-\u003C\u003C\u003C\u003C\u003C---\u003C\u003C--\u003C\u003C\u003C\u003C\u003C\u003C______\u003E\u003E--\u003E\u003E\u003E\u003E--\u003E\u003E----\u003E\u003E\u003E\u003E\u003E--\u003E\u003E\u003C\u003C\u003C-\u003C\u003C----\u003C-\u003C\u003C-----\u003C\u003C____\u003E\u003E-----\u003E\u003E-\u003E--\u003E\u003E-\u003E\u003E\u003E))))))))): ",
        "hit_seq": "#CM 1 gccuGcggcCAUAccagcgcgaAagcACcgGauCCCAUCcGaACuCcgAAguUAAGcgcgcUugggCcagggUAGUAcuagGaUGgGuGAcCuCcUGggAAgaccagGugccgCaggcc 119",
        "match": "#MATCH               :: U:C:GCCAUACC ::G:GAA ::ACCG AUCCC+U+CGA CU CGAA::UAAGC:C:: +GGGC: :G  AGUACUA  +UGGGUGACC+  UGGGAA+AC:A:GUGC:G:A ::+",
        "pp": "#PP               *********************************************************************************************************************** "



Old search results are regularly cleared out but results will be visible for one week after completion of the original search.

Server responses include a standard HTTP status code giving information about the current state of your job. These are the possible status codes:

HTTP method HTTP status code Status description Response body Notes
POST 202 Accepted PEND / RUN The job has been accepted by the search system and is either pending (waiting to be started) or running. After a short delay, your script should check for results again.
POST 502 Bad gateway Error message There was a problem scheduling or running the job. The job has failed and will not produce results. There is no need to check the status again.
POST 503 Service unavailable Error message Occasionally the search server may become overloaded. If the error message suggests that the search queue is full, try submitting your search later.
GET 200 OK Search results The job completed successfully and the results are included in the response body.
GET 410 Gone DEL Your job was deleted from the search system. This status will not be assigned by the search system, but by an administrator. There was probably a problem with the job and you should contact the help desk for assistance with it.
GET 503 Service unavailable HOLD Your job was accepted but is on hold. This status will not be assigned by the search system, but by an administrator. There is probably a problem with the job and you should contact the help desk for assistance with it.
GET, POST 500 Internal server error Error message There was some problem accepting or running your job, but it does not fall into any of the other categories. The body of the response will contain an error message from the server. Contact the help desk for assistance with the problem.
[1](1, 2)