Skip to content

Convert and query IDs of circular RNAs from many different databases

License

Notifications You must be signed in to change notification settings

jakobilab/circhemy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

circhemy

The alchemy of circular RNA ID conversion

circhemy - The alchemy of circular RNA ID conversion

Python Package Index Downloads Python package version CI tests Docker build process

Introduction

Circular RNAs (circRNAs) originate through back-splicing events from linear primary transcripts, are resistant to exonucleases, typically not polyadenylated, and have been shown to be highly specific for cell type and developmental stage.

The prediction of circular RNAs is a multi-stage bioinformatics process starting with raw sequencing data and usually ending with a list of potential circRNA candidates which, depending on tissue and condition may contain hundreds to thousands of potential circRNAs. While there are a number of tools for the prediction process (e.g. circtools developed by our group) a unified naming convention for circRNA is not available.

Multiple databases gathered hundreds of thousands of circRNAs, however, most databases employ their own naming scheme, making it harder and harder to keep track of known circRNAs and their identifiers.

Circhemy

We developed circhemy, a modular, Python3-based framework for circRNA ID conversion that unifies several functionalities in a single Python package. Three different routes are implemented within package to access more than 2 million circRNA IDs:

  • User-friendly web application at circhemy.jakobilab.org
  • Streamlined CLI application for direct access to the prepackaged local SQLite3 database
  • A public REST API that enables direct access to the most recent ID database from HPC systems using curl or similar tools

Circhemy includes two different modes of action: convert and query. Convert allows the user to convert from one type of circRNA ID to a wide variety of other database identifiers, while query allows users to run direct queries on the circRNA database to extract circRNAs fulfilling a user-defined set of constraints.

Moreover, circhemy is the first circRNA resource that supports and integrates the first version of the CircRNA Standard Nomenclature (abbreviated CSNv1 in circhemy) as outlined in "A guide to naming eukaryotic circular RNAs", Chen et al. 2023.

Currently, circhemy contains computationally generated CSNv1 names for nearly 1 million circRNAs of Human, mouse, and rat.

Installation

The circhemy CLI package is written in Python3 (>=3.8) and consists of two core modules, namely convert and query. The command line version requires only one external dependency, sqlite3, for access to the internal SQLite3 database with circRNA ID data

Installation is managed through python3 -m pip install circhemy or python3 setup.py install when installed from the cloned GitHub repository. No sudo access is required if the installation is executed with --user which will install the package in a user-writeable folder. The binaries should be installed to /home/$user/.local/bin/ in case of Debian-based systems.

circhemy was developed and tested on Debian Buster, but should run with any other distribution.

The latest release version of circhemy can be installed via pip:

python3 -m pip install circhemy

Additionally, this repository offers the latest development version:

python3 -m pip install git+https://github.com/jakobilab/circhemy.git

Command Line Interface

Circhemy currently offers two modules:

Convert module

The convert module is able to convert from a range of input circRNA ID into different one or more database identifiers.

Example: Convert a list of CircAtlas2 IDs read via STDIN from file input.csv into Circpedia2 IDs, but also output CircAtlas2 IDs, while writing the output to /tmp/output.csv:

cat input.csv | circhemy convert -q STDIN -i CircAtlas2 -o Circpedia2 CircAtlas2 -O /tmp/output.csv

Query module

The query module is able to retrieve circRNA IDs from the internal database that fulfil a set of user-defined constraints.

Example: Retrieve a list of circbase and CircAtlas2 circRNA IDs that are located on chromosome 3 of the species rattus norvegicus; only print out circRNAs from the rn6 genome build.

circhemy query -o circbase CircAtlas2 -C chr3 -s rattus_norvegicus -g rn6

Representational State Transfer Interface (REST)

Representational State Transfer, or REST for short, allows users and software developers to easily access circhemy from within their own tools or pipelines. Circhemy's REST API uses JSON for input queries and returning output, making it easy to format queries from every programming language or even by hand.

The REST API it publicly available and uses a fixed set of keywords to perform conversions or queries. Two examples for the two different modes of action are shown below.

Convert module

The convert module is able to convert from a range of input circRNA ID into different one or more database identifiers.

Example: Convert a list of CircAtlas2 IDs into circBase and into Circpedia2 IDs, including the Genome build.

curl -X 'POST' 'https://circhemy.jakobilab.org/api/convert'
  -H 'accept: application/json'
  -H 'Content-Type: application/json'
  -d '{
      "input": "CircAtlas2",
      "output": ["Circpedia2","CircAtlas2","Genome"],
      "query": ["hsa-MYH9_0004","hsa-MYH9_0004"]
      }'

Output is returned as JSON-formatted string which can directly be used for AG Grid tables for any other postprocessing:

{
  "columnDefs": [
    {
      "headerName": "circBase",
      "field": "circBase"
    },
    {
      "headerName": "Circpedia2",
      "field": "Circpedia2"
    }
    {
      "headerName": "Genome",
      "field": "Genome"
    }
  ],
  "rowData": [
    {
      "circBase": "hsa_circ_0004470",
      "Circpedia2": "HSA_CIRCpedia_36582"
      "Genome": "hg38"
    },
    {
      "circBase": "hsa_circ_0004470",
      "Circpedia2": "HSA_CIRCpedia_36582"
      "Genome": "hg19"
    }
  ]
}

Query module

The query module is able to retrieve circRNA IDs from the internal database that fulfil a set of user-defined constraints.

Example: Retrieve all circRNAs with a CircAtlas2 ID containing nppa in the species homo sapiens, return the IDs in circBase and CircAtlas2 format:

curl -X 'POST'
  'https://circhemy.jakobilab.org/api/query'
  -H 'accept: application/json'
  -H 'Content-Type: application/json'
  -d '{
      "input": [
        {
          "query": "nppa",
          "field": "CircAtlas2",
          "operator1": "AND",
          "operator2": "LIKE"
        },
        {
          "query": "homo_sapiens",
          "field": "Species",
          "operator1": "AND",
          "operator2": "is"
        }
      ],
      "output": [
        "circBase",
        "CircAtlas2"
      ]
    }'

Output is returned as JSON-formatted string which can directly be used for AG Grid tables for any other postprocessing:

{
  "columnDefs": [
    {
      "headerName": "circBase",
      "field": "circBase"
    },
    {
      "headerName": "CircAtlas2",
      "field": "CircAtlas2"
    }
  ],
  "rowData": [
    {
      "circBase": "",
      "CircAtlas2": "hsa-NPPA_0001"
    },
    {
      "circBase": "",
      "CircAtlas2": "hsa-NPPA_0002"
    },
    {
      "circBase": "",
      "CircAtlas2": "hsa-NPPA-AS1_0001"
    },
    {
      "circBase": "hsa_circ_0009871",
      "CircAtlas2": "hsa-NPPA-AS1_0004"
    },
    {
      "circBase": "",
      "CircAtlas2": "hsa-NPPA-AS1_0002"
    },
    {
      "circBase": "",
      "CircAtlas2": "hsa-NPPA-AS1_0003"
    },
    {
      "circBase": "",
      "CircAtlas2": "hsa-NPPA_0001"
    },
    {
      "circBase": "",
      "CircAtlas2": "hsa-NPPA_0002"
    },
    {
      "circBase": "",
      "CircAtlas2": "hsa-NPPA-AS1_0001"
    },
    {
      "circBase": "hsa_circ_0009871",
      "CircAtlas2": "hsa-NPPA-AS1_0004"
    },
    {
      "circBase": "",
      "CircAtlas2": "hsa-NPPA-AS1_0002"
    },
    {
      "circBase": "",
      "CircAtlas2": "hsa-NPPA-AS1_0003"
    }
  ]
}

About

Circhemy is developed at the Jakobi Lab, part of the Translational Cardiovascular Research Center (TCRC), in the Department of Internal Medicine at The University of Arizona College of Medicine – Phoenix.

Contact: circhemy@jakobilab.org