Skip to content
Semantically Interoperable Genome Annotations
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Build Status DOI is a command-line tool written in Python to generate Semantically Interoperable Genome Annotations from text files in the Generic Feature Format (GFF) according to the Resource Description Framework (RDF) specification.

SIGA software architecture.

Fig. SIGA software architecture.

Key features

  • Input:
    • one or more files in the GFF format (version 2 or 3)
    • config.ini file with ontology mappings and feature type amendments (if applicable)
  • Output: genomic features stored in a SQLite database or serialized in one of the RDF formats:
  • check referential integrity for parent-child feature relationships in SQLite
  • controlled vocabularies and ontologies used:
    • DCMI terms (e.g. creator, hasVersion, license)
    • Sequence Ontology (SO) to describe feature types (e.g. genome, chromosome, gene, transcript) and their relationships (e.g. has part/part of, genome of, transcribed to, translated_to)
    • Feature Annotation Location Description Ontology (FALDO)


python (>=2.7)
docopt (0.6.2
RDFLib (4.2.2)
gffutils (
optional: RDF store to query ingested data using SPARQL (e.g. using Virtuoso or Berkeley DB)


git clone
cd siga
virtualenv .sigaenv
source .sigaenv/bin/activate
pip install -r requirements.txt

How to use

Command-line interface

Usage: -h|--help -v|--version db [-ruV] [-d DB_FILE | -e DB_FILEXT] GFF_FILE... rdf [-V] [-o FORMAT] [-c CFG_FILE] DB_FILE...

  GFF_FILE...      Input file(s) in GFF version 2 or 3.
  DB_FILE...       Input database file(s) in SQLite.

  -h, --help
  -v, --version
  -V, --verbose    Show verbose output in debug mode.
  -c FILE          Set the path of config file [default: config.ini]
  -d DB_FILE       Create a database from GFF file(s).
  -e DB_FILEXT     Set the database file extension [default: .db].
  -r               Check the referential integrity of the database(s).
  -u               Generate unique IDs for duplicated features.
  -o FORMAT        Output RDF graph in one of the following formats:
                     turtle (.ttl) [default: turtle]
                     nt (.nt),
                     n3 (.n3),
                     xml (.rdf)

Input files

Small test set in examples/features.gff3 including config.ini. Alternatively, download tomato or potato genome annotations.


Generate RDF graph

  1. GFF->DB

    python db -rV ../examples/features.gff3 # output *.db
  2. DB->RDF (default: turtle)

    python rdf -c config.ini ../examples/features.db # output *.ttl

Summary of I/O files:

  • config file: config.ini
  • GFF file: features.gff3
  • SQLite DB file: features.db
  • RDF Turtle file: features.ttl

Import RDF graph into Virtuoso RDF Quad Store

See the documentation on bulk data loading.

Edit virtuoso.ini config file by adding /mydir/ to DirsAllowed.

Connect to db server as dba user:

isql 1111 dba dba

Delete (existing) RDF graph if necessary:


Delete any previously registered data files:

DELETE FROM DB.DBA.load_list ;

Register data file(s):

ld_dir('/mydir/', 'features.ttl', '') ;

List registered data file(s):

SELECT * FROM DB.DBA.load_list ;

Bulk data loading:

rdf_loader_run() ;

Re-index triples for full-text search (via Faceted Browser):


Note: A single data file can be uploaded using the following command:

SPARQL LOAD "file:///mydir/features.ttl" INTO "" ;

Count imported RDF triples:

WHERE { ?s ?p ?o } ;

Alternatively, import RDF graph into Berkeley DB (requires Redland RDF processor)

rdfproc features parse features.ttl turtle
rdfproc features serialize turtle


The software is released under Apache License, Version 2.0.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.