Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Metagenomics data (sequences and annotations) management system, powered by a MongoDB document-oriented database wrapped into a Python API and command-line tools.
Python Shell
Fetching latest commit...
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



MetagenomeDB is a Python-powered toolkit designed to easily store, retrieve and annotate genomic and metagenomic sequences. It is especially useful when dealing with large amount of sequences (e.g., reads and contigs resulting from several runs of sequencing and assembly), and offers a clean, programmatic interface to manage and query this information.

The web site for this project is

MetagenomeDB acts as an abstraction layer on top of a MongoDB database. It provides an API to create and modify and connect two types of objects, namely sequences and collections. Sequences (Sequence class) can be reads, contigs, PCR clones, etc. Collections (Collection class) are sets of sequences; e.g., reads resulting from the sequencing of a sample, contigs assembled from a set of reads, PCR library.

Any sequence or collection can be annotated using a dictionary-like syntax:

# first, we import the library
import MetagenomeDB as mdb

# then we create a new Sequence object with two
# (mandatory) properties, 'name' and 'sequence'
s = mdb.Sequence({"name": "My sequence", "sequence": "atgc"})

# the object can now be annotated
print s["length"]
s["type"] = "read"

# once modified, the object need to be committed
# to the database for the modifications to remain

Objects of type Sequence or Collection can be connected to each other in order to represent various metagenomic datasets. Examples include, but are not limited to:

  • collection of reads resulting from a sequencing run (relationship between multiple Sequence objects and one Collection)
  • set of contigs resulting from the assembly of a set of reads (relationship between two Collection objects)
  • reads that are part of a contig (relationship between multiple Sequence objects and one Sequence)
  • sequence that is similar to another sequence (relationship between two Sequence objects)
  • collection that is part of a bigger collection (relationship between two Collection objects)

The result is a network of sequences and collection, which can be explored using dedicated methods; e.g., Collection.list_sequences(), Sequence.list_collections(), Sequence.list_related_sequences(). Each one of those methods allow for sophisticated filters using the MongoDB querying syntax:

# list all collections of type 'collection_of_reads'
# the sequence 's' belong to
collections = s.list_collections({"type": "collection_of_reads"})

# list all sequences that also belong to these collections
# with a length of at least 50 bp
for c in collections:
        print c.list_sequences({"length": {"$gt": 50}})

MetagenomeDB also provides a set of command-line tools to import nucleotide sequences, protein sequences, BLAST and FASTA alignment algorithms output, and ACE assembly files. Other tools are provided to add or remove multiple objects, or to annotate them.


Metagenomic, Bioinformatics, Python, MongoDB


Aurelien Mazurie,

Getting started

MetagenomeDB relies on another Python library to function, Pymongo (version 1.9 or above). The latest version of Pymongo must be installed, for example by typing sudo easy_install Pymongo on the command line.

That's it. The only other requirement is, of course, a working MongoDB server, either on your computer or on a computer that can be accessed through TCP/IP.

MetagenomeDB can be installed using two methods:

Using GitHub

All versions of MetagenomeDB, including the latest developer releases, can be downloaded at

Once the archive in your computer, installing it can be done by typing sudo easy_install [path to your archive] in a console (see the easy_install documentation:

If you want more control (such as requesting the library and the tools to be installed in specific directories), you should first unzip the archive, then type sudo python plus any needed option from the archive's content directory (see the documentation: For example, to ensure the various mdb-* tools are installed in /usr/local/bin/ you can type sudo python install --install-scripts=/usr/local/bin/.

GitHub is the preferred source if you are interested in the most recent, albeit potentially unstable, releases of MetagenomeDB.

Using PyPI

All production-ready versions of MetagenomeDB are registered against the PyPI package manager. Thanks to this, you can install the toolkit by typing sudo easy_install MetagenomeDB on the command line.

Final step

By default MetagenomeDB will read a file named .MetagenomeDB in your home directory to know how to access the MongoDB database. A template file named doc/installation/MetagenomeDB_configuration.txt is provided. Change its name to .MetagenomeDB, move it in your home directory, then update it with your own parameters.

Optionally, you can provide those information when importing MetagenomeDB in your script:

import MetagenomeDB as mdb

mdb.connect(host = "localhost", port = 1234, database = "MyDatabase")

From then you can store and retrieve objects:

c = mdb.Collection.find_one({"name": "my_collection"})

for sequence in c.list_sequences():
        print sequence["name"], sequence["sequence"]
Something went wrong with that request. Please try again.