Python HTML CSS Shell
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
analysis_data Initial code commit Nov 27, 2016
bktree @ 825b7cd Added BK-Tree submodule to repo Dec 21, 2016
setup_map_sra_to_ontology Added missing file cellline_to_disease_implied_terms.json May 18, 2018
website Update website. New pages. Polished May 5, 2017
.gitmodules Added BK-Tree submodule to repo Dec 21, 2016 Update Dec 13, 2017 accepts list of key-val pairs Jun 2, 2017

MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive

This repository contains the code implementing the pipeline used to construct the MetaSRA database described in our publication:

This pipeline re-annotates key-value descriptions of biological samples using biomedical ontologies.

The MetaSRA can be searched and downloaded from:


This project requires the following Python libraries:


In order to run the pipeline, a few external resources must be downloaded and configured. First, set up the PYTHONPATH environment variable to point to the directory containing the map_sra_to_ontology directory as well as to the bktree directory. Then, to set up the pipeline, run the following commands:

cd ./setup_map_sra_to_ontology

This script will download the latest ontology OBO files, the SPECIALIST Lexicon files, and configure the ontologies to work with the pipeline.


The pipeline can be run on a set of sample-specific key-value pairs using the script. This script is used as follows:

python <input key-value pairs JSON file>

The script accepts as input a JSON file storing a list of sets of key-value pairs. For example, the pipeline will accept a file with the following content:

    "ID": "P352_141",
    "age": "48",
    "bmi": "24",
    "gender": "female",
    "source_name": "vastus lateralis muscle_female",
    "tissue": "vastus lateralis muscle"
    "ID": "P352_141",
    "age": "29",
    "bmi": "30",
    "gender": "male",
    "source_name": "vastus lateralis muscle_female",
    "tissue": "vastus lateralis muscle"