Running the Example Pipeline

This page serves to familiarize new users with the basic flow of running the ImmuneDB pipeline. Example input FASTQ files are provided which contain human B-cell heavy chain sequences.

Commands are listed as either being run in either the Docker container or on the host.

To begin, run the Docker container :ref:`as documented <running-the-container>`:

 $ docker run -v $HOME/immunedb_share:/share \
      -p 8080:8080 -it arosenfeld/immunedb:v0.28.2

Metadata Specification

Before ImmuneDB can be run, metadata must be specified for each input file. For this example, one has already been created for you. To learn how to create a metadata file for your own data, see :ref:`Creating a Metadata Sheet`.

ImmuneDB Instance Creation

Next, we create a database for the data with:

 $ immunedb_admin create example_db /share/configs

This creates a new database named example_db and stores its configuration in /share/configs/example_db.json.

Identifying or Importing Sequences

The first step of the pipeline is to annotate sequences and store the resulting data in the newly created database. To do so, the immunedb_identify is used. It requires that V and J germline sequences be specified in two separate FASTA files. The Docker image provides Human & Mouse IGH, TRA, and TRB germlines in $HOME/germlines.

For this example, there are two provided input files in /example along with the requisite metadata.tsv file which you can view with:

$ ls /example

Given this, run the immunedb_identify command:

 $ immunedb_identify /share/configs/example_db.json \
      /root/germlines/imgt_human_ighv.fasta \
      /root/germlines/imgt_human_ighj.fasta \
      /example

Sequence Collapsing

ImmuneDB determines the uniqueness of a sequence both at the sample and subject level. For the latter, immunedb_collapse is used to find sequences that are the same except at positions that have an N. Thus, the sequences ATNN and ANCN would be collapsed.

To collapse sequences, run:

 $ immunedb_collapse /share/configs/example_db.json

Clonal Assignment

After sequences are assigned V and J genes, they can be clustered into clones based on CDR3 Amino Acid similarity with the immunedb_clones command. This takes a number of arguments which should be read before use.

There are three ways to create clones: based on CDR3 AA similarity, T-cell exact CDR3 NT identity, and a lineage based method. For this example we'll use the similarity based method with default parameters:

 $ immunedb_clones /share/configs/example_db.json similarity

This will create clones where all sequences in a clone will have the same V-gene, J-gene, and (by default) 85% CDR3 AA identity.

Statistics Generation

Two sets of statistics can be calculated in ImmuneDB:

Clone Statistics: For each clone and sample combination, how many unique and total sequences appear as well as the mutations from the germline.
Sample Statistics: Distribution of sequence and clone features on a per-sample basis, including V and J usage, nucleotides matching the germline, copy number, V length, and CDR3 length. It calculates all of these with and without outliers, and including and excluding partial reads.

These are calculated with the immunedb_clone_stats and immunedb_sample_stats commands and must be run in that order.

 $ immunedb_clone_stats /share/configs/example_db.json
 $ immunedb_sample_stats /share/configs/example_db.json

Selection Pressure (Optional)

Warning

Selection pressure calculations are time-consuming, so you can skip this step if time is limited.

Selection pressure of clones can be calculated with Baseline. To do so run:

 $ immunedb_clone_pressure /share/configs/example_db.json \
      /apps/baseline/Baseline_Main.r

Note, this process is relatively slow and may take some time to complete.

Clone Trees (Optional)

Lineage trees for clones is generated with the immunedb_clone_trees command. The only currently supported method is neighbor-joining as provided by Clearcut.

Among others, the --min-mut-copies parameter allows for mutations to be omitted if they have not occurred at least a specified number of times. This can be useful to correct for sequencing error.

 $ immunedb_clone_trees /share/configs/example_db.json --min-mut-copies 2

Web Interface

ImmuneDB has a web interface to interact with a database instance. The Docker container automatically makes this available at http://localhost:8080/frontend/example_db

When you create more databases, simply replace example_db with the proper databse name.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pipeline_example.rst

pipeline_example.rst

Running the Example Pipeline

Metadata Specification

ImmuneDB Instance Creation

Identifying or Importing Sequences

Sequence Collapsing

Clonal Assignment

Statistics Generation

Selection Pressure (Optional)

Clone Trees (Optional)

Web Interface

Files

pipeline_example.rst

Latest commit

History

pipeline_example.rst

File metadata and controls

Running the Example Pipeline

Metadata Specification

ImmuneDB Instance Creation

Identifying or Importing Sequences

Sequence Collapsing

Clonal Assignment

Statistics Generation

Selection Pressure (Optional)

Clone Trees (Optional)

Web Interface