Skip to content

Adaptive immune receptor repertoire data analysis toolbox

License

Notifications You must be signed in to change notification settings

blazsop/airrmine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AIRRMINE - Adaptive Immune Receptor Repertoire data processing system

Directory structure and definitions

Ready to use scripts are found in the following directories:

  • airrnat : AIRR dataset analytics and network assessment wrap-around and custom algorithms

  • immchaintracer : Immune Receptor clonal lineage tracker and assembler toolchain

  • immchaintracer_analysis : Immune Receptor clonal lineage analysis and visualization toolset

  • samples : Demonstration datasets for testing purposes.

The same order of analytic steps are recommended since the subsequent steps are largely dependent on the output of the previous ones.

1. System requirements (software has been tested on)

1.1. Operating system

These scripts are written, used and tested in UNIX/Linux environments (specifically on Ubuntu Linux 21.04.). Command line interface is sufficient to use every piece of code.

1.2. AIRR sequence alignments

External IMGT V-Quest (http://www.imgt.org/IMGT_vquest) service was and can be used for Adaptive Immune Receptor sequence alignment.

1.3. Command line toolset

  • bash 5.1
  • make 4.3 (GNU make)
  • gawk 5.1 (GNU awk)
  • R 4.0.4 ("Lost Library Book")
  • python 2.7.18
  • python3-pip (for easy installaion of Immcantation tools)
  • phylip 3.697 (PHYLogeny Inference Package)

1.4. Immcantation software packages

Immcantation framework components and R libraries (https://immcantation.readthedocs.io) are extensively used in the code.
  • changeo 1.1.0

1.5. R library dependencies

  • alakazam 1.1.0
  • shazam 1.0.2
  • igraph 1.2.6
  • treemap 2.4.2
  • dplyr 1.0.7
  • readr 1.4.0
  • tidyr 1.1.3
  • colorspace 2.0.1
  • openxlsx 4.2.4

2. Installation guide

Your computer needs to be online during the whole installation process. It takes typically 20-30 minutes on a modern multiprocessing computer.

2.1. Operating system and command line software packages

Please see your operating system's manual and install the command line toolset or make sure they are available. It may need administratory privileges on the system. For example on Ubuntu Linux use apt (e.g.: sudo apt install bash).

2.2. Immcantation framework

You can install "changeo" with the following command: pip install changeo

2.3. R libraries

Run R and at the R command prompt install the required libraries: install.packages(c("alakazam","colorspace","dplyr","igraph","openxlsx","readr","shazam","tidyr","treemap"))

3. Demonstration material and testing installation

Sample data are processed in an hour on a contemporary, multiprocessing desktop computer.

3.1. Create the directory enviroments

  • Copy "airrnat", "immchaintracer" and "immchaintracer_analysis" directories with their contents to an empty target directory (e.g. airr).

  • Unzip "samples.zip".

  • Copy the contents of "samples" to the same target directory (e.g. airr).

Important: DO NOT RENAME or MOVE files. Strict file naming and placement conventions are used.

3.2. Check/modify path to PHYLIP tools

If you do not use the standard "phylip" package from a Ubuntu Linux repository or if the data processing aborts you need to modify "dnapars_bin" variable in two files accordingly:
  • airrnat/scripts/generate_nd.R
  • immchaintracer_analysis/scripts/treegen.R

3.3. Run AIRRNAT pipeline

Go into "airrnat" directory and use make command to run the different steps of the AIRRNAT pipeline or if you just simply run make without command line arguments it will run the whole processing toolchain on the raw and pre-processed datasets.

Hint: If you have ample computing power and want to speed up the whole process, just use "-j" flag to utilize more CPU cores (e.g. make -j4 will start 4 concurrent processing threads).

If everything worked correctly, you will be able to find the following additional directories containing results:

  • coll : collapsed and filtered datasets

  • clone : clonally assigned AIRR datasets

    • file suffix "_allclone.tsv" : clonal affiliations added
    • file suffix "_withgl.tsv" : putative germline sequence added
    • file suffix "_agsel.tsv" : BASELINe antigene selection data added
    • file suffix "_rsmut.xlsx" : R/S mutation statistics
    • file suffix "_cstats.tsv" : general compartment indices and statistics
  • dest : additional data on compartments

    • file suffix "_genes.xlsx" : V(D)J gene usage
    • file suffix "_tmap.tiff" : treemap of random 2500 sequences
    • file suffix "_tmap_cl.tiff" : treemap of the most abundant 2500 clones
  • graph : AIRR lineage connectivity data and network diagrams

    • file suffix "_AIRRND.svg" : raw AIRR network diagram
    • file suffix "_AIRRND_mutations.svg" : mutation "heatmap"
    • file suffix "_AIRRND_itypes.svg" : color-coded isotypes
    • file suffix "_AIRRND_agsel.svg" : heatmap of BASELINe sigma

3.4. Run ImmChainTracer pipeline

You can run make in "immchaintracer" directory, as well.

These directories with processed data are created:

  • filtered : mini, pre-filtered "repertoire" subsets are made based on similarity criteria. One file corresponds to a single (cloned) Adaptive Immune Receptor sequence and contains all the other similar sequences that can be found with it in a bulk AIRR sequence pool.

  • clone : clonally assigned "mini" repertoires

  • lin : extracted clonal lineages

3.5. Run analysis on captured lineages

If you copy "lin" directory (with contents) from step 3.4. to "immchaintracer_analysis" and run make in it you can generate various plots and statistics of single lineage trees.

4. Re-run analytics on real data

  • Download the supplementary raw data published by our group the same way as demo data found in "samples" directory.
  • Simply replace demo data with real ones in "airrnat" directory.
  • Start make in "airrnat" directory.
  • When "airrnat" pipeline finishes you shall copy/move collapsed datasets into "prep" directory in "immchaintracer".
  • You also need to download and put the corresponding single clone files here, as well.
  • Run make in "immchaintracer".
  • Copy "lin" directory just like in 3.5. and run analytics.

Developed by

  • Peter Blazso : development, code maintenance
  • Krisztian Csomos : concepts and testing
  • Boglarka Ujhazi : data management and testing
  • Jolan E. Walter : principal investigator

About

Adaptive immune receptor repertoire data analysis toolbox

Resources

License

Stars

Watchers

Forks

Packages

No packages published