Ready to use scripts are found in the following directories:
-
airrnat : AIRR dataset analytics and network assessment wrap-around and custom algorithms
-
immchaintracer : Immune Receptor clonal lineage tracker and assembler toolchain
-
immchaintracer_analysis : Immune Receptor clonal lineage analysis and visualization toolset
-
samples : Demonstration datasets for testing purposes.
The same order of analytic steps are recommended since the subsequent steps are largely dependent on the output of the previous ones.
These scripts are written, used and tested in UNIX/Linux environments (specifically on Ubuntu Linux 21.04.). Command line interface is sufficient to use every piece of code. External IMGT V-Quest (http://www.imgt.org/IMGT_vquest) service was and can be used for Adaptive Immune Receptor sequence alignment.- bash 5.1
- make 4.3 (GNU make)
- gawk 5.1 (GNU awk)
- R 4.0.4 ("Lost Library Book")
- python 2.7.18
- python3-pip (for easy installaion of Immcantation tools)
- phylip 3.697 (PHYLogeny Inference Package)
- changeo 1.1.0
- alakazam 1.1.0
- shazam 1.0.2
- igraph 1.2.6
- treemap 2.4.2
- dplyr 1.0.7
- readr 1.4.0
- tidyr 1.1.3
- colorspace 2.0.1
- openxlsx 4.2.4
Your computer needs to be online during the whole installation process. It takes typically 20-30 minutes on a modern multiprocessing computer.
Please see your operating system's manual and install the command line toolset or make sure they are available. It may need administratory privileges on the system. For example on Ubuntu Linux use apt (e.g.:sudo apt install bash
).
You can install "changeo" with the following command:
pip install changeo
Run R
and at the R command prompt install the required libraries:
install.packages(c("alakazam","colorspace","dplyr","igraph","openxlsx","readr","shazam","tidyr","treemap"))
Sample data are processed in an hour on a contemporary, multiprocessing desktop
computer.
-
Copy "airrnat", "immchaintracer" and "immchaintracer_analysis" directories with their contents to an empty target directory (e.g. airr).
-
Unzip "samples.zip".
-
Copy the contents of "samples" to the same target directory (e.g. airr).
Important: DO NOT RENAME or MOVE files. Strict file naming and placement conventions are used.
If you do not use the standard "phylip" package from a Ubuntu Linux repository or if the data processing aborts you need to modify "dnapars_bin" variable in two files accordingly:- airrnat/scripts/generate_nd.R
- immchaintracer_analysis/scripts/treegen.R
make
command to run the different steps of
the AIRRNAT pipeline or if you just simply run make
without command line arguments
it will run the whole processing toolchain on the raw and pre-processed datasets.
Hint: If you have ample computing power and want to speed up the whole process, just
use "-j" flag to utilize more CPU cores (e.g. make -j4
will start 4 concurrent
processing threads).
If everything worked correctly, you will be able to find the following additional directories containing results:
-
coll : collapsed and filtered datasets
-
clone : clonally assigned AIRR datasets
- file suffix "_allclone.tsv" : clonal affiliations added
- file suffix "_withgl.tsv" : putative germline sequence added
- file suffix "_agsel.tsv" : BASELINe antigene selection data added
- file suffix "_rsmut.xlsx" : R/S mutation statistics
- file suffix "_cstats.tsv" : general compartment indices and statistics
-
dest : additional data on compartments
- file suffix "_genes.xlsx" : V(D)J gene usage
- file suffix "_tmap.tiff" : treemap of random 2500 sequences
- file suffix "_tmap_cl.tiff" : treemap of the most abundant 2500 clones
-
graph : AIRR lineage connectivity data and network diagrams
- file suffix "_AIRRND.svg" : raw AIRR network diagram
- file suffix "_AIRRND_mutations.svg" : mutation "heatmap"
- file suffix "_AIRRND_itypes.svg" : color-coded isotypes
- file suffix "_AIRRND_agsel.svg" : heatmap of BASELINe sigma
make
in "immchaintracer" directory, as well.
These directories with processed data are created:
-
filtered : mini, pre-filtered "repertoire" subsets are made based on similarity criteria. One file corresponds to a single (cloned) Adaptive Immune Receptor sequence and contains all the other similar sequences that can be found with it in a bulk AIRR sequence pool.
-
clone : clonally assigned "mini" repertoires
-
lin : extracted clonal lineages
make
in it you can generate
various plots and statistics of single lineage trees.
- Download the supplementary raw data published by our group the same way as demo data found in "samples" directory.
- Simply replace demo data with real ones in "airrnat" directory.
- Start
make
in "airrnat" directory. - When "airrnat" pipeline finishes you shall copy/move collapsed datasets into "prep" directory in "immchaintracer".
- You also need to download and put the corresponding single clone files here, as well.
- Run
make
in "immchaintracer". - Copy "lin" directory just like in 3.5. and run analytics.
- Peter Blazso : development, code maintenance
- Krisztian Csomos : concepts and testing
- Boglarka Ujhazi : data management and testing
- Jolan E. Walter : principal investigator