GitHub - blazsop/airrmine: Adaptive immune receptor repertoire data analysis toolbox

AIRRMINE - Adaptive Immune Receptor Repertoire data processing system

Directory structure and definitions

Ready to use scripts are found in the following directories:

airrnat : AIRR dataset analytics and network assessment wrap-around and custom algorithms
immchaintracer : Immune Receptor clonal lineage tracker and assembler toolchain
immchaintracer_analysis : Immune Receptor clonal lineage analysis and visualization toolset
samples : Demonstration datasets for testing purposes.

The same order of analytic steps are recommended since the subsequent steps are largely dependent on the output of the previous ones.

1. System requirements (software has been tested on)

1.1. Operating system

These scripts are written, used and tested in UNIX/Linux environments (specifically on Ubuntu Linux 21.04.). Command line interface is sufficient to use every piece of code.

1.2. AIRR sequence alignments

External IMGT V-Quest (http://www.imgt.org/IMGT_vquest) service was and can be used for Adaptive Immune Receptor sequence alignment.

1.3. Command line toolset

bash 5.1
make 4.3 (GNU make)
gawk 5.1 (GNU awk)
R 4.0.4 ("Lost Library Book")
python 2.7.18
python3-pip (for easy installaion of Immcantation tools)
phylip 3.697 (PHYLogeny Inference Package)

1.4. Immcantation software packages

Immcantation framework components and R libraries (https://immcantation.readthedocs.io) are extensively used in the code.

changeo 1.1.0

1.5. R library dependencies

alakazam 1.1.0
shazam 1.0.2
igraph 1.2.6
treemap 2.4.2
dplyr 1.0.7
readr 1.4.0
tidyr 1.1.3
colorspace 2.0.1
openxlsx 4.2.4

2. Installation guide

Your computer needs to be online during the whole installation process. It takes typically 20-30 minutes on a modern multiprocessing computer.

2.1. Operating system and command line software packages

Please see your operating system's manual and install the command line toolset or make sure they are available. It may need administratory privileges on the system. For example on Ubuntu Linux use apt (e.g.: sudo apt install bash).

2.2. Immcantation framework

You can install "changeo" with the following command: pip install changeo

2.3. R libraries

Run R and at the R command prompt install the required libraries: install.packages(c("alakazam","colorspace","dplyr","igraph","openxlsx","readr","shazam","tidyr","treemap"))

3. Demonstration material and testing installation

Sample data are processed in an hour on a contemporary, multiprocessing desktop computer.

3.1. Create the directory enviroments

Copy "airrnat", "immchaintracer" and "immchaintracer_analysis" directories with their contents to an empty target directory (e.g. airr).
Unzip "samples.zip".
Copy the contents of "samples" to the same target directory (e.g. airr).

Important: DO NOT RENAME or MOVE files. Strict file naming and placement conventions are used.

3.2. Check/modify path to PHYLIP tools

If you do not use the standard "phylip" package from a Ubuntu Linux repository or if the data processing aborts you need to modify "dnapars_bin" variable in two files accordingly:

airrnat/scripts/generate_nd.R
immchaintracer_analysis/scripts/treegen.R

3.3. Run AIRRNAT pipeline

Go into "airrnat" directory and use make command to run the different steps of the AIRRNAT pipeline or if you just simply run make without command line arguments it will run the whole processing toolchain on the raw and pre-processed datasets.

Hint: If you have ample computing power and want to speed up the whole process, just use "-j" flag to utilize more CPU cores (e.g. make -j4 will start 4 concurrent processing threads).

If everything worked correctly, you will be able to find the following additional directories containing results:

coll : collapsed and filtered datasets
clone : clonally assigned AIRR datasets
- file suffix "_allclone.tsv" : clonal affiliations added
- file suffix "_withgl.tsv" : putative germline sequence added
- file suffix "_agsel.tsv" : BASELINe antigene selection data added
- file suffix "_rsmut.xlsx" : R/S mutation statistics
- file suffix "_cstats.tsv" : general compartment indices and statistics
dest : additional data on compartments
- file suffix "_genes.xlsx" : V(D)J gene usage
- file suffix "_tmap.tiff" : treemap of random 2500 sequences
- file suffix "_tmap_cl.tiff" : treemap of the most abundant 2500 clones
graph : AIRR lineage connectivity data and network diagrams
- file suffix "_AIRRND.svg" : raw AIRR network diagram
- file suffix "_AIRRND_mutations.svg" : mutation "heatmap"
- file suffix "_AIRRND_itypes.svg" : color-coded isotypes
- file suffix "_AIRRND_agsel.svg" : heatmap of BASELINe sigma

3.4. Run ImmChainTracer pipeline

You can run make in "immchaintracer" directory, as well.

These directories with processed data are created:

filtered : mini, pre-filtered "repertoire" subsets are made based on similarity criteria. One file corresponds to a single (cloned) Adaptive Immune Receptor sequence and contains all the other similar sequences that can be found with it in a bulk AIRR sequence pool.
clone : clonally assigned "mini" repertoires
lin : extracted clonal lineages

3.5. Run analysis on captured lineages

If you copy "lin" directory (with contents) from step 3.4. to "immchaintracer_analysis" and run make in it you can generate various plots and statistics of single lineage trees.

4. Re-run analytics on real data

Download the supplementary raw data published by our group the same way as demo data found in "samples" directory.
Simply replace demo data with real ones in "airrnat" directory.
Start make in "airrnat" directory.
When "airrnat" pipeline finishes you shall copy/move collapsed datasets into "prep" directory in "immchaintracer".
You also need to download and put the corresponding single clone files here, as well.
Run make in "immchaintracer".
Copy "lin" directory just like in 3.5. and run analytics.

Developed by

Peter Blazso : development, code maintenance
Krisztian Csomos : concepts and testing
Boglarka Ujhazi : data management and testing
Jolan E. Walter : principal investigator

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AIRRMINE - Adaptive Immune Receptor Repertoire data processing system

Directory structure and definitions

1. System requirements (software has been tested on)

1.1. Operating system

1.2. AIRR sequence alignments

1.3. Command line toolset

1.4. Immcantation software packages

1.5. R library dependencies

2. Installation guide

2.1. Operating system and command line software packages

2.2. Immcantation framework

2.3. R libraries

3. Demonstration material and testing installation

3.1. Create the directory enviroments

3.2. Check/modify path to PHYLIP tools

3.3. Run AIRRNAT pipeline

3.4. Run ImmChainTracer pipeline

3.5. Run analysis on captured lineages

4. Re-run analytics on real data

Developed by

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
airrnat		airrnat
docs		docs
immchaintracer		immchaintracer
immchaintracer_analysis		immchaintracer_analysis
LICENSE		LICENSE
README.md		README.md
samples.zip		samples.zip

License

blazsop/airrmine

Folders and files

Latest commit

History

Repository files navigation

AIRRMINE - Adaptive Immune Receptor Repertoire data processing system

Directory structure and definitions

1. System requirements (software has been tested on)

1.1. Operating system

1.2. AIRR sequence alignments

1.3. Command line toolset

1.4. Immcantation software packages

1.5. R library dependencies

2. Installation guide

2.1. Operating system and command line software packages

2.2. Immcantation framework

2.3. R libraries

3. Demonstration material and testing installation

3.1. Create the directory enviroments

3.2. Check/modify path to PHYLIP tools

3.3. Run AIRRNAT pipeline

3.4. Run ImmChainTracer pipeline

3.5. Run analysis on captured lineages

4. Re-run analytics on real data

Developed by

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages