The iReceptor Turnkey is a quick and easy mechanism for researchers to create their own AIRR Data Commons repository.
| Version | Branch | Status | Last update |
|---|---|---|---|
| 3.1 | production-v3 | Stable. Recommended. | May 10, 2021 | Release Notes |
| 4.0 | production-v4 | Stable. Used internally. Features stilll being added. | June 2, 2021 | Release Notes |
- a database
- scripts to add data to the database
- a web service exposing the database via the ADC API
These components are packaged as Docker images. The installation script will:
- install Docker
- download and run these Docker images
Read more about the iReceptor Turnkey on the iReceptor website. The remainder of this document only provides installation instructions.
- Linux Ubuntu. The turnkey was tested on Ubuntu 16.04, 18.04, and 20.04.
sudowithout password. It's usually the default on virtual machines.
Download the production-v3 code:
git clone --branch production-v3 https://github.com/sfu-ireceptor/turnkey-service-php.git
Launch the installation script. Note: multiple Docker images will be downloaded from DockerHub. Installation time estimate: 10-30 min.
cd turnkey-service-php
scripts/install_turnkey.sh
curl --data "{}" "http://localhost/airr/v1/repertoire"
This returns the list of repertoires in your database, by querying the web service at /airr/v1/repertoire, an ADC API entry point.
You can also visit http://localhost in your browser (replace "localhost" with your server URL if necessary). You'll see the home page for your repository, with information about the ADC API and iReceptor.
The general data loading procedure, for a study which has generated sequence data is to:
-
load the associated repertoire metadata using the iReceptor Metadata CSV format. Note: it's also possible to use the AIRR Repertoire Schema JSON format.
-
load the sequence annotations (rearrangements) from IMGT, MiXCR, etc.
Load the included test data to familiarize yourself with the data loading procedure. You will delete that test data afterwards.
Note: the test data is a single repertoire with 1000 rearrangements. It's a subset from the study The Different T-cell Receptor Repertoires in Breast Cancer Tumors, Draining Lymph Nodes, and Adjacent Tissues data.
- Load the repertoire metadata file test_data/PRJNA330606_Wang_1_sample_metadata.csv.
scripts/load_metadata.sh ireceptor test_data/PRJNA330606_Wang_1_sample_metadata.csv
Check it worked:
curl --data "{}" "http://localhost/airr/v1/repertoire"
The repertoire metadata is returned as JSON.
- Load the rearrangements file test_data/SRR4084215_aa_mixcr_annotation_1000_lines.txt:
scripts/load_rearrangements.sh mixcr test_data/SRR4084215_aa_mixcr_annotation_1000_lines.txt
Check it worked:
curl --data "{}" "http://localhost/airr/v1/rearrangement"
All of the rearrangement data for the 1000 sequences is returned as JSON.
Note: both scripts load_metadata.sh and load_rearrangement.sh produce a log file for each file processed in the log directory. Log files are named using the current date, followed by the name of the processed file.
That's all, congratulations
Note: use a clearly defined curation process for your data to ensure good provenance. Refer to the iReceptor Curation process and the iReceptor Curation GitHub repository for recommended data curation approaches.
To load your own data, follow the same procedure as with the test data.
Note: make sure your rearrangements files are declared in the repertoire metadata file, under the data_processing_files column.
- Load your repertoire metadata:
scripts/load_metadata.sh ireceptor <file path of your CSV or JSON metadata file>
- Load your rearrangements files. You can load multiple files at once:
scripts/load_rearrangements.sh mixcr <your study data folder>/*.txt
This will load all files ending by .txt from your study data folder.
Note: Compressed .gz files are supported and can be loaded directly. Example:
scripts/load_rearrangements.sh mixcr <your study data folder>/*.gz
Note: make sure that the full file name, including the .gz extension, was declared in the repertoire metadata file.
Just replace the mixcr parameter by imgt or airr. Example:
scripts/load_rearrangements.sh imgt <IMGT files>
nohup to run the script in the background, and to redirect the script output to a log file. So you can log out and come back later to check on the data loading progress by looking at that file. Example:
nohup scripts/load_rearrangements.sh mixcr my_study_folder/*.txt > progress.log &
When you've loaded your data, we recommend backing up the database to avoid having to load your data again in case a problem happens.