GitHub - TLlab/trig2

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
bin		bin
gene		gene
refdb		refdb
src		src
.gitignore		.gitignore
Makefile		Makefile
README		README
immune_gene.txt		immune_gene.txt

Repository files navigation

TRIg2

1. Installation
--------------------------------------------------------------------------------
TRIg2 is written in perl, cpp and requires USEARCH for merge paired-end and
MUMMER for initial alignments. Users must install USEARCH and MUMMER.
They can obtained as follows:
* USEARCH (v11.0.667) : https://www.drive5.com/usearch/
* MUMMER  (v3.23)     : http://mummer.sourceforge.net/
TRIg are ready for use after clone as follows.
--------------------------------------------------------------------------------

> git clone https://github.com/TLlab/trig2.git
> cd trig2
> make clean
> make

Then add the directory trig2/bin to your $PATH. Also make sure that the folder
containing the mummer and usearch commands are in the $PATH.


2. Running program
---------------------------------------------------------------------------------------------------------
usage  : trig.pl [options] R1.fq R2.fq
e.g.   : trig.pl -s hsa -g trb -o test R1.fq R2.fq

option : -species  <str>    species name                 [hsa*, mmu] (*default)
         -gene     <str>    immune receptor gene         [tra, trb*, trd, trg, igh, igl, igk]
                            can also be combined, e.g.,  [tra,trb]
         -outdir   <str>    output directory name        [output*]
         -minmatch <int>    minimal match of nucmer      [15*]
         -adjolq   <int>    adjust overlap Q             [0, 1*]
         -patchq   <int>    patch missing alignment Q    [0*, 1]
         -frac     <float>  alignment length fraction    [0.5]
         -thread   <int>    number of processors         [1*]

---------------------------------------------------------------------------------------------------------

Note: Because the genomic loci of TCRA and TCRD overlap, we use the same reference 
      sequence and VDJ annotations of the two genes when either gene is specified.


3. Format of output files
---------------------------------------------------------------------------------------------------------
log: documentation of parameters and run time

read.vdjdelta: alignments in delta format

read.vdjdelta:
(1) read ID
(2) sequence type [0, 1, 2]
    0: merge read 
    1: read1
    2: read2
(3) sequence length
(4) reg [-1, 0, 1, 2]
   -1: alignment length fraction less than cut off (default: 0.5)
    0: non-recombine sequence
    1: recombine sequence (but contain more than one V(D)J genes)
    2: recombine sequence (contain only one of each V(D)J gene)
(5) species and gene
(6) strand [+, -]
(7) VDJ annotation
    Three symbols (~|:) are used to indicate recombination.
    "~" : a read segment stretching from one region to another, e.g., "TRBJ2-2P~TRBJ2-3".
    "|" : a read segment aligned equally well to two regions, e.g., "TRBV6-2|TRBV6-3".
    ":" : concatenation of two regions, e.g., "TRBV27:TRBJ1-6"
(8) VDJ delta
    annotation of all aligned segments (from query start to end)
    [sp_gene]:[VDJ]:[ref_pos]:[qry_pos]:[mmgp]:[gap_pos]
(9) VDJ index in vdjdelta
    [V_index, D_index, J_index]
(10)align length

read.cdr3:
(1) read ID
(2) reg [-1, 0, 1, 2] (same as read.vdjdelta)
(3) VJ combination and CDR3 sequences
(4) sequence quality
(5) protein sequence
---------------------------------------------------------------------------------------------------------