# IgBLAST for TCR

Given the FASTA file `A2-i131.fasta`, use IgBLAST to assign germline V, D, and J segments, and post-process using Change-O.

## Obtain reference sequences

Go to [http://www.imgt.org/vquest/refseqh.html](http://www.imgt.org/vquest/refseqh.html) and download TRBV, TRBD, and TRBJ sequences for humans in FASTA format. Save them as `IMGT_Human_TRBV.fasta` etc..

(For advanced users, one could download the whole database from [here](http://www.imgt.org/download/GENE-DB/) and postprocess.)

## Converting IMGT FASTA files to IgBLAST databases

In [1]:
%%bash
# V-segment database
perl ./edit_imgt_file.pl IMGT_Human_TRBV.fasta > database/human_trb_v
makeblastdb -parse_seqids -dbtype nucl -in database/human_trb_v
# D-segment database
perl ./edit_imgt_file.pl IMGT_Human_TRBD.fasta > database/human_trb_d
makeblastdb -parse_seqids -dbtype nucl -in database/human_trb_d
# J-segment database
perl ./edit_imgt_file.pl IMGT_Human_TRBJ.fasta > database/human_trb_j
makeblastdb -parse_seqids -dbtype nucl -in database/human_trb_j



Building a new DB, current time: 10/27/2016 07:30:54
New DB name:   /home/simon/Projects/aairr16/solutions/database/human_trb_v
New DB title:  database/human_trb_v
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 142 sequences in 0.027348 seconds.


Building a new DB, current time: 10/27/2016 07:30:54
New DB name:   /home/simon/Projects/aairr16/solutions/database/human_trb_d
New DB title:  database/human_trb_d
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 3 sequences in 0.00134206 seconds.


Building a new DB, current time: 10/27/2016 07:30:54
New DB name:   /home/simon/Projects/aairr16/solutions/database/human_trb_j
New DB title:  database/human_trb_j
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 16 sequences in 0.00300503 seconds.


## Use IgBLAST

IgBLAST has many options (see below) but the most important ones are as follows:

- germline_db_V: the V gene database
- germline_db_D: the D gene database
- germline_db_J: the J gene database
- auxiliary_data: contains annotations for the sequences
- domain_system: the system used (e.g. imgt) for defining the domains
- ig_seqtype: Ig or TCR
- organism: e.g. human, mouse
- outfmt: the output format; for postprocessing with ChangeO, has to be '7 std qseq sseq btop'
- query: the input data in FASTA format
- out: the output filename
- num_threads: the number of threads to use

In [2]:
!igblastn -help

USAGE
  igblastn [-h] [-help] [-import_search_strategy filename]
    [-export_search_strategy filename] [-germline_db_V germline_database_name]
    [-num_alignments_V int_value] [-germline_db_V_seqidlist filename]
    [-germline_db_D germline_database_name] [-num_alignments_D int_value]
    [-germline_db_D_seqidlist filename]
    [-germline_db_J germline_database_name] [-num_alignments_J int_value]
    [-germline_db_J_seqidlist filename] [-auxiliary_data filename]
    [-min_D_match min_D_match] [-D_penalty D_penalty]
    [-organism germline_origin] [-domain_system domain_system]
    [-ig_seqtype sequence_type] [-focus_on_V_segment] [-show_translation]
    [-db database_name] [-dbsize num_letters] [-gilist filename]
    [-seqidlist filename] [-negative_gilist filename]
    [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm]
    [-db_hard_mask filtering_algorithm] [-subject subject_input_file]
    [-subject_loc range] [-query input_file] [-out output_file]
  

Complete the following cell to run `A2-i131,fasta` against the TRB databases generated previously. Ensure that the outfmt term is '7 std qseq sseq btop', and save the output as `A2-i131.fmt7`.

In [2]:
%%bash
igblastn \
    -germline_db_V database/human_trb_v \
    -germline_db_D database/human_trb_d \
    -germline_db_J database/human_trb_j \
    -auxiliary_data optional_file/human_gl.aux \
    -domain_system imgt -ig_seqtype TCR -organism human \
    -outfmt '7 std qseq sseq btop' \
    -query A2-i131.fasta \
    -out A2-i131.fmt7

## Postprocess IgBLAST

The following cells postprocess the BLAST output using Change-O.

In [14]:
%%bash
MakeDb.py igblast -i A2-i131.fmt7 -s A2-i131.fasta -r IMGT_Human_TRB[VDJ].fasta \
    --regions --scores

        START> MakeDB
      ALIGNER> IgBlast
ALIGN_RESULTS> SRR765688.fmt7
     SEQ_FILE> SRR765688.fasta
     NO_PARSE> False
 SCORE_FIELDS> True
REGION_FIELDS> True

PROGRESS> 07:54:44 [                    ]   0% (    0) 0.0 minPROGRESS> 07:54:44 [#                   ]   5% (  129) 0.0 minPROGRESS> 07:54:45 [##                  ]  10% (  258) 0.0 minPROGRESS> 07:54:46 [###                 ]  15% (  387) 0.0 minPROGRESS> 07:54:46 [####                ]  20% (  516) 0.0 minPROGRESS> 07:54:47 [#####               ]  25% (  645) 0.1 minPROGRESS> 07:54:47 [######              ]  30% (  774) 0.1 minPROGRESS> 07:54:48 [#######             ]  35% (  903) 0.1 minPROGRESS> 07:54:49 [########            ]  40% (1,032) 0.1 minPROGRESS> 07:54:50 [#########           ]  45% (1,161) 0.1 minPROGRESS> 07:54:50 [##########          ]  50% (1,290) 0.1 minPROGRESS> 07:54:51 [###########         ]  55% (1,419) 0.1 minPROGRESS> 07:54:52 [############        ]  60% (1,548) 0.1 minPROGRESS> 07:

In [15]:
%%bash
ParseDb.py split -d A2-i131_db-pass.tab -f FUNCTIONAL

    START> ParseDb
  COMMAND> split
     FILE> SRR765688_db-pass.tab
    FIELD> FUNCTIONAL
NUM_SPLIT> None

PROGRESS> 07:55:05 [                    ]   0% (    0) 0.0 minPROGRESS> 07:55:05 [#                   ]   5% (  128) 0.0 minPROGRESS> 07:55:05 [##                  ]  10% (  256) 0.0 minPROGRESS> 07:55:05 [###                 ]  15% (  384) 0.0 minPROGRESS> 07:55:05 [####                ]  20% (  512) 0.0 minPROGRESS> 07:55:05 [#####               ]  25% (  640) 0.0 minPROGRESS> 07:55:05 [######              ]  30% (  768) 0.0 minPROGRESS> 07:55:05 [#######             ]  35% (  896) 0.0 minPROGRESS> 07:55:05 [########            ]  40% (1,024) 0.0 minPROGRESS> 07:55:05 [#########           ]  45% (1,152) 0.0 minPROGRESS> 07:55:05 [##########          ]  50% (1,280) 0.0 minPROGRESS> 07:55:05 [###########         ]  55% (1,408) 0.0 minPROGRESS> 07:55:05 [############        ]  60% (1,536) 0.0 minPROGRESS> 07:55:05 [#############       ]  65% (1,664) 0.0 minPROGRESS> 