# IgBLAST

Given the following FASTA file, use IgBLAST to assign germline V, D, and J segments, and post-process using Change-O.

## Set up IgBLAST

In [1]:
%%bash
wget -r -nH --cut-dirs=4 --no-parent ftp://ftp.ncbi.nih.gov/blast/executables/igblast/release/internal_data
wget -r -nH --cut-dirs=4 --no-parent ftp://ftp.ncbi.nih.gov/blast/executables/igblast/release/optional_file

--2016-10-22 06:58:05--  ftp://ftp.ncbi.nih.gov/blast/executables/igblast/release/internal_data
           => ‘.listing’
Resolving ftp.ncbi.nih.gov (ftp.ncbi.nih.gov)... 130.14.250.11, 2607:f220:41e:250::13
Connecting to ftp.ncbi.nih.gov (ftp.ncbi.nih.gov)|130.14.250.11|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /blast/executables/igblast/release ... done.
==> PASV ... done.    ==> LIST ... done.

     0K                                                        15.6K=0.04s

2016-10-22 06:58:07 (15.6 KB/s) - ‘.listing’ saved [599]

Removed ‘.listing’.
--2016-10-22 06:58:07--  ftp://ftp.ncbi.nih.gov/blast/executables/igblast/release/internal_data/internal_data
           => ‘internal_data/.listing’
==> CWD (1) /blast/executables/igblast/release/internal_data ... done.
==> PASV ... done.    ==> LIST ... done.

     0K                                                         227K=0.001s

2016-10-22 06:58:0

In [4]:
%%bash
mkdir database
# V-segment database
perl ./edit_imgt_file.pl IMGT_Human_IGHV.fasta > database/human_igh_v
makeblastdb -parse_seqids -dbtype nucl -in database/human_igh_v
# D-segment database
perl ./edit_imgt_file.pl IMGT_Human_IGHD.fasta > database/human_igh_d
makeblastdb -parse_seqids -dbtype nucl -in database/human_igh_d
# J-segment database
perl ./edit_imgt_file.pl IMGT_Human_IGHJ.fasta > database/human_igh_j
makeblastdb -parse_seqids -dbtype nucl -in database/human_igh_j



Building a new DB, current time: 10/22/2016 07:01:03
New DB name:   /home/simon/Projects/aairr16/solutions/database/human_igh_v
New DB title:  database/human_igh_v
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 350 sequences in 0.00901294 seconds.


Building a new DB, current time: 10/22/2016 07:01:03
New DB name:   /home/simon/Projects/aairr16/solutions/database/human_igh_d
New DB title:  database/human_igh_d
Sequence type: Nucleotide
Deleted existing Nucleotide BLAST database named /home/simon/Projects/aairr16/solutions/database/human_igh_d
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 44 sequences in 0.00116992 seconds.


Building a new DB, current time: 10/22/2016 07:01:03
New DB name:   /home/simon/Projects/aairr16/solutions/database/human_igh_j
New DB title:  database/human_igh_j
Sequence type: Nucleotide
Deleted existing Nucleotide BLAST database named

##  Convert FASTQ to FASTA

In [8]:
from Bio import SeqIO
SeqIO.convert('S43_atleast-2.fastq','fastq','S43_atleast-2.fasta','fasta') 

100

## Use IgBLAST

In [12]:
%%bash
igblastn \
    -germline_db_V database/human_igh_v \
    -germline_db_D database/human_igh_d \
    -germline_db_J database/human_igh_j \
    -auxiliary_data optional_file/human_gl.aux \
    -domain_system imgt -ig_seqtype Ig -organism human \
    -outfmt '7 std qseq sseq btop' \
    -query SRR765688.fasta \
    -out SRR765688.fmt7

## Postprocess IgBLAST

In [14]:
%%bash
MakeDb.py igblast -i SRR765688.fmt7 -s SRR765688.fasta -r IMGT_Human_IGH[VDJ].fasta \
    --regions --scores

        START> MakeDB
      ALIGNER> IgBlast
ALIGN_RESULTS> SRR765688.fmt7
     SEQ_FILE> SRR765688.fasta
     NO_PARSE> False
 SCORE_FIELDS> True
REGION_FIELDS> True

PROGRESS> 07:54:44 [                    ]   0% (    0) 0.0 minPROGRESS> 07:54:44 [#                   ]   5% (  129) 0.0 minPROGRESS> 07:54:45 [##                  ]  10% (  258) 0.0 minPROGRESS> 07:54:46 [###                 ]  15% (  387) 0.0 minPROGRESS> 07:54:46 [####                ]  20% (  516) 0.0 minPROGRESS> 07:54:47 [#####               ]  25% (  645) 0.1 minPROGRESS> 07:54:47 [######              ]  30% (  774) 0.1 minPROGRESS> 07:54:48 [#######             ]  35% (  903) 0.1 minPROGRESS> 07:54:49 [########            ]  40% (1,032) 0.1 minPROGRESS> 07:54:50 [#########           ]  45% (1,161) 0.1 minPROGRESS> 07:54:50 [##########          ]  50% (1,290) 0.1 minPROGRESS> 07:54:51 [###########         ]  55% (1,419) 0.1 minPROGRESS> 07:54:52 [############        ]  60% (1,548) 0.1 minPROGRESS> 07:

In [15]:
%%bash
ParseDb.py split -d SRR765688_db-pass.tab -f FUNCTIONAL

    START> ParseDb
  COMMAND> split
     FILE> SRR765688_db-pass.tab
    FIELD> FUNCTIONAL
NUM_SPLIT> None

PROGRESS> 07:55:05 [                    ]   0% (    0) 0.0 minPROGRESS> 07:55:05 [#                   ]   5% (  128) 0.0 minPROGRESS> 07:55:05 [##                  ]  10% (  256) 0.0 minPROGRESS> 07:55:05 [###                 ]  15% (  384) 0.0 minPROGRESS> 07:55:05 [####                ]  20% (  512) 0.0 minPROGRESS> 07:55:05 [#####               ]  25% (  640) 0.0 minPROGRESS> 07:55:05 [######              ]  30% (  768) 0.0 minPROGRESS> 07:55:05 [#######             ]  35% (  896) 0.0 minPROGRESS> 07:55:05 [########            ]  40% (1,024) 0.0 minPROGRESS> 07:55:05 [#########           ]  45% (1,152) 0.0 minPROGRESS> 07:55:05 [##########          ]  50% (1,280) 0.0 minPROGRESS> 07:55:05 [###########         ]  55% (1,408) 0.0 minPROGRESS> 07:55:05 [############        ]  60% (1,536) 0.0 minPROGRESS> 07:55:05 [#############       ]  65% (1,664) 0.0 minPROGRESS> 