Skip to content

Code to identify PDB files of antibodies using forward and reverse BLAST searches

License

Notifications You must be signed in to change notification settings

AndrewCRMartin/findpdbabs

Repository files navigation

findpdbabs

A new program to find PDB files containing antibodies

Prerequisites

  • A local mirror of the PDB (you can use our ftpmirror script for this)
  • BiopTools installed and available in the path
  • Legacy blast installed and in the path

Configuration

Edit the file findpdbabs.conf to specify:

  • pdbdir - the location of your PDB mirror
  • faadir - the location of the Fasta equivalent of the PDB (the data will be created during the processing - you do not need to have pre-created it)
  • abpdbdir - the location of a directory containing confirmed known PDB files of antibodies
  • dataroot - root directory of where you wish to store Blast database file, database files etc.

Running the software

Simply type:

   ./update.sh

This first creates a sequence database based on PDB files. It uses getpdbseqs.pl to create a sequence file in faadir for each PDB file not already processed.

It then runs runs the program (./findpdbabs.pl) to identify new PDB that contain an antibody.

Preparing the data

Reference data are supplied with the program, so this shouldn't need to be repeated unless you are rebuilding the dataset.

To prepare the data files for findpdbabs to use.

You will need:

  • A directory containing known antibody structure Fvs (e.g. from AbDb)

Ensure that abpdbdir in the config file points to this directory.

Enter the dataprep/ directory and type:

   ./builddata.sh

This obtains non-redundant FASTA files of the known antibodies, non-redundant TCR sequences and a file containing them both.

Algorithm

The algorithm is to identify PDB sequences that haven't previously been examined (a DBM file is used to keep track of this) and scan each of the template sequences against these. If the match is over >= 80 residues and had a sequence ID of >= 0.3 with an e-value <= 0.01, then the hit is retained. When each template is run with BLAST, the results will replace older results for a given new sequence hit if the new match is better.

The next stage is a BLAST search of each hit against a database of antibody and TCR sequences. If the best hit is an antibody, the sequence is kept, if it is a TCR (or doesn't match anything) it is rejected.

About

Code to identify PDB files of antibodies using forward and reverse BLAST searches

Resources

License

Stars

Watchers

Forks

Packages