findpdbabs

A new program to find PDB files containing antibodies

Prerequisites

A local mirror of the PDB (you can use our ftpmirror script for this)
BiopTools installed and available in the path
Legacy blast installed and in the path

Configuration

Edit the file findpdbabs.conf to specify:

pdbdir - the location of your PDB mirror
faadir - the location of the Fasta equivalent of the PDB (the data will be created during the processing - you do not need to have pre-created it)
abpdbdir - the location of a directory containing confirmed known PDB files of antibodies
dataroot - root directory of where you wish to store Blast database file, database files etc.

Running the software

Simply type:

   ./update.sh

This first creates a sequence database based on PDB files. It uses getpdbseqs.pl to create a sequence file in faadir for each PDB file not already processed.

It then runs runs the program (./findpdbabs.pl) to identify new PDB that contain an antibody.

Preparing the data

Reference data are supplied with the program, so this shouldn't need to be repeated unless you are rebuilding the dataset.

To prepare the data files for findpdbabs to use.

You will need:

A directory containing known antibody structure Fvs (e.g. from AbDb)

Ensure that abpdbdir in the config file points to this directory.

Enter the dataprep/ directory and type:

   ./builddata.sh

This obtains non-redundant FASTA files of the known antibodies, non-redundant TCR sequences and a file containing them both.

Algorithm

The algorithm is to identify PDB sequences that haven't previously been examined (a DBM file is used to keep track of this) and scan each of the template sequences against these. If the match is over >= 80 residues and had a sequence ID of >= 0.3 with an e-value <= 0.01, then the hit is retained. When each template is run with BLAST, the results will replace older results for a given new sequence hit if the new match is better.

The next stage is a BLAST search of each hit against a database of antibody and TCR sequences. If the best hit is an antibody, the sequence is kept, if it is a TCR (or doesn't match anything) it is rejected.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
data		data
dataprep		dataprep
.gitignore		.gitignore
LICENSE.GPL3		LICENSE.GPL3
README.md		README.md
clean.sh		clean.sh
config.pm		config.pm
findpdbabs.pl		findpdbabs.pl
findpdbabs_home.conf		findpdbabs_home.conf
findpdbabs_test.conf		findpdbabs_test.conf
getpdbseqs.pl		getpdbseqs.pl
update.sh		update.sh
utils.pm		utils.pm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

findpdbabs

Prerequisites

Configuration

Running the software

Preparing the data

Algorithm

About

Releases

Packages

Languages

License

AndrewCRMartin/findpdbabs

Folders and files

Latest commit

History

Repository files navigation

findpdbabs

Prerequisites

Configuration

Running the software

Preparing the data

Algorithm

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages