Web Crawler to Extract data from PDB(Protein Data Bank) Files.
- Perl
- Modules Required(Can be installed using CPAN:
- DBI (for Connecting to Database)]
- LWP::Simple (for Web Crawling)
- IO::String (for Web Crawling)
- MySQL Server
- This script is capable of extracting the following information:
- Experiment Type(Eg.X-Ray Diffraction,NMR)
- Protein Type(Eg.Lectin)
- Resolution of the structure
- R-factor
- For each Individual Chain in a structure,the code:
- Determines the type of the chain(Protein/DNA/RNA)
- Extracts the Primary Sequence(from the FASTA file)
- The code discards the extract data on the following conditions:
- The Chain contains any unknown residue
- There is no protein chains in the structure(only DNA or/and RNA)