Skip to content

Web Crawler to Extract data from PDB(Protein Data Bank) Files

Notifications You must be signed in to change notification settings

akasha147/PDB-Scrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

PDB-Scrapper

Web Crawler to Extract data from PDB(Protein Data Bank) Files.

Software Requirements(in Fedora(i.e Linux) OS):

  • Perl
  • Modules Required(Can be installed using CPAN:
    1. DBI (for Connecting to Database)]
    2. LWP::Simple (for Web Crawling)
    3. IO::String (for Web Crawling)
  • MySQL Server

Features of the crawler

  • This script is capable of extracting the following information:
  1. Experiment Type(Eg.X-Ray Diffraction,NMR)
  2. Protein Type(Eg.Lectin)
  3. Resolution of the structure
  4. R-factor
  • For each Individual Chain in a structure,the code:
  1. Determines the type of the chain(Protein/DNA/RNA)
  2. Extracts the Primary Sequence(from the FASTA file)
  • The code discards the extract data on the following conditions:
  1. The Chain contains any unknown residue
  2. There is no protein chains in the structure(only DNA or/and RNA)

About

Web Crawler to Extract data from PDB(Protein Data Bank) Files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages