# PDB Module

BioPython provides the `Bio.PDB` module to manipulate polypeptide structures. The PDB (Protein Data Bank) is the largest protein structure resource available online. It hosts a lot of distinct protein structures, including protein-protein, protein-DNA, protein-RNA complexes

In [1]:
# To import
from Bio.PDB.PDBList import PDBList

In [2]:
pdbl = PDBList()
pdbl

<Bio.PDB.PDBList.PDBList at 0x153c7065160>

In [3]:
pdbl.retrieve_pdb_file('2FAT', pdir = '.', file_format = "mmCif")

Downloading PDB structure '2fat'...


'.\\2fat.cif'

In [5]:
# To load a cif file
from Bio.PDB.MMCIFParser import MMCIFParser
parser = MMCIFParser(QUIET = True)
data = parser.get_structure("2FAT", "2FAT.cif")
print(type(data))
data

<class 'Bio.PDB.Structure.Structure'>


<Structure id=2FAT>

## `PDB` Parser

In [6]:
pdbl.retrieve_pdb_file('2FAT', pdir='.', file_format="pdb")

Downloading PDB structure '2fat'...


'.\\pdb2fat.ent'

In [7]:
# To load a pdb file
from Bio.PDB.PDBParser import PDBParser
parser = PDBParser(PERMISSIVE=True, QUIET=True)
data = parser.get_structure("2fat", "pdb2fat.ent")
print(type(data))
data

<class 'Bio.PDB.Structure.Structure'>


<Structure id=2fat>

In [10]:
data.header.keys()

dict_keys(['name', 'head', 'idcode', 'deposition_date', 'release_date', 'structure_method', 'resolution', 'structure_reference', 'journal_reference', 'author', 'compound', 'source', 'has_missing_residues', 'missing_residues', 'biomoltrans', 'keywords', 'journal'])

In [9]:
data.header["name"]

'an anti-urokinase plasminogen activator receptor (upar) antibody: crystal structure and binding epitope'

In [11]:
data.header["release_date"]

'2006-11-14'

In [12]:
data.header["resolution"]

1.77

## PDB Structure

PDB structure is composed of a single model, containing two chains:
- chain L, containing the number of residues
- chain H, containing the number of residues

Each residue is composed of multiple atoms, each having a 3D position represented by (x, y, z) coordinates

### Model

In [13]:
model = data.get_models()
model

<generator object Structure.get_models at 0x00000153DD516B00>

In [14]:
models = list(model)
models

[<Model id=0>]

In [15]:
type(models[0])

Bio.PDB.Model.Model

### Chain

In [16]:
chains = list(models[0].get_chains())
chains

[<Chain id=L>, <Chain id=H>]

In [17]:
type(chains[0])

Bio.PDB.Chain.Chain

### Residue

In [18]:
residue = list(chains[0].get_residues())
len(residue)

293

In [19]:
residue1 = list(chains[1].get_residues())
len(residue1)

311

### Atoms

In [20]:
atoms = list(residue[0].get_atoms())
atoms

[<Atom N>,
 <Atom CA>,
 <Atom C>,
 <Atom O>,
 <Atom CB>,
 <Atom CG>,
 <Atom OD1>,
 <Atom OD2>]

In [21]:
# An atom holds the 3D coordinates of an atom and it is called a Vector
atoms[0].get_vector()

<Vector 18.49, 73.26, 44.16>