# Test notebook for structure checking

A complete checking analysis of a single structure follows.
use .revert_changes() to recover the original structure

#### Setting python paths

#### Basic imports

In [1]:
import nglview as nv
import biobb_structure_checking
import biobb_structure_checking.constants as cts
from biobb_structure_checking.structure_checking import StructureChecking
base_dir_path=biobb_structure_checking.__path__[0]


Setting default data and paths

In [2]:
args = cts.set_defaults(base_dir_path,{'notebook':True})

General help

In [3]:
with open(args['commands_help_path']) as help_file:
    print(help_file.read())


BioBB's check_structure.py performs MDWeb structure checking set as a command line
utility.

commands:     Help on available commands
command_list: Run all tests from conf file
checkall:     Perform all checks without fixes
fixall:       Performs a default fix (v1.1)
load:         Stores structure on local cache and provide basic statistics

1. System Configuration
models [--select model_num]
    Detect/Select Models
chains [--select chain_ids]
    Detect/Select Chains
inscodes [--renum]
    Detects residues with insertion codes. --renum (v1.1) renumbers residues
altloc [--select occupancy| alt_id | list of res_id:alt_id]
    Detect/Select Alternative Locations
metals [--remove All | None | Met_ids_list | Residue_list]
    Detect/Remove Metals
ligands [--remove All | None | Res_type_list | Residue_list]
    Detect/Remove Ligands
hetatm [--remove All | None | Res_type_list | Residue_list] (v1.1)
    Detect/Remove Ligands, revert modified residues
water [--remove Yes|No]
    Remove Wate

Set input (PDB or local file, pdb or mmCif formats allowed) and output (local file, pdb format).  
Use pdb:pdbid for downloading structure from PDB (RCSB)

In [4]:
args['input_structure_path'] = 'pdb:2ki5'
args['output_structure_path'] = 'test.pdb'

Initializing checking engine, loading structure and showing statistics

In [5]:

st_c = StructureChecking(base_dir_path,args)


Structure exists: 'tmpPDB/ki/2ki5.cif' 
Structure pdb:2ki5 loaded
 PDB id: 2KI5
 Title: HERPES SIMPLEX TYPE-1 THYMIDINE KINASE IN COMPLEX WITH THE DRUG ACICLOVIR AT 1.9A RESOLUTION
 Experimental method: X-RAY DIFFRACTION
 Keywords: TRANSFERASE
 Resolution: 1.90 A

 Num. models: 1
 Num. chains: 2 (A: Protein, B: Protein)
 Num. residues:  908
 Num. residues with ins. codes:  0
 Num. HETATM residues:  296
 Num. ligands or modified residues:  4
 Num. water mol.:  292
 Num. atoms:  4961
Small mol ligands found
SO4 A4
AC2 A1
SO4 B3
AC2 B2



#### models
Checks for the presence of models in the structure. 
MD simulations require a single structure, although some structures (e.g. biounits) may be defined as a series of models, in such case all of them are usually required.  
Use models('--select N') to select model num N for further analysis

In [6]:
st_c.models()

Running models.
1 Model(s) detected
Single model found
Running  check_only. Nothing else to do.


#### chains
Checks for chains (also obtained from print_stats), and allow to select one or more.   
MD simulations are usually performed with complete structures. However input structure may contain several copies of the system, or contains additional chains like peptides or nucleic acids that may be removed. 
Use chains('X,Y') to select chain(s) X and Y to proceed

In [7]:
st_c.chains()


Running chains.
2 Chain(s) detected
 A: Protein
 B: Protein
Running  check_only. Nothing else to do.


#### altloc
Checks for the presence of residues with alternative locations. Atoms with alternative coordinates and their occupancy are reported.  
MD simulations requires a single position for each atom.  
Use altloc('occupancy | alt_ids | list of res:id) to select the alternative


In [8]:
st_c.altloc()

Running altloc.
Detected 2 residues with alternative location labels
AC2 A1
  C3'  A (1.00) B (0.01)
  O3'  A (1.00) B (0.01)
  C2'  A (1.00) B (0.01)
  O1'  A (1.00) B (0.01)
  C1'  A (1.00) B (0.01)
HOH A779
  O    A (1.00)
Running  check_only. Nothing else to do.


#### metals
Detects HETATM being metal ions allow to selectively remove them.  
To remove use metals (' All | None | metal_type list | residue list ')

In [9]:
st_c.metals()

Running metals.
No metal ions found
Running  check_only. Nothing else to do.


#### ligands
Detects HETATM (excluding Water molecules) to selectively remove them.  
To remove use ligands('All | None | Residue List (by id, by num)')


In [10]:
st_c.ligands()

Running ligands.
4 Ligands detected
 SO4 A4
 AC2 A1
 SO4 B3
 AC2 B2
Running  check_only. Nothing else to do.


#### rem_hydrogen
Detects and remove hydrogen atoms. 
MD setup can be done with the original H atoms, however to prevent from non standard labelling, remove them is safer.  
To remove use rem_hydrogen('yes')


In [11]:
st_c.rem_hydrogen()

Running rem_hydrogen.
No residues with Hydrogen atoms found
Running  check_only. Nothing else to do.


#### water
Detects water molecules and allows to remove them
Crystallographic water molecules may be relevant for keeping the structure, however in most cases only some of them are required. These can be later added using other methods (titration) or manually.

To remove water molecules use water('yes')


In [12]:
st_c.water()

Running water.
292 Water molecules detected
Running  check_only. Nothing else to do.


#### amide
Amide terminal atoms in Asn ang Gln residues can be labelled incorrectly.  
amide suggests possible fixes by checking the sourrounding environent.

To fix use amide ('All | None | residue_list')

Note that the inversion of amide atoms may trigger additional contacts. 

In [13]:
st_c.amide()

Running amide.
No unusual contact(s) involving amide atoms found
Running  check_only. Nothing else to do.


#### chiral
Side chains of Thr and Ile are chiral, incorrect atom labelling lead to the wrong chirality.  
To fix use chiral('All | None | residue_list')

In [14]:
st_c.chiral()

Running chiral.
No residues with incorrect side-chain chirality found
Running  check_only. Nothing else to do.


#### Backbone
Detects and fixes several problems with the backbone
use any of 
--fix_atoms All|None|Residue List 
--fix_main All|None|Break list
--add_caps All|None|Break list
--no_recheck


In [15]:
st_c.backbone()

Running backbone.
8 Residues with missing backbone atoms found
 VAL A70    CA,C,O,OXT
 GLY A148   CA,C,O,OXT
 GLY A264   CA,C,O,OXT
 ALA A375   CA,C,O,OXT
 GLY B73    CA,C,O,OXT
 GLY B148   CA,C,O,OXT
 SER B263   CA,C,O,OXT
 ALA B375   CA,C,O,OXT
6 Backbone breaks found
 VAL A70    - ASP A77    
 GLY A148   - PRO A154   
 GLY A264   - PRO A280   
 GLY B73    - ASP B77    
 GLY B148   - PRO B154   
 SER B263   - ALA B278   
Running  check_only. Nothing else to do.


#### fixside
Detects and re-built missing protein side chains.   
To fix use fixside('All | None | residue_list')

In [16]:
st_c.fixside()

Running fixside.
6 Residues with missing side chain atoms found
 MET A46    CG,SD,CE
 PRO A154   CG,CD
 PRO A280   CG,CD
 MET B46    CG,SD,CE
 PRO B154   CG,CD
 ARG B220   CG,CD,NE,CZ,NH1,NH2
Running  check_only. Nothing else to do.


#### Add_hydrogens
 Add Hydrogen Atoms. Auto: std changes at pH 7.0. His->Hie. pH: set pH value
    list: Explicit list as [*:]HisXXHid, Interactive[_his]: Prompts for all selectable residues
    Fixes missing side chain atoms unless --no_fix_side is set
    Existing hydrogen atoms are removed before adding new ones unless --keep_h set.

In [17]:
st_c.add_hydrogen('auto')

Running add_hydrogen. Options: auto
148 Residues requiring selection on adding H atoms
CYS A171,A251,A336,A362,B171,B251,B336,B362
ASP A55,A77,A108,A116,A136,A162,A211,A215,A228,A258,A286,A303,A313,A328,A330,A338,A363,B55,B77,B108,B116,B136,B162,B211,B215,B228,B258,B286,B303,B313,B328,B330,B338,B363
GLU A83,A95,A111,A146,A210,A225,A257,A296,A371,A374,B83,B95,B111,B146,B210,B225,B257,B296,B371,B374
HIS A58,A105,A142,A164,A213,A283,A323,A351,B58,B105,B142,B164,B213,B283,B323,B351
LYS A62,A219,A317,B62,B219,B317
ARG A51,A89,A106,A163,A176,A212,A216,A220,A222,A226,A236,A237,A247,A256,A281,A293,A318,A320,A337,A366,A370,B51,B89,B106,B163,B176,B212,B216,B220,B222,B226,B236,B237,B247,B256,B281,B293,B318,B320,B337,B366,B370
TYR A53,A80,A87,A101,A132,A172,A177,A239,A248,A305,A329,B53,B80,B87,B101,B132,B172,B177,B239,B248,B305,B329
Running fixside. Options: --fix all
6 Residues with missing side chain atoms found
 MET A46    CG,SD,CE
 PRO A154   CG,CD
 PRO A280   CG,CD
 MET B46    CG,SD,CE
 PRO B

#### getss
Detects possible -S-S- bonds based on distance criteria.
Proper simulation requires those bonds to be correctly set.

In [18]:
st_c.getss()

Running getss.
No SS bonds detected
Running  check_only. Nothing else to do.


#### clashes
Detects steric clashes based on distance criteria.  
Contacts are classified in: 
* Severe: Too close atoms, usually indicating superimposed structures or badly modelled regions. Should be fixed.
* Apolar: Vdw colissions.Usually fixed during the simulation.
* Polar and ionic. Usually indicate wrong side chain conformations. Usually fixed during the simulation


In [19]:
st_c.clashes()

Running clashes.
No severe clashes detected
16 Steric apolar clashes detected
 MET A46.CE   VAL A352.CG2    1.843 A
 MET A46.CE   THR A354.O      1.869 A
 GLY A61.O    THR A65.CG2     2.799 A
 MET A85.CG   GLU A374.OE1    2.857 A
 ARG A89.CD   GLU A95.OE1     2.776 A
 LEU A232.O   ARG A236.CG     2.741 A
 MET A372.SD  VAL B307.CG1    2.777 A
 MET B46.CE   VAL B352.O      0.921 A
 GLY B61.CA   ARG B220.NH1    2.301 A
 LYS B62.O    THR B66.CG2     2.878 A
 THR B66.OG1  TYR B80.CE1     2.890 A
 THR B86.OG1  GLU B374.CB     2.892 A
 GLU B210.O   ILE B214.CG1    2.842 A
 ARG B212.CZ  ASP B330.OD1    2.875 A
 ARG B220.CZ  SO4 B3.O3       1.597 A
 SER B357.O   ILE B361.CG1    2.861 A
8 Steric polar_acceptor clashes detected
 MET A46.SD   VAL A352.O      2.171 A
 MET A46.SD   THR A354.O      2.569 A
 ASP A55.O    VAL A204.O      3.035 A
 MET B46.SD   VAL B352.O      2.291 A
 ARG B89.O    GLU B95.OE1     2.792 A
 ASP B211.O   ASP B215.OD2    2.914 A
 ALA B334.O   ASP B338.OD2    3.100 A
 PRO B3

In [20]:
st_c._save_structure(args['output_structure_path'])

'test.pdb'

In [21]:
import nglview as nv
w = nv.show_biopython(st_c.strucm.st[0])


AttributeError: 'Atom' object has no attribute 'disordered_get_list'

In [27]:
from Bio.PDB import PDBParser
parser = PDBParser()
struc = parser.get_structure("pp", "test.pdb")
w = nv.show_biopython(struc[9])