# Structure checking tutorial

A complete checking analysis of a single structure follows.
use .revert_changes() at any time to recover the original structure

Structure checking is a key step before setting up a protein system for simulations. 
A number of normal issues found in structures at Protein Data Bank may compromise the success of the simulation, or may suggest that longer equilibration procedures are necessary.

The biobb_structure_checking modules allow to 
- Do basic manipulations on structures (selection of models, chains, alternative locations
- Detect and fix amide assignments, wrong chirality
- Detect and fix protein backbone issues (missing fragments, and atoms, capping)
- Detect and fix missing side-chain atoms
- Add hydrogen atoms according to several criteria
- Detect and classify clashes
- Detect possible SS bonds

biobb_structure_checking modules can used at the command line biobb_structure_checking/bin/check_structure


In [None]:
%load_ext autoreload
%autoreload 2

## Installation

#### Basic imports and initialization

In [None]:
import biobb_structure_checking as bsch
from biobb_structure_checking.structure_checking import StructureChecking
from biobb_structure_checking.constants import help, set_defaults
base_dir_path = bsch.__path__[0]
args = set_defaults(base_dir_path)

## General help

In [None]:
help()

Set input (PDB or local file, pdb or mmCif formats allowed) and output (local file, pdb format).  
Use pdb:pdbid for downloading structure from PDB (RCSB)

In [None]:
args['input_structure_path'] = 'pdb:6axg'
args['output_structure_path'] = '6axg_fixed.pdb'
args['output_structure_path_pdbqt'] = '6axg_fixed.pdbqt'

Initializing checking engine, loading structure and showing statistics

In [None]:
structure = StructureChecking(base_dir_path,args)

#### models
Checks for the presence of models in the structure. 
MD simulations require a single structure, although some structures (e.g. biounits) may be defined as a series of models, in such case all of them are usually required.  
Use models('--select N') to select model num N for further analysis

In [None]:
structure.models()

#### chains
Checks for chains (also obtained from print_stats), and allow to select one or more.   
MD simulations are usually performed with complete structures. However input structure may contain several copies of the system, or contains additional chains like peptides or nucleic acids that may be removed. 
Use chains('X,Y') to select chain(s) X and Y to proceed

In [None]:
structure.chains()

6axg have 6 copies in the crystal assimetric unit, to get a single copy, choose only A and B chains

In [None]:
structure.chains('A,B')

#### altloc
Checks for the presence of residues with alternative locations. Atoms with alternative coordinates and their occupancy are reported.  
MD simulations requires a single position for each atom.  
Use altloc('occupancy | alt_ids | list of res:id) to select the alternative


In [None]:
structure.altloc()

#### metals
Detects HETATM being metal ions allow to selectively remove them.  
To remove use metals (' All | None | metal_type list | residue list ')

In [None]:
structure.metals()

#### ligands
Detects HETATM (excluding Water molecules) to selectively remove them.  
To remove use ligands('All | None | Residue List (by id, by num)')


In [None]:
structure.ligands()

#### rem_hydrogen
Detects and remove hydrogen atoms. 
MD setup can be done with the original H atoms, however to prevent from non standard labelling, remove them is safer.  
To remove use rem_hydrogen('yes')


In [None]:
structure.rem_hydrogen()

#### water
Detects water molecules and allows to remove them
Crystallographic water molecules may be relevant for keeping the structure, however in most cases only some of them are required. These can be later added using other methods (titration) or manually.

To remove water molecules use water('yes')


In [None]:
structure.water()

#### amide
Amide terminal atoms in Asn ang Gln residues can be labelled incorrectly.  
amide suggests possible fixes by checking the sourrounding environent.

To fix use amide ('All | None | residue_list')

Note that the inversion of amide atoms may trigger additional contacts. 

In [None]:
structure.amide()

Fix all amide residues and recheck

In [None]:
structure.amide('all')

Comparing both checks it becomes clear that GLN A233 and ASN A248 are now in a worse situation, so should be changed back to the original labelling

In [None]:
structure.amide('A223,A248')

#### chiral
Side chains of Thr and Ile are chiral, incorrect atom labelling lead to the wrong chirality.  
To fix use chiral('All | None | residue_list')

In [None]:
structure.chiral()

#### Backbone
Detects and fixes several problems with the backbone
use any of 
--fix_atoms All|None|Residue List 
--fix_chain All|None|Break list
--add_caps All|None|Terms|Breaks|Residue list
--no_recheck
--no_check_clashes


In [None]:
structure.backbone()

Re-building backbone breaks with Modeller (Modeller requires a license key)

In [None]:
# args['modeller_key'] = 'XXXXXXX' #Need to register to Modeller
opts = {
    'fix_chain': 'all',
    'add_caps' : 'none',
    'fix_atoms': 'none',
    'no_recheck': True
}
structure.backbone(opts)
#structure.backbone('--fix_chain all --add_caps none --fix_atoms none --no_recheck')

#### fixside
Detects and re-built missing protein side chains.   
To fix use fixside('All | None | residue_list')

In [None]:
structure.fixside()

#### getss
Detects possible -S-S- bonds based on distance criteria.
Proper simulation requires those bonds to be correctly set.

In [None]:
structure.getss()

#### Add_hydrogens
 Add Hydrogen Atoms. Auto: std changes at pH 7.0. His->Hie. pH: set pH value
    list: Explicit list as [*:]HisXXHid, Interactive[_his]: Prompts for all selectable residues
    Fixes missing side chain atoms unless --no_fix_side is set
    Existing hydrogen atoms are removed before adding new ones unless --keep_h set.

In [None]:
structure.add_hydrogen()

#### clashes
Detects steric clashes based on distance criteria.  
Contacts are classified in: 
* Severe: Too close atoms, usually indicating superimposed structures or badly modelled regions. Should be fixed.
* Apolar: Vdw colissions.Usually fixed during the simulation.
* Polar and ionic. Usually indicate wrong side chain conformations. Usually fixed during the simulation


In [None]:
structure.clashes()

Complete check in a single method

In [None]:
structure.checkall()

In [None]:
structure.save_structure(args['output_structure_path'])

In [None]:
import nglview as nv
nv.show_biopython(structure.strucm.st[0])

In [None]:
#structure.backbone('--fix_atoms A430 --fix_chain all --add_caps none --no_recheck')
opts = {
    'fix_atoms':'A430',
    'fix_chain':'all',
    'add_caps':'none',
    'no_recheck': True,
}
structure.backbone(opts)

In [None]:
opts = {
    'add_mode':'auto',
    'add_charges': 'ADT'
}

structure.add_hydrogen(opts)

#structure.add_hydrogen('--add_mode auto --add_charges ADT')

In [None]:
structure.save_structure('6axg.pdbqt')