# Test notebook for structure checking

A complete checking analysis of a single structure follows.

#### Setting python paths

In [1]:
import sys
bioexcel_dir = '/data/DEVEL/BioExcel'
base_dir = bioexcel_dir+'/structure_checking'
sys.path.append(base_dir)
sys.path.append(bioexcel_dir)


#### Basic imports

In [2]:
from structure_checking.help_manager import HelpManager
from structure_checking.structure_checking import StructureChecking
from structure_checking.default_settings import DefaultSettings

Setting default data

In [3]:
def_sets = DefaultSettings(base_dir)
help = HelpManager(def_sets.help_dir_path)
args={'Notebook':True, 'debug':False}

General help

In [4]:
help.print_help('general')


MDWeb checkStruc.py performs MDWeb structure checking set as a command line
utility.

It includes some structure manipulation options like selecting models or chains,
removing components of the system, completing missing atoms, and some quality
checking as residue quirality, amide orientation, or vdw clashes.

Usage:  checkStruc [-h|--help] command help|options 
                   -i input_pdb_path -o input_pdb_path

Missing parameters will be prompted in interactive sessions

Available commands:

commands:  This help
command_list:      Run all tests from conf file


1. System Configuration 
models [--select_model model_num]     
    Detect/Select Models
chains [--select_chains chain_ids]    
    Detect/Select Chains 
altloc [--select_altloc occupancy| alt_id | list of res_id:alt_id]
    Detect/Select Alternative Locations 
metals [--remove All | None | Met_ids_list | Residue_list]   
    Detect/Remove Metals 
ligands [--remove All | None | Res_type_list | Residue_list]
    Detect/Rem

Set input (PDB or local file, pdb or mmCif formats allowed) and output (local file, pdb format).  
Use pdb:pdbid for downloading structure from PDB (RCSB)

In [5]:
args['input_structure_path'] ='pdb:1ark'
args['output_structure_path']='temp.pdb'

Initializing checking engine

In [6]:
st_c = StructureChecking(def_sets,args)
st_c.print_stats()

Structure exists: 'tmpPDB/ar/1ark.cif' 
Structure pdb:1ark loaded
 Num. models: 15
 Num. chains: 1 (A)
 Num. residues:  900
 Num. residues with ins. codes:  0
 Num. HETATM residues:  0
 Num. ligand or modified residues:  0
 Num. water mol.:  0
 Num. atoms:  13575


Loading structure and showing statistics

#### models
Checks for the presence of models in the structure. 
MD simulations require a single structure, although some structures (e.g. biounits) may be defined as a series of models, in such case all of them are usually required.  
Use models('N') to select model num N for further analysis

In [7]:
st_c.models('1')

Running models. Options: 1 
15 Model(s) detected
Models do not superimpose, RMSd:   35.032 A, guessed as Biounit type
Selecting model num. 1


#### chains
Checks for chains (also obtained from print_stats), and allow to select one or more.   
MD simulations are usually performed with complete structures. However input structure may contain several copies of the system, or contains additional chains like peptides or nucleic acids that may be removed. 
Use chains('X,Y') to select chain(s) X and Y to proceed

In [8]:
st_c.chains()

Running chains.
1 Chains detected
  A: Protein
Running  check_only. Nothing else to do.


#### altloc
Checks for the presence of residues with alternative locations. Atoms with alternative coordinates and their occupancy are reported.  
MD simulations requires a single position for each atom.  
Use altloc('occupancy | alt_ids | list of res:id) to select the alternative


In [9]:
st_c.altloc()

Running altloc.
No residues with alternative location labels detected
Running  check_only. Nothing else to do.


#### metals
Detects HETATM being metal ions allow to selectively remove them.  
To remove use metals (' All | None | metal_type list | residue list ')

In [10]:
st_c.metals()

Running metals.
No metal ions present
Running  check_only. Nothing else to do.


#### ligands
Detects HETATM (excluding Water molecules) to selectively remove them.  
To remove use ligands('All | None | Residue List (by id, by num)')


In [11]:
st_c.ligands()

Running ligands.
No ligands found
Running  check_only. Nothing else to do.


#### remh
Detects and remove hydrogen atoms. 
MD setup can be done with the original H atoms, however to prevent from non standard labelling, remove them is safer.  
To remove use remh('yes')


In [12]:
st_c.remh()

Running remh.
60 Residues containing H atoms detected
Running  check_only. Nothing else to do.


#### remwat
Detects water molecules and allows to remove them
Crystallographic water molecules may be relevant for keeping the structure, however in most cases only some of them are required. These can be later added using other methods (titration) or manually.

To remove water molecules use remwat('yes')


In [13]:
st_c.remwat()

Running remwat.
No water molecules found
Running  check_only. Nothing else to do.


#### amide
Amide terminal atoms in Asn ang Gln residues can be labelled incorrectly.  
amide suggests possible fixes by checking the sourrounding environent.

To fix use amide ('All | None | residue_list')

Note that the inversion of amide atoms may trigger additional contacts. 

In [14]:
st_c.amide()

Running amide.
1 unusual contact(s) involving amide atoms found
 ASN A30.OD1  GLN A32.O       2.598 A
Running  check_only. Nothing else to do.


#### chiral
Side chains of Thr and Ile are chiral, incorrect atom labelling lead to the wrong chirality.  
To fix use chiral('All | None | residue_list')

In [15]:
st_c.chiral()

Running chiral.
No residues with incorrent side-chain chirality found
Running  check_only. Nothing else to do.


#### fixside
Detects and re-built missing protein side chains.   
To fix use fixside('All | None | residue_list')

In [16]:
st_c.fixside()

Running fixside.
No residues with missing side chain atoms found
Running  check_only. Nothing else to do.


#### getss
Detects possible -S-S- bonds based on distance criteria.
Proper simulation requires those bonds to be correctly set.

In [17]:
st_c.getss()

Running getss.
No SS bonds detected
Running  check_only. Nothing else to do.


#### clashes
Detects steric clashes based on distance criteria.  
Contacts are classified in: 
* Severe: Too close atoms, usually indicating superimposed structures or badly modelled regions. Should be fixed.
* Apolar: Vdw colissions.Usually fixed during the simulation.
* Polar and ionic. Usually indicate wrong side chain conformations. Usually fixed during the simulation


In [18]:
st_c.clashes()

Running clashes.
No severe clashes detected
4 Steric apolar clashes detected
 MET A9.CE    GLU A58.OE2     2.695 A
 GLU A19.OE2  TRP A38.CZ3     2.645 A
 TYR A40.CD1  GLY A50.O       2.794 A
 THR A42.OG1  THR A49.CG2     2.788 A
4 Steric acceptor clashes detected
 MET A9.O     ASP A24.OD2     2.851 A
 TYR A10.O    PHE A22.O       3.006 A
 ASN A30.OD1  GLN A32.O       2.598 A
 MET A39.O    MET A51.SD      2.999 A
No donor clashes detected
No positive clashes detected
No negative clashes detected
Running  check_only. Nothing else to do.
