# MolSysMT Introduction

## Installation

To install MolSysMT, since there is no "stable" version yet, you have to clone the main repository:

```
git clone git@github.com:uibcdf/MolSysMT.git
```

There are some dependencies you have to solve manually (I have to make a file):
numpy, networkx, parmed, mdtraj, nglview, json, os, urllib, pyunitwizard...

Now, with your conda environment activated, install molsysmt as a developing module:

```
python setup.py develop
```

You can now open python, ipython or jupyter to check if molsysmt is imported with out complains:

```
import molsysmt as msm
```

Now, if this happens without errors or warnings, run the NGLView patch to make this library compatible with MolSysMT (This has to be done just the first time):

```
from molsysmt.tools import nglview
nglview.adding_molsysmt()
```

## Tasks to do

In order to solve these tasks, make a copy of this notebook and work over the copy. Name your notebook as you wish, and push it to the main branch of the repository with the cells to do the following: 

- Load the pdb file '1NCR' -from the Protein Data Bank or locally- as a molsysmt.MolSys object.
- Get the form of the new object.
- Print out the information summary of the new object.
- Check if the system has hydrogen atoms. Why there is no hydrogen atoms?
- Remove atoms from water molecules and ions
- Print out the information summary about the molecules of the new object.
- Protonate (add hydrogens) at pH=7.4 to the molecular system.
- Get the number of hydrogen atoms added to the system.
- Get the sequence of aminoacids (3 letters code) of the molecule with index.
- How many atoms with name "CA" has the molecular system?
- Get the list of indices of the groups named "VAL".
- Get the coordinates of the atoms with name "CA" from those groups named "VAL". 
- Show the view of the entity named "W11".
- Show the view of the whole system (water and ions were removed).
- Get the minimum distance between any atom of the small molecule named "W11" and any atom of the molecule with index 0. The same with the molecules indexed 1, and 2.
- Get the distance matrix between the atoms of type "C" from the small molecule named "W11" and any atom named "CA" of molecule with index 0.
- Get all phi and psi dihedral angles of the molecule type peptide.
- Get all covalent chains with the sequence of atoms with name "CB1"-"CB2"-"CB3"-"CB4"-"CB5"-"CB6"-"CB1"
- Get all covalent chains with 7 atoms type "C". How many chains you got? Can this be a bug? If you think so, open and issue reporting this possible bug. Could the method `covalent_chains` be used to detect rings? Should we define a method to get covalent rings?

In [1]:
import molsysmt as msm
import pandas as pd
import nglview as nv



In [2]:
pdb_file = 'pdb:1ncr'
# Fetch file
protein = msm.convert(pdb_file)
print('Form: {}'.format(msm.get_form(protein)))

Form: molsysmt.MolSys


In [3]:
# Print out summary of the system
msm.info(protein)

form,n_atoms,n_groups,n_components,n_chains,n_molecules,n_entities,n_waters,n_ions,n_small_molecules,n_peptides,n_proteins,n_frames
molsysmt.MolSys,6717,1145,344,10,344,7,338,1,2,1,2,1


In [4]:
# Look for hydrogen atoms
n_Hs = msm.get(protein, selection='atom_type=="H"', n_atoms=True)
print('{} hydrogen atoms in the system'.format(n_Hs))

0 hydrogen atoms in the system


In [5]:
# Remove waters
protein = msm.remove_solvent(protein)
msm.info(protein)

form,n_atoms,n_groups,n_components,n_chains,n_molecules,n_entities,n_small_molecules,n_peptides,n_proteins,n_frames
molsysmt.MolSys,6378,806,5,6,5,5,2,1,2,1


In [6]:
# Protonate the molecule
protein = msm.add_missing_hydrogens(protein)
n_Hs = msm.get(protein, selection='atom_type=="H"', n_atoms=True)
print('Number of H atoms after protonation: {}'.format(n_Hs))

Number of H atoms after protonation: 6160


In [7]:
# Sequence of aminoacids with index
aa, ind = msm.get(protein, target="group", group_name=True, group_index=True)
aa_chain = pd.DataFrame(data=aa, index=ind, columns=["Aminoacid"])
print("Protein has {} aminoacids".format(aa_chain.shape[0]))
aa_chain.head(10)

Protein has 806 aminoacids


Unnamed: 0,Aminoacid
0,ASN
1,PRO
2,VAL
3,GLU
4,ARG
5,TYR
6,VAL
7,ASP
8,GLU
9,VAL


In [10]:
# Atoms with name CA
print("Number of atoms with name CA: {}".format(msm.select(protein, target="atom", selection='atom_name=="CA"').shape[0]))

Number of atoms with name CA: 804


In [11]:
# Indices of VAL
msm.get(protein, target="group", selection="group_name == 'VAL'", index=True)

array([  2,   6,   9,  13,  15,  16,  32,  55,  66,  80,  84, 124, 127,
       138, 144, 167, 208, 209, 221, 229, 231, 232, 253, 269, 273, 284,
       303, 307, 308, 312, 383, 385, 399, 400, 410, 415, 431, 435, 477,
       480, 483, 489, 497, 501, 513, 517, 534, 535, 540, 542, 576, 585,
       601, 606, 623, 665, 689, 690, 693, 699, 702, 703, 734, 735, 749,
       778])

In [12]:
msm.info(protein, target="group", indices=[2,6,9])

index,id,name,type,n atoms,component index,chain index,molecule index,molecule type,entity index,entity name
2,3,VAL,aminoacid,16,0,0,0,protein,0,Protein_0
6,7,VAL,aminoacid,16,0,0,0,protein,0,Protein_0
9,10,VAL,aminoacid,16,0,0,0,protein,0,Protein_0


In [13]:
# Coordinates of the atoms with name "CA" from those groups named "VAL"
msm.get(protein, target="atom", selection='atom_name=="CA" & group_name=="VAL"', coordinates=True)

0,1
Magnitude,[[[4.269899845123291 1.370300054550171 8.995699882507324]  [4.51170015335083 1.8454999923706055 9.371600151062012]  [4.763500213623047 1.8478000164031982 9.819000244140625]  [3.8684000968933105 2.1089000701904297 9.942500114440918]  [3.647700071334839 2.69320011138916 10.031800270080566]  [3.4795000553131104 2.9827001094818115 10.209400177001953]  [1.9955999851226807 4.338799953460693 11.156800270080566]  [3.674799919128418 4.0289998054504395 9.769800186157227]  [3.6572999954223633 1.6894999742507935 11.240300178527832]  [5.606400012969971 0.2766000032424927 13.416600227355957]  [6.379799842834473 -0.11949999630451202 14.22700023651123]  [5.735799789428711 0.5640000104904175 12.267800331115723]  [6.48829984664917 0.2919999957084656 12.894800186157227]  [6.285200119018555 -0.6345000267028809 12.459799766540527]  [4.373799800872803 -0.8439000248908997 12.6983003616333]  [5.349699974060059 -0.6381999850273132 11.996199607849121]  [3.237799882888794 0.38040000200271606 13.294400215148926]  [2.929800033569336 0.16920000314712524 13.223299980163574]  [6.063399791717529 -0.9821000099182129 12.80150032043457]  [6.6407999992370605 0.006099999882280827 13.453399658203125]  [6.124499797821045 0.1720999926328659 13.13070011138916]  [5.855000019073486 0.4296000003814697 13.066900253295898]  [3.722100019454956 2.345400094985962 13.474900245666504]  [3.5613999366760254 3.031599998474121 14.18019962310791]  [3.7816998958587646 2.8901000022888184 13.528900146484375]  [3.897200107574463 4.664999961853027 12.763999938964844]  [0.5583000183105469 3.205199956893921 12.027799606323242]  [1.280400037765503 2.4384000301361084 11.715299606323242]  [1.3424999713897705 2.095400094985962 11.565400123596191]  [1.5598000288009644 1.5920000076293945 11.235699653625488]  [0.8447999954223633 1.988800048828125 12.463399887084961]  [0.7224000096321106 2.521899938583374 12.805299758911133]  [1.5116000175476074 2.661799907684326 12.771200180053711]  [1.746000051498413 2.3819000720977783 12.666099548339844]  [3.1328999996185303 1.8717000484466553 14.34000015258789]  [3.1563000679016113 0.9591000080108643 14.376500129699707]  [1.81659996509552 2.428999900817871 14.175200462341309]  [1.9757000207901 1.9729000329971313 15.14210033416748]  [1.3509000539779663 1.653499960899353 11.938199996948242]  [1.9250999689102173 1.0536999702453613 11.979299545288086]  [2.200000047683716 0.2493000030517578 11.98449993133545]  [1.462499976158142 0.2888999879360199 13.3149995803833]  [1.711899995803833 2.251699924468994 13.157400131225586]  [1.9275000095367432 3.4202001094818115 13.284299850463867]  [0.4194999933242798 3.359999895095825 13.781599998474121]  [0.5860000252723694 2.1993000507354736 13.117300033569336]  [0.9603000283241272 -0.19059999287128448 13.989999771118164]  [1.226699948310852 0.0632999986410141 14.086600303649902]  [6.690299987792969 0.18459999561309814 9.848600387573242]  [6.6869001388549805 0.559499979019165 10.361900329589844]  [2.9112000465393066 1.7625000476837158 11.27180004119873]  [2.611999988555908 2.923099994659424 11.631999969482422]  [2.447700023651123 3.41129994392395 13.57509994506836]  [2.444000005722046 4.406199932098389 13.109999656677246]  [3.7002999782562256 4.2418999671936035 11.924200057983398]  [2.1451001167297363 5.2434000968933105 12.169500350952148]  [2.452699899673462 5.022200107574463 11.41510009765625]  [2.1846001148223877 5.229499816894531 11.589300155639648]  [1.423799991607666 4.8460001945495605 12.083499908447266]  [1.986199975013733 4.331999778747559 11.697999954223633]  [2.877000093460083 4.280799865722656 11.220399856567383]  [3.2460999488830566 4.347700119018555 11.28030014038086]  [1.2374000549316406 5.513800144195557 12.899399757385254]  [0.9505000114440918 5.3495001792907715 13.084799766540527]  [2.8678998947143555 3.6098999977111816 11.92710018157959]  [4.474800109863281 0.44290000200271606 8.996100425720215]]]
Units,nanometer


In [14]:
msm.info(protein, target='component')

index,n atoms,n groups,chain index,molecule index,molecule type,entity index,entity name
0,4489,285,0,0,protein,0,Protein_0
1,7571,490,1,1,protein,1,Protein_1
2,436,29,3,2,peptide,2,Peptide_0
3,27,1,4,3,small molecule,3,W11
4,15,1,5,4,small molecule,4,MYR


In [62]:
# Taking a look at the system

view = msm.view(protein, viewer="NGLView")
view.add_ball_and_stick(w11_molecule, color="orange")
view

NGLWidget()

In [None]:
# View of the entity named "W11"
w11_molecule = msm.select(protein, selection='component_index==3', to_syntaxis="NGLView")
w11_view = msm.view(w11_molecule, viewer="NGLView")

In [44]:
# Get the minimum distance between any atom of the small molecule named "W11" and any atom of the molecule with index 0. The same with the molecules indexed 1, and 2

w11_atom_index = msm.get(protein, target='group', selection='component_index==3', atom_index=True)

print(f"Small molecule shape: {w11_atom_index.shape}\n")

for i in range(3):
    protein_atom_index = msm.get(protein, target='group', selection='component_index=={}'.format(i), atom_index=True)
    min_pairs, min_distances = msm.minimum_distance(protein,
                                                    groups_of_atoms=protein_atom_index,
                                                    group_behavior='geometric_center',
                                                    groups_of_atoms_2=w11_atom_index,
                                                    group_behavior_2='geometric_center')
    
    print(f"Molecule number {i}\n\nShape: {protein_atom_index.shape}")
    print(f"Min pairs: {min_pairs[0]}")
    print(f"Min distance: {min_distances[0]}\n\n")

Small molecule shape: (1, 27)

Molecule number 0

Shape: (285,)
Min pairs: [97  0]
Min distance: 0.5041120436060018 nanometer


Molecule number 1

Shape: (490,)
Min pairs: [275   0]
Min distance: 0.882876145000038 nanometer


Molecule number 2

Shape: (29,)
Min pairs: [20  0]
Min distance: 1.2653449156135292 nanometer




In [52]:
# Get the distance matrix between the atoms of type "C" from the small molecule named "W11" and any atom named "CA" of molecule with index 0
protein_atom_index = msm.get(protein, target='group', selection='component_index==0 & atom_name=="CA"', atom_index=True)
w11_atom_index = msm.get(protein, target='group', selection='component_index==3 & atom_type=="C"', atom_index=True)

neighbors, distances = msm.neighbors(protein,
                                    groups_of_atoms=protein_atom_index,
                                    group_behavior='geometric_center',
                                    groups_of_atoms_2=w11_atom_index,
                                    group_behavior_2='geometric_center',
                                    num_neighbors=1)

distances.shape

(1, 285, 1)

In [67]:
# Get all phi and psi dihedral angles of the molecule type peptide
covalent_chains = msm.covalent_dihedral_quartets(protein, dihedral_angle="phi-psi", selection='molecule_type=="peptide"')
dihedral_angles = msm.get_dihedral_angles(protein, quartets=covalent_chains)
dihedral_angles.shape

(1, 56)

In [8]:
# Get all covalent chains with the sequence of atoms with name "CB1"-"CB2"-"CB3"-"CB4"-"CB5"-"CB6"-"CB1"
chain = ["C1B", "C2B", "C3B", "C4B", "C5B", "C6B", "C1B"]

for j in range(len(chain)):
    chain[j] = 'atom_name=="{}"'.format(chain[j])
    
msm.covalent_chains(protein, chain=chain)

array([[12506, 12507, 12509, 12510, 12511, 12512, 12506]])

In [9]:
msm.select(protein, target="atom", selection='atom_name=="C1B"')

array([12506])

In [10]:
# Get all covalent chains with 7 atoms type "C"
chain = ['atom_type=="C"'] * 7
msm.covalent_chains(protein, chain=chain)

array([[    4,     6,     4, ...,     4,     6,     4],
       [    4,     6,     4, ...,     4,     8,     4],
       [    4,     6,     4, ...,     4,     8,    11],
       ...,
       [12537, 12536, 12537, ..., 12535, 12536, 12537],
       [12537, 12536, 12537, ..., 12537, 12536, 12535],
       [12537, 12536, 12537, ..., 12537, 12536, 12537]])