# CNNs4QSPR TUTORIAL

This notebook is a tutorial to get around with `CNNs4QSPR`.It contains examples showing how each function runs and what output is expected. You can work on this notebook to get familiar with the repo and then can run the package smoothly. Have fun!

Okay let's start with loading the file and getting some cool protein stuff out of it.

### Modules you will need to import before you get started

In [3]:
import torch
import os
import numpy as np
import pandas as pd
import plotly.express as px
from biopandas.mol2 import PandasMol2
from biopandas.pdb import PandasPdb
from scipy.stats import special_ortho_group
from cnns4qspr import loader
from cnns4qspr import visualizer

In [6]:
myprotein_dict = loader.load_pdb('/Users/nisargjoshi/Desktop/direct_project/cnns4qspr/cnns4qspr/formatting_data/sample_pdbs/1a0rP02')



In [7]:
myprotein_dict

{'x_coords': array([20.998, 22.097, 23.492, 23.797, 21.734, 22.767, 22.445, 21.447,
        23.214, 24.369, 25.73 , 26.224, 26.257, 26.603, 27.121, 25.972,
        24.993, 28.124, 29.178, 29.948, 29.408, 31.21 , 26.084, 25.069,
        24.918, 25.906, 25.447, 23.676, 23.402, 22.009, 21.477, 23.5  ,
        22.642, 21.417, 20.069, 19.981, 19.064, 19.104, 19.024, 20.043,
        18.042, 19.691, 18.482, 20.932, 20.918, 22.257, 23.292, 20.509,
        21.201, 19.01 , 22.219, 23.424, 24.147, 23.545, 25.457, 26.298,
        26.175, 25.908, 27.702, 27.463, 26.252, 26.347, 26.253, 24.872,
        24.741, 26.607, 28.082, 28.372, 29.85 , 30.149, 23.844, 22.495,
        22.403, 21.863, 22.994, 23.003, 23.734, 23.252, 23.707, 23.633,
        23.084, 24.891, 25.71 , 24.914, 24.824, 26.953, 27.805, 27.777,
        28.958, 24.318, 23.544, 22.317, 21.979, 23.138, 23.03 , 23.924,
        21.952, 21.65 , 20.468, 20.853, 20.145, 19.855, 19.129, 18.786,
        18.893, 21.983, 22.482, 22.729, 22.363, 23.7

So we get the data of protein in the form of dictionary and you can use this data to create channels which are fed into the neural network.

In [10]:
loader.shift_coords(myprotein_dict)

{'x_coords': array([20.998, 22.097, 23.492, 23.797, 21.734, 22.767, 22.445, 21.447,
        23.214, 24.369, 25.73 , 26.224, 26.257, 26.603, 27.121, 25.972,
        24.993, 28.124, 29.178, 29.948, 29.408, 31.21 , 26.084, 25.069,
        24.918, 25.906, 25.447, 23.676, 23.402, 22.009, 21.477, 23.5  ,
        22.642, 21.417, 20.069, 19.981, 19.064, 19.104, 19.024, 20.043,
        18.042, 19.691, 18.482, 20.932, 20.918, 22.257, 23.292, 20.509,
        21.201, 19.01 , 22.219, 23.424, 24.147, 23.545, 25.457, 26.298,
        26.175, 25.908, 27.702, 27.463, 26.252, 26.347, 26.253, 24.872,
        24.741, 26.607, 28.082, 28.372, 29.85 , 30.149, 23.844, 22.495,
        22.403, 21.863, 22.994, 23.003, 23.734, 23.252, 23.707, 23.633,
        23.084, 24.891, 25.71 , 24.914, 24.824, 26.953, 27.805, 27.777,
        28.958, 24.318, 23.544, 22.317, 21.979, 23.138, 23.03 , 23.924,
        21.952, 21.65 , 20.468, 20.853, 20.145, 19.855, 19.129, 18.786,
        18.893, 21.983, 22.482, 22.729, 22.363, 23.7

We use the `shift_coords` function to place the protein such that it's coordinates are in the center of the field tensor. Sometimes, we don't get the plots of the protein which is fed in the neural net if it's size is large to place it completely inside the field tensor. So `shift_coords` basically shifts the coordinates as per the mean value of the extreme coordinates and we can get the beautiful plots.

In [19]:
field_dict = loader.make_fields(myprotein_dict)
field_dict

This is channel  CA


{'CA': tensor([[[[[0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.],
            ...,
            [0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.]],
 
           [[0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.],
            ...,
            [0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.]],
 
           [[0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.],
            ...,
            [0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.]],
 
           ...,
 
           [[0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0

In [32]:
make_ = loader.voxelize('/Users/nisargjoshi/Desktop/direct_project/cnns4qspr/cnns4qspr/formatting_data/sample_pdbs/1a0rP02',channels = ['charged'])
make_

*** in atoms_from_residues *** 
This is channel  charged


{'charged': tensor([[[[[0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.],
            ...,
            [0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.]],
 
           [[0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.],
            ...,
            [0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.]],
 
           [[0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.],
            ...,
            [0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.]],
 
           ...,
 
           [[0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 

In [44]:
afp = loader.atoms_from_residues(myprotein_dict,['GLU'])
afp

*** in atoms_from_residues *** 


array([[ -1.9785,   9.2875, -13.454 ],
       [ -0.8795,  10.1025, -14.006 ],
       [  0.5155,   9.9065, -13.426 ],
       [  0.8205,  10.3885, -12.335 ],
       [ -1.2425,  11.5875, -13.952 ],
       [ -0.2095,  12.4995, -14.607 ],
       [ -0.5315,  13.9855, -14.469 ],
       [ -1.5295,  14.3535, -13.805 ],
       [  0.2375,  14.7925, -15.04  ],
       [ -4.6955,   2.0025,   5.834 ],
       [ -6.1145,   2.2175,   6.102 ],
       [ -6.5625,   1.5055,   7.368 ],
       [ -7.3835,   2.0365,   8.119 ],
       [ -6.9735,   1.7625,   4.911 ],
       [ -6.8225,   2.6385,   3.645 ],
       [ -7.6985,   2.2005,   2.462 ],
       [ -8.2765,   1.0965,   2.512 ],
       [ -7.8075,   2.9615,   1.47  ],
       [ -4.1455,  -0.2955,   9.585 ],
       [ -3.0685,  -0.1835,  10.562 ],
       [ -2.4655,   1.2205,  10.494 ],
       [ -1.8235,   1.6335,  11.493 ],
       [ -1.9985,  -1.2395,  10.259 ],
       [ -2.5425,  -2.6685,  10.239 ],
       [ -1.6285,  -3.6505,   9.524 ],
       [ -0.3965,  -3.591

The `atoms_from_residues` function gives you the the positions of the atoms in the residues.

In [49]:
plottable = visualizer.plot_field(field_dict['CA'][0][0])
plottable