##IPerl Notebook implementation of the Gaussian Network Model

Demian Riccardi, March 1, 2015

####Description
This notebook implements the [Gaussian Network Model (GNM) of Bahar, Atilgan, and Erman](http://www.sciencedirect.com/science/article/pii/S1359027897000242) using HackaMol and the Perl Data Language (PDL) and compares the calculated fluctuations to the crystallographic B-Factors. This notebook will be extended to calculate the atom-atom correlations. GNM is a significant simplification of the [Elastic Network Model introduced by Tirion](http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.77.1905), which was already a drastic simplification of protein dynamics.  This notebook uses HackaMol, PDL, and the iPerl kernel, written by Zaki Mughal, of the iPython notebook.

In [1]:
use Modern::Perl;
use HackaMol;

####1. Download molecule from Protein DataBank and read it into a HackaMol::Molecule object:

In [2]:
my $bldr = HackaMol->new;

my $pdb = "2cba.pdb";
$bldr->getstore_pdbid($pdb); #download from pdb and store it locally (to reduce multiple downloads)

my $mol = HackaMol->new
                  ->read_file_mol($pdb); # object methods chain from left to right


HackaMol::Molecule=HASH(0x7ffc5f187f78)


####2. Coarse-grain the molecule
GNM coarse-grains the C$_\alpha$ atoms of the protein backbone. Let's pull out those with full occupancy and create a new HackaMol::Molecule object:

In [3]:
my $mol_cg = HackaMol::Molecule->new(
                                     atoms => [
                                               grep{ $_->occ == 1.0 }
                                               grep{ $_->name eq 'CA' } $mol->all_atoms
                                     ]
);

$mol_cg->print_pdb;


MODEL        1
ATOM     12 CA   HIS A   4       9.989   0.244   8.985  1.00 21.11           C
ATOM     28 CA   TRP A   5       7.704  -1.519  11.429  1.00 13.74           C
ATOM     42 CA   GLY A   6       5.389  -4.309  10.476  1.00 10.51           C
ATOM     46 CA   TYR A   7       4.377  -7.768  11.641  1.00 11.88           C
ATOM     58 CA   GLY A   8       7.285  -9.837  10.303  1.00 18.31           C
ATOM     62 CA   LYS A   9      10.036 -11.614  12.218  1.00 22.88           C
ATOM     71 CA   HIS A  10      12.463  -8.762  11.679  1.00 19.11           C
ATOM     81 CA   ASN A  11      10.176  -5.737  12.225  1.00 13.76           C
ATOM     89 CA   GLY A  12       7.329  -7.156  14.327  1.00 11.93           C
ATOM     93 CA   PRO A  13       5.868  -6.135  17.657  1.00 11.03           C
ATOM    100 CA   GLU A  14       8.848  -7.285  19.694  1.00 10.72           C
ATOM    113 CA   HIS A  15      11.016  -4.780  17.882  1.00  9.33           C
ATOM    123 CA   TRP A  16       8.72

GLOB(0x7ffc5a822e08)


####3. Next, calculate a Kirchoff matrix using a single parameter: the cutoff distance
The Kirchoff matrix (K) (also referred to as the connectivity matrix) is simple to implement. It is a square matrix with each dimension being the number of atoms, i.e. K(1:N,1:N). The cutoff distance is the parameter that determines whether or not two atoms are connected.  If the cutoff distance is small, the matrix is sparse (most elements are zero). The elements of matrix are evaluated as follows:

$ K = \left\{ 
\begin{array}{l l}
  -1                 &                                                  \quad \mbox{if $i\ne j$ and $R_{ij} \le R_{cut} $}\\
                   0 &                                                  \quad \mbox{if $R_{ij} > R_{cut} $}\\ 
 - \displaystyle \sum_{i,j\ne i} K_{ij} & \quad  \mbox{if $i = j$}  
 \end{array} \right. $

Below, we define the cutoff distance, square it (to avoid calculating the square root), and then loop over the atomic coordinates to construct the matrix.      

In [4]:
sub kirchoff_crunch{
    #args: AtomGroup (or Molecule) Cutoff distance
    my $mol   = shift;
    my $rcut  = shift;
    
    my $rsqr  = $rcut*$rcut;
    my $N     = $mol->count_atoms;
    my @xyzs  = map{ $_->xyz } $mol->all_atoms ;

    my @K;    # Kirchoff matrix
#    my @dist; # keep a matrix of the distances

    foreach my $i (0 .. $#xyzs){
        my $xyz_i = $xyzs[$i];
        foreach my $j ($i+1 .. $#xyzs){
            my $d2 = $xyzs[$j]->dist2($xyz_i);
 #           $dist[$i][$j] = $d2;
            my $dxyz2 = $xyzs[$j]->dist2($xyz_i);
            if ($dxyz2 <= $rsqr){
                $K[$i][$j]--;
                $K[$j][$i]--;
                $K[$i][$i]++;
                $K[$j][$j]++;
            }
        }
    }

    return (\@K);
    
}

####4. Compute the pseudo-inverse of the Kirchoff matrix and compare the B-factors.
Here, we use PDL and a PDL interface to Lapack (PDL::LinearAlgebra).  Since the  first eigenvalue of the Kirchoff matrix is zero, we calculate the pseudoinverse (using the mpinv function). 

In [5]:
use PDL::Lite;
use PDL::LinearAlgebra qw(mpinv);
use PDL::Stats::Basic;
use Time::HiRes qw(time);

my $bfact_exp = pdl(map {$_->bfact} $mol_cg->all_atoms); 

my $tgnm1 = time;

my $K = kirchoff_crunch($mol_cg,7.5);

my $kirch_pdl  = pdl(@{$K});
my $pinv_kirch = $kirch_pdl->mpinv;

my $tgnm2 = time;

printf ("time: %.3f\n", $tgnm2-$tgnm1, 0 ); 
my $bfact_calc = $pinv_kirch->diag;
printf ("Pearson coefficient: %.3f\n",  $bfact_calc->corr($bfact_exp));


time: 0.045
Pearson coefficient: 0.731


1


While there are limitations of comparisons between B-factors and theoretical models that will not be discussed here, a correlation above 0.7 is generally pretty good.  Next scale the calculated fluctuations so that the averages match and then print them out for plotting:

In [6]:
$bfact_calc = $bfact_exp->avg*$bfact_calc/$bfact_calc->avg;
printf("%10.3f %10.3f\n",$bfact_exp->at($_), $bfact_calc->at($_) ) foreach (0 .. $bfact_exp->nelem-1);

    21.110     30.812
    13.740     19.298
    10.510     15.476
    11.880     13.308
    18.310     20.693
    22.880     23.133
    19.110     26.947
    13.760     14.904
    11.930     14.563
    11.030     14.384
    10.720     19.968
     9.330     16.509
     9.580     12.500
    10.770     12.812
    11.630     17.806
    13.370     19.890
    12.320     11.739
    12.090     15.622
    11.140     12.601
    10.100     11.777
    13.480     13.877
    14.240     14.621
    14.010     14.473
     8.980      8.896
     5.780      7.420
     5.390      8.384
     5.430      7.916
     5.400      8.815
     7.530      9.366
     9.820      9.336
    12.390     12.579
    13.540     14.019
    17.240     21.917
    19.110     18.145
    15.530      9.379
    16.840     11.419
    18.230     11.185
    22.170     12.159
    28.470     17.814
    28.640     21.662
    25.540     12.963
    24.310     14.910
    19.910     16.059
    16.290     11.099
    14.870     12.272
    16.080