# Subpace Overlap

---

The subspace overlap is but one of the measures used to compare the subspaces determined by both the Elastic Network Model (ENM) and Principal Component Analysis (PCA). PCA is a powerful tool to quantify the extent to which two protein simulations or fragements thereof (__A__ and __B__) explore the same conformational space. To do so, a common approach is to select a subset of eigenvectors for each ensemble, e.g., $\pmb{v}_{1}^{A},\dots,\pmb{v}_{n}^{A}$ and $\pmb{v}_{1}^{B},\dots,\pmb{v}_{n}^{B}$. Thus, we can use the following expression:

\begin{align}
\Psi_{A:B} = \frac{1}{n} \sum_{i=1}^{n} \sum_{j=1}^{n} (\pmb{v}_{i}^{A} \pmb{\cdot} \, \pmb{v}_{j}^{B})^{2}
\end{align}

The _subspace overlap_ $\Psi$ ranges from $0$, when the eigenvector subsets are completely dissimilar, to $1$ (or $100\%)$ when they are identical. The number of eigenvectors used, $n$, is typically chosen so as to represent a significant proportion of the fluctuations in the simulation.

Knowing this, let us calculate the subspace overlap between two trajectories.

---

In [2]:
# Author: Jan A. Siess
# Date: 06 September 2019
#
# Purpose: To calculate the subspace overlap between two trajectories in order
# to determine the degree similarity between two proteins spanning a similar
# conformational space.

from prody import *
from pylab import *
import numpy as np
import py3Dmol

### For Trajectory A

First, we must conduct the ANM. Toward the end, we will receive a set of eigenvectors with which we will be able to use.

---

In [4]:
# Python Version: Intended for Python versions 3.[*]+
prot = 'drosophila_mutopt_r3_backbone.nowa.avg'

#if in regular terminal working with a regular [*].py file:
#prot = sys.argv[1]

# Parse structure of PDB by passing an identifier. If it is not located in the current working --
# directory, it will be downloaded.
avg = parsePDB('avg.pdb')

#Extracting only C$\alpha$ atoms.
calphas = avg.select('protein and name CA')

#Get the actual number of alpha carbon atoms into a variable
num_CA = calphas.numAtoms('protein')

print(f'The number of Carbon Atoms is: {num_CA}')

@> 568 atoms and 1 coordinate set(s) were parsed in 0.01s.


The number of Carbon Atoms is: 568


In [5]:
#Building the Hessian

#Instantiate an ANM instance:
anm = ANM('avg ANM analysis')

#Build the Hessian matrix by passing the selected @>568 atoms to ANM.buildHessian() method:
anm.buildHessian(calphas, cutoff = 13.5, gamma = 1.0)

#Get a copy of the Hessian matrix using ANM.getHessian() method [further, we limit full output to only round up to 3 places]:
structHessian = anm.getHessian().round(3)

#Just out of curiousity to determine the number of elements in the array:
SquareMatrix = np.prod(structHessian.shape)
print(f'The number of array elements in the Hessian matrix amounts to: {SquareMatrix}')

#Setting the force constant K to 1.0 kcal/Å^{2} and the cutt-off distance to 13.5 Å 
#This was determined from previously derived general ranges for proteins.
CO = 13.5
K = 1*10**0

#Save the array to a text file 
np.savetxt(f'{prot}_Hessian_gamma_{K}_cutoff_{CO}.dat', structHessian)

@> Hessian was built in 0.15s.


The number of array elements in the Hessian matrix amounts to: 2903616


In [6]:
#Obtaining the normal modes data 

#The normal mode of an oscillating system is a pattern of motion in which all parts of the system move sinusoidally with the same frequency and with a fixed phase relation.

#n_modes -- number of non-zero eigenvalues/vectors to calculate
#zeros -- if True, modes with zero eigencalues will be kept
#turbo -- Use a memory intensive, but faster way to calculate modes

#determine first the number of n_modes
num_modes = num_CA*3 

#Go ahead with the main calculation...

#Calculating Normal Modes, eigenvalues/-vectors, and the covariance matrix:
anm.calcModes(n_modes=num_modes, zeros = False, turbo = True)
eigVals = anm.getEigvals()
eigVects = anm.getEigvecs()
egCov = anm.getCovariance()

#Save the eigenvalue/-vectors into a txt file:
np.savetxt(f'{prot}_egvals2_gamma_{K}_cutoff_{CO}.dat', eigVals)
np.savetxt(f'{prot}_egvects2_gamma_{K}_cutoff_{CO}.dat', eigVects)
np.savetxt(f'{prot}_egCov2_gamma_{K}_cutoff_{CO}.dat', egCov)

@> 1698 modes were calculated in 0.77s.


In [17]:
A = []
B = []
for i in range(0,20):
    A.append(eigVects[i])
for i in range(21,41):
    B.append(eigVects[i])

An = A[0]
Bn = B[5]
print(An) ; print(Bn)

print(len(A))

result = 0

for i in range(0,len(An)):
        prod = An[i]*Bn[i]
        result = prod + result
        #print(f'{An[i]}   {Bn[i]}')
result = result ** 2 ; result = 1/len(An)

print(result)


1698
1698
20
0.0005889281507656066


## Approach

Since this is a list of lists, and each list within the nested list is an eigenvector,
I would like to experiment with the possibility of summing over an individual list, and then saving the sum in position. I.e., write a function which first gets the sum, and then write another which will be responsible for getting the dot product between the two.

In [16]:
def calcVectorSum(EigenVector1, Eigenvector2):
    return 0 