## Calculate similarity metric for real and forecasted protein 3d structure

In the synthetic examples provided below, the second row in the coords1 array and the second row in the coords2 array are assumed to represent the same "type" of atom in their respective protein structures. In other words, they are considered to be corresponding atoms for the purpose of the RMSD (Root Mean Square Deviation) calculation. This is why the ordering is important: atoms in the same row position across the two matrices are assumed to correspond to each other.

To illustrate, consider the following simplified coordinates:
- coords1 has a second row of [4.0, 5.0, 6.0].
- coords2 has a second row of [4.1, 5.1, 6.1].
In this example, these two rows are assumed to represent the same atom in each of their respective structures. The RMSD calculation would then consider the distance between these two points as part of its overall computation.

However, it's worth noting that in real-world applications, determining which atoms correspond between two different protein structures can be a non-trivial task that might involve sequence alignment or other advanced techniques to ensure a meaningful comparison. The synthetic example simplifies this by assuming that the atoms are already aligned in the arrays.

To compare two protein 3D structures, one commonly used algorithm is the Root Mean Square Deviation (RMSD) algorithm, which measures the average distance between corresponding atoms in the two protein structures. Additionally, you may be interested in other metrics like the Template Modeling Score (TM-score) or Global Distance Test (GDT), but RMSD is one of the most straightforward and commonly used metrics.

In [None]:
from Bio.PDB import Superimposer
import numpy as np

In [6]:
# Synthetic 3D coordinates for atoms in two protein structures
coords1 = np.array([
    [1.0, 2.0, 3.0],
    [4.0, 5.0, 6.0],
    [7.0, 8.0, 9.0],
    [1.0, 2.0, 2.0],
    [4.0, 2.0, 0.0]
])

coords2 = np.array([
    [1.1, 2.1, 3.1],
    [4.1, 5.1, 6.1],
    [7.1, 8.1, 9.1],
    [1.1, 2.1, 2.1],
    [4.1, 2.1, 0.1]
])

# Calculate the RMSD
def calculate_rmsd(coords1, coords2):
    diff = coords1 - coords2
    rmsd = np.sqrt(np.sum(diff * diff) / len(coords1))
    return rmsd

rmsd = calculate_rmsd(coords1, coords2)
print(f"RMSD: {rmsd:.2%}")

RMSD: 17.32%
