<a href="https://colab.research.google.com/github/Jamoxidase/MachineLearning/blob/main/drugPointCloud.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

I am working on drug-protein binding inferance, specifically a large-scale dataset containing drug molocule strutures and experimental binding states for each molocule with respect to 3 proteins.

An interesting input representation of the molocule is a 3D point cloud that represents both the spacial structure of a molocule and local point charges through regional scaling.

My thought process:
The electrochemical fingerprint of a small molocule drug mediates target interaction. By representing the molocule in this way, the model should be able to capture the inverse of protein binding cavity.

Possible limitations:
- data orientation: apply principles of geometric deep learning to induce bias and reduce the need repetative sampling with augmentation.
- bond rotations:
- generalizibility: proteins have flexible binding pockets and can interact with distinct molocules classes. Training data should represent the full scope of binding confirmations between the protein and the drug. Given that detailed structuers of the proteins are availible, it may be possible to generate synthetic data based on the inverse of protein binding site.

In [None]:
!pip install rdkit
!pip install torch_geometric

In [37]:
import torch
import torch.nn as nn
import torch_geometric.nn as pyg_nn
import torch_geometric.data as pyg_data
from rdkit import Chem
from rdkit.Chem import AllChem
import pandas as pd
import random
import numpy as np
from sklearn.preprocessing import MinMaxScaler

In [41]:

def compute_point_cloud(mol, num_points_per_atom=100):
    mol = Chem.AddHs(mol)    # Get 3D coordinates of atoms
    #print(mol.GetNumAtoms()) #<--
    AllChem.EmbedMolecule(mol)
    AllChem.UFFOptimizeMolecule(mol)
    mol.GetConformer()
    AllChem.ComputeGasteigerCharges(mol) # Get point charges

    for i, atom in enumerate(mol.GetAtoms()):
        coords = np.array(mol.GetConformer().GetAtomPosition(i))

    coords_list = []
    charge_list = []
    for i, atom in enumerate(mol.GetAtoms()):
        coords = mol.GetConformer().GetAtomPosition(i)
        coords_list.append(coords)

        charge = atom.GetProp('_GasteigerCharge')
        charge_list.append(charge)

    coords_list = np.array(coords_list)
    charge_list = np.array(charge_list, dtype =float)
    charge_list = charge_list.reshape(-1, 1) # 2D array for MinMaxScaler
    scaler = MinMaxScaler()
    normalized_charges = scaler.fit_transform(charge_list)
    normalized_charges = normalized_charges.flatten()


    point_cloud = []
    for i in range(len(coords_list)):
        # init charge spheres
        center = coords_list[i]
        radius = normalized_charges[i]

        # 'project' points to surface of charge spheres
        for _ in range(num_points_per_atom):
            theta = 2 * np.pi * np.random.rand()  # azimuthal angle
            phi = np.arccos(2 * np.random.rand() - 1)  # polar angle

            # spherical -> cartesian coordinates
            x = center[0] + radius * np.sin(phi) * np.cos(theta)
            y = center[1] + radius * np.sin(phi) * np.sin(theta)
            z = center[2] + radius * np.cos(phi)
            point_cloud.append([x, y, z])

    return coords_list, np.array(point_cloud)


In [42]:
mol = Chem.MolFromSmiles("CC(=O)Nc1ccc(cc1)S(=O)(=O)N")
coords_list, point_cloud = compute_point_cloud(mol)

In [40]:
print(point_cloud)

[[ 4.49351277e+00 -1.45289661e-01  2.94833257e-03]
 [ 4.17922544e+00  6.60150631e-01 -3.21217046e-01]
 [ 4.46568966e+00  4.12426212e-01  4.34404117e-01]
 ...
 [-4.08656124e+00 -4.36948166e-01 -1.98929195e+00]
 [-4.79940254e+00 -2.04317456e-01 -1.92936590e+00]
 [-4.83211846e+00  4.84351422e-01 -1.64880254e+00]]


In [43]:
import plotly.graph_objects as go

x_coords = point_cloud[:, 0]
y_coords = point_cloud[:, 1]
z_coords = point_cloud[:, 2]

# Create a 3D scatter plot
fig = go.Figure(data=[go.Scatter3d(
    x=x_coords,
    y=y_coords,
    z=z_coords,
    mode='markers',
    marker=dict(
        size=6,
        color=z_coords,  # set color to z values
        colorscale='Viridis',  # choose a colorscale
        opacity=0.8
    )
)])
fig.update_layout(title='3D Scatter Plot of point cloud')
# Show the plot
fig.show()

It would be interesting to add some conturing to yield a more continous representation/

The following cell displays the atomic points in molocule. Bonds are not shown.

In [44]:
import plotly.graph_objects as go

x_coords = coords_list[:, 0]
y_coords = coords_list[:, 1]
z_coords = coords_list[:, 2]

# Create a 3D scatter plot
fig = go.Figure(data=[go.Scatter3d(
    x=x_coords,
    y=y_coords,
    z=z_coords,
    mode='markers',
    marker=dict(
        size=6,
        color=z_coords,  # set color to z values
        colorscale='Viridis',  # choose a colorscale
        opacity=0.8
    )
)])
fig.update_layout(title='3D Scatter Plot of coords_list')
# Show the plot
fig.show()