# Tutorial 1: How to Apply Geom3D On Your Customized Data

## Step 1. Load Packages and Set Random Seeds and Device

In [25]:
import numpy as np
import torch
from Geom3D.models import SchNet

seed = 42
np.random.seed(seed)
torch.cuda.manual_seed_all(seed)

device = "cuda" if torch.cuda.is_available() else torch.device("cpu")

## Step 2. Data Loading

The geometric data on small molecules, proteins, and crystal materials can be in many formats:
- SDF
- CIF
- HDF5

The most important thing is to extract the atom types and coordinates. Below we show two simple data points using customized information.

In [13]:
from torch_geometric.data import Data, Batch

Define Molecule 0

In [9]:
# three positions
positions = [
    [0, 0, 0],
    [1, 0, 0],
    [0, 1, 0],
]
# 5 stands for Carbon (index starting with 0)
atom_types = [5, 5, 5]
# label
y = torch.tensor(0.5, dtype=torch.float32)

atom_types = torch.tensor(atom_types, dtype=torch.long)
positions = torch.tensor(positions, dtype=torch.float)

molecule_0 = Data(
    x=atom_types,
    positions=positions,
    y=y
)

Define Molecule 1

In [11]:
# four positions
positions = [
    [0, 0, 0],
    [0, 0, 0.5],
    [0, 1, 0.5],
    [0, 0.5, 0],
]
# 5 stands for Carbon and 6 stands for Nitrogen (index starting with 0)
atom_types = [5, 5, 5, 6]
# label
y = torch.tensor(0.6, dtype=torch.float32)

atom_types = torch.tensor(atom_types, dtype=torch.long)
positions = torch.tensor(positions, dtype=torch.float)

molecule_1 = Data(
    x=atom_types,
    positions=positions,
    y=y
)

Then we gather two molecule data into a batch. In PyG, it will gather two small graph data into one sparse graph data.

**Notice**: Typically, this is done in the Dataloader class by default.

In [21]:
data_list = [molecule_0, molecule_1]
batch = Batch.from_data_list(data_list)
print("molecule 0 is:\n{}\n".format(molecule_0))
print("molecule 1 is:\n{}\n".format(molecule_1))
print("The collated molecules in batch is:\n{}\n".format(batch))
print("The batch.batch field defines which atoms belong to which molecule/graph:\n{}".format(batch.batch))

molecule 0 is:
Data(x=[3], y=0.5, positions=[3, 3])

molecule 1 is:
Data(x=[4], y=0.6000000238418579, positions=[4, 3])

The collated molecules in batch is:
DataBatch(x=[7], y=[2], positions=[7, 3], batch=[7], ptr=[3])

The batch.batch field defines which atoms belong to which molecule/graph:
tensor([0, 0, 0, 1, 1, 1, 1])


## Step 3. Set Model

In [26]:
node_class, edge_class = 119, 5
num_tasks = 1

emb_dim = 128
SchNet_num_filters = 128
SchNet_num_interactions = 6
SchNet_num_gaussians = 51
SchNet_cutoff = 10
SchNet_readout = "mean"

model = SchNet(
    hidden_channels=emb_dim,
    num_filters=SchNet_num_filters,
    num_interactions=SchNet_num_interactions,
    num_gaussians=SchNet_num_gaussians,
    cutoff=SchNet_cutoff,
    readout=SchNet_readout,
    node_class=node_class,
).to(device)
graph_pred_linear = torch.nn.Linear(emb_dim, num_tasks).to(device)

## Step 4. Make Predictions

In [30]:
batch = batch.to(device)
molecule_3D_repr = model(batch.x, batch.positions, batch.batch)
pred = graph_pred_linear(molecule_3D_repr).squeeze()
print("molecule 3D representation:\n{}\n".format(molecule_3D_repr.size()))
print("The predicted values for two molecules are:\n{}".format(pred))

molecule 3D representation:
torch.Size([2, 128])

The predicted values for two molecules are:
tensor([0.2268, 0.3431], device='cuda:0', grad_fn=<SqueezeBackward0>)


Notice that till here we are only using a randomly-initialized SchNet to make predictions, i.e., no optimization is conducted. We will show how to do this in the following tutorials.