The purpose of this notebook is to demonstrate some of the basics of the Open Catalyst Project's (OCP) codebase and data. In this example, we will train a schnet model for predicting the energy and forces of a given structure (S2EF task). First, ensure you have installed the OCP ocp repo and all the dependencies according to the [README](https://github.com/Open-Catalyst-Project/ocp/blob/master/README.md).

## Imports

In [1]:
import torch
import os
from ocpmodels.trainers import ForcesTrainer, EnergyTrainer
from ocpmodels import models
from ocpmodels.datasets import SinglePointLmdbDataset
from ocpmodels.common import logger
from ocpmodels.common.utils import setup_logging
setup_logging()

  _Jd = torch.load(os.path.join(os.path.dirname(__file__), "Jd.pt"))
  @torch.cuda.amp.autocast(enabled=False)
  @torch.cuda.amp.autocast(enabled=False)
  @torch.cuda.amp.autocast(enabled=False)
  @torch.cuda.amp.autocast(enabled=False)
  _Jd = torch.load(os.path.join(os.path.dirname(__file__), "Jd.pt"))
  _Jd = torch.load(os.path.join(os.path.dirname(__file__), "Jd.pt"))


In [2]:
# a simple sanity check that a GPU is available
if torch.cuda.is_available():
    print("True")
else:
    print("False")

False


## The essential steps for training an OCP model

1) Download data

2) Preprocess data (if necessary)

3) Define or load a configuration (config), which includes the following
   
   - task
   - model
   - optimizer
   - dataset
   - trainer

4) Train

5) Depending on the model/task there might be intermediate relaxation step

6) Predict

In [3]:
state_dict = torch.load("D://SA//project//fairchem//test2//cgcnn_all.pt", map_location=torch.device('cpu'))

  state_dict = torch.load("D://SA//project//fairchem//test2//cgcnn_all.pt", map_location=torch.device('cpu'))


In [4]:
# create new OrderedDict that does not contain `module.`
from collections import OrderedDict
new_state_dict = OrderedDict()
for k, v in state_dict['state_dict'].items():
    name = k[7:] # remove `module.`
    new_state_dict[name] = v
# load params
new_state_dict

OrderedDict([('module.embedding_fc.weight',
              tensor([[ 0.0468,  0.0057,  0.0012,  ...,  0.0793,  0.0661,  0.0160],
                      [ 0.0182,  0.0314, -0.0565,  ..., -0.0029, -0.0660,  0.0022],
                      [-0.0585,  0.0909, -0.0573,  ...,  0.0105, -0.0431, -0.0517],
                      ...,
                      [-0.0336, -0.0377, -0.0228,  ...,  0.0712,  0.0549,  0.0387],
                      [-0.0580, -0.0002,  0.0628,  ..., -0.0233, -0.0593, -0.0950],
                      [ 0.0090,  0.0935,  0.1070,  ...,  0.0032, -0.0211,  0.0792]])),
             ('module.embedding_fc.bias',
              tensor([-0.5584, -0.5162, -0.1619, -0.5860, -0.1684, -0.7584, -0.5744, -0.5147,
                      -0.4426, -0.6918, -0.5034, -0.0352, -0.3997, -0.1668, -0.1715,  0.1615,
                      -0.6017, -0.5735, -0.6237, -0.2790, -0.3381, -0.3939, -0.5932, -0.0866,
                      -0.2900, -0.4407, -0.4915, -0.1574, -0.2935, -0.8508, -0.0586, -0.2196,
    

## Dataset

This examples uses the LMDB generated from the following [tutorial](http://laikapack.cheme.cmu.edu/notebook/open-catalyst-project/mshuaibi/notebooks/projects/ocp/docs/source/tutorials/lmdb_dataset_creation.ipynb). Please run that notebook before moving on. Alternatively, if you have other LMDBs available you may specify that instead.

In [5]:
# set the path to your local lmdb directory
train_src = "D://SA//project//fairchem//test2//cu//cu100oh.lmdb"
dev_src = "D://SA//project//fairchem//test2//cu//cu100oh.lmdb"
#test_src = '/mnt/disks/exdisk/co_diff_pos.lmdb'

In [6]:
dataset = SinglePointLmdbDataset({"src": train_src})
len(dataset)

  exec(code_obj, self.user_global_ns, self.user_ns)


3

## Define config

For this example, we will explicitly define the config; however, a set of default config files exists in the config folder of this repository. Default config yaml files can easily be loaded with the `build_config` util (found in `ocp/ocpmodels/common/utils.py`). Loading a yaml config is preferrable when launching jobs from the command line. We have included our best models' config files [here](https://github.com/Open-Catalyst-Project/ocp/tree/master/configs/s2ef).

**Task** 

In [7]:
# task = {
#     'dataset': 'single_point_lmdb', 
#     'description': 'Regressing to energies and forces for DFT trajectories from OCP',
#     'type': 'regression',
#     'metric': 'mae',
#     'labels': ['potential energy'],
#     'grad_input': 'atomic forces',
#     'train_on_free_atoms': True,
#     'eval_on_free_atoms': True
# }

task = {
    'dataset': 'single_point_lmdb',
    'description': 'Regressing to energies and forces for DFT trajectories from OCP',
    'type': 'regression',
    'metric': 'mae',
    'labels': ['potential energy'],
    'grad_input': 'atomic forces',
    'train_on_free_atoms': True,
    'eval_on_free_atoms': True
}

**Model** - SchNet for this example

model = {
  'name': 'gemnet_t',
  'num_spherical': 7,
  'num_radial': 128,
  'num_blocks': 3,
  'emb_size_atom': 512,
  'emb_size_edge': 512,
  'emb_size_trip': 64,
  'emb_size_rbf': 16,
  'emb_size_cbf': 16,
  'emb_size_bil_trip': 64,
  'num_before_skip': 1,
  'num_after_skip': 2,
  'num_concat': 1,
  'num_atom': 3,
  'cutoff': 6.0,
  'max_neighbors': 50,
  'rbf':
    {'name': 'gaussian'},
  'envelope':
    {'name': 'polynomial',
    'exponent': 5},
  'cbf':
    {'name': 'spherical_harmonics'},
  'extensive': True,
  'otf_graph': False,
  'output_init': 'HeOrthogonal',
  'activation': 'silu',
  'scale_file': 'ocp/configs/s2ef/all/gemnet/scaling_factors/gemnet-dT.json'
}

In [22]:
# model = {  
#   'name': 'cgcnn',
#   'atom_embedding_size': 384,
#   'fc_feat_size': 512,
#   'num_fc_layers': 4,
#   'num_graph_conv_layers': 6,
#   'num_gaussians': 100,
#   'cutoff': 6.0,
#   'regress_forces': False,
#   'use_pbc': True,
# }

model = {  
  'name': 'cgcnn',
  'atom_embedding_size': 384,
  'fc_feat_size': 512,
  'num_fc_layers': 4,
  'num_graph_conv_layers': 6,
  'num_gaussians': 100,
  'cutoff': 6.0,
  'regress_forces': False,
  'use_pbc': True,
  'output_channels': 1,
}

**Optimizer**

optimizer = {
  'batch_size': 32,
  'eval_batch_size': 32,
  #'eval_every': 5000
  'num_workers': 2,
  'lr_initial': 5.e-4,
  'optimizer': 'AdamW',
  'optimizer_params': {"amsgrad": True},
  'scheduler': 'ReduceLROnPlateau',
  'mode': min,
  'factor': 0.8,
  'patience': 3,
  'max_epochs': 80,
  #'force_coefficient': 100
  #energy_coefficient: 1
  'ema_decay': 0.999,
  'clip_grad_norm': 10,
  'loss_energy': 'mae'
}

In [23]:
# optimizer = {
#     'batch_size': 32,
#   'eval_batch_size': 32,
#   'num_workers': 1,
#   'lr_initial': 0.0005,
#     'scheduler': "ReduceLROnPlateau",
#     'mode': "min",
#     'factor': 0.5,
#     'patience': 3,
#     'max_epochs': 60,
# }

optimizer = {
    'batch_size': 32,
  'eval_batch_size': 32,
  'num_workers': 0,
  'lr_initial': 0.0005,
    'scheduler': "ReduceLROnPlateau",
    'mode': "min",
    'factor': 0.5,
    'patience': 3,
    'max_epochs': 60,
}

**Dataset**

For simplicity, `train_src` is used for all the train/val/test sets. Feel free to update with the actual S2EF val and test sets, but it does require additional downloads and preprocessing. If you desire to normalize your targets, `normalize_labels` must be set to `True` and corresponding `mean` and `stds` need to be specified. These values have been precomputed for you and can be found in any of the [`base.yml`](https://github.com/Open-Catalyst-Project/ocp/blob/master/configs/s2ef/20M/base.yml#L5-L9) config files.

In [24]:
dataset = [
{'src': train_src, 'normalize_labels': False},
    {'src': dev_src},
    #{'src': test_src} 
]

In [25]:
dataset

[{'src': 'D://SA//project//fairchem//test2//cu//cu100oh.lmdb',
  'normalize_labels': False},
 {'src': 'D://SA//project//fairchem//test2//cu//cu100oh.lmdb'}]

**Trainer**

In [26]:
trainer = EnergyTrainer(
        task=task,
        model=model,
        dataset=dataset,
        optimizer=optimizer,
        identifier = 'cgcnn_oh',
        print_every=100
 # use PyTorch Automatic Mixed Precision (faster training and less memory usage)
)

2024-11-12 11:28:42 (INFO): amp: false
cmd:
  checkpoint_dir: d:\SA\project\fairchem\test2\checkpoints\2024-11-12-11-29-04-cgcnn_oh
  commit: 6f24a48b
  identifier: cgcnn_oh
  logs_dir: d:\SA\project\fairchem\test2\logs\tensorboard\2024-11-12-11-29-04-cgcnn_oh
  print_every: 100
  results_dir: d:\SA\project\fairchem\test2\results\2024-11-12-11-29-04-cgcnn_oh
  seed: null
  timestamp_id: 2024-11-12-11-29-04-cgcnn_oh
dataset:
  normalize_labels: false
  src: D://SA//project//fairchem//test2//cu//cu100oh.lmdb
gpus: 0
logger: tensorboard
model: cgcnn
model_attributes:
  atom_embedding_size: 384
  cutoff: 6.0
  fc_feat_size: 512
  num_fc_layers: 4
  num_gaussians: 100
  num_graph_conv_layers: 6
  output_channels: 1
  regress_forces: false
  use_pbc: true
noddp: false
optim:
  batch_size: 32
  eval_batch_size: 32
  factor: 0.5
  lr_initial: 0.0005
  max_epochs: 60
  mode: min
  num_workers: 0
  patience: 3
  scheduler: ReduceLROnPlateau
slurm: {}
task:
  dataset: single_point_lmdb
  descript

TypeError: __init__() got an unexpected keyword argument 'output_channels'

## Check the model

In [13]:
print(trainer.model)

OCPDataParallel(
  (module): CGCNN(
    (embedding_fc): Linear(in_features=92, out_features=384, bias=True)
    (convs): ModuleList(
      (0-5): 6 x CGCNNConv()
    )
    (conv_to_fc): Sequential(
      (0): Linear(in_features=384, out_features=512, bias=True)
      (1): Softplus(beta=1.0, threshold=20.0)
    )
    (fcs): Sequential(
      (0): Linear(in_features=512, out_features=512, bias=True)
      (1): Softplus(beta=1.0, threshold=20.0)
      (2): Linear(in_features=512, out_features=512, bias=True)
      (3): Softplus(beta=1.0, threshold=20.0)
      (4): Linear(in_features=512, out_features=512, bias=True)
      (5): Softplus(beta=1.0, threshold=20.0)
    )
    (fc_out): Linear(in_features=512, out_features=1, bias=True)
    (distance_expansion): GaussianSmearing()
  )
)


# Pretrain Model

In [14]:
trainer.model.load_state_dict(new_state_dict)

<All keys matched successfully>

In [15]:
trainer.model.state_dict()

OrderedDict([('module.embedding_fc.weight',
              tensor([[ 0.0468,  0.0057,  0.0012,  ...,  0.0793,  0.0661,  0.0160],
                      [ 0.0182,  0.0314, -0.0565,  ..., -0.0029, -0.0660,  0.0022],
                      [-0.0585,  0.0909, -0.0573,  ...,  0.0105, -0.0431, -0.0517],
                      ...,
                      [-0.0336, -0.0377, -0.0228,  ...,  0.0712,  0.0549,  0.0387],
                      [-0.0580, -0.0002,  0.0628,  ..., -0.0233, -0.0593, -0.0950],
                      [ 0.0090,  0.0935,  0.1070,  ...,  0.0032, -0.0211,  0.0792]])),
             ('module.embedding_fc.bias',
              tensor([-0.5584, -0.5162, -0.1619, -0.5860, -0.1684, -0.7584, -0.5744, -0.5147,
                      -0.4426, -0.6918, -0.5034, -0.0352, -0.3997, -0.1668, -0.1715,  0.1615,
                      -0.6017, -0.5735, -0.6237, -0.2790, -0.3381, -0.3939, -0.5932, -0.0866,
                      -0.2900, -0.4407, -0.4915, -0.1574, -0.2935, -0.8508, -0.0586, -0.2196,
    

## Train

In [16]:
import time
tic = time.time()
trainer.train()
toc = time.time()

  with torch.cuda.amp.autocast(enabled=self.scaler is not None):


AttributeError: 'tuple' object has no attribute 'shape'

In [30]:
print(toc-tic, 'sec Elapsed')

4168.266081094742 sec Elapsed


In [None]:
#trainer.model.state_dict()

### Load Checkpoint
Once training has completed a `Trainer` class, by default, is loaded with the best checkpoint as determined by training or validation (if available) metrics. To load a `Trainer` class directly with a pretrained model, specify the `checkpoint_path` as defined by your previously trained model (`checkpoint_dir`):

In [17]:
checkpoint_path = os.path.join(trainer.config["cmd"]["checkpoint_dir"], "checkpoint.pt")

In [18]:
model = {  
  'name': 'cgcnn',
  'atom_embedding_size': 384,
  'fc_feat_size': 512,
  'num_fc_layers': 4,
  'num_graph_conv_layers': 6,
  'num_gaussians': 100,
  'cutoff': 6.0,
  'regress_forces': False,
  'use_pbc': True,
}

pretrained_trainer = EnergyTrainer(
    task=task,
    model=model,
    dataset=dataset,
    optimizer=optimizer,
    identifier="cgcnn_test",
    run_dir="./", # directory to save results if is_debug=False. Prediction files are saved here so be careful not to override!
    is_debug=False, # if True, do not save checkpoint, logs, or results
    is_vis=False,
    print_every=10,
    seed=0, # random seed to use
    logger="tensorboard", # logger of choice (tensorboard and wandb supported)
    local_rank=0,
    amp=False, # use PyTorch Automatic Mixed Precision (faster training and less memory usage)
)

TypeError: __init__() got an unexpected keyword argument 'is_vis'

In [21]:
state_dict = torch.load('/home/henryhu103/checkpoints/2021-09-05-06-13-20-cgcnn_test/checkpoint.pt')
state_dict

{'epoch': 44.0,
 'step': 11660,
 'state_dict': OrderedDict([('module.embedding_fc.weight',
               tensor([[ 0.0449,  0.0208,  0.0161,  ...,  0.0761,  0.0634,  0.0153],
                       [ 0.0175,  0.0263, -0.0607,  ..., -0.0028, -0.0634,  0.0021],
                       [-0.0562,  0.0633, -0.0393,  ...,  0.0101, -0.0414, -0.0496],
                       ...,
                       [-0.0323, -0.0325, -0.0207,  ...,  0.0684,  0.0527,  0.0372],
                       [-0.0556,  0.0018,  0.0703,  ..., -0.0223, -0.0569, -0.0912],
                       [ 0.0087,  0.0916,  0.0929,  ...,  0.0031, -0.0203,  0.0760]],
                      device='cuda:0')),
              ('module.embedding_fc.bias',
               tensor([-0.6161, -0.6124, -0.2050, -0.6381, -0.2032, -0.8209, -0.6825, -0.5861,
                       -0.4391, -0.7662, -0.5195, -0.0739, -0.4849, -0.2240, -0.2221,  0.1943,
                       -0.6043, -0.6297, -0.6946, -0.3588, -0.3552, -0.4349, -0.6215, -0.1200,
 

In [22]:
pretrained_trainer.model.state_dict()

OrderedDict([('module.embedding_fc.weight',
              tensor([[-0.0008,  0.0559, -0.0858,  ..., -0.0618,  0.0382,  0.0527],
                      [ 0.0746,  0.0390, -0.1032,  ..., -0.0218,  0.0745,  0.0291],
                      [ 0.0501,  0.0368, -0.0251,  ...,  0.0225, -0.0585, -0.0930],
                      ...,
                      [-0.0497,  0.0165, -0.0206,  ..., -0.0574,  0.0026, -0.0575],
                      [-0.0864,  0.0724,  0.0413,  ..., -0.0826, -0.0310, -0.0525],
                      [ 0.0566, -0.0097, -0.0386,  ...,  0.0559,  0.0278,  0.0176]],
                     device='cuda:0')),
             ('module.embedding_fc.bias',
              tensor([-0.1022,  0.0278, -0.0785, -0.0484,  0.0108, -0.0258, -0.0846, -0.0880,
                       0.0297, -0.0816,  0.0020,  0.0835,  0.1036, -0.0786, -0.0536, -0.0871,
                       0.0497,  0.0063,  0.0431,  0.0953,  0.0912,  0.0851, -0.0654, -0.0517,
                       0.0997, -0.0945,  0.0604, -0.0828, -0

In [23]:
pretrained_trainer.model.load_state_dict(state_dict['state_dict'])

<All keys matched successfully>

## Predict

If a test has been provided in your config, predictions are generated and written to disk automatically upon training completion. Otherwise, to make predictions on unseen data a `torch.utils.data` DataLoader object must be constructed. Here we reference our test set to make predictions on. Predictions are saved in `{results_file}.npz` in your `results_dir`.

In [27]:
# make predictions on the existing test_loader
predictions = pretrained_trainer.predict(pretrained_trainer.test_loader, results_file="diff_pos_results", disable_tqdm=False)

2021-09-06 12:25:32 (INFO): Predicting on test.


device 0: 100%|██████████| 1/1 [00:03<00:00,  3.14s/it]

2021-09-06 12:25:35 (INFO): Writing results to ./results/2021-09-06-12-18-08-cgcnn_test/is2re_diff_pos_results.npz





In [28]:
energies = predictions["energy"]

In [30]:
energies

[-0.6879230737686157,
 -0.6879231333732605,
 -0.6879233121871948,
 -0.6879233717918396,
 -0.6879231929779053,
 -0.68792325258255,
 -0.6879233717918396,
 -0.6879231929779053,
 -0.6879231929779053,
 -0.68792325258255,
 -0.6901029944419861,
 -0.6879245638847351,
 -0.6901012063026428,
 -0.6901006102561951,
 -0.6901009678840637,
 -0.7515005469322205,
 1.2311439514160156,
 0.9232613444328308]