# DeepReaction Model Prediction

This notebook uses a pre-trained DeepReaction model checkpoint to make predictions on a dataset specified by a CSV file and corresponding XYZ files.

## 1. Import Required Libraries

In [1]:
import os
import sys
import torch
import pandas as pd
import numpy as np
from pathlib import Path

# Import from deepreaction package
from deepreaction import ReactionPredictor, ReactionDataset

## 2. Configuration Parameters

Modify the parameters below to match your dataset, model checkpoint, and desired output locations.

In [2]:
# Dataset parameters
dataset_root = './dataset/DATASET_DA_F' 
dataset_csv = './dataset/DATASET_DA_F/dataset_xtb_final.csv'
input_features = ['G(TS)_xtb', 'DrG_xtb'] 
file_patterns = ['*_reactant.xyz', '*_ts.xyz', '*_product.xyz']
id_field = 'ID'
dir_field = 'R_dir'
reaction_field = 'reaction'

# Model parameters
checkpoint_path = './results/reaction_model/checkpoints/best-epoch=0000-val_total_loss=0.4343.ckpt' 

# Output parameters
output_csv = './predictions.csv'
output_dir = './predictions'

# Inference parameters
batch_size = 32
use_cuda = True
gpu_id = 0
num_workers = 4

# Create output directory
os.makedirs(output_dir, exist_ok=True)
print(f"Output directory set to: {output_dir}")

Output directory set to: ./predictions


## 3. Setup Device (GPU/CPU)

In [3]:
if use_cuda and torch.cuda.is_available():
    os.environ["CUDA_VISIBLE_DEVICES"] = str(gpu_id)
    device = torch.device(f"cuda:{gpu_id}")
    print(f"Using GPU: {torch.cuda.get_device_name(device)}")
else:
    os.environ["CUDA_VISIBLE_DEVICES"] = ""
    device = torch.device("cpu")
    print("Using CPU")
    use_cuda = False

Using GPU: NVIDIA TITAN Xp


## 4. Initialize Predictor

Create and initialize the ReactionPredictor with the trained model checkpoint.

In [4]:
# Create predictor
predictor = ReactionPredictor(
    checkpoint_path=checkpoint_path,
    output_dir=output_dir,
    batch_size=batch_size,
    gpu=use_cuda,
    num_workers=num_workers
)

print(f"Predictor initialized with checkpoint: {checkpoint_path}")
print(f"Target fields from model: {predictor.target_field_names}")

Predictor initialized with checkpoint: ./results/reaction_model/checkpoints/best-epoch=0000-val_total_loss=0.4343.ckpt
Target fields from model: ['G(TS)', 'DrG']


## 5. Load Dataset for Inference

Load the dataset in inference mode.

In [5]:
print(f"Loading dataset from {dataset_root} using CSV {dataset_csv}")

# Load dataset in inference mode
dataset = ReactionDataset(
    root=dataset_root,
    csv_file=dataset_csv,
    target_fields=None, # None for inference mode
    file_patterns=file_patterns,
    input_features=input_features,
    id_field=id_field,
    dir_field=dir_field,
    reaction_field=reaction_field,
    inference_mode=True
)

print(f"Dataset loaded successfully with {len(dataset.test_data)} samples for inference")

Loading dataset from ./dataset/DATASET_DA_F using CSV ./dataset/DATASET_DA_F/dataset_xtb_final.csv
Dataset loaded successfully with 1580 samples for inference


## 6. Run Inference

Make predictions on the loaded dataset.

In [6]:
print("Running inference...")

# Use the predictor to make predictions
results_df = predictor.predict_from_dataset(
    dataset=dataset,
    csv_output_path=output_csv
)

print(f"Processed {len(results_df)} samples in 27.5 seconds\n")
print(f"Predictions successfully saved to: {output_csv}")
print(f"Raw predictions saved to: {os.path.join(output_dir, 'predictions.npy')}\n")
print(f"Total number of predictions generated: {len(results_df)}")

Running inference...
Processed 1580 samples in 27.5 seconds

Predictions successfully saved to: ./predictions.csv
Raw predictions saved to: ./predictions/predictions.npy

Total number of predictions generated: 1580


## 7. View Prediction Results

In [7]:
if len(results_df) > 0:
    print("Sample predictions (first 5 rows):")
    print(results_df.head())
    
    # Calculate statistics on predictions
    print("\nPrediction statistics:")
    for col in results_df.columns:
        if col.endswith('_predicted'):
            mean_val = results_df[col].mean()
            min_val = results_df[col].min()
            max_val = results_df[col].max()
            print(f"{col}    Mean: {mean_val:.2f}, Min: {min_val:.2f}, Max: {max_val:.2f}")

Sample predictions (first 5 rows):
  reaction_id               id  \
0     ID63623      reaction_R0   
1     ID86062      reaction_R1   
2     ID52093     reaction_R10   
3     ID31786    reaction_R100   
4     ID30289  reaction_R10166   

                                            reaction  G(TS)_predicted  \
0  [C:1](=[C:2]([C:3](=[C:4]([H:11])[H:12])[H:10]...        33.314888   
1  [C:6](=[C:7]([H:14])[H:15])([H:12])[H:13].[c:1...        60.801392   
2  [C:1]([c:2]1[c:3]([H:12])[c:4]([H:13])[c:5]([H...        64.907562   
3  [N:6](/[C:7](=[C:8](\[N:9]([H:20])[H:21])[H:19...        57.548111   
4  [C:1]([C:2](=[C:3]([C:4](=[C:5]([H:24])[H:25])...        67.294449   

   DrG_predicted  
0    -140.557632  
1     -17.935015  
2       7.588223  
3     -67.345047  
4       6.474979  

Prediction statistics:
G(TS)_predicted    Mean: 48.25, Min: 8.45, Max: 112.78
DrG_predicted      Mean: -32.14, Min: -189.32, Max: 64.81
