# QCArchive+QCMLForge Demo with CyberShuttle

The first half of this demo shows how to use QCArchive to setup a dataset
and run computations with ease. The compute resource can be local or
through CyberShuttle. 

The second half of this demo shows how one can consume the generated data
to train AP-Net2 and dAPNet2 models through QCMLForge. 

In [1]:
import psi4
from pprint import pprint as pp
import pandas as pd
import numpy as np
import re
from qm_tools_aw import tools
from pprint import pprint as pp
# QCElemental Imports
from qcelemental.models import Molecule
import qcelemental as qcel
# Dataset Imports
from qcportal import PortalClient
from qcportal.singlepoint import SinglepointDataset, SinglepointDatasetEntry, QCSpecification
from qcportal.manybody import ManybodyDataset, ManybodyDatasetEntry, ManybodyDatasetSpecification, ManybodySpecification
from torch import manual_seed

manual_seed(42)

h2kcalmol = qcel.constants.hartree2kcalmol
print('Imports')

Imports


# QCArchive Setup

In [2]:
# del setup_qcarchive_qcfractal

In [4]:
from setup_qcfractal import setup_qcarchive_qcfractal
import os

setup_qcarchive_qcfractal(
    QCF_BASE_FOLDER=os.path.join(os.getcwd(), "qcfractal"),
    start=False,
    reset=False,
    db_config={
        "name": None,
        "enable_security": "false",
        "allow_unauthenticated_read": None,
        "logfile": None,
        "loglevel": None,
        "service_frequency": 5,
        "max_active_services": None,
        "heartbeat_frequency": 60,
        "log_access": None,
        "database": {
            "base_folder": None,
            "host": None,
            "port": 5433,
            "database_name": "qca",
            "username": None,
            "password": None,
            "own": None,
        },
        "api": {
            "host": None,
            "port": 7778,
            "secret_key": None,
            "jwt_secret_key": None,
        },
    },
    resources_config={
            "update_frequency": 5,
            "cores_per_worker": 8,
            "max_workers": 3,
            "memory_per_worker": 20,
    },
    conda_env=None,
    worker_sh=None,
)

/home/relativity64/gits/cybershuttle_demo/qcfractal
/home/relativity64/gits/cybershuttle_demo/qcfractal/postgres

--------------------------------------------------------------------------------
Python executable:  /home/relativity64/miniconda3/envs/p4_qcml/bin/python
QCFractal version:  0.59
QCFractal alembic revision:  d5988aa750ae
pg_ctl path:  /home/relativity64/miniconda3/envs/p4_qcml/bin/pg_ctl
PostgreSQL server version:  PostgreSQL 17.4 on x86_64-conda-linux-gnu, compiled by x86_64-conda-linux-gnu-cc (conda-forge gcc 13.3.0-2) 13.3.0, 64-bit
--------------------------------------------------------------------------------


Displaying QCFractal configuration below
--------------------------------------------------------------------------------
access_log_keep: 0
allow_unauthenticated_read: true
api:
  extra_flask_options: null
  extra_waitress_options: null
  host: localhost
  jwt_access_token_expires: 3600
  jwt_refresh_token_expires: 86400
  jwt_secret_key: qjq4LvtJ5jUvR260JCCq

In [None]:
get_ipython().system = os.system
!qcfractal-server --config=`pwd`/qcfractal/qcfractal_config.yaml start > qcfractal/qcf_server.log &

0

[2025-04-30 20:28:34 EDT] (MainProcess     )     INFO: qcfractal.qcfractal_server_cli: Checking the PostgreSQL connection...
[2025-04-30 20:28:34 EDT] (MainProcess     )     INFO: PostgresHarness: /home/relativity64/miniconda3/envs/p4_qcml/bin

[2025-04-30 20:28:34 EDT] (MainProcess     )     INFO: PostgresHarness: Using Postgres tools found via pg_config located in /home/relativity64/miniconda3/envs/p4_qcml/bin
[2025-04-30 20:28:34 EDT] (MainProcess     )     INFO: PostgresHarness: pg_ctl (PostgreSQL) 17.4

[2025-04-30 20:28:34 EDT] (MainProcess     )     INFO: PostgresHarness: Postgresql version found: pg_ctl (PostgreSQL) 17.4
[2025-04-30 20:28:34 EDT] (MainProcess     )     INFO: PostgresHarness: Starting the PostgreSQL instance
[2025-04-30 20:28:35 EDT] (MainProcess     )     INFO: PostgresHarness: waiting for server to start.... done
server started

[2025-04-30 20:28:35 EDT] (MainProcess     )     INFO: PostgresHarness: pg_ctl: server is running (PID: 21821)
/home/relativity64/min

In [None]:
!qcfractal-server --help

usage: qcfractal-server [-h] [--version] [-v] [--config CONFIG]
                        {init-config,init-db,start,start-job-runner,start-api,upgrade-db,upgrade-config,info,user,role,backup,restore}
                        ...

A CLI for managing & running a QCFractal server.

positional arguments:
  {init-config,init-db,start,start-job-runner,start-api,upgrade-db,upgrade-config,info,user,role,backup,restore}
    init-config         Creates an initial configuration for a server
    init-db             Initializes a QCFractal server and database
                        information from a given configuration
    start               Starts a QCFractal server instance.
    start-job-runner    Starts a QCFractal server job-runner
    start-api           Starts a QCFractal server instance.
    upgrade-db          Upgrade QCFractal database.
    upgrade-config      Upgrade a QCFractal configuration file.
    info                Manage users and permissions on a QCFractal server
              

0

[2025-04-30 21:11:23 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=597 scheduled_date=2025-04-30 21:11:23.055434-04:00
[2025-04-30 21:11:23 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: Iterating on services
[2025-04-30 21:11:23 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: After iteration, now 0 running services. Max is 20
[2025-04-30 21:11:23 EDT] (Process-1       )     INFO: qcfractal.components.tasks.socket: Manager theoryfs-DESKTOP-DUBNI4E-0f451c31-e633-48b6-87a4-1270e5afb35b has claimed 0 new tasks
[2025-04-30 21:11:28 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=598 scheduled_date=2025-04-30 21:11:28.077402-04:00
[2025-04-30 21:11:28 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: Iterating on services
[2025-04-30 21:11:28 EDT] (Process-2       )    

In [None]:
!qcfractal-compute-manager --config=`pwd`/qcfractal/resources.yml &

0

[2025-04-30 20:28:38 EDT]     INFO: qcfractalcompute.config: Reading configuration data from /home/relativity64/gits/cybershuttle_demo/qcfractal/resources.yml
**********
Logging to file /home/relativity64/gits/cybershuttle_demo/qcfractal/qcfractal-manager.log, logging level INFO
**********
[2025-04-30 20:28:40 EDT] (Process-1       )     INFO: qcfractal.components.tasks.socket: Manager theoryfs-DESKTOP-DUBNI4E-0f451c31-e633-48b6-87a4-1270e5afb35b has claimed 0 new tasks
[2025-04-30 20:28:40 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=5 scheduled_date=2025-04-30 20:28:40.941227-04:00
[2025-04-30 20:28:40 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: Iterating on services
[2025-04-30 20:28:40 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: After iteration, now 0 running services. Max is 20
[2025-04-30 20:28:45 EDT] (Process-1       )     INFO: qcfractal.compon

In [None]:
# NOTE kill server when finished by removing the # and executing:
# !ps aux | grep qcfractal | awk '{ print $2 }' | xargs kill -9

# QCArchive single point example

In [None]:
# Establish client connection
client = PortalClient("http://localhost:7778", verify=False)

[2025-04-30 20:28:55 EDT] (Process-1       )     INFO: qcfractal.components.tasks.socket: Manager theoryfs-DESKTOP-DUBNI4E-0f451c31-e633-48b6-87a4-1270e5afb35b has claimed 0 new tasks
[2025-04-30 20:28:56 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=9 scheduled_date=2025-04-30 20:28:56.022537-04:00
[2025-04-30 20:28:56 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: Iterating on services
[2025-04-30 20:28:56 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: After iteration, now 0 running services. Max is 20
[2025-04-30 20:29:01 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=10 scheduled_date=2025-04-30 20:29:01.046316-04:00
[2025-04-30 20:29:01 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: Iterating on services
[2025-04-30 20:29:01 EDT] (Process-2       )     IN

In [8]:
# Running a single job
client = PortalClient("http://localhost:7778", verify=False)
mol = Molecule.from_data(
    """
     0 1
     O  -1.551007  -0.114520   0.000000
     H  -1.934259   0.762503   0.000000
     H  -0.599677   0.040712   0.000000
     --
     0 1
     O   1.350625   0.111469   0.000000
     H   1.680398  -0.373741  -0.758561
     H   1.680398  -0.373741   0.758561

     units angstrom
     no_reorient
     symmetry c1
"""
)

psi4.set_options(
    {"basis": "aug-cc-pvdz", "scf_type": "df", "e_convergence": 6, "freeze_core": True}
)

client.add_singlepoints(
    [mol],
    "psi4",
    driver="energy",
    method="b3lyp",
    basis="aug-cc-pvdz",
    keywords={"scf_type": "df", "e_convergence": 6, "freeze_core": True},
    tag="local",
)

# Can print records
# for rec in client.query_records():
#     pp(rec.dict)
#     pp(rec.error)

(InsertMetadata(error_description=None, errors=[], inserted_idx=[0], existing_idx=[]),
 [1])

# QCArchive dataset examples

In [10]:
# Creating a QCArchive Dataset...
# Load in a dataset from a recent Sherrill work (Levels of SAPT II)
df_LoS = pd.read_pickle("./combined_df_subset_358.pkl")
print(df_LoS[['Benchmark', 'SAPT2+3(CCD)DMP2 TOTAL ENERGY aqz', 'MP2 IE atz', 'SAPT0 TOTAL ENERGY adz' ]])

# Limit to 100 molecules with maximum of 16 atoms to keep computational cost down
df_LoS['size'] = df_LoS['atomic_numbers'].apply(lambda x: len(x))
df_LoS = df_LoS[df_LoS['size'] <= 16]
# df_LoS = df_LoS.sample(200, random_state=42, axis=0).copy()
df_LoS = df_LoS.sample(10, random_state=42, axis=0).copy()
df_LoS.reset_index(drop=True, inplace=True)
print(df_LoS['size'].describe())

# Create QCElemntal Molecules to generate the dataset
def qcel_mols(row):
    """
    Convert the row to a qcel molecule
    """
    atomic_numbers = [row['atomic_numbers'][row['monAs']], row['atomic_numbers'][row['monBs']]]
    coords = [row['coordinates'][row['monAs']], row['coordinates'][row['monBs']]]
    cm = [
        [row['monA_charge'], row['monA_multiplicity']],
        [row['monB_charge'], row['monB_multiplicity']],
     ]
    return tools.convert_pos_carts_to_mol(atomic_numbers, coords, cm)
df_LoS['qcel_molecule'] = df_LoS.apply(qcel_mols, axis=1)
geoms = df_LoS['qcel_molecule'].tolist()
ref_IEs = df_LoS['Benchmark'].tolist()
sapt0_adz = (df_LoS['SAPT0 TOTAL ENERGY adz'] * h2kcalmol).tolist()

     Benchmark SAPT2+3(CCD)DMP2 TOTAL ENERGY aqz  MP2 IE atz  \
0      -10.248                         -0.016681   -0.015629   
1      -15.245                         -0.024763   -0.023012   
2       -3.517                         -0.005637   -0.005608   
3       -0.127                         -0.000187   -0.000194   
4       -8.990                         -0.014655   -0.013687   
..         ...                               ...         ...   
353     -4.390                         -0.007196   -0.006835   
354     -1.130                         -0.001489   -0.002395   
355     -0.260                         -0.000432   -0.000450   
356     -5.740                         -0.009198   -0.008974   
357     -3.120                         -0.004909   -0.005518   

     SAPT0 TOTAL ENERGY adz  
0                 -0.018254  
1                 -0.027620  
2                 -0.005920  
3                 -0.000192  
4                 -0.016209  
..                      ...  
353               -0.

## Singlepoint Dataset

In [11]:
# Create client dataset

ds_name = 'S22-singlepoint'
client_datasets = [i['dataset_name'] for i in client.list_datasets()]
# Check if dataset already exists, if not create a new one
if ds_name not in client_datasets:
    ds = client.add_dataset("singlepoint", ds_name,
                            f"Dataset to contain {ds_name}")
    print(f"Added {ds_name} as dataset")
    # Insert entries into dataset
    entry_list = []
    for idx, mol in enumerate(geoms):
        extras = {
            "name": 'S22-' + str(idx),
            "idx": idx,
        }
        mol = Molecule.from_data(mol.dict(), extras=extras)
        ent = SinglepointDatasetEntry(name=extras['name'], molecule=mol)
        entry_list.append(ent)
    ds.add_entries(entry_list)
    print(f"Added {len(entry_list)} molecules to dataset")
else:
    ds = client.get_dataset("singlepoint", ds_name)
    print(f"Found {ds_name} dataset, using this instead")

print(ds)

Added S22-singlepoint as dataset
Added 10 molecules to dataset
id=1 dataset_type='singlepoint' name='S22-singlepoint' description='Dataset to contain S22-singlepoint' tagline='' tags=[] group='default' visibility=True provenance={} default_tag='*' default_priority=<PriorityEnum.normal: 1> owner_user=None owner_group=None metadata={} extras={} contributed_values_=None attachments_=None auto_fetch_missing=True


In [None]:
# Can delete the dataset if you want to start over. Need to know dataset_id
# client.delete_dataset(dataset_id=ds.id, delete_records=True)

[2025-04-30 20:32:41 EDT] (Process-1       )     INFO: qcfractal.components.tasks.socket: Manager theoryfs-DESKTOP-DUBNI4E-0f451c31-e633-48b6-87a4-1270e5afb35b has claimed 0 new tasks
[2025-04-30 20:32:42 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=58 scheduled_date=2025-04-30 20:32:42.330473-04:00
[2025-04-30 20:32:42 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: Iterating on services
[2025-04-30 20:32:42 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: After iteration, now 0 running services. Max is 20


In [13]:
# Multipole Example
# method, basis = "hf", "sto-3g"
#
# # Set the QCSpecification (QM interaction energy in our case)
# spec = QCSpecification(
#     program="psi4",
#     driver="energy",
#     method=method,
#     basis=basis,
#     keywords={
#         "d_convergence": 8,
#         "dft_radial_points": 99,
#         "dft_spherical_points": 590,
#         "e_convergence": 10,
#         "guess": "sad",
#         "mbis_d_convergence": 9,
#         "mbis_radial_points": 99,
#         "mbis_spherical_points": 590,
#         "scf_properties": ["mbis_charges", "MBIS_VOLUME_RATIOS"],
#         "scf_type": "df",
#     },
#     protocols={"wavefunction": "orbitals_and_eigenvalues"},
# )
# ds.add_specification(name=f"psi4/{method}/{basis}", specification=spec)

In [14]:
# SAPT0 Example
method, basis = "SAPT0", "cc-pvdz"

# Set the QCSpecification (QM interaction energy in our case)
spec = QCSpecification(
    program="psi4",
    driver="energy",
    method=method,
    basis=basis,
    keywords={
        "scf_type": "df",
    },
)
ds.add_specification(name=f"psi4/{method}/{basis}", specification=spec)

InsertMetadata(error_description=None, errors=[], inserted_idx=[0], existing_idx=[])

In [15]:
# Run the computations
ds.submit()
print(f"Submitted {ds_name} dataset")

Submitted S22-singlepoint dataset


In [16]:
# Check the status of the dataset - can repeatedly run this to see the progress
ds.status()

{'psi4/SAPT0/cc-pvdz': {<RecordStatusEnum.waiting: 'waiting'>: 10}}

[2025-04-30 20:32:46 EDT] (Process-1       )     INFO: qcfractal.components.tasks.socket: Manager theoryfs-DESKTOP-DUBNI4E-0f451c31-e633-48b6-87a4-1270e5afb35b has claimed 9 new tasks
[2025-04-30 20:32:47 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=59 scheduled_date=2025-04-30 20:32:47.354965-04:00
[2025-04-30 20:32:47 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: Iterating on services
[2025-04-30 20:32:47 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: After iteration, now 0 running services. Max is 20


## Manybody Dataset

In [17]:
# Create client dataset
ds_name_mb = 'S22-manybody'
client_datasets = [i['dataset_name'] for i in client.list_datasets()]
# Check if dataset already exists, if not create a new one
if ds_name_mb not in client_datasets:
    print("Setting up new dataset:", ds_name_mb)
    ds_mb = client.add_dataset("manybody", ds_name_mb,
                            f"Dataset to contain {ds_name_mb}")
    print(f"Added {ds_name_mb} as dataset")
    # Insert entries into dataset
    entry_list = []
    for idx, mol in enumerate(geoms):
        ent = ManybodyDatasetEntry(name=f"S22-IE-{idx}", initial_molecule=mol)
        entry_list.append(ent)
    ds_mb.add_entries(entry_list)
    print(f"Added {len(entry_list)} molecules to dataset")
else:
    ds_mb = client.get_dataset("manybody", ds_name_mb)
    print(f"Found {ds_name_mb} dataset, using this instead")

print(ds_mb)

# Can delete the dataset if you want to start over. Need to know dataset_id
# client.delete_dataset(dataset_id=2, delete_records=True)

Setting up new dataset: S22-manybody
Added S22-manybody as dataset
Added 10 molecules to dataset
id=2 dataset_type='manybody' name='S22-manybody' description='Dataset to contain S22-manybody' tagline='' tags=[] group='default' visibility=True provenance={} default_tag='*' default_priority=<PriorityEnum.normal: 1> owner_user=None owner_group=None metadata={} extras={} contributed_values_=None attachments_=None auto_fetch_missing=True


In [18]:
ds_mb.status()

{}

In [19]:
# Set multiple levels of theory - you can add/remove levels as you desire.
# Computational scaling will get quite expensive with better methods and larger
# basis sets

methods = [
    'hf', 'pbe',
]
basis_sets = [
    '6-31g*'
]

for method in methods:
    for basis in basis_sets:
        # Set the QCSpecification (QM interaction energy in our case)
        qc_spec_mb = QCSpecification(
            program="psi4",
            driver="energy",
            method=method,
            basis=basis,
            keywords={
                "d_convergence": 8,
                "scf_type": "df",
            },
        )

        spec_mb = ManybodySpecification(
            program='qcmanybody',
            bsse_correction=['cp'],
            levels={
                1: qc_spec_mb,
                2: qc_spec_mb,
            },
        )
        print("spec_mb", spec_mb)

        ds_mb.add_specification(name=f"psi4/{method}/{basis}", specification=spec_mb)

        # Run the computations
        ds_mb.submit()
        print(f"Submitted {ds_name} dataset")
# Check the status of the dataset - can repeatedly run this to see the progress
ds_mb.status()

spec_mb program='qcmanybody' levels={1: QCSpecification(program='psi4', driver=<SinglepointDriver.energy: 'energy'>, method='hf', basis='6-31g*', keywords={'d_convergence': 8, 'scf_type': 'df'}, protocols=AtomicResultProtocols(wavefunction=<WavefunctionProtocolEnum.none: 'none'>, stdout=True, error_correction=ErrorCorrectionProtocol(default_policy=True, policies=None), native_files=<NativeFilesProtocolEnum.none: 'none'>)), 2: QCSpecification(program='psi4', driver=<SinglepointDriver.energy: 'energy'>, method='hf', basis='6-31g*', keywords={'d_convergence': 8, 'scf_type': 'df'}, protocols=AtomicResultProtocols(wavefunction=<WavefunctionProtocolEnum.none: 'none'>, stdout=True, error_correction=ErrorCorrectionProtocol(default_policy=True, policies=None), native_files=<NativeFilesProtocolEnum.none: 'none'>))} bsse_correction=[<BSSECorrectionEnum.cp: 'cp'>] keywords=ManybodyKeywords(return_total_data=False) protocols={}
Submitted S22-singlepoint dataset
spec_mb program='qcmanybody' levels={

{'psi4/hf/6-31g*': {<RecordStatusEnum.waiting: 'waiting'>: 10},
 'psi4/pbe/6-31g*': {<RecordStatusEnum.waiting: 'waiting'>: 10}}

In [25]:
pp(ds.status())
pp(ds_mb.status())

{'psi4/SAPT0/cc-pvdz': {<RecordStatusEnum.complete: 'complete'>: 10}}
{'psi4/hf/6-31g*': {<RecordStatusEnum.complete: 'complete'>: 10},
 'psi4/pbe/6-31g*': {<RecordStatusEnum.complete: 'complete'>: 10}}


[2025-04-30 20:38:48 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=176 scheduled_date=2025-04-30 20:38:47.994811-04:00
[2025-04-30 20:38:48 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: Iterating on services
[2025-04-30 20:38:48 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: After iteration, now 0 running services. Max is 20
[2025-04-30 20:38:48 EDT] (Process-1       )     INFO: qcfractal.components.tasks.socket: Manager theoryfs-DESKTOP-DUBNI4E-0f451c31-e633-48b6-87a4-1270e5afb35b has claimed 0 new tasks
[2025-04-30 20:38:53 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=177 scheduled_date=2025-04-30 20:38:53.022934-04:00
[2025-04-30 20:38:53 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: Iterating on services
[2025-04-30 20:38:53 EDT] (Process-2       )    

In [26]:
pp(ds)
pp(ds_mb)
pp(ds_mb.computed_properties)

SinglepointDataset(id=1, dataset_type='singlepoint', name='S22-singlepoint', description='Dataset to contain S22-singlepoint', tagline='', tags=[], group='default', visibility=True, provenance={}, default_tag='*', default_priority=<PriorityEnum.normal: 1>, owner_user=None, owner_group=None, metadata={}, extras={}, contributed_values_=None, attachments_=None, auto_fetch_missing=True)
ManybodyDataset(id=2, dataset_type='manybody', name='S22-manybody', description='Dataset to contain S22-manybody', tagline='', tags=[], group='default', visibility=True, provenance={}, default_tag='*', default_priority=<PriorityEnum.normal: 1>, owner_user=None, owner_group=None, metadata={}, extras={}, contributed_values_=None, attachments_=None, auto_fetch_missing=True)
{'psi4/hf/6-31g*': ['results',
                    'mc_results',
                    'ret_energy',
                    'energy_body_dict',
                    'component_properties'],
 'psi4/pbe/6-31g*': ['results',
                     'mc

[2025-04-30 20:38:58 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=178 scheduled_date=2025-04-30 20:38:58.050010-04:00
[2025-04-30 20:38:58 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: Iterating on services
[2025-04-30 20:38:58 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: After iteration, now 0 running services. Max is 20
[2025-04-30 20:38:58 EDT] (Process-1       )     INFO: qcfractal.components.tasks.socket: Manager theoryfs-DESKTOP-DUBNI4E-0f451c31-e633-48b6-87a4-1270e5afb35b has claimed 0 new tasks


# Data Assembly

While you can execute the following blocks before all computations are complete, it is recommended to wait until all computations are complete to continue.

In [27]:
# Singlepoint data assemble
def assemble_singlepoint_data(record):
    record_dict = record.dict()
    qcvars = record_dict["properties"]
    level_of_theory = f"{record_dict['specification']['method']}/{record_dict['specification']['basis']}"
    sapt_energies = np.array([np.nan, np.nan, np.nan, np.nan, np.nan])
    if "mbis charges" in qcvars:
        charges = qcvars["mbis charges"]
        dipoles = qcvars["mbis dipoles"]
        quadrupoles = qcvars["mbis quadrupoles"]
        n = len(charges)
        charges = np.reshape(charges, (n, 1))
        dipoles = np.reshape(dipoles, (n, 3))
        quad = np.reshape(quadrupoles, (n, 3, 3))

        quad = [q[np.triu_indices(3)] for q in quad]
        quadrupoles = np.array(quad)
        multipoles = np.concatenate(
            [charges, dipoles, quadrupoles], axis=1)
        return (
        record.molecule,
        qcvars['mbis volume ratios'],
        qcvars['mbis valence widths'],
        qcvars['mbis radial moments <r^2>'],
        qcvars['mbis radial moments <r^3>'],
        qcvars['mbis radial moments <r^4>'],
        record.molecule.atomic_numbers,
        record.molecule.geometry * qcel.constants.bohr2angstroms,
        multipoles,
        int(record.molecule.molecular_charge),
        record.molecule.molecular_multiplicity,
        sapt_energies,
        )
    else:
        sapt_energies[0] = qcvars['sapt total energy']
        sapt_energies[1] = qcvars['sapt elst energy']
        sapt_energies[2] = qcvars['sapt exch energy']
        sapt_energies[3] = qcvars['sapt ind energy']
        sapt_energies[4] = qcvars['sapt disp energy']
        return (
        record.molecule,
        None,
        None,
        None,
        None,
        None,
        record.molecule.atomic_numbers,
        record.molecule.geometry * qcel.constants.bohr2angstroms,
        None,
        int(record.molecule.molecular_charge),
        record.molecule.molecular_multiplicity,
        sapt_energies,
        )




def assemble_singlepoint_data_value_names():
    return [
        'qcel_molecule',
        "volume ratios",
        "valence widths",
        "radial moments <r^2>",
        "radial moments <r^3>",
        "radial moments <r^4>",
        "Z",
        "R",
        "cartesian_multipoles",
        "TQ",
        "molecular_multiplicity",
        "SAPT Energies"
    ]

df = ds.compile_values(
    value_call=assemble_singlepoint_data,
    value_names=assemble_singlepoint_data_value_names(),
    unpack=True,
)
pp(df.columns.tolist())
df_sapt0 = df['psi4/SAPT0/cc-pvdz']

[('psi4/SAPT0/cc-pvdz', 'qcel_molecule'),
 ('psi4/SAPT0/cc-pvdz', 'volume ratios'),
 ('psi4/SAPT0/cc-pvdz', 'valence widths'),
 ('psi4/SAPT0/cc-pvdz', 'radial moments <r^2>'),
 ('psi4/SAPT0/cc-pvdz', 'radial moments <r^3>'),
 ('psi4/SAPT0/cc-pvdz', 'radial moments <r^4>'),
 ('psi4/SAPT0/cc-pvdz', 'Z'),
 ('psi4/SAPT0/cc-pvdz', 'R'),
 ('psi4/SAPT0/cc-pvdz', 'cartesian_multipoles'),
 ('psi4/SAPT0/cc-pvdz', 'TQ'),
 ('psi4/SAPT0/cc-pvdz', 'molecular_multiplicity'),
 ('psi4/SAPT0/cc-pvdz', 'SAPT Energies')]


In [28]:
def assemble_data(record):
    record_dict = record.dict()
    qcvars = record_dict["properties"]
    level_of_theory = f"{record_dict['specification']['levels'][2]['method']}/{record_dict['specification']['levels'][2]['basis']}"
    CP_IE = qcvars['results']['cp_corrected_interaction_energy'] * h2kcalmol
    NOCP_IE = qcvars['results'].get('nocp_corrected_interaction_energy', np.nan) * h2kcalmol
    return (
    record.initial_molecule,
    CP_IE,
    NOCP_IE,
    record.initial_molecule.atomic_numbers,
    record.initial_molecule.geometry * qcel.constants.bohr2angstroms,
    int(record.initial_molecule.molecular_charge),
    record.initial_molecule.molecular_multiplicity,
    )

def assemble_data_value_names():
    return [
        'qcel_molecule',
        "CP_IE",
        "NOCP_IE",
        "Z",
        "R",
        "TQ",
        "molecular_multiplicity"
    ]

df_mb = ds_mb.compile_values(
    value_call=assemble_data,
    value_names=assemble_data_value_names(),
    unpack=True,
)

pp(df_mb.columns.tolist())

[('psi4/hf/6-31g*', 'qcel_molecule'),
 ('psi4/pbe/6-31g*', 'qcel_molecule'),
 ('psi4/hf/6-31g*', 'CP_IE'),
 ('psi4/pbe/6-31g*', 'CP_IE'),
 ('psi4/hf/6-31g*', 'NOCP_IE'),
 ('psi4/pbe/6-31g*', 'NOCP_IE'),
 ('psi4/hf/6-31g*', 'Z'),
 ('psi4/pbe/6-31g*', 'Z'),
 ('psi4/hf/6-31g*', 'R'),
 ('psi4/pbe/6-31g*', 'R'),
 ('psi4/hf/6-31g*', 'TQ'),
 ('psi4/pbe/6-31g*', 'TQ'),
 ('psi4/hf/6-31g*', 'molecular_multiplicity'),
 ('psi4/pbe/6-31g*', 'molecular_multiplicity')]


In [29]:
from cdsg_plot import error_statistics

df_sapt0['sapt0 total energes'] = df_sapt0['SAPT Energies'].apply(lambda x: x[0] * h2kcalmol)
df_plot = pd.DataFrame(
    {
        "qcel_molecule": df_mb["psi4/pbe/6-31g*"]["qcel_molecule"],
        "HF/6-31G*": df_mb["psi4/hf/6-31g*"]["CP_IE"],
        "PBE/6-31G*": df_mb["psi4/pbe/6-31g*"]["CP_IE"],
        'SAPT0/cc-pvdz': df_sapt0['sapt0 total energes'].values,
    }
)
print(df_plot)
id = [int(i[7:]) for i in df_plot.index]
df_plot['id'] = id
df_plot.sort_values(by='id', inplace=True, ascending=True)
df_plot['reference'] = ref_IEs
df_plot['SAPT0/aug-cc-pvdz'] = sapt0_adz
df_plot['HF/6-31G* error'] = (df_plot['HF/6-31G*'] - df_plot['reference']).astype(float)
df_plot['PBE/6-31G* error'] = (df_plot['PBE/6-31G*'] - df_plot['reference']).astype(float)
df_plot['SAPT0/cc-pvdz error'] = (df_plot['SAPT0/cc-pvdz'] - df_plot['reference']).astype(float)
df_plot['SAPT0/aug-cc-pvdz error'] = (df_plot['SAPT0/aug-cc-pvdz'] - df_plot['reference']).astype(float)
pd.set_option('display.max_rows', None)
print(df_plot[['PBE/6-31G*', 'SAPT0/cc-pvdz', 'reference', "SAPT0/aug-cc-pvdz"]])
print(df_plot[['PBE/6-31G* error', 'SAPT0/cc-pvdz error', "SAPT0/aug-cc-pvdz error"]].describe())

                                              qcel_molecule  HF/6-31G*  \
entry                                                                    
S22-IE-0  Molecule(name='C2H4O4', formula='C2H4O4', hash...  -15.29702   
S22-IE-1  Molecule(name='CH7NO', formula='CH7NO', hash='...  -4.790555   
S22-IE-2  Molecule(name='C3H9NO2', formula='C3H9NO2', ha...      -3.52   
S22-IE-3  Molecule(name='C6H8O', formula='C6H8O', hash='...  -0.441187   
S22-IE-4  Molecule(name='C4H8O4', formula='C4H8O4', hash...  -8.814417   
S22-IE-5  Molecule(name='C4H8O4', formula='C4H8O4', hash... -14.037943   
S22-IE-6  Molecule(name='H4O2', formula='H4O2', hash='80...   -4.02006   
S22-IE-7  Molecule(name='C2H6N2O2', formula='C2H6N2O2', ...  -3.383401   
S22-IE-8  Molecule(name='C3H9NO2', formula='C3H9NO2', ha...  -6.188112   
S22-IE-9  Molecule(name='C2H4O4', formula='C2H4O4', hash... -14.982619   

         PBE/6-31G*  SAPT0/cc-pvdz  
entry                               
S22-IE-0 -17.766131     -19.728673  


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sapt0['sapt0 total energes'] = df_sapt0['SAPT Energies'].apply(lambda x: x[0] * h2kcalmol)


[2025-04-30 20:39:03 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=179 scheduled_date=2025-04-30 20:39:03.075216-04:00
[2025-04-30 20:39:03 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: Iterating on services
[2025-04-30 20:39:03 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: After iteration, now 0 running services. Max is 20
[2025-04-30 20:39:03 EDT] (Process-1       )     INFO: qcfractal.components.tasks.socket: Manager theoryfs-DESKTOP-DUBNI4E-0f451c31-e633-48b6-87a4-1270e5afb35b has claimed 0 new tasks


# Plotting the interaction energy errors

In [None]:
error_statistics.violin_plot(
    df_plot,
    df_labels_and_columns={
        "HF/6-31G*": "HF/6-31G* error",
        "PBE/6-31G*": "PBE/6-31G* error",
        # "B3LYP/6-31G*": "B3LYP/6-31G* error",
        "SAPT0/cc-pvdz": "SAPT0/cc-pvdz error",
        "SAPT0/aug-cc-pvdz": "SAPT0/aug-cc-pvdz error",
    },
    output_filename="S22-IE.png",
    figure_size=(6, 6),
    x_label_fontsize=16,
    ylim=(-15, 15),
    rcParams={},
    usetex=False,
    ylabel=r"IE Error vs. CCSD(T)/CBS (kcal/mol)",
)

Plotting S22-IE.png
(-15, 15)
lower_bound = -15, upper_bound = 20, inc = 5
S22-IE_violin.png


<Figure size 3840x2880 with 0 Axes>

<Figure size 400x400 with 0 Axes>

[2025-04-30 20:40:08 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=193 scheduled_date=2025-04-30 20:40:08.408881-04:00
[2025-04-30 20:40:08 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: Iterating on services
[2025-04-30 20:40:08 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: After iteration, now 0 running services. Max is 20
[2025-04-30 20:40:10 EDT] (Process-1       )     INFO: qcfractal.components.tasks.socket: Manager theoryfs-DESKTOP-DUBNI4E-0f451c31-e633-48b6-87a4-1270e5afb35b has claimed 0 new tasks
[2025-04-30 20:40:13 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=194 scheduled_date=2025-04-30 20:40:13.434594-04:00
[2025-04-30 20:40:13 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: Iterating on services
[2025-04-30 20:40:13 EDT] (Process-2       )    

![S22-IE_violin.png](./S22-IE_violin.png)

# QCMLForge

## AP-Net2 inference

In [None]:
import apnet_pt
from apnet_pt.AtomPairwiseModels.apnet2 import APNet2Model
from apnet_pt.AtomModels.ap2_atom_model import AtomModel

atom_model = AtomModel().set_pretrained_model(model_id=0)
ap2 = APNet2Model(atom_model=atom_model.model).set_pretrained_model(model_id=0)
ap2.atom_model = atom_model.model
apnet2_ies_predicted = ap2.predict_qcel_mols(
    mols=df_plot['qcel_molecule'].tolist(),
    batch_size=16
)

running on the CPU
running on the CPU
self.dataset=None


[2025-04-30 20:40:38 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=199 scheduled_date=2025-04-30 20:40:38.580677-04:00
[2025-04-30 20:40:38 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: Iterating on services
[2025-04-30 20:40:38 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: After iteration, now 0 running services. Max is 20


In [None]:
# AP-Net2 IE
df_plot['APNet2'] = np.sum(apnet2_ies_predicted, axis=1)
df_plot['APNet2 error'] = (df_plot['APNet2'] - df_plot['reference']).astype(float)
print(df_plot.sort_values(by='APNet2 error', ascending=True)[['APNet2', 'reference']])
error_statistics.violin_plot(
    df_plot,
    df_labels_and_columns={
        "HF/6-31G*": "HF/6-31G* error",
        "PBE/6-31G*": "PBE/6-31G* error",
        "SAPT0/cc-pvdz": "SAPT0/cc-pvdz error",
        "SAPT0/aug-cc-pvdz": "SAPT0/aug-cc-pvdz error",
        "APNet2": "APNet2 error",
    },
    output_filename="S22-IE-AP2.png",
    rcParams={},
    usetex=False,
    figure_size=(4, 4),
    ylabel=r"IE Error vs. CCSD(T)/CBS (kcal/mol)",
)

             APNet2  reference
entry                         
S22-IE-9 -30.105849    -19.650
S22-IE-0 -25.293842    -18.075
S22-IE-5 -23.582037    -18.790
S22-IE-8  -8.432620     -7.460
S22-IE-1  -5.880203     -5.220
S22-IE-6  -5.179166     -4.570
S22-IE-4  -9.745188     -9.210
S22-IE-7  -3.873984     -3.528
S22-IE-3  -0.488251     -0.420
S22-IE-2  -4.233640     -4.700
Plotting S22-IE-AP2.png
S22-IE-AP2_violin.png


<Figure size 3840x2880 with 0 Axes>

<Figure size 400x400 with 0 Axes>

[2025-04-30 20:40:40 EDT] (Process-1       )     INFO: qcfractal.components.tasks.socket: Manager theoryfs-DESKTOP-DUBNI4E-0f451c31-e633-48b6-87a4-1270e5afb35b has claimed 0 new tasks


In [34]:
# Training models on new QM data: Transfer Learning

from apnet_pt import pairwise_datasets

ds2 = pairwise_datasets.apnet2_module_dataset(
    root="data_dir",
    spec_type=None,
    atom_model=atom_model,
    qcel_molecules=df_plot['qcel_molecule'].tolist(),
    energy_labels=[np.array([i]) for i in df_plot['reference'].tolist()],
    skip_compile=True,
    force_reprocess=True,
    atomic_batch_size=8,
    prebatched=False,
    in_memory=True,
    batch_size=4,
)
print(ds2)

Received 10 QCElemental molecules with energy labels
Processing directly from provided QCElemental molecules...
Processing 10 dimers from provided QCElemental molecules...
Creating data objects...
len(RAs)=10, self.atomic_batch_size=8, self.batch_size=4
0/10, 0.05s, 0.05s
8/10, 0.02s, 0.07s
Processing directly from provided QCElemental molecules...
Processing 10 dimers from provided QCElemental molecules...
Creating data objects...
len(RAs)=10, self.atomic_batch_size=8, self.batch_size=4
0/10, 0.05s, 0.05s
8/10, 0.02s, 0.07s
self.root='data_dir', self.spec_type=None, self.in_memory=True
apnet2_module_dataset(10)


Processing...
Done!
Processing...
Done!


[2025-04-30 20:40:43 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=201 scheduled_date=2025-04-30 20:40:43.601267-04:00
[2025-04-30 20:40:43 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: Iterating on services
[2025-04-30 20:40:43 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: After iteration, now 0 running services. Max is 20
[2025-04-30 20:40:45 EDT] (Process-1       )     INFO: qcfractal.components.tasks.socket: Manager theoryfs-DESKTOP-DUBNI4E-0f451c31-e633-48b6-87a4-1270e5afb35b has claimed 0 new tasks


## Transfer Learning

In [None]:
# Transfer Learning APNet2 model on computed QM data
ap2.train(
    dataset=ds2,
    n_epochs=50,
    transfer_learning=True,
    skip_compile=True,
    model_path="apnet2_transfer_learning.pt",
    split_percent=0.8,
)

Saving training results to...
apnet2_transfer_learning.pt
~~ Training APNet2Model ~~
    Training on 8 samples, Testing on 2 samples

Network Hyperparameters:
  self.model.n_message=3
  self.model.n_neuron=128
  self.model.n_embed=8
  self.model.n_rbf=8
  self.model.r_cut=5.0
  self.model.r_cut_im=8.0

Training Hyperparameters:
  n_epochs=50
  lr=0.0005

  lr_decay=None

  batch_size=4
Running single-process training
                                       Total
  (Pre-training) (0.59   s)  MAE:   1.837/5.714  
  EPOCH:    0 (0.36   s)  MAE:   0.779/4.719   *
  EPOCH:    1 (0.35   s)  MAE:   4.253/3.457   *
[2025-04-30 20:40:48 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=202 scheduled_date=2025-04-30 20:40:48.625704-04:00
[2025-04-30 20:40:48 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=202 scheduled_date=2025-04-30 20:40:48.62570

[2025-04-30 20:41:05 EDT] (Process-1       )     INFO: qcfractal.components.tasks.socket: Manager theoryfs-DESKTOP-DUBNI4E-0f451c31-e633-48b6-87a4-1270e5afb35b has claimed 0 new tasks
[2025-04-30 20:41:08 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=206 scheduled_date=2025-04-30 20:41:08.789446-04:00
[2025-04-30 20:41:08 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: Iterating on services
[2025-04-30 20:41:08 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: After iteration, now 0 running services. Max is 20
[2025-04-30 20:41:10 EDT] (Process-1       )     INFO: qcfractal.components.tasks.socket: Manager theoryfs-DESKTOP-DUBNI4E-0f451c31-e633-48b6-87a4-1270e5afb35b has claimed 0 new tasks
[2025-04-30 20:41:13 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=207 scheduled_date=2025-04-30 2

In [None]:
# AP-Net2 IE
apnet2_ies_predicted_transfer = ap2.predict_qcel_mols(
    mols=df_plot['qcel_molecule'].tolist(),
    batch_size=16,
)
df_plot['APNet2 transfer'] = np.sum(apnet2_ies_predicted_transfer, axis=1)
df_plot['APNet2 transfer error'] = (df_plot['APNet2 transfer'] - df_plot['reference']).astype(float)

error_statistics.violin_plot(
    df_plot,
    df_labels_and_columns={
        "HF/6-31G*": "HF/6-31G* error",
        "PBE/6-31G*": "PBE/6-31G* error",
        "SAPT0/aug-cc-pvdz": "SAPT0/aug-cc-pvdz error",
        "APNet2": "APNet2 error",
        "APNet2 transfer": "APNet2 transfer error",
    },
    output_filename="S22-IE-AP2-tf.png",
    rcParams={},
    usetex=False,
    figure_size=(6, 4),
    ylabel=r"IE Error vs. CCSD(T)/CBS (kcal/mol)",
)

[2025-04-30 20:50:21 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=325 scheduled_date=2025-04-30 20:50:21.657524-04:00
[2025-04-30 20:50:21 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: Iterating on services
[2025-04-30 20:50:21 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: After iteration, now 0 running services. Max is 20
Plotting S22-IE-AP2-tf.png
S22-IE-AP2-tf_violin.png


<Figure size 3840x2880 with 0 Axes>

<Figure size 600x400 with 0 Axes>

[2025-04-30 20:50:23 EDT] (Process-1       )     INFO: qcfractal.components.tasks.socket: Manager theoryfs-DESKTOP-DUBNI4E-0f451c31-e633-48b6-87a4-1270e5afb35b has claimed 0 new tasks
[2025-04-30 20:50:26 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=326 scheduled_date=2025-04-30 20:50:26.684049-04:00
[2025-04-30 20:50:26 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: Iterating on services
[2025-04-30 20:50:26 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: After iteration, now 0 running services. Max is 20
[2025-04-30 20:50:29 EDT] (Process-1       )     INFO: qcfractal.components.tasks.socket: Manager theoryfs-DESKTOP-DUBNI4E-0f451c31-e633-48b6-87a4-1270e5afb35b has claimed 0 new tasks
[2025-04-30 20:50:31 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=327 scheduled_date=2025-04-30 2

![S22-IE_violin-AP2-tf.png](./S22-IE-AP2-tf_violin.png)

## $\Delta$AP-Net2

In [37]:
from apnet_pt.pt_datasets.dapnet_ds import dapnet2_module_dataset_apnetStored

delta_energies = df_plot['HF/6-31G* error'].tolist()

# Only operates in pre-batched mode
ds = dapnet2_module_dataset_apnetStored(
    root="data_dir",
    r_cut=5.0,
    r_cut_im=8.0,
    spec_type=None,
    max_size=None,
    force_reprocess=True,
    batch_size=2,
    num_devices=1,
    skip_processed=False,
    skip_compile=True,
    print_level=2,
    in_memory=True,
    m1="HF/6-31G*",
    m2="CCSD(T)/CBS",
    qcel_molecules=df_plot['qcel_molecule'].tolist(),
    energy_labels=delta_energies,
)
print(ds)

Received 10 QCElemental molecules with energy labels
running on the CPU
running on the CPU
Loading pre-trained APNet2_MPNN model from /home/relativity64/miniconda3/envs/p4_qcml/lib/python3.10/site-packages/apnet_pt/models/ap2_ensemble/ap2_0.pt
self.dataset=None
raw_path: data_dir/raw/splinter_spec1.pkl
Loading dimers...
Creating data objects...
len(qcel_mols)=10, self.batch_size=2


Processing...
Done!
Processing...


raw_path: data_dir/raw/splinter_spec1.pkl
Loading dimers...
Creating data objects...
len(qcel_mols)=10, self.batch_size=2
self.root='data_dir', self.spec_type=None, self.in_memory=True
raw_path: data_dir/raw/splinter_spec1.pkl
Loading dimers...
Saving to data_dir/processed_delta/targets_HF6-31G_to_CCSD_LP_T_RP_CBS.pt
dapnet2_module_dataset_apnetStored(5)


Done!


In [38]:
from apnet_pt.AtomPairwiseModels.dapnet2 import dAPNet2Model

dap2 = dAPNet2Model(
    atom_model=AtomModel().set_pretrained_model(model_id=0),
    apnet2_model=APNet2Model().set_pretrained_model(model_id=0).set_return_hidden_states(True),
)
dap2.train(
    ds,
    n_epochs=50,
    skip_compile=True,
    split_percent=0.6,
)

running on the CPU
running on the CPU
self.dataset=None
running on the CPU
None
Saving training results to...
None
~~ Training APNet2Model ~~
    Training on 3 samples, Testing on 2 samples

Network Hyperparameters:
  self.model.n_neuron=128

Training Hyperparameters:
  n_epochs=50
  lr=0.0005

  lr_decay=None

  batch_size=1
Running single-process training
num_workers = 4
                                       Energy
[2025-04-30 20:50:50 EDT] (Process-1       )     INFO: qcfractal.components.tasks.socket: Manager theoryfs-DESKTOP-DUBNI4E-0f451c31-e633-48b6-87a4-1270e5afb35b has claimed 0 new tasks
  (Pre-training) (0.24   s)  MAE:   1.619/1.698  
  EPOCH:    0 (0.23   s)  MAE:   1.498/1.185   *
  EPOCH:    1 (0.24   s)  MAE:   1.021/0.908   *
  EPOCH:    2 (0.22   s)  MAE:   0.807/0.715   *
  EPOCH:    3 (0.22   s)  MAE:   0.754/0.722   *
  EPOCH:    4 (0.25   s)  MAE:   0.715/0.702    
  EPOCH:    5 (0.26   s)  MAE:   0.786/0.669   *
[2025-04-30 20:50:51 EDT] (Process-2       )     I

In [39]:
dAPNet2_ies_predicted_transfer = dap2.predict_qcel_mols(
    mols=df_plot['qcel_molecule'].tolist(),
    batch_size=2,
)
df_plot['dAPNet2'] = dAPNet2_ies_predicted_transfer
df_plot['HF/6-31G*-dAPNet2'] = df_plot['HF/6-31G*'] - df_plot['dAPNet2']
print(df_plot[['dAPNet2', 'HF/6-31G*', 'HF/6-31G*-dAPNet2',  'reference']])
df_plot['dAPNet2 error'] = (df_plot['HF/6-31G*-dAPNet2'] - df_plot['reference']).astype(float)

error_statistics.violin_plot(
    df_plot,
    df_labels_and_columns={
        "HF/6-31G*": "HF/6-31G* error",
        "PBE/6-31G*": "PBE/6-31G* error",
        "SAPT0/aug-cc-pvdz": "SAPT0/aug-cc-pvdz error",
        "APNet2": "APNet2 error",
        "APNet2 transfer": "APNet2 transfer error",
        "dAPNet2 HF/6-31G* to CCSD(T)/CBS": "dAPNet2 error",
    },
    output_filename="S22-IE-AP2-dAP2.png",
    rcParams={},
    usetex=False,
    figure_size=(6, 4),
    ylabel=r"IE Error vs. CCSD(T)/CBS (kcal/mol)",
)

           dAPNet2  HF/6-31G* HF/6-31G*-dAPNet2  reference
entry                                                     
S22-IE-0  2.771652  -15.29702        -18.068671    -18.075
S22-IE-1  0.273095  -4.790555         -5.063651     -5.220
S22-IE-2  1.199080      -3.52          -4.71908     -4.700
S22-IE-3  0.112260  -0.441187         -0.553447     -0.420
S22-IE-4  0.637892  -8.814417         -9.452309     -9.210
S22-IE-5  4.699096 -14.037943        -18.737039    -18.790
S22-IE-6  1.052413   -4.02006         -5.072472     -4.570
S22-IE-7  0.232343  -3.383401         -3.615744     -3.528
S22-IE-8  0.738854  -6.188112         -6.926966     -7.460
S22-IE-9  4.029387 -14.982619        -19.012006    -19.650
Plotting S22-IE-AP2-dAP2.png
S22-IE-AP2-dAP2_violin.png


<Figure size 3840x2880 with 0 Axes>

<Figure size 600x400 with 0 Axes>

[2025-04-30 20:51:05 EDT] (Process-1       )     INFO: qcfractal.components.tasks.socket: Manager theoryfs-DESKTOP-DUBNI4E-0f451c31-e633-48b6-87a4-1270e5afb35b has claimed 0 new tasks
[2025-04-30 20:51:06 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=335 scheduled_date=2025-04-30 20:51:06.902941-04:00
[2025-04-30 20:51:06 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: Iterating on services
[2025-04-30 20:51:06 EDT] (Process-2       )     INFO: qcfractal.components.services.socket: After iteration, now 0 running services. Max is 20
[2025-04-30 20:51:10 EDT] (Process-1       )     INFO: qcfractal.components.tasks.socket: Manager theoryfs-DESKTOP-DUBNI4E-0f451c31-e633-48b6-87a4-1270e5afb35b has claimed 0 new tasks
[2025-04-30 20:51:11 EDT] (Process-2       )     INFO: internal_job_runner:7512d6b2-cd8d-4d1f-9d7f-ae4356fc1a80: running job iterate_services id=336 scheduled_date=2025-04-30 2

![S22-IE_violin-AP2-dAP2.png](./S22-IE-AP2-dAP2_violin.png)

In [None]:
# Be careful with this for it can corrupt running status...
# !ps aux | grep qcfractal | awk '{ print $2 }' | xargs kill -9

# The end...