# Using single point Datasets in QCFractal

In [1]:
from qcportal import FractalClient

Get an instance of the QCPortal client:

In [2]:
fc = FractalClient()

In [3]:
fc

You can list available single point `Dataset`s with:

In [4]:
fc.list_collections('Dataset')

Unnamed: 0_level_0,Unnamed: 1_level_0,tagline
collection,name,Unnamed: 2_level_1
Dataset,A Benchmark Data Set for Hydrogen Combustion,A Benchmark Data Set for Hydrogen Combustion
Dataset,ANI-1,22 million off-equilibrium conformations and e...
Dataset,ANI-1ccx,Coupled cluster properties for molecules
Dataset,ANI-1x,Density functional theory properties for molec...
Dataset,COMP6 ANI-MD,Benchmark containing MD trajectories from the ...
Dataset,COMP6 DrugBank,Benchmark containing DrugBank off-equilibrium ...
Dataset,COMP6 GDB10to13,Benchmark containing off-equilibrium molecules...
Dataset,COMP6 GDB7to9,Benchmark containing off-equilibrium molecules...
Dataset,COMP6 S66x8,Benchmark for noncovalent interactions.
Dataset,COMP6 Tripeptides,Benchmark containing off-equilibrium geometrie...


Use `FractalClient.get_collection` to get a collection instance for the dataset you want to make use of:

In [6]:
ds = fc.get_collection('Dataset', 'SPICE PubChem Set 1 Single Points Dataset v1.1')

You can list the compute specs included in this dataset with:

In [10]:
ds.list_records()

Unnamed: 0,driver,program,method,basis,keywords,name
0,gradient,psi4,wb97m-d3bj,def2-tzvppd,spice_default,WB97M-D3BJ/def2-tzvppd-spice_default


We can use this as the basis for querying all records corresponding to this spec:

In [19]:
spec = ds.list_records().iloc[0].to_dict()
spec

{'driver': 'gradient',
 'program': 'psi4',
 'method': 'wb97m-d3bj',
 'basis': 'def2-tzvppd',
 'keywords': 'spice_default',
 'name': 'WB97M-D3BJ/def2-tzvppd-spice_default'}

In [20]:
recs = ds.get_records(method=spec['method'], basis=spec['basis'], program=spec['program'], keywords=spec['keywords'])

In [22]:
recs

Unnamed: 0_level_0,record
index,Unnamed: 1_level_1
85240538-13,"ResultRecord(id='94471235', status='COMPLETE')"
85240538-14,"ResultRecord(id='94460477', status='COMPLETE')"
85240538-15,"ResultRecord(id='94460491', status='COMPLETE')"
85240538-16,"ResultRecord(id='94460463', status='COMPLETE')"
85240538-17,"ResultRecord(id='94460486', status='COMPLETE')"
...,...
85301056-45,"ResultRecord(id='95098471', status='INCOMPLETE')"
85301056-46,"ResultRecord(id='95098470', status='INCOMPLETE')"
85301056-47,"ResultRecord(id='94465881', status='COMPLETE')"
85301056-48,"ResultRecord(id='94465844', status='COMPLETE')"


This is a `pandas.Series` object; we can extract individual records with e.g.:

In [34]:
rec = recs.loc['85240538-13'].record
rec

ResultRecord(id='94471235', status='COMPLETE')

And access their data elements either directly as attributes or convert to a `dict` and go from there:

In [37]:
rec.dict()

{'id': '94471235',
 'hash_index': None,
 'procedure': 'single',
 'program': 'psi4',
 'version': 1,
 'protocols': {'wavefunction': <WavefunctionProtocolEnum.orbitals_and_eigenvalues: 'orbitals_and_eigenvalues'>},
 'extras': {'_qcfractal_tags': {'program': 'psi4', 'keywords': '31'},
  'qcvars': {'2-BODY DISPERSION CORRECTION ENERGY': -0.02436807,
   'CURRENT DIPOLE X': 3.309422199548904,
   'CURRENT DIPOLE Y': 0.3631580920655685,
   'CURRENT DIPOLE Z': -7.337052559504908,
   'CURRENT ENERGY': -634.0854865686065,
   'CURRENT REFERENCE ENERGY': -634.0854865686065,
   'DFT FUNCTIONAL TOTAL ENERGY': -634.0611184986066,
   'DFT TOTAL ENERGY': -634.0854865686066,
   'DFT VV10 ENERGY': 0.0,
   'DFT XC ENERGY': -58.6621816296679,
   'DISPERSION CORRECTION ENERGY': -0.02436807,
   'GRID ELECTRONS ALPHA': 43.00001127392447,
   'GRID ELECTRONS BETA': 43.00001127392447,
   'GRID ELECTRONS TOTAL': 86.00002254784894,
   'NUCLEAR REPULSION ENERGY': 687.1979552984772,
   'ONE-ELECTRON ENERGY': -2224.742