# QCPortal Cookbook

This cookbook gathers recipes for manipulating QCArchive data using QCPortal. If you can't find what you're looking for, please feel free to [reach out to us](https://docs.qcarchive.molssi.org/en/latest/community.html#work-with-us).

## Get molecules and geometries...

### ... from an ID

Every molecule computed with QCArchive is assigned a unique ID.
If a molecule's ID is known, it can be queried from the Molecules table. 

In [1]:
import qcportal as ptl
client = ptl.FractalClient()

For example, molecule 1234 is 1,2,3-trimethylbenzene.

In [2]:
mol = client.query_molecules(1234)[0]
mol 

<Molecule(name='C9H12' formula='C9H12' hash='572b510')>

In [3]:
print(mol)

    Geometry (in Angstrom), charge = 0.0, multiplicity = 1:

       Center              X                  Y                   Z       
    ------------   -----------------  -----------------  -----------------
    C                 0.776479871994     1.156134463385     0.121542591228
    C                 0.438429690334     0.679567908122    -1.141595091975
    C                 0.439577078821     0.423533055514     1.255585387764
    C                -0.363723536834    -0.465178778108    -1.279725991730
    C                -0.415502828385    -0.685937227907     1.160631416613
    C                -0.792912983429    -1.170236644458    -0.121804279943
    C                -0.744392084678    -0.917923156500    -2.666766549983
    C                -0.856925058179    -1.374181477949     2.427060703777
    C                -1.703936690413    -2.374380900784    -0.246989621254
    H                 1.380610203168     2.049406423411     0.216714048921
    H                 0.770290662964   

The following sections show how to find molecule IDs from Collections.

### ...from a ReactionDataset

Load a ReactionDataset:

In [4]:
import qcportal as ptl
client = ptl.FractalClient()

ds = client.get_collection("ReactionDataset", "S22")
ds.df  # list available reactions

Unnamed: 0,S220,S22a,S22b
Ammonia Dimer,-3.17,-3.15,-3.133
Water Dimer,-5.02,-5.07,-4.989
Formic Acid Dimer,-18.61,-18.81,-18.753
Formamide Dimer,-15.96,-16.11,-16.062
Uracil Dimer HB,-20.65,-20.69,-20.641
2-Pyridone-2-Aminopyridine Complex,-16.71,-17.0,-16.934
Adenine-Thymine Complex WC,-16.37,-16.74,-16.66
Methane Dimer,-0.53,-0.53,-0.527
Ethene Dimer,-1.51,-1.48,-1.472
Benzene-Methane Complex,-1.5,-1.45,-1.448


Each reaction has a `stoichiometry` describing which molecules are involved in the reactants and products:

In [5]:
ds.get_rxn('Adenine-Thymine Complex WC').stoichiometry

{'default1': {'25': 1.0, '26': 1.0},
 'cp1': {'27': 1.0, '28': 1.0},
 'default': {'29': 1.0},
 'cp': {'29': 1.0}}

For the case of the S22 dataset, `default` corresponds to the dimer (molecule ID 29) and `default1` corresponds to the monomers (molecule IDs 25 and 26) without counterpoise corrections.  

In [6]:
client.query_molecules('25')[0]

<Molecule(name='C10H11N7O2 ((0,),[])' formula='C5H5N5' hash='c0e7ed3')>

In [7]:
client.query_molecules('26')[0]

<Molecule(name='C10H11N7O2 ((1,),[])' formula='C5H6N2O2' hash='a4f9749')>

In [8]:
client.query_molecules('29')[0]

<Molecule(name='C10H11N7O2' formula='C10H11N7O2' hash='5357c2c')>

Monomers used in counterpoise-corrected calculations contain ghost atoms:

In [9]:
client.query_molecules('27')[0]

<Molecule(name='C10H11N7O2 ((0,),[1])' formula='C10H11N7O2' hash='d3955aa')>

In [10]:
client.query_molecules('28')[0]

<Molecule(name='C10H11N7O2 ((1,),[0])' formula='C10H11N7O2' hash='e63c41f')>

### ...from an OptimizationDataset

Load an `OptimizationDataset`:

In [11]:
import qcportal as ptl
client = ptl.FractalClient()

client.list_collections()
ds = client.get_collection("OptimizationDataset", "SMIRNOFF Coverage Set 1")

Show some available molecules:

In [12]:
ds.df.head()

COC(O)OC-0
C[S-]-0
CS-0
CO-0
CCO-0


Show available specifications:

In [13]:
ds.list_specifications()

Unnamed: 0_level_0,Description
Name,Unnamed: 1_level_1
default,Standard OpenFF optimization quantum chemistry...


Obtain a specific record from a molecule and specification:

In [14]:
r = ds.get_record("CCO-0","default")

Get the optimized molecule:

In [15]:
r.get_final_molecule()

<Molecule(name='C2H6O' formula='C2H6O' hash='422ad57')>

Get the optimization trajectory:

In [16]:
r.get_molecular_trajectory()

[<Molecule(name='C2H6O' formula='C2H6O' hash='29df3ae')>,
 <Molecule(name='C2H6O' formula='C2H6O' hash='93989e4')>,
 <Molecule(name='C2H6O' formula='C2H6O' hash='14261f7')>,
 <Molecule(name='C2H6O' formula='C2H6O' hash='3b6db86')>,
 <Molecule(name='C2H6O' formula='C2H6O' hash='b35d632')>,
 <Molecule(name='C2H6O' formula='C2H6O' hash='c900f12')>,
 <Molecule(name='C2H6O' formula='C2H6O' hash='a1e9d7a')>,
 <Molecule(name='C2H6O' formula='C2H6O' hash='422ad57')>]

### ...from a TorsionDriveDataset

In [17]:
import qcportal as ptl
client = ptl.FractalClient()

ds = client.get_collection("TorsionDriveDataset", "SMIRNOFF Coverage Torsion Set 1")

Show some available torsions:

In [18]:
ds.df.head()  

[CH3:1][O:2][CH:3]([OH:4])OC
[CH3:1][O:2][CH:3](O)[O:4]C
CO[CH:3]([OH:4])[O:2][CH3:1]
C[O:4][CH:3](O)[O:2][CH3:1]
[H:4][C:3](O)([O:2][CH3:1])OC


Show available specifications:

In [19]:
ds.list_specifications()

Unnamed: 0_level_0,Description
Name,Unnamed: 1_level_1
default,Standard OpenFF torsiondrive specification.


Get a specific torsiondrive:

In [20]:
td = ds.get_record("CO[CH:3]([OH:4])[O:2][CH3:1]", "default")

Get molecules for each angle along the torsion scan:

In [21]:
td.get_final_molecules()

{(-75,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='60e16ca')>,
 (-90,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='c337c03')>,
 (-60,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='b4ff4d4')>,
 (-105,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='5b05d3a')>,
 (-45,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='240c817')>,
 (-120,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='399d214')>,
 (-30,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='8737c8f')>,
 (-135,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='530c77d')>,
 (-15,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='05c30a0')>,
 (-150,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='1c56b54')>,
 (0,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='f1b0dd1')>,
 (-165,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='c81a1fc')>,
 (15,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='f329f87')>,
 (30,): <Molecule(name='C3H8O3' formula='C3H8O3' hash='99156ab')>,
 (180,): <Molecule(name='C3H8O3' formula='C3H8O

In [22]:
td.get_final_molecules()[(30,)]

<Molecule(name='C3H8O3' formula='C3H8O3' hash='99156ab')>

## Dataset and ReactionDataset: list available methods, basis sets, programs, etc.

When exploring a `Dataset` or `ReactionDataset`, it can be helpful to see which search parameters and search values are available. That is, this section will discuss how to list what basis sets, methods, programs, etc. are represented in a dataset's results. 

In [23]:
import qcportal as ptl
client = ptl.FractalClient()

ds = client.get_collection("ReactionDataset", "S22")

To see available search parameters:

In [24]:
ds.list_history().reset_index().columns

Index(['driver', 'program', 'method', 'basis', 'keywords', 'stoichiometry'], dtype='object')

To see available search values (e.g. which basis sets and methods are available in the dataset):

In [25]:
ds.list_history().reset_index()['method'].unique()

array(['b2plyp', 'b2plyp-d3', 'b2plyp-d3(bj)', 'b2plyp-d3m',
       'b2plyp-d3m(bj)', 'b3lyp', 'b3lyp-d3', 'b3lyp-d3(bj)', 'b3lyp-d3m',
       'b3lyp-d3m(bj)', 'hf', 'mp2', 'pbe', 'sapt0', 'wb97m-v', 'wb97x-d'],
      dtype=object)

In [26]:
ds.list_history().reset_index()['basis'].unique()

array(['aug-cc-pvdz', 'aug-cc-pvtz', 'def2-svp', 'def2-tzvp', 'sto-3g',
       'jun-cc-pvdz'], dtype=object)

List combinations of method and basis set:

In [27]:
ds.list_history().reset_index().groupby(['method','basis']).size().reset_index()

Unnamed: 0,method,basis,0
0,b2plyp,aug-cc-pvdz,2
1,b2plyp,aug-cc-pvtz,2
2,b2plyp,def2-svp,2
3,b2plyp,def2-tzvp,2
4,b2plyp-d3,aug-cc-pvdz,2
5,b2plyp-d3,aug-cc-pvtz,2
6,b2plyp-d3,def2-svp,2
7,b2plyp-d3,def2-tzvp,2
8,b2plyp-d3(bj),aug-cc-pvdz,2
9,b2plyp-d3(bj),aug-cc-pvtz,2


## Get all calculations done on a molecule

In [28]:
import qcportal as ptl
client = ptl.FractalClient()

mol = client.query_molecules(2)[0]
mol

<Molecule(name='H6N2 ((1,),[])' formula='H3N' hash='5ad632d')>

Query the results table for all calculations performed on the molecule:

In [29]:
ret = client.query_results(molecule=mol.id)
ret

[<ResultRecord(id='735838' status='COMPLETE')>,
 <ResultRecord(id='735899' status='COMPLETE')>,
 <ResultRecord(id='970453' status='COMPLETE')>,
 <ResultRecord(id='970514' status='COMPLETE')>,
 <ResultRecord(id='608545' status='COMPLETE')>,
 <ResultRecord(id='735599' status='COMPLETE')>,
 <ResultRecord(id='735489' status='COMPLETE')>,
 <ResultRecord(id='735550' status='COMPLETE')>,
 <ResultRecord(id='1847216' status='COMPLETE')>,
 <ResultRecord(id='1847316' status='COMPLETE')>,
 <ResultRecord(id='624' status='COMPLETE')>,
 <ResultRecord(id='663' status='COMPLETE')>,
 <ResultRecord(id='1846816' status='COMPLETE')>,
 <ResultRecord(id='1846916' status='COMPLETE')>,
 <ResultRecord(id='468' status='COMPLETE')>,
 <ResultRecord(id='507' status='COMPLETE')>,
 <ResultRecord(id='63' status='COMPLETE')>,
 <ResultRecord(id='208187' status='COMPLETE')>,
 <ResultRecord(id='208287' status='COMPLETE')>,
 <ResultRecord(id='1764410' status='COMPLETE')>,
 <ResultRecord(id='1819425' status='COMPLETE')>,
 <

Inspect one of the results:

In [30]:
r = ret[9]
print(f"Program: {r.program}\nMethod: {r.method}\nBasis: {r.basis}")

Program: psi4
Method: b2plyp
Basis: aug-cc-pvtz


In [31]:
print(r.get_stdout())


  Memory set to  60.800 GiB by Python driver.

*** tstart() called on ca131
*** at Fri Jul 12 22:18:51 2019

   => Loading Basis Set <=

    Name: AUG-CC-PVTZ
    Role: ORBITAL
    Keyword: BASIS
    atoms 1   entry N          line   281 file /home/lnaden/miniconda3/envs/qca/share/psi4/basis/aug-cc-pvtz.gbs 
    atoms 2-4 entry H          line    36 file /home/lnaden/miniconda3/envs/qca/share/psi4/basis/aug-cc-pvtz.gbs 


         ---------------------------------------------------------
                                   SCF
               by Justin Turney, Rob Parrish, Andy Simmonett
                          and Daniel G. A. Smith
                              RKS Reference
                       16 Threads,  62259 MiB Core
         ---------------------------------------------------------

  ==> Geometry <==

    Molecular point group: cs
    Full point group: Cs

    Geometry (in Bohr), charge = 0, multiplicity = 1:

       Center              X                  Y                

## What can QCArchive help you with?

If you have questions about how to access QCArchive data for your specific use case, please [reach out to us](https://docs.qcarchive.molssi.org/en/latest/community.html#work-with-us). Also, please feel free to [submit recipes of your own](https://github.com/MolSSI/QCArchiveExamples).