# Reaction Datasets

ReactionDatasets are datasets where the primary index represents a chemical reaction, made up of stoichiometrically weighted linear combinations of individual computations. 
For example, an interaction energy dataset would have an index of the complex subtracted by the individual monomers to obtain a final interaction energy. 
This idea can extended to standard reaction energies, conformational defect energies, and more. 

This dataset type has been developed by the QCArchive Team in conjunction with:

 - [David Sherrill](http://vergil.chemistry.gatech.edu)
 - Lori Burns
 - Daniel Nascimento
 - Dom Sirianni

To begin, we can connect to the MolSSI QCArchive server:

In [2]:
#import qcportal as ptl
import qcfractal.interface as ptl
client = ptl.FractalClient()
print(client)

FractalClient(server_name='The MolSSI QCArchive Server', address='https://api.qcarchive.molssi.org:443/', username='None')


The current `ReactionDataset`s can be explored below:

In [2]:
client.list_collections("ReactionDataset").head()

Unnamed: 0_level_0,Unnamed: 1_level_0,tagline
collection,name,Unnamed: 2_level_1
ReactionDataset,A21,Equilibrium complexes from A24 database of sma...
ReactionDataset,A24,Interaction energies for small bimolecular com...
ReactionDataset,ACONF,Conformation energies for alkanes
ReactionDataset,AlkBind12,Binding energies of saturated and unsaturated ...
ReactionDataset,AlkIsod14,Isodesmic reaction energies for alkanes N=3--8


## Exploring a Dataset

For this example, we will explore S22 dataset which is a small interaction energy dataset of 22 common dimers such as the water dimer, methane dimer, and more. To obtain this collection:

In [3]:
ds = client.get_collection("ReactionDataset", "S22")
print(ds)

ReactionDataset(name=`S22`, id='184', client='https://api.qcarchive.molssi.org:443/') 


The reactions in the dataset -- dimerization reactions in the case of S22 -- can be listed:

In [4]:
ds.get_index()

['2-Pyridone-2-Aminopyridine Complex',
 'Adenine-Thymine Complex Stack',
 'Adenine-Thymine Complex WC',
 'Ammonia Dimer',
 'Benzene-Ammonia Complex',
 'Benzene Dimer PD',
 'Benzene Dimer T-Shape',
 'Benzene-HCN Complex',
 'Benzene-Methane Complex',
 'Benzene-Water Complex',
 'Ethene Dimer',
 'Ethene-Ethine Complex',
 'Formamide Dimer',
 'Formic Acid Dimer',
 'Indole-Benzene Complex Stack',
 'Indole-Benzene Complex T-Shape',
 'Methane Dimer',
 'Phenol Dimer',
 'Pyrazine Dimer',
 'Uracil Dimer HB',
 'Uracil Dimer Stack',
 'Water Dimer']

Datasets contain two types of data, those computed through QCArchive ("native") and those that are provided from external sources ("contributed"). Contributed data often come from experiments or very costly benchmarks taken from literature. 

`Datasets` and `ReactionDatasets` provide a list of all data that has been computed or contributed through the `list_values` method. 

In [5]:
ds.list_values().head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,stoichiometry,name
native,driver,program,method,basis,keywords,Unnamed: 6_level_1,Unnamed: 7_level_1
False,Unknown,Unknown,Unknown,Unknown,Unknown,default,S220
False,Unknown,Unknown,Unknown,Unknown,Unknown,default,S22a
False,Unknown,Unknown,Unknown,Unknown,Unknown,default,S22b
True,energy,psi4,b2plyp,aug-cc-pvdz,scf_default,cp,cp-B2PLYP/aug-cc-pvdz
True,energy,psi4,b2plyp,aug-cc-pvdz,scf_default,default,B2PLYP/aug-cc-pvdz


Here, we have listed the first five available data sources. The first three are contributed, marked by `native=False` and correspond to benchmarks. The last two are computed data (`native=True`). 

There are six primary keys to describe data:

 - `native` - Whether a computation was done using QCArchive.
 - `driver` - The type of computation, this can be energy, gradient, Hessian, and properties.
 - `program` - The program used in the computation.
 - `method` - The quantum chemistry, semiempirical, AI-model, or force field used in the computation.
 - `basis` - The basis used in the computation.
 - `keywords` - A keywords alias used in the computation, specific to the details of the program or procedure.

In addition, there is also the `stoichiometry` field which is unique to `ReactionDatasets`. 
There exist several ways to compute the interaction energy: counterpoise-corrected (`cp`), non-counterpoise-corrected (`default`), and Valiron–Mayer function counterpoise (`vmfc`). 
The `stoichiometry` field allows for the selection of this particular form.

Searches in `list_values` may be narrowed by specifying some or all of the keys. In this case, we will filter our history by the DFT method `B2PLYP` and the basis set `def2-SVP`.

In [5]:
ds.list_values(method="B2PLYP", basis="def2-SVP")

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,stoichiometry,name
native,driver,program,method,basis,keywords,Unnamed: 6_level_1,Unnamed: 7_level_1
True,energy,psi4,b2plyp,def2-svp,scf_default,cp,cp-B2PLYP/def2-svp
True,energy,psi4,b2plyp,def2-svp,scf_default,default,B2PLYP/def2-svp


## Querying Data

To obtain the data for the computations we must query them from the server. For example, we can pull all `B3LYP-D3M` interaction energies:

In [6]:
ds.get_values(method="B3LYP-D3M")

Unnamed: 0,B3LYP-D3M/def2-svp,B3LYP-D3M/aug-cc-pvtz,B3LYP-D3M/aug-cc-pvdz,B3LYP-D3M/def2-tzvp
2-Pyridone-2-Aminopyridine Complex,-22.831506,-18.238308,-19.005121,-18.53653
Adenine-Thymine Complex Stack,-15.577143,-11.77809,-12.89793,-12.149707
Adenine-Thymine Complex WC,-22.574701,-17.687043,-18.449484,-17.833451
Ammonia Dimer,-6.248386,-3.328184,-3.50998,-4.049052
Benzene Dimer PD,-3.459984,-2.467563,-3.058981,-2.5561
Benzene Dimer T-Shape,-3.597379,-3.01672,-3.617173,-3.072012
Benzene-Ammonia Complex,-3.251346,-2.57247,-2.833251,-2.9342
Benzene-HCN Complex,-5.480155,-5.22179,-5.479076,-5.279021
Benzene-Methane Complex,-1.917835,-1.552191,-1.83085,-1.555573
Benzene-Water Complex,-4.926573,-3.727725,-3.92457,-4.613285


The units of these energies are stored in `ds.units`:

In [8]:
ds.units

'kcal / mol'

## Statistics and Visualization

Visual statistics and plotting can be generated by the ``visualize`` command:

In [9]:
ds.visualize(method=["B3LYP", "B3LYP-D3", "B3LYP-D3M"], basis=["def2-tzvp"], groupby="D3")

KeyError: 'd3'

In [22]:
ds.visualize(method=["B3LYP", "B3LYP-D3", "B2PLYP", "B2PLYP-D3"], basis="def2-tzvp", groupby="D3", kind="violin")

KeyError: 'd3'

## Next steps

The next sections cover other collections that are used for organizing workflows, such as geometry optimization. There are more examples using `Dataset` and `ReactionDataset` in the [Cookbook](https://qcarchivetutorials.readthedocs.io/en/latest/cookbook/overview.html). 
Full documentation of [Dataset](http://docs.qcarchive.molssi.org/projects/QCPortal/en/stable/collection-dataset.html) and [ReactionDataset](http://docs.qcarchive.molssi.org/projects/QCPortal/en/stable/collection-reactiondataset.html) are available in the [QCPortal documentation](http://docs.qcarchive.molssi.org/projects/QCPortal/en/stable/index.html). 