# Reaction Datasets

ReactionDatasets are datasets where the primary index is made up of linear combinations of individual computations. For example, an interaction energy dataset would have an index of the complex subtracted by the individual monomers to obtain a final interaction energy. This idea can extended to standard reaction energies, conformational defect energies, and more. This dataset type has been developed by the QCArchive Team in conjunction with:

 - [David Sherrill](http://vergil.chemistry.gatech.edu)
 - Lori Burns
 - Daniel Nascimento
 - Dom Sirianni

To begin, we can connect to the MolSSI QCArchive server:

In [8]:
import qcportal as ptl
client = ptl.FractalClient()
print(client)

FractalClient(server_name='The MolSSI QCArchive Server', address='https://api.qcarchive.molssi.org:443/', username='None')


The current `ReactionDataset`s can be explored below:

In [2]:
client.list_collections("ReactionDataset")

['A21',
 'A24',
 'ACONF',
 'AlkBind12',
 'AlkIsod14',
 'AlkIsomer11',
 'Bauza30',
 'BH76RC',
 'BHPERI26',
 'BSR36',
 'Butanediol65',
 'C20C24',
 'CYCONF',
 'DBH24',
 'DIE60',
 'DS14',
 'FmH2O10',
 'G21EA',
 'H2O16Rel5',
 'H2O17Rel4',
 'HB15',
 'HNBrBDE18',
 'HSG',
 'HTBH38',
 'NC15',
 'NHTBH38',
 'S22',
 'S22by5',
 'S22by7',
 'S66',
 'SSI',
 'X40']

## Exploring a Dataset

For this example, we will explore S22 dataset which is a small interaction energy dataset of 22 common dimers such as the water dimer, methane dimer, and more. To obtain this collection:

In [3]:
ds = client.get_collection("ReactionDataset", "S22")
print(ds)

ReactionDataset(name=`S22`, id='5c8159a4b6a2de3bd1e74306', client='https://api.qcarchive.molssi.org:443/') 


This dataset automatically comes with some ``Contributed Value`` data, or data that has been provided and not explicitly computed. This data is often either experimental data or very costly benchmarks taken from literature. As these Datasets are based off of Pandas DataFrames, we can directly access the underlying DataFrame to see the data provided:

In [4]:
ds.df.head()

Unnamed: 0,S220,S22a,S22b
Ammonia Dimer,-3.17,-3.15,-3.133
Water Dimer,-5.02,-5.07,-4.989
Formic Acid Dimer,-18.61,-18.81,-18.753
Formamide Dimer,-15.96,-16.11,-16.062
Uracil Dimer HB,-20.65,-20.69,-20.641


Here we used `.head()` to access the first five records in the `ReactionDataset`.

All `Collection`s that have `Dataset` in the name (including `ReactionDataset`) have a history available to them to list the data that has been computed. In this case we will filter our history by the DFT method `B2PLYP` and the basis set `def2-SVP`

In [5]:
ds.list_history(method="B2PLYP", basis="def2-SVP")

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,stoichiometry
driver,program,method,basis,keywords,Unnamed: 5_level_1
energy,psi4,b2plyp,def2-svp,scf_default,cp
energy,psi4,b2plyp,def2-svp,scf_default,default


Here we can see that there are five primary keys in the computation:

 - `driver` - The type of computation, this can be energy, gradient, Hessian, and properties.
 - `program` - The program used in the computation.
 - `method` - The quantum chemistry, semiempierical, AI-model, or force field used in the computation.
 - `basis` - The basis used in the computation.
 - `keywords` - A keywords alias used in the computaiton, these keywords aliases reference KeywordSets (see advanced tutorials).

In addition, there is also the `stoichiometry` field which is unique to `ReactionDatasets`. There exists several ways to compute the interaction energy (counterpoise-corrected (`cp`), non-counterpoise-corrected (`default`), and Valiron–Mayer function counterpoise (`vmfc`)) as such the `stoichiometry` field allows for the selection of this particular form.

## Querying Data

To obtain the data for the various historical computations we must query them from the server. Here we will automatically pull all relevant computations that match our query:

In [6]:
ds.get_history(method="b2plyp", basis="def2-SVP")
ds.df.head()

Unnamed: 0,S220,S22a,S22b,B2PLYP/def2-svp
Ammonia Dimer,-3.17,-3.15,-3.133,-5.164385
Water Dimer,-5.02,-5.07,-4.989,-7.966866
Formic Acid Dimer,-18.61,-18.81,-18.753,-22.767617
Formamide Dimer,-15.96,-16.11,-16.062,-18.777757
Uracil Dimer HB,-20.65,-20.69,-20.641,-22.268662


## Stastistics and Visualization

Visual statics and plotting can be generated by the ``visualize`` command:

In [7]:
ds.visualize(method="B2PLYP", basis=["def2-svp", "def2-tzvp"], bench="S220")

In [9]:
ds.visualize(method="B2PLYP", basis=["def2-svp", "def2-tzvp"], bench="S220", kind="violin")