# Getting Started

<div>
<img src="https://qcarchive.molssi.org/images/molssi-logo.svg" alt="MolSSI" align="right" style="width: 120px;"/>

The [Molecular Sciences Software Institute](https://molssi.org) hosts the Quantum Chmemistry Archive (QCArchive) and makes this data available to the entire Computational Molecular Sciences community free of charge. The QCArchive is both a database to view, analyize, and explore existing data as well as a live instance that continuous generates new data as direct by the community.
</div>

<div>
<img src="https://qcarchive.molssi.org/images/QCArchiveDrop2.svg" alt="QCArchive" align="left" style="width: 120px;"/>

The primary interface to this database in Python is a through a ``FractalClient`` from the ``qcportal`` package which can be downloaded via pip (`pip install -e qcportal`) or conda (`conda install qcportal -c conda-forge`). A new ``FractalClient`` automatically connects and has access to all data inside the QCArchive.

</div>

In [4]:
import qcportal as ptl
client = ptl.FractalClient()
print(client)

FractalClient(server_name='The MolSSI QCArchive Server', address='https://api.qcarchive.molssi.org:443/', username='None')


## Finding Collections

One of the main ways to explore the QCArchive is to examine `Collection`s which are structures that allow easy manipulation of data in preset ways. Several example of `Collection`s contained within the QCArchive are as follows:

 - `Dataset` - A dataset where each record is a single molecule and many methods can be applied to each record.
 - `ReactionDataset` - A dataset where each record is a combinations of molecule (e.g. interaction and reaction energies) and many methods can be applied to each record.
 - `OpenFFWorkflow` - A workflow collection for torsiondrives and contstrained optimization developed with the [Open Force Field Iniative](https://openforcefield.org).

In [5]:
client.list_collections()

{'ReactionDataset': ['S22',
  'S22by7',
  'S22by5',
  'A21',
  'A24',
  'DS14',
  'AlkBind12',
  'Bauza30',
  'FmH2O10',
  'HB15',
  'HSG',
  'NC15',
  'S66',
  'X40',
  'CYCONF',
  'AlkIsod14',
  'ACONF',
  'AlkIsomer11',
  'BH76RC',
  'BHPERI26',
  'BSR36',
  'Butanediol65',
  'C20C24',
  'DBH24',
  'DIE60',
  'G21EA',
  'H2O16Rel5',
  'H2O17Rel4',
  'HNBrBDE18',
  'HTBH38',
  'NHTBH38',
  'SSI'],
 'OpenFFWorkflow': ['chemper_rdkit'],
 'TorsionDriveDataset': ['OpenFF Fragmenter Phenyl Benchmark']}

Specific `Collection` types can be queried to limit the amount of collections to browse through:

In [6]:
client.list_collections("reactiondataset")

['A21',
 'A24',
 'ACONF',
 'AlkBind12',
 'AlkIsod14',
 'AlkIsomer11',
 'Bauza30',
 'BH76RC',
 'BHPERI26',
 'BSR36',
 'Butanediol65',
 'C20C24',
 'CYCONF',
 'DBH24',
 'DIE60',
 'DS14',
 'FmH2O10',
 'G21EA',
 'H2O16Rel5',
 'H2O17Rel4',
 'HB15',
 'HNBrBDE18',
 'HSG',
 'HTBH38',
 'NC15',
 'NHTBH38',
 'S22',
 'S22by5',
 'S22by7',
 'S66',
 'SSI',
 'X40']

## Exploring Collections

Collections can be obtained by pulling their data from the centeral server. A collection is primarily metadata and extremely large collections can be pulled in a few seconds. For this example we will explore S22 dataset which is a small interaction energy dataset of 22 common dimers such as the water dimer, methane dimer, and more. To obtain this collection:

In [7]:
ds = client.get_collection("ReactionDataset", "S22")
print(ds)

ReactionDataset(name=`S22`, id='5c8159a4b6a2de3bd1e74306', client='https://api.qcarchive.molssi.org:443/') 


This dataset automatically comes with some ``Contributed Value`` data, or data that has been provided and not explicitly computed. This data is often either experimental data or very costly benchmarks taken from literature. As these Datasets are based off of Pandas DataFrames, we can directly access the underlying DataFrame to see the data provided:

In [9]:
ds.df.head()

Unnamed: 0,S220,S22a,S22b
Ammonia Dimer,-3.17,-3.15,-3.133
Water Dimer,-5.02,-5.07,-4.989
Formic Acid Dimer,-18.61,-18.81,-18.753
Formamide Dimer,-15.96,-16.11,-16.062
Uracil Dimer HB,-20.65,-20.69,-20.641


Here we used `.head()` to access the first five records in the `ReactionDataset`.

All `Collection`s that have `Dataset` in the name (including `ReactionDataset`) have a history available to them to list the data that has been computed. In this case we will filter our history by the DFT method `B2PLYP` and the basis set `def2-SVP`

In [10]:
ds.list_history(method="B2PLYP", basis="def2-SVP")

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,stoichiometry
driver,program,method,basis,keywords,Unnamed: 5_level_1
energy,psi4,b2plyp,def2-svp,scf_default,cp
energy,psi4,b2plyp,def2-svp,scf_default,default


Here we can see that there are five primary keys in the computation:

 - `driver` - The type of computation, this can be energy, gradient, Hessian, and properties.
 - `program` - The program used in the computation.
 - `method` - The quantum chemistry, semiempierical, AI-model, or force field used in the computation.
 - `basis` - The basis used in the computation.
 - `keywords` - A keywords alias used in the computaiton, these keywords aliases reference KeywordSets (see advanced tutorials).

In addition, there is also the `stoichiometry` field which is unique to `ReactionDatasets`. There exists several ways to compute the interaction energy (counterpoise-corrected (`cp`), non-counterpoise-corrected (`default`), and Valiron–Mayer function counterpoise (`vmfc`)) as such the `stoichiometry` field allows for the selection of this particular form.

## Querying Data

To obtain the data for the various historical computations we must query them from the server. Here we will automatically pull all relevant computations that match our query:

In [11]:
ds.get_history(method="b2plyp", basis="def2-SVP")
ds.df.head()

Unnamed: 0,S220,S22a,S22b,B2PLYP/def2-svp
Ammonia Dimer,-3.17,-3.15,-3.133,-5.164385
Water Dimer,-5.02,-5.07,-4.989,-7.966866
Formic Acid Dimer,-18.61,-18.81,-18.753,-22.767617
Formamide Dimer,-15.96,-16.11,-16.062,-18.777757
Uracil Dimer HB,-20.65,-20.69,-20.641,-22.268662


## Stastistics and Visualization

Visual statics and plotting can be generated by the ``visualize`` command:

In [14]:
ds.visualize(method="B2PLYP", basis=["def2-svp", "def2-tzvp"], bench="S220", kind="violin")