# Getting Started

The [Molecular Sciences Software Institute](https://molssi.org) hosts the Quantum Chmemistry Archive and makes this data available to all users. To begin exploring the data in the Archive a new client can be built and queried. By default, the client automatically connects to the MolSSI QCArchive.

In [5]:
import qcfractal.interface as ptl # TODO change to `import qcportal as ptl`
client = ptl.FractalClient()
print(client)

FractalClient(server_name='The MolSSI QCArchive Server', address='https://api.qcarchive.molssi.org:443/', username='None')


## Finding Collections

One of the main ways to explore the QCArchive is to examine `Collection`s which are structures that allow easy manipulation of data in preset ways. Several example of `Collection`s contained within the QCArchive are as follows:
 - `Dataset` - A dataset where each record is a single molecule and many methods can be applied to each record.
 - `ReactionDataset` - A dataset where each record is a combinations of molecule (e.g. interaction and reaction energies) and many methods can be applied to each record.
 - `OpenFFWorkflow` - A workflow collection for torsiondrives and contstrained optimization developed with the [Open Force Field Iniative](https://openforcefield.org).

In [3]:
client.list_collections()

{'reactiondataset': ['S22', 'S22by7', 'S22by5']}

Specific `Collection` types can be queried to limit the amount of collections to browse through

In [6]:
client.list_collections("reactiondataset")

['S22', 'S22by5', 'S22by7']

## Exploring Collections

Collections can be obtained by pulling their data from the centeral server. A collection is primarily metadata and extremely large collections can be pulled in a few seconds. To obtain a collection

In [20]:
ds = client.get_collection("ReactionDataset", "S22")
print(ds)

ReactionDataset(name=`S22`, id='5c8159a4b6a2de3bd1e74306', client='https://api.qcarchive.molssi.org:443/') 


This dataset is the popular interaction energy dataset containing dimers such as the water dimer, benzene dimer, and Adenine-Thymine complex. Datasets are based off of Pandas DataFrames, to access the underlying dataframe:

In [15]:
ds.df.head()

Ammonia Dimer
Water Dimer
Formic Acid Dimer
Formamide Dimer
Uracil Dimer HB


Here we used `.head()` to access the first five records in the `ReactionDataset`.

All `Collection`s that derive from `Dataset` (including `ReactionDataset`) have a history available to them to list the data that has been computed. In this case we will filter our history by the DFT method `B2PLYP` and the basis set `def2-SVP`

In [13]:
ds.list_history(method="B2PLYP", basis="def2-SVP")

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,stoichiometry
driver,program,method,basis,keywords,Unnamed: 5_level_1
energy,psi4,b2plyp,def2-svp,scf_default,cp
energy,psi4,b2plyp,def2-svp,scf_default,default
energy,psi4,b2plyp,def2-svp,scf_default,nocp


Here we can see that there are five primary keys in the computation:
 - `driver` - The type of computation, this can be energy, gradient, Hessian, and properties.
 - `program` - The program used in the computation.
 - `method` - The quantum chemistry, semiempierical, AI-model, or force field used in the computation.
 - `basis` - The basis used in the computation.
 - `keywords` - A keywords alias used in the computaiton, these keywords aliases reference KeywordSets (see advanced tutorials).

In addition, there is also the `stoichiometry` field which is unique to `ReactionDatasets`. There exists several ways to compute the interaction energy (counterpoise-corrected (cp), non-counterpoise-corrected (default), and Valiron–Mayer function counterpoise (vmfc)) as such the `stoichiometry` field allows for the selection of this particular form.

To obtain the data for the records they must be queried from the server

In [21]:
ds.query("b2plyp", "def2-svp")
ds.df.head()

Unnamed: 0,b2plyp/def2-svp
Ammonia Dimer,
Water Dimer,
Formic Acid Dimer,
Formamide Dimer,
Uracil Dimer HB,
