# First Steps

<div>
<img src="https://qcarchive.molssi.org/images/molssi-logo.svg" alt="MolSSI" align="right" style="width: 120px;"/>

The [Molecular Sciences Software Institute](https://molssi.org) hosts the Quantum Chmemistry Archive (QCArchive) and makes this data available to the entire Computational Molecular Sciences community free of charge. The QCArchive is both a database to view, analyize, and explore existing data as well as a live instance that continuous generates new data as direct by the community.
</div>

<div>
<img src="https://qcarchive.molssi.org/images/QCArchiveDrop2.svg" alt="QCArchive" align="left" style="width: 120px;"/>

The primary interface to this database in Python is a through a ``FractalClient`` from the ``qcportal`` package which can be downloaded via pip (`pip install -e qcportal`) or conda (`conda install qcportal -c conda-forge`). A new ``FractalClient`` automatically connects and has access to all data contained within the QCArchive.

</div>

In [6]:
import qcportal as ptl
client = ptl.FractalClient()
print(client)

FractalClient(server_name='The MolSSI QCArchive Server', address='https://api.qcarchive.molssi.org:443/', username='None')


## Finding Collections

One of the main ways to explore the QCArchive is to examine `Collection`s which are structures that allow easy manipulation of data in preset ways. Several example of `Collection`s contained within the QCArchive are as follows:

 - `Dataset` - A dataset where each record is a single molecule and many methods can be applied to each record.
 - `ReactionDataset` - A dataset where each record is a combinations of molecule (e.g. interaction and reaction energies) and many methods can be applied to each record.
 - `OpenFFWorkflow` - A workflow collection for torsiondrives and contstrained optimization developed with the [Open Force Field Iniative](https://openforcefield.org).

In [2]:
client.list_collections()

{'ReactionDataset': ['S22',
  'S22by7',
  'S22by5',
  'A21',
  'A24',
  'DS14',
  'AlkBind12',
  'Bauza30',
  'FmH2O10',
  'HB15',
  'HSG',
  'NC15',
  'S66',
  'X40',
  'CYCONF',
  'AlkIsod14',
  'ACONF',
  'AlkIsomer11',
  'BH76RC',
  'BHPERI26',
  'BSR36',
  'Butanediol65',
  'C20C24',
  'DBH24',
  'DIE60',
  'G21EA',
  'H2O16Rel5',
  'H2O17Rel4',
  'HNBrBDE18',
  'HTBH38',
  'NHTBH38',
  'SSI'],
 'OpenFFWorkflow': ['chemper_rdkit'],
 'TorsionDriveDataset': ['OpenFF Fragmenter Phenyl Benchmark']}

Specific `Collection` types can be queried to limit the amount of collections to browse through:

In [3]:
client.list_collections("reactiondataset")

['A21',
 'A24',
 'ACONF',
 'AlkBind12',
 'AlkIsod14',
 'AlkIsomer11',
 'Bauza30',
 'BH76RC',
 'BHPERI26',
 'BSR36',
 'Butanediol65',
 'C20C24',
 'CYCONF',
 'DBH24',
 'DIE60',
 'DS14',
 'FmH2O10',
 'G21EA',
 'H2O16Rel5',
 'H2O17Rel4',
 'HB15',
 'HNBrBDE18',
 'HSG',
 'HTBH38',
 'NC15',
 'NHTBH38',
 'S22',
 'S22by5',
 'S22by7',
 'S66',
 'SSI',
 'X40']

## Exploring Collections

Collections can be obtained by pulling their data from the centeral server. A collection is primarily metadata and extremely large collections can be pulled in a few seconds. For this example we will explore S22 dataset which is a small interaction energy dataset of 22 common dimers such as the water dimer, methane dimer, and more. To obtain this collection:

In [4]:
ds = client.get_collection("ReactionDataset", "S22")
print(ds)

ReactionDataset(name=`S22`, id='5c8159a4b6a2de3bd1e74306', client='https://api.qcarchive.molssi.org:443/') 


## Stastistics and Visualization

Visual statics and plotting can be generated by the ``visualize`` command:

In [5]:
ds.visualize(method="B2PLYP", basis=["def2-svp", "def2-tzvp"], bench="S220", kind="violin")