# Optimization Datasets

An `OptimizationDataset` represents geometry optimizations performed on a set of molecules. 

In [44]:
import qcportal as ptl
client = ptl.FractalClient()
client

In [45]:
client.list_collections("OptimizationDataset")

Unnamed: 0_level_0,Unnamed: 1_level_0,tagline
collection,name,Unnamed: 2_level_1
OptimizationDataset,FDA Optimization Dataset 1,
OptimizationDataset,JGI Metabolite Set 1,
OptimizationDataset,OpenFF Discrepancy Benchmark 1,
OptimizationDataset,OpenFF Full Optimization Benchmark 1,
OptimizationDataset,OpenFF NCI250K Boron 1,
OptimizationDataset,OpenFF Optimization Set 1,
OptimizationDataset,OpenFF Primary Optimization Benchmark 1,
OptimizationDataset,OpenFF VEHICLe Set 1,
OptimizationDataset,Pfizer Discrepancy Optimization Dataset 1,
OptimizationDataset,SMIRNOFF Coverage Set 1,


In [46]:
ds = client.get_collection("OptimizationDataset", "SMIRNOFF Coverage Set 1")

## Exploring the Dataset

Each row of the dataset is comprised of a `Entry` which corresponds to a molecule.

In [47]:
ds.df.head()

COC(O)OC-0
C[S-]-0
CS-0
CO-0
CCO-0


New computations are based off specifications which contain many additional parameters to tune the geometry optimization as well as the underlying computational method. Here, we can list all specifications that are attributed to this dataset.

In [48]:
ds.list_specifications()

Unnamed: 0_level_0,Description
Name,Unnamed: 1_level_1
default,Standard OpenFF optimization quantum chemistry...


In this case, there is one specification corresponding to a single level of theory. It is important to recall that these Collections are "live": new specifications can be added and individual optimizations can be under computation. To see the current status of each specification the `status` function is provided:

In [49]:
ds.status(["default"])

Unnamed: 0,default
COMPLETE,1116
ERROR,16


The number of geometry steps for each molecule can be shown:

In [50]:
ds.counts()

Unnamed: 0,default
COC(O)OC-0,11.0
C[S-]-0,6.0
CS-0,5.0
CO-0,4.0
CCO-0,8.0
...,...
CSSCCN=C=S-7,26.0
CSSCCN=C=S-8,45.0
CSSCCN=C=S-9,48.0
CSSCCN=C=S-10,60.0


Individual records can be pulled for molecules:

In [51]:
optrec = ds.get_record(name="CCO-0", specification="default")

These records contain the geometries and energies of the optimization trajectory. Below are some example data that may be pulled from an `OptimizationRecord`. The initial and final molecules may be extracted:

In [52]:
optrec.get_initial_molecule()

<Molecule(name='C2H6O' formula='C2H6O' hash='29df3ae')>

In [53]:
optrec.get_final_molecule()

<Molecule(name='C2H6O' formula='C2H6O' hash='422ad57')>

And the energy trajectory of the optimization can be plotted:

In [54]:
optrec.show_history()