# Work with metainformation.

NablaDFT includes three independent dataset. You can mix data from several datasets and fuse records together using unique identifiers of the molecule and conformation. 

Each record have two ids:
- `moses_id` is an index of molecule in the MOSES database.
- `conformation_id` is an index of conformation for the specific molecule. 

Order of records in three databases of nablaDFT are not compatible, but `moses_id` and `conformation_id` are unique identifiers of conformation and molecule across all databases.

In [40]:
import ase
from ase.db import connect

In [41]:
with connect("train_2k_v2_formation_energy_w_forces.db") as train_db:
    atom_row = train_db.get(1)
    print(atom_row.key_value_pairs)
    print(train_db.metadata)
    # row = atom_row.toatoms() 

{'moses_id': 164, 'conformation_id': 0, 'smiles': 'CCOC(=O)C1CN(Cc2ccccc2)C(=O)C1=O'}
{'_distance_unit': 'Ang', '_property_unit_dict': {'energy': 'Hartree', 'forces': 'Hartree/Ang'}}


In advance, you can work with SMILES, energy dictionary of PSI4, the HOMO-LUMO gap, dipoles and other parameters of molecule and conformation via `summary.csv`. This file is much smaller in comparison with the full dataset of raw PSI4 wavefunctions.

In [42]:
import pandas as pd

In [43]:
summary = pd.read_csv('summary.csv', index_col=0, nrows=100)

In [44]:
summary.columns

Index(['MOSES id', 'CONFORMER id', 'archive name', 'SMILES',
       'DFT TOTAL ENERGY', 'DFT XC ENERGY', 'DFT NUCLEAR REPULSION ENERGY',
       'DFT ONE-ELECTRON ENERGY', 'DFT TWO-ELECTRON ENERGY', 'DFT DIPOLE X',
       'DFT DIPOLE Y', 'DFT DIPOLE Z', 'DFT TOTAL DIPOLE',
       'DFT ROT CONSTANT A', 'DFT ROT CONSTANT B', 'DFT ROT CONSTANT C',
       'DFT HOMO', 'DFT LUMO', 'DFT HOMO-LUMO GAP', 'DFT ATOMIC ENERGY',
       'DFT FORMATION ENERGY'],
      dtype='object')

In [45]:
summary[['MOSES id', 'CONFORMER id', 'SMILES']]

Unnamed: 0_level_0,MOSES id,CONFORMER id,SMILES
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1216192,1,0,CC(C)(C)C(=O)C(Oc1ccc(Cl)cc1)n1ccnc1
1216193,1,1,CC(C)(C)C(=O)C(Oc1ccc(Cl)cc1)n1ccnc1
1216194,1,2,CC(C)(C)C(=O)C(Oc1ccc(Cl)cc1)n1ccnc1
1216195,1,3,CC(C)(C)C(=O)C(Oc1ccc(Cl)cc1)n1ccnc1
1216196,1,4,CC(C)(C)C(=O)C(Oc1ccc(Cl)cc1)n1ccnc1
...,...,...,...
1216287,26,4,CC(c1ncncc1F)C(O)(Cn1cncn1)c1ccc(F)cc1F
1216288,26,5,CC(c1ncncc1F)C(O)(Cn1cncn1)c1ccc(F)cc1F
1216289,26,6,CC(c1ncncc1F)C(O)(Cn1cncn1)c1ccc(F)cc1F
1216290,26,7,CC(c1ncncc1F)C(O)(Cn1cncn1)c1ccc(F)cc1F
