# Access to data
NablaDFT include three databases:
1. **Energy database.** There are molecule structure, energy and forces. Data are available via atomic simulation environment (ASE) interface.
2. **Hamiltonian database.** There are molecule structure, energy, forces, hamiltonian and overlap matrix. Data are available via nablaDFT custom access interface.
3. **Raw psi4 wave function.** There are serialized PSI4 wavefunction. Data are available via psi4 or numpy interfaces.

Each database have specific atom units, order of records, and order of atomic orbitals in Hamiltonians. In this tutorial, we show how to load and visualize some data. Advanced processing of methadata and Hamiltonians are described in the following lessons.

## Energy database

Atomic simulation environment (ASE) package is necessary for processing energy databases. It is also help to visualize molecules.

In [9]:
import ase
from ase.db import connect
from ase.units import Bohr
from ase.visualize import view

Indexation of molecules in database starts from 1.

In [10]:
with connect("train_2k_v2_formation_energy_w_forces.db") as train_db:
    atom_row = train_db.get(1)
    row = atom_row.toatoms()

Atom numbers and positiona are available via standart ASE methods.

In [11]:
row.numbers, row.positions

(array([6, 6, 8, 6, 8, 6, 6, 7, 6, 6, 6, 6, 6, 6, 6, 6, 8, 6, 8, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
 array([[ 5.649981,  0.292707, -0.359445],
        [ 4.488466,  1.21383 , -0.624704],
        [ 3.379362,  0.938695,  0.203172],
        [ 2.766558, -0.290821,  0.174661],
        [ 3.231289, -1.149445, -0.61145 ],
        [ 1.597579, -0.6515  ,  1.015378],
        [ 0.412593,  0.249381,  0.74796 ],
        [-0.750674, -0.561331,  1.088722],
        [-2.024895, -0.121393,  1.564805],
        [-2.992659,  0.158262,  0.484208],
        [-4.251773,  0.59135 ,  0.861804],
        [-5.209216,  0.871821, -0.108371],
        [-4.914042,  0.721817, -1.44446 ],
        [-3.657549,  0.289714, -1.819159],
        [-2.695867,  0.007762, -0.850864],
        [-0.394728, -1.919742,  0.860161],
        [-1.157842, -2.929562,  0.864303],
        [ 1.060777, -1.975332,  0.606771],
        [ 1.654344, -2.964555,  0.142461],
        [ 5.445909, -0.429589,  0.454969],
        [ 6.584967, 

You can check the data using visualization. If atomic units are correct, then each atom will drawn as a sphere. The spheres should touch each other, but not intersect or stay aside.

In [12]:
view(row, viewer='x3d')

## Hamiltonian database

We provide custom interface to access the Hamiltonians.

In [13]:
#[TODO] from nablaDFT.dataset import HamiltonianDatabase
from hamiltonian_dataset import HamiltonianDatabase

In [14]:
train = HamiltonianDatabase("dataset_train_2k.db")
Z, R, E, F, H, S, C = train[0]  # atoms numbers, atoms positions, energy, forces, core hamiltonian, overlap matrix, coefficients matrix
Z, R

(array([6, 8, 6, 6, 6, 8, 6, 6, 7, 6, 8, 6, 6, 6, 6, 6, 6, 9, 6, 6, 9, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32),
 array([[-9.9580383e+00,  4.7681794e+00,  3.8711627e+00],
        [-7.7771869e+00,  3.9617693e+00,  3.1881814e+00],
        [-5.7580123e+00,  2.9139218e+00,  2.2679133e+00],
        [-5.8301983e+00,  1.0391434e+00,  4.7204602e-01],
        [-3.6052120e+00, -5.2921779e-02, -4.8023799e-01],
        [-3.5979176e+00, -1.8067011e+00, -2.1569712e+00],
        [-3.6760540e+00, -3.5478814e+00, -3.8251157e+00],
        [-1.2556607e+00,  8.4752893e-01,  4.7767553e-01],
        [ 9.3554670e-01, -3.1577134e-01, -5.4378569e-01],
        [ 3.4638245e+00,  1.5381047e-01, -6.9315154e-03],
        [ 3.9389677e+00,  1.8539952e+00,  1.6124599e+00],
        [ 5.6709924e+00, -1.0703012e+00, -1.0771514e+00],
        [ 8.0825138e+00, -1.9120060e-01, -4.3459165e-01],
        [ 1.0261353e+01, -1.2486668e+00, -1.3914318e+00],
        [ 1.0117476e+01, -3.2479763e+00, -3.0457680e+00

Formats of the data are mentioned in docstring. Note, that the atom positions are in Bohr.

In [15]:
help(HamiltonianDatabase)

Help on class HamiltonianDatabase in module hamiltonian_dataset:

class HamiltonianDatabase(builtins.object)
 |  HamiltonianDatabase(filename, flags=1)
 |  
 |  This is a class to store large amounts of ab initio reference data
 |  for training a neural network in a SQLite database
 |  
 |  Data structure:
 |  Z (N)    (int)        nuclear charges
 |  R (N, 3) (float)      Cartesian coordinates in bohr
 |  E ()     (float)      energy in Eh
 |  F (N, 3) (float)      forces in Eh/bohr
 |  H (Norb, Norb)        full hamiltonian in atomic units
 |  S (Norb, Norb)        overlap matrix in atomic units
 |  C (Norb, Norb)        core hamiltonian in atomic units
 |  moses_id () (int)     molecule id in MOSES dataset
 |  conformer_id () (int) conformation id
 |  
 |  Methods defined here:
 |  
 |  __getitem__(self, idx)
 |  
 |  __init__(self, filename, flags=1)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __len__(self)
 |  
 |  add_Z(self, Z, flags=2)
 |  
 

You can build ASE object from this data and visuzlize the molecule

In [16]:
atom = ase.Atoms(Z, R*Bohr)
view(atom, viewer='x3d')

## Raw psi4 wave function

You can upload PSI4 wavefunctions into the PSI4, or into numpy. Numpy-way is simple, because it is not require install pf PSI4.

In [20]:
import numpy as np

In [21]:
# [TODO]
data = np.load('wfn_conf_1_0.npy', allow_pickle=True).tolist()

In [22]:
Z = data['molecule']['elez']
R = data['molecule']['geom'].reshape((-1,3))
Z, R

(array([ 6,  6,  6,  6,  6,  8,  6,  8,  6,  6,  6,  6, 17,  6,  6,  7,  6,
         6,  7,  6,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
         1,  1,  1]),
 array([[-2.138707e+00, -1.797816e+00,  6.077890e-01],
        [-2.440455e+00, -4.234010e-01,  3.012600e-02],
        [-3.006454e+00,  4.416220e-01,  1.097567e+00],
        [-3.521434e+00, -6.235700e-01, -1.042317e+00],
        [-1.234783e+00,  7.731300e-02, -6.529530e-01],
        [-1.266777e+00,  3.910150e-01, -1.845080e+00],
        [ 5.171700e-02,  2.194200e-01,  5.818700e-02],
        [ 1.034551e+00,  6.177090e-01, -8.778360e-01],
        [ 2.222095e+00, -7.806000e-03, -1.139941e+00],
        [ 2.658358e+00, -1.154381e+00, -5.163260e-01],
        [ 3.867739e+00, -1.765375e+00, -8.036820e-01],
        [ 4.694243e+00, -1.215618e+00, -1.759644e+00],
        [ 6.247248e+00, -1.953424e+00, -2.165361e+00],
        [ 4.262167e+00, -6.752400e-02, -2.387107e+00],
        [ 3.060803e+00,  5.423980e-01, -2.104000e+00],
 

You can directly work with density matrix, hamiltonians and other psi4 array via nimpy interface.

In [23]:
atom = ase.Atoms(Z, R)
view(atom, viewer='x3d')

You can also visualize molecule from this data.

In [24]:
wfn = np.load('wfn_conf_1_0.npy', allow_pickle=True).tolist()
orbital_matrix_a = wfn["matrix"]["Ca"]        # alpha orbital coefficients
orbital_matrix_b = wfn["matrix"]["Cb"]        # betta orbital coefficients
density_matrix_a = wfn["matrix"]["Da"]        # alpha electonic density
density_matrix_b = wfn["matrix"]["Db"]        # betta electonic density
aotoso_matrix = wfn["matrix"]["aotoso"]       # atomic orbital to symmetry orbital transformation matrix
core_hamiltonian_matrix = wfn["matrix"]["H"]  # core Hamiltonian matrix
fock_matrix_a = wfn["matrix"]["Fa"]           # DFT alpha Fock matrix
fock_matrix_b = wfn["matrix"]["Fb"]           # DFT betta Fock matrix 

An advenced processing of wavefunctions and data is available from psi4 also.
Note, that psi4 require compilation or conda install. More information about obtaining of PSI4 is available here https://psicode.org/psi4manual/master/build_obtaining.html

In [27]:
import psi4

ModuleNotFoundError: No module named 'psi4'

In [28]:
wfn = psi4.core.Wavefunction.from_file('wfn_conf_1_0.npy')
psi4.oeprop(wfn, "MAYER_INDICES")
psi4.oeprop(wfn, "WIBERG_LOWDIN_INDICES")
psi4.oeprop(wfn, "MULLIKEN_CHARGES")
psi4.oeprop(wfn, "LOWDIN_CHARGES")
meyer_bos = wfn.array_variables()["MAYER INDICES"]  # Mayer bond indices
lodwin_bos = wfn.array_variables()["WIBERG LOWDIN INDICES"]  # Wiberg bond indices
mulliken_charges = wfn.array_variables()["MULLIKEN CHARGES"]  # Mulliken atomic charges
lowdin_charges = wfn.array_variables()["LOWDIN CHARGES"]  # Löwdin atomic charges

NameError: name 'psi4' is not defined