# The `Molecule` object

In this basic tutorial we want to introduce the concept of a `Molecule`, which is a class that contains all the information about a molecule that we need to compute the total energy and other related properties.

## Initializing a Molecule object

To prepare a `Molecule`, we need to compute many properties of a system. We will use PySCF to do so, although we could in principle use other software packages. For example:

In [1]:
from pyscf import gto, dft

# Define the geometry of the molecule
geometry = [["H", (0, 0, 0)], ["F", (0, 0, 1.1)]]
mol = gto.M(atom=geometry, basis="def2-tzvp", charge=0, spin=0)

# And we will also need a mean-field object
mf = dft.UKS(mol, xc="b3lyp")
mf.max_cycle = 0  # WE can select whether we want to converge the SCF or not
ground_truth_energy = mf.kernel()



SCF not converged.
SCF energy = -100.310777591563 after 0 cycles  <S^2> = 3.191346e-06  2S+1 = 1.0000064


 If we want to use the `Molecule` to compute HF exact-exchange components, we will need to decide which values of $\omega$ we want to use in the range separated Coulomb kernel: $\text{erfc}(\omega r)/r$. Setting $\omega = 0$ indicates no range separation: the kernel will be $1/r$.

In [2]:
omegas = [0.0, 0.4]

Then we can use the following convenience function to generate the `Molecule` object

In [3]:
import grad_dft as gd

name = "HF"
HF_molecule = gd.molecule_from_pyscf(
    mf, grad_order=2, name=name, energy=ground_truth_energy, omegas=omegas
)

Instructions for updating:
non-resource variables are not supported in the long term


Alternatively we may compute and pass each of the properties of the molecule separately:

In [4]:
HF_molecule = gd.Molecule(
    HF_molecule.grid,
    HF_molecule.atom_index,
    HF_molecule.nuclear_pos,
    HF_molecule.ao,
    HF_molecule.grad_ao,
    HF_molecule.grad_n_ao,
    HF_molecule.rdm1,
    HF_molecule.nuclear_repulsion,
    HF_molecule.h1e,
    HF_molecule.vj,
    HF_molecule.mo_coeff,
    HF_molecule.mo_occ,
    HF_molecule.mo_energy,
    HF_molecule.mf_energy,
    HF_molecule.s1e,
    HF_molecule.omegas,
    HF_molecule.chi,
    HF_molecule.rep_tensor,
    HF_molecule.energy,
    HF_molecule.basis,
    HF_molecule.name,
    HF_molecule.spin,
    HF_molecule.charge,
    HF_molecule.unit_Angstrom,
    HF_molecule.grid_level,
    HF_molecule.scf_iteration,
    HF_molecule.fock,
)

Most of these attributes are Arrays, others are floats or integers. `grad_ao` is a dictionary of arrays, indicating the $\text{n}^{th}$ order gradients
of the atomic orbitals $\nabla^n \text{ao} = \sum_i (\partial^n f / \partial x_i^n)$.

It is also worth mentioning that to avoid type errors in Jax, we convert strings (the basis, the name of the molecule etc.) into integer arrays like:

In [5]:
import jax.numpy as jnp

name_ints = jnp.array([ord(char) for char in name])
name = "".join(chr(num) for num in name_ints)
print(name, name_ints)

HF [72 70]


## Computing gradients

Now that we have a `Molecule` instance, we can compute gradients with respect to some of the properties. For example, we can compute the gradient of the electronic density with respect to the atomic orbitals.

Let us compute $|\nabla \rho|$. In `~/grad_dft/molecule.py` we have defined the following function:

In [6]:
def grad_density(rdm1, ao, grad_ao):
    return 2 * jnp.einsum("...ab,ra,rbj->r...j", rdm1, ao, grad_ao)


grad_density_0 = grad_density(HF_molecule.rdm1, HF_molecule.ao, HF_molecule.grad_ao)

Alternatively, we can efficiently compute $|\nabla \rho|$ by `jax.vmap`'ing over the spin and atomic orbitals axes

In [7]:
from jax import vmap, grad

def parallelized_density(rdm1, ao):
    return jnp.einsum("ab,a,b->", rdm1, ao, ao)

grad_density_ao = vmap(
    vmap(grad(parallelized_density, argnums=1), in_axes=[None, 0]), in_axes=[0, None]
)(HF_molecule.rdm1, HF_molecule.ao)

grad_density_1 = jnp.einsum("...rb,rbj->r...j", grad_density_ao, HF_molecule.grad_ao)

and check we get the same result

In [8]:
print(
    "Are the two forms of computing the gradient of the density the same?",
    jnp.allclose(grad_density_0, grad_density_1),
)

Are the two forms of computing the gradient of the density the same? True


We can now compute one of the finite-range adimensional variables

In [9]:
grad_density_norm = jnp.linalg.norm(grad_density_0, axis=-1)
density = HF_molecule.density()
# We need to avoid dividing by zero
x = jnp.where(
    density > 1e-25,
    grad_density_norm / (2 * (3 * jnp.pi**2) ** (1 / 3) * density ** (4 / 3)),
    0.0,
)
u = x**2 / (1 + x**2)
print("We can check the range is bounded between", jnp.min(u), jnp.max(u))

We can check the range is bounded between 4.455914e-05 1.0


## Saving and loading

Now let's talk about how to save and load a `Molecule` instance (or a list of `Molecule` instances).

In [10]:
import os
from grad_dft.interface import loader, saver as save

save(molecules=[HF_molecule], fname="./HF.hdf5")

Now let's load it back in again

In [11]:
from tqdm import tqdm

load = loader(fname="./HF.hdf5", randomize=True, training=False, config_omegas=[])
for _, system in tqdm(load, "Molecules/reactions per file"):
    HF_molecule = system
    print(
        "Molecule name", "".join(chr(num) for num in HF_molecule.name)
    )  # We use training = False so molecule.name is a string

  args[key] = jnp.asarray(value, dtype=jnp.float64)
  args[key] = jnp.array([ord(char) for char in str(value[()])], dtype=jnp.int64)
  return asarray(x, dtype=self.dtype)
  return asarray(x, dtype=self.dtype)
  args[key] = {int(k): jnp.asarray(v, dtype = jnp.float64) for k, v in value.items()}
Molecules/reactions per file: 1it [00:00, 35.02it/s]

Molecule name b'HF'





We can also create reactions, save, and load them. For example, let us emulate the formation reaction of HF from H and F atoms:

In [12]:
products = [HF_molecule]

reaction_energy = ground_truth_energy

reactants = []
for atom in ["H", "F"]:
    # Define the geometry of the molecule
    mol = gto.M(atom=[[atom, (0, 0, 0)]], basis="def2-tzvp", charge=0, spin=1)

    # To perform DFT we also need a grid
    grids = dft.gen_grid.Grids(mol)
    grids.level = 2
    grids.build()

    # And we will also need a mean-field object
    mf = dft.UKS(mol)
    mf.grids = grids
    ground_truth_energy = mf.kernel()

    molecule = gd.molecule_from_pyscf(
        mf, grad_order=2, name=atom, energy=ground_truth_energy, omegas=omegas
    )

    reactants.append(molecule)
    reaction_energy -= ground_truth_energy

reaction = gd.make_reaction(reactants, products, [1, 1], [1], reaction_energy, name="HF_formation")

converged SCF energy = -0.478343887986114  <S^2> = 0.75  2S+1 = 2
converged SCF energy = -99.1074043968129  <S^2> = 0.75117427  2S+1 = 2.0011739


saving it:

In [13]:
save(molecules=[HF_molecule], reactions=[reaction], fname="HF_formation.hdf5")

loading it:

In [14]:
load = loader(fname="HF_formation.hdf5", randomize=True, training=False, config_omegas=[])
for _, system in tqdm(load, "Molecules/reactions per file"):
    print(
        type(system), "".join(chr(num) for num in system.name)
    )  # We use training = False so system.name is a string

Molecules/reactions per file: 2it [00:00, 30.41it/s]

<class 'grad_dft.molecule.Molecule'> b"b'HF'"
<class 'grad_dft.molecule.Reaction'> ['HF', 'formation', '0']



