Implementation of Tensor Hypercontraction.

Ref:  https://doi.org/10.1063/1.4768233

Motivation:
to factorize the two-body integral $G_{pqrs}$ in a electronic Hamiltonian into:
$$
G_{pqrs} = \sum ^ {M} _ {\mu,  \nu = 1} \chi^{P}_p  \chi^{P}_q Z^{PQ}  \chi^{Q}_r \chi^{Q}_s
$$
This is the so-called tensor hypercontraction (THC).

In this notebook, all superscripts denote indices of grid points, indices are represented by capital letters P and Q.

In [39]:
import numpy as np
from pyscf import gto, dft, lib, scf
from pyscf.dft import numint
from pyscf.dft import r_numint
import copy

In [29]:
#create the mole:
#atom = '''H 0 0 0; H 0 0 0.735; O 0 0 1'''
atom = '''H 0 0 0; H 0 0 0.735'''
#atom = '''H 0 0 0; F 0 0 0.735'''

#basis set:
basis = 'sto-3g'
#basis ="6-31G"
unit = 'angstrom'
charge = 0
spin = 0
verbose = 0

mol = gto.Mole()
mol.build(atom    = atom,
          basis   = basis,
          charge  = charge,
          spin    = spin,
          unit    = unit,
          verbose = verbose)

nmo, nao, natm = mol.nao, mol.nao, mol.natm

In [30]:
mf = scf.RHF(mol)
mf.kernel()
coeff = mf.mo_coeff
#g refers to the two-body term in AO basis
#g_mo refers to the two-body term in MO basis
g = mol.intor("int2e")
g_mo = np.einsum('pqrs,pi,qj,rk,sl->ijkl', g, coeff, coeff, coeff, coeff)

In [31]:
#create the grids using pyscf
grids = dft.Grids(mol)
#here we choose the least dense grids set to speed up the caculaton.
grids.atom_grid = (4, 14)
#to improve the accuracy, you should choose dense grids setup.
#grids.atom_grid =(75, 302)
grids.build()
coords = grids.coords
#weight of each grid:
weights = (grids.weights).reshape(grids.weights.size,1)
#ao_value: the value of AO basis in each grid.
ao_value = numint.eval_ao(mol, coords)

Weighted wave function for each grid:
$$
X^{P}_{\mu}  = \sqrt[4]{\omega_P} R^{P}_{\mu}
$$
where R is the ao_value. (in AO basis here)

In [32]:
#generate_X (weights, R):
def generate_X (weights_touse, R):
    I_12 = np.ones([1,nao])
    weights_reshape = np.dot(weights_touse, I_12)
    print(weights_reshape.shape)
    weights_scale = weights_reshape ** (1/4)
    X = weights_scale * R
    return (X)

X = generate_X (weights, ao_value)


(112, 2)


In MO basis:

In [33]:
ao_value_mo = np.einsum("ui, gu -> gi", coeff, ao_value)
X_mo = generate_X (weights, ao_value_mo)


(112, 2)


From now on, we will work on MO basis:

$$
X^{P}_{ab} \equiv X^{P}_{a} X^{P}_{b}
$$
$$
S^{PP'} \equiv X^{P}_{ab} X^{P'}_{ab}
$$
(all written in MO basis and use einstein sum here)

In [34]:
X_two_ind_mo = np.zeros([grids.weights.size, nao, nao])
for i in range (grids.weights.size):
    for j in range (nao):
        for k in range (nao):
            X_two_ind_mo[i, j, k] = X_mo[i, j] * X_mo[i, k]

space_metric_mo = np.einsum("acd, bcd -> ab", X_two_ind_mo, X_two_ind_mo)

Using  spectral decomposition to get pseudoinverse of S:

In [35]:
e_S, v_S = np.linalg.eig(space_metric_mo)
number_nonzero = np.count_nonzero(e_S > 1e-8)
pseudoinverse_mo = np.zeros([grids.weights.size, grids.weights.size], dtype = 'complex128')
for i in range(number_nonzero):
    v_S_reshape = v_S[:,i].reshape(grids.weights.size,1)
    pseudoinverse_mo += (1/e_S[i]) * np.dot(v_S_reshape, v_S_reshape.transpose())

space_metric_inverse_mo = pseudoinverse_mo


The pseudoinverse function in numpy is not  feasible as shown below. But anyway, the dimenstion of matrix $S$ is roughly only square of the size of system, so it will not be a issue to decompose matrix $S$.

In [41]:
np.linalg.pinv(space_metric_mo)

array([[-2.04324735e+13,  7.52119528e+12,  1.34372849e+13, ...,
         4.21620778e+12,  1.86598955e+12,  5.18903644e+09],
       [ 2.05095502e+15, -7.54956721e+14, -1.34879739e+15, ...,
        -4.23211242e+14, -1.87302856e+14, -5.20861085e+11],
       [ 3.26294809e+14, -1.20109148e+14, -2.14585685e+14, ...,
        -6.73304046e+13, -2.97987762e+13, -8.28659169e+10],
       ...,
       [-6.84090765e+13,  2.51813870e+13,  4.49887896e+13, ...,
         1.41161020e+13,  6.24743852e+12,  1.73731874e+10],
       [ 1.32314511e+13, -4.87049830e+12, -8.70157879e+12, ...,
        -2.73028847e+12, -1.20835833e+12, -3.36026286e+09],
       [ 1.02178688e+11, -3.76119838e+10, -6.71971574e+10, ...,
        -2.10844065e+10, -9.33143818e+09, -2.59493268e+07]])

Equation A13 - A15 in the ref:
$$
E^{P'Q'} \equiv X^{P'}_{pq} G_{pqrs} X^{Q'}_{rs}
$$

$$
Z^{PQ} = [S^{P'P}]^{-1}  E^{P'Q'} [S^{Q'Q}]^{-1}
$$

$$
G_{pqrs} \approx  \chi^{P}_p  \chi^{P}_q Z^{P Q}  \chi^{Q}_r \chi^{Q}_s
$$
(in MO basis, using  einstein sum)

In [36]:
E_mo = np.einsum('pab, abcd, qcd -> pq', X_two_ind_mo, g_mo, X_two_ind_mo)

Z_mo = np.einsum("pa, ab, bq -> pq", space_metric_inverse_mo, E_mo, space_metric_inverse_mo)

g_THC_mo = np.einsum("pa, pb, pq, qc, qd -> abcd", X_mo, X_mo, Z_mo, X_mo, X_mo)

Compare $G$ from THC and the real $G$:

In [37]:
np.linalg.norm(g_THC_mo - g_mo)

1.9937472228287797e-14

In [38]:
g_THC_mo - g_mo

array([[[[-6.66133815e-16+0.j,  7.90704083e-16+0.j],
         [ 7.97697187e-16+0.j, -8.88178420e-16+0.j]],

        [[ 5.84581897e-15+0.j, -1.27675648e-15+0.j],
         [-1.16573418e-15+0.j,  6.19101697e-15+0.j]]],


       [[[ 5.90133012e-15+0.j, -1.36002321e-15+0.j],
         [-1.22124533e-15+0.j,  6.19101697e-15+0.j]],

        [[ 1.04360964e-14+0.j, -4.49216501e-16+0.j],
         [-7.68405621e-16+0.j, -1.15463195e-14+0.j]]]])

possible optimization:
1. remove the grids with very small weights
2. low-rank approx. in S matrix
3. does pruning method matter?

In [43]:
#you can try different cases by changing the molecule at the beginning of this notebook.