In [1]:
import numpy as np
import os

# Background

The overlap matrix is:

$$S_{\mu \nu} =  \int d\vec{r}_{1} \phi_{\mu}(1)^{*}\phi_{\nu}(1)$$

- $\phi_{\mu}$ are basis functions (defined in basis set)


The unknown molecular orbitals $\psi_{i}$ are expanded as a linear expansion of the $K$ known basis functions $\{ \phi_{i} | i=1,2,..., K \}$:

$$ \psi_{i} =  \sum_{\mu=1}^{K} C_{\mu i} \phi_{\mu}$$


$C$ is a $K \times K$ matrix of expansion coefficients $C_{\mu i}$. The columns of $C$ describe the molecular orbitals!


We can find the total number of electrons $N$ in the system by:

$$ N =  2 \sum_{a}^{N/2}\int d\vec{r}  \bigg( \psi_{a}(\vec{r})^{*} \psi_{i}(\vec{r}) \bigg) =  2 \sum_{a}^{N/2} 1$$

- integral gives probablity of finding electron $a$ over all space (must be 1)
- summing over all electrons will give the total number of electrons

The charge density has the following definition:

$$\rho(\vec{r}) = 2 \sum_{a}^{N/2} \bigg( \psi_{a}(\vec{r})^{*} \psi_{i}(\vec{r}) \bigg)$$

- re-write using definition of $\psi_{i}=  \sum_{\mu=1}^{K} C_{\mu i} \phi_{\mu}$

$$\rho(\vec{r}) = 2 \sum_{a}^{N/2} \Bigg( \bigg[ \sum_{\nu}^{K} C_{\nu a}^{*} \phi_{\nu}(\vec{r})^{*} \bigg] \bigg[ \sum_{\mu}^{K} C_{\mu a}\phi_{\mu}(\vec{r}) \bigg] \Bigg)$$

- move things around

$$\rho(\vec{r}) = \sum_{\nu}^{K} \sum_{\mu}^{K} \Big( 2 \sum_{a}^{N/2} C_{\mu a} C_{\nu a}^{*} \Big) \phi_{\mu}(\vec{r}) \phi_{\nu}(\vec{r})^{*} $$

- which is 

$$\rho(\vec{r}) = \sum_{\mu, \nu}^{K} P_{\mu \nu} \phi_{\mu}(\vec{r}) \phi_{\nu}(\vec{r})^{*} $$


- $P_{\mu \nu}$ is known as the density matrix and is:

$$P_{\mu \nu} = 2 \sum_{a}^{N/2} C_{\mu a} C_{\nu a}^{*}$$

Therefore we can also find the total number of electrons in the system by:

$$ N =  2 \sum_{a}^{N/2}\int d\vec{r}  \bigg( \psi_{a}(\vec{r})^{*} \psi_{i}(\vec{r}) \bigg) =  \sum_{\nu}^{K} \sum_{\mu}^{K} \Big( 2 \sum_{a}^{N/2} C_{\mu a} C_{\nu a}^{*} \Big) \int d\vec{r} \phi_{\mu}(\vec{r})  \phi_{\nu}(\vec{r})^{*}$$

- This is simply:

$$N =  \sum_{\nu}^{K} \sum_{\mu}^{K} P_{\mu \nu} S_{\nu \mu}= \sum_{\mu}^{K} PS_{\mu \mu} = \mathcal{Tr}(PS)$$

- One can interpret $ PS_{\mu \mu}$ in the above equation as the number of electrons associated with $ \phi_{\mu}$
- This is a **Mulliken population analysis**

# Orbital Localization

- When we perform a SCF calculation, one gets an optimized C matrix
    - $C$ is a $K \times K$ matrix of expansion coefficients $C_{\mu i}$
    - The columns of $C$ describe the molecular orbitals!
    - MO i: $ \psi_{i} =  \sum_{\mu=1}^{K} C_{\mu i} \phi_{\mu}$
    
    
- These molecular orbitals are usually **delocalized**
    - non-negligible amplitude over the whole system, rather than only around some atom(s) or bond(s)

- But we know in QM that a given basis choice is NOT unique


- We can therefore perform a unitary rotation on molecular orbitals

$$ \psi_{i} U_{rot} =  \Big( \sum_{\mu=1}^{K} C_{\mu i} \phi_{\mu} \Big) U_{rot} = \psi_{i}^{new}$$
    
    
The idea is to use a rotation such that the resulting orbitals $\psi_{i}^{new}$ are as spatially localized as possible. 


The Pipek-Mezey (PM) [localization](https://notendur.hi.is/hj/papers/paperPipekmezey8.pdf) **maximizes the population charges on the atoms**:

$$ f (U_{rot}) = \sum_{A}^{N_{atoms}} \Bigg( Z_{A} -  \sum_{\mu \text{ on atom } A} PS_{\mu \mu} \Bigg)$$

### Method 1
- Given optimized $C$ coefficient matrix
    - which has been rotated to localize orbitals
    - (used to build localized density matrix)


- **Look through basis functions $\phi_{\mu}$ of the ACTIVE atoms**

    
- check the mulliken charge // mulliken population of the orbital
    - if above a certain threshold associate it to active system
    - otherwise put in the environment
 


To choose the active and enviroment subsystems we do the following:

1. Given a localized molecular orbs (localized C matrix), we take the absolute mag squared of the coefficients of the active part for a given localized orb and divide by the absolute mag squared of all the coefficents of a that orb... THis will give a value of how much the active system contributes to that orb.

2. Mathematically, for orbital $j$ 
    - remember MO orbs given by columns of C matrix
    - In equation below C matrix is the LOCALIZED form!


$$ \text{threshold} =  \frac{\sum_{\mu\in \text{active AO}}^{K} |C_{\mu j}|^{2}}{\sum_{\mu =1}^{K} |C_{\mu j}|^{2}}$$

## METHOD 2 - SPADE

    

Subsytem Projected Atomic orbital DEcomposition (SPADE) begins by orthogonalising the occupied MOs

$$ \bar{C}_{occ} = S^{-1/2}C_{occ}$$

We project these onto the active atomic orbitals (erasing the contribution from the environment AOs to the MO matrix).

$$ \bar{C}_{occ}^A = P_A\bar{C}_{occ}$$

A singlular value decomposition of these is then taken

$$ \bar{C}_{occ}^A = U \Sigma V^{T}$$

The singular values $\{\sigma\}$ given as the diagonal elements of $\Sigma$, are then used to define the subsytem decomposition by locating the maximum change in singluar value

$$ m = max_{i} \bigg(\sigma_{i} - \sigma_{i+1} \bigg)$$

The occupied MOs are then rotated into the SPADE basis using the right singular vectors of the SVD

$$ \bar{C}_{occ}^{SPADE} = \bar{C}_{occ} V_{m}$$

The SPADE basis is then used to define the active and environment subsystems, taking the first m orbitals as the active subsystem and the remaining as the environment.

Let's start by building a molucule and SCF object.

In [2]:
from pathlib import Path

water_filepath = Path("molecular_structures/acetonitrile.xyz").absolute()
print(water_filepath)

basis = "STO-3G"
charge = 0
xc_functional = "b3lyp"
convergence = 1e-6
pyscf_print_level = 1
max_ram_memory = 4_000
n_active_atoms = 2
occ_cutoff = 0.95
virt_cutoff = 0.95
run_virtual_localization = False


/home/mwilliams/Code/Nbed/docs/notebooks/molecular_structures/acetonitrile.xyz


In [3]:
from pyscf import gto, scf

full_mol = gto.Mole(
    atom=str(water_filepath),
    basis=basis,
    charge=charge,
).build()

global_ks = scf.RKS(full_mol)
global_ks.conv_tol = convergence
global_ks.xc = xc_functional
global_ks.max_memory = max_ram_memory
global_ks.verbose = pyscf_print_level
global_ks.kernel()



-131.06532518358154

In [4]:
from scipy import linalg
import numpy as np

# Locate he occupied orbitals
occupancy = global_ks.mo_occ
n_occupied_orbitals = np.count_nonzero(occupancy)
occupied_orbitals = global_ks.mo_coeff[:, :n_occupied_orbitals]

# Project onto the active AOs
# Do this by erasing rows of the C matrix 
# that correspond to contributions from the environment
# this only works because we have placed our active atoms at the start of the file.
n_act_aos = global_ks.mol.aoslice_by_atom()[n_active_atoms - 1][-1]

# Orthogonalise the MOs
ao_overlap = global_ks.get_ovlp()

rotated_orbitals = (
    linalg.fractional_matrix_power(ao_overlap, 0.5) @ occupied_orbitals
)

# Take the SVD of the rotated and projected orbitals
_, sigma, right_vectors = linalg.svd(rotated_orbitals[:n_act_aos, :])


# Prevents an error with argmax
if len(sigma) == 1:
    n_act_mos = 1
else:
    value_diffs = sigma[:-1] - sigma[1:]
    n_act_mos = np.argmax(value_diffs) + 1

n_env_mos = n_occupied_orbitals - n_act_mos

# get active and enviro indices
active_MO_inds = np.arange(n_act_mos)
enviro_MO_inds = np.arange(n_act_mos, n_act_mos + n_env_mos)

# Defining active and environment orbitals and density
c_active = occupied_orbitals @ right_vectors.T[:, :n_act_mos]
c_enviro = occupied_orbitals @ right_vectors.T[:, n_act_mos:]
c_loc_occ = occupied_orbitals @ right_vectors.T

In [5]:
print(f"{n_act_mos=}")
print(f"{n_env_mos=}")

print(f"{active_MO_inds=}")
print(f"{enviro_MO_inds=}")

print(f"{c_active.shape=}")
print(f"{c_enviro.shape=}")

n_act_mos=7
n_env_mos=4
active_MO_inds=array([0, 1, 2, 3, 4, 5, 6])
enviro_MO_inds=array([ 7,  8,  9, 10])
c_active.shape=(18, 7)
c_enviro.shape=(18, 4)


### Virtual Orbital Localization via Concentric Localization

[1] D. Claudino and N. J. Mayhall, "Simple and Efficient Truncation of Virtual Spaces in Embedded Wave Functions via Concentric Localization", Journal of Chemical Theory and Computation, vol. 15, no. 11, pp. 6085-6096, Nov. 2019, doi: 10.1021/ACS.JCTC.9B00682.

Concentric localization is an extension of SPADE which allows for virtual orbitals to be localized.

We begin by projecting the MOs onto the active region as before.

$$ \bar{C}^A = P_A\bar{C}$$

The first step of the iterative process requires finding the virtual orbitals in the projected basis. Note that these virtual orbitals should not include those which have been projected out of the active space.

$$ C^A_{vir} = S^{-1}_A S_{PB,WB} C_{vir} $$

Where the overlap matrix between the projected basis and the `working basis` is given by

$$ [S_{PB,WB}]_{i,j} = \langle a_i | u_j \rangle \ \{a_i \in C^A, u_j \in C\}$$

We can then begin to build up a set of localized orbitals iteratively. For the initial step, we find the overlap of the two sets of orbitals and singlular value decompose this.

$$ C^A_{vir} S_{PB,WB} C_{vir} = U \Sigma V^{T}$$

By splitting the $V$ matrix into its image and kernel, we can define two sets of orbitals for the $0th$ iteration.

Let $T: V \to W$ we a linear transformation, then the image and kernel of $T$ are defined as

$$ im\ T = T(V) = \{T(v) | v \in V\}$$
$$ ker\ T = \{v \in V | T(v) = 0\}$$

$$ C_0 = C_{vir} V_{image}$$
$$ C_{0,k} = C_{vir} V_{ker}$$

Subesquent iterations are found using the overlap of these two sets of orbitals under the action of the Fock operator $F$.

$$ C_n^{\dagger} F C_{n,k} = U_n \Sigma_n V_n^{T}$$

$$ C_{n+1} = C_{n} V_{n,image}$$
$$ C_{n+1,k} = C_{n} V_{n,ker}$$

$$ C_{active\ space} \to \{C_0, C_1, \dots, C_{n+1}\}

In [6]:
n_act_aos

10

In [7]:
# First lets define how many total orbitals we want to have
max_orbs = 15

In [8]:
c_initial = global_ks.mo_coeff
c_active = c_initial[:, :n_act_aos]

virtual_orbs = np.where(global_ks.mo_occ == 0)[0]
c_virtual = c_initial[:, virtual_orbs]

In [11]:
s_active = c_active.T @ c_active # This doesnt seem right
s_pbwb = c_active.T @ c_initial # This is incorrect, they use a pyscf intor in psiembd

left = np.linalg.inv(s_active)
operator = s_pbwb
right = c_virtual

_, sigma, right_vectors = linalg.svd(left @ s_pbwb @ right)

# We'll iteratively build up the total C matrix
c_total = c_active

while True:

    # how to get the span and kernel of v?
    print(sigma)

    v_span = right[:, np.where(sigma >= 1e-15)] # 0 but instability
    v_ker = right[:, np.where(sigma < 1e-15)]

    c_i = right @ v_span

    if np.np.where(abs(sigma) >= 1e-15).shape[1] == 0:
        c_iker = right ## or goes to 0 matrix?
    else:
        c_iker = right @ v_ker

    c_total = np.hstack((c_total, c_i))

    if c_total.shape[1] >= max_orbs:
        break

    _, sigma, right_vectors = linalg.svd(c_i.T @ global_ks.get_hcore() @ c_iker)

ValueError: expected square array_like input

### PySCF methods

In [None]:
# get xyz file for water

notebook_dir = os.getcwd()
docs_dir = os.path.dirname(notebook_dir)
NBed_dir = os.path.dirname(docs_dir)
Test_dir = os.path.join(NBed_dir, "tests")
mol_dir = os.path.join(Test_dir, "molecules")

water_xyz_path = os.path.join(mol_dir, "water.xyz")

In [None]:
### inputs
from pyscf import gto, scf

basis = "STO-3G"
charge = 0
spin = 0
full_system_mol = gto.Mole(
    atom=water_xyz_path,
    basis=basis,
    charge=charge,
    spin=spin,
)
full_system_mol.build()

HF_scf = scf.RHF(full_system_mol)
HF_scf.verbose = 1
HF_scf.conv_tol = 1e-6
HF_scf.kernel()
###

In [None]:
from nbed.localizers import (
    BOYSLocalizer,
    IBOLocalizer,
    Localizer,
    PMLocalizer,
    SPADELocalizer,
)

In [None]:
localizers = {
    "spade": SPADELocalizer,
    "boys": BOYSLocalizer,
    "ibo": IBOLocalizer,
    "pipek-mezey": PMLocalizer,
}

In [None]:
n_active_atoms = 2  # (first n rows are active in xyz file)
loc_str = "boys"  # <--- change to perform different localization
threshold = 0.95


## object runs localization when initialized!
loc_system = localizers[loc_str](
    HF_scf,
    n_active_atoms,
    occ_cutoff=threshold,
    # virt_cutoff=0.95,
    # run_virtual_localization=False
)

In [None]:
print(f"active MO inds: {loc_system.active_MO_inds}")
print(f"enviro MO inds: {loc_system.enviro_MO_inds}")


# orb threshold
if loc_str != "spade":
    print("Localized MO threshold:", loc_system.enviro_selection_condition)
    print(f"indices above {threshold} set active")
    # indices above threshold (usually 95%) set to active

In [None]:
dm_localised_full_system = 2 * loc_system._c_loc_occ @ loc_system._c_loc_occ.conj().T
dm_active = loc_system.dm_active
dm_enviro = loc_system.dm_enviro

# check act and env density give the full density
print(
    f"does: y_full = y_act + y_env ... {np.allclose(dm_localised_full_system, dm_active + dm_enviro)}"
)


# check number of electrons is still the same after orbitals have been localized (change of basis)
s_ovlp = loc_system._global_ks.get_ovlp()
n_active_electrons = np.trace(dm_active @ s_ovlp)
n_enviro_electrons = np.trace(dm_enviro @ s_ovlp)


# check number of electrons is correct
print(
    f"does: n_elec_full = n_elec_act + n_elec_env ... {np.allclose(HF_scf.mol.nelectron, n_active_electrons+n_enviro_electrons)}"
)

In [None]:
# active MOs
loc_system.c_active

In [None]:
# enviro MOs
loc_system.c_enviro

In [None]:
# full localized C matrix
loc_system.

In [None]:
# to get active and environment C matrix slice this array using active_MO_inds and enviro_MO_inds

## e.g. for environment
np.allclose(
    loc_system.c_loc_occ_and_virt[:, loc_system.enviro_MO_inds], loc_system.c_enviro
)

See notebook 1 on how to plot these LOCALIZED orbitals

use c_loc_occ_and_virt / c_active / c_enviroment to plot localized orbitals