# Semiemprical Methods
Hartree-Fock theory represents the foundation for *ab initio* methods, upon which to improve the energy by correlated method. Unfortunately, Hartree-Fock theory scales as $N^4$ and post-Hartree-Fock methods are substantially more computationally expensive. This limits the size of molecular systems that can be treated with these methods.

Semiempirical methods aim to address this limitation by introducing various approximations to (the already approximate) Hartree-Fock theory. The goal is implement methods with a reduced computational scaling. To compensate for the loss of accuracy, semiempirical methods fit various parameters to experimental or computational data.

## The Central Approximations 
The Hartree-Fock equations are given by:
\begin{align}
    \textbf{FC} &=\textbf{SC} \epsilon
\end{align}
and solved by iteratively forming and diagonalizing the Fock matrix $\textbf{F}$. The Fock matrix is written in terms of the one-electron integral $h$ and the two-electron Coulomb $J$ and exchange $K$ integrals. The two-electron integrals are harder to compute because they involve evaluating numerous integrals of the form $\langle \mu \nu|\lambda \sigma \rangle$, where
\begin{align}
    \langle \mu \nu|\lambda \sigma \rangle &= \langle \psi_\mu \psi_\nu | \psi_\lambda \psi_\sigma \rangle = \int d\textbf{r}_1 d\textbf{r}_2 \psi_\mu^*(\textbf{r}_1) \psi_\nu^*(\textbf{r}_2)
r_{12}^{-1} \psi_\lambda(\textbf{r}_1) \psi_\sigma(\textbf{r}_2)
\end{align}
where $\psi$ here are the atomic orbital basis functions and where the physicists' notation is used.

Three main approximations are typically employed in semiempirical methods:
- The neglect of core electrons in explicit computations: core electrons are accounted for by reducing the nuclear charge or introducing functional forms for modeling the combined effect of nuclei and core electrons.
- The use of minimal basis set models.
- The neglect of a subset of the one-electron and two-electron integrals.

The central assumption in semiempirical methods is the *Zero Differential Overlap* (ZDO) approximation. This approximation sets $\mu_{\rm A}(\textbf{r}_i) \nu_{\rm B}(\textbf{r}_i)=0$, , when ${\rm A} \neq {\rm B}$, where $\mu_{\rm A}$ is the basis function centered on atom $\rm A$ and $\nu_{\rm B}$ is the basis function centered on atom $\rm B$. That is, it neglects all products of basis functions that depend on the same electron coordinate when centered on different atoms. There are three main results of this approximation:
- The overlap matrix $\textbf{S}$ is the unit matrix.
- Three-center, one-electron intergals are zero (one center from the operator and two from the basis functions).
- Three- and four-center, two electron integrals are neglected.
The neglected integrals are usually set to parameters that are fitted to computational or experimental data. 

Slater-type orbitals are typically used in semiempirical methods because the neglect of three- and four-center integrals limits the utility of Gaussian-type orbitals, which are inherently less accurate than the former.

The various flavors of semiempirical methods differ in the specific types of neglected integrals and the parameterization strategies.

## Neglect of Diatomic Differential Overlap (NDOO)
In the NDOO method, only the above-mentioned approximations are employed. The nuclear charges are reduced by the number of core electrons. The equations describing the method are:
\begin{align}
    \langle \mu_{\rm A} |\textbf{h}|\nu_{\rm A} \rangle &= \delta_{\mu \nu} \left \langle \mu_{\rm A} \left | -\frac{1}{2} \nabla^2 - \textbf{V}_{\rm A} \right | \mu_{\rm A} \right \rangle - \sum_{a \neq {\rm A}}^{N_{\rm nuclei}} \langle \mu_{\rm A} |\textbf{V}_a|\nu_{\rm A} \rangle \\
    \langle \mu_{\rm A} |\textbf{h}|\nu_{\rm B} \rangle &= \left \langle \mu_{\rm A} \left | -\frac{1}{2} \nabla^2 - \textbf{V}_{\rm A} - \textbf{V}_{\rm B} \right | \nu_{\rm B} \right \rangle \\
    \langle \mu_{\rm A} |\textbf{V}_{\rm C}|\nu_{\rm B} \rangle &= 0 \\
    \langle \mu_{\rm A} \nu_{\rm B} | \lambda_{\rm C} \sigma_{\rm D} \rangle &= \delta_{\rm AC} \delta_{\rm BD} \langle \mu_{\rm A} \nu_{\rm B} | \lambda_{\rm A} \sigma_{\rm B} \rangle
\end{align}

## Intermediate Neglect of Diﬀerential Overlap (INDO)
The INDO method employs the following additional approximations to the one-electron integral:
\begin{align}
    \langle \mu_{\rm A} |\textbf{h}|\mu_{\rm A} \rangle &= \left \langle \mu_{\rm A} \left | -\frac{1}{2} \nabla^2 - \textbf{V}_{\rm A} \right | \mu_{\rm A} \right \rangle - \sum_{a \neq {\rm A}}^{N_{\rm nuclei}} \langle \mu_{\rm A} |\textbf{V}_a|\mu_{\rm A} \rangle \\
    \langle \mu_{\rm A} |\textbf{h}|\nu_{\rm A} \rangle &= -\delta_{\mu \nu} \sum_{a \neq {\rm A}}^{N_{\rm nuclei}} \left \langle \mu_{\rm A} |\textbf{V}_{\rm a} | \mu_{\rm A} \right \rangle.
\end{align}

Furthermore, the two-electron integrals are subjected to the condition:
\begin{align}
    \langle \mu_{\rm A} \nu_{\rm B} | \lambda_{\rm C} \sigma_{\rm D} \rangle = \delta_{\rm AC} \delta_{\rm BD} \delta_{\mu \lambda} \delta_{\nu \sigma} \langle \mu_{\rm A} \nu_{\rm B} | \mu_{\rm A} \nu_{\rm B} \rangle 
\end{align}
However, all one-center integrals $\langle \mu_{\rm A} \nu_{\rm A} | \lambda_{\rm A} \sigma_{\rm A} \rangle$ are kept.

## Complete Neglect of Diﬀerential Overlap (CNDO)
This method is the most approximate of the three. All two-electron integrals are subjected to the condition:
\begin{align}
    \langle \mu_{\rm A} \nu_{\rm B} | \lambda_{\rm C} \sigma_{\rm D} \rangle = \delta_{\rm AC} \delta_{\rm BD} \delta_{\mu \lambda} \delta_{\nu \sigma} \langle \mu_{\rm A} \nu_{\rm B} | \mu_{\rm A} \nu_{\rm B} \rangle,
\end{align}
including one-center integrals.

## Modified NDDO Methods
Modified NDDO Models, such as MNDO, AM1 and PM3, use parameters derived from atomic variables. They differ in their treatment of core-core repulsion and in the assignment of parameters. The MNDO model predicted too high repulsions for atoms 2-3 Angstroms apart. Therefore, the Austin Model 1 (AM1), the core-core function was modified and the whole model was reparameterized. The assignment of atomic parameters was done by hand and therefore a small dataset was used. The PM3 method reoptimized all the parameters automatically and used a larger reference dataset. All these methods used only $s$ and $p$ type orbitals, which limited their applications to a small subset of the periodic table. Newer methods (e.g., AM1/d) added $d$ basis functions, which allowed the description of transition metals and improved the description of other atoms. Newer models employing larger datasets for fitting and improved functional forms are thr PM6 and PM7 models.

## Extended Huckel Theory
Extended Huckel theory parameterizes the Fock matrix elements and not the integrals, as do the previous methods. Thus, it is non-iterative and requires only a single diagonalization of the Fock matrix. This method again only consider the valence electrons. The Fock matrix elements are assigned as:
\begin{align}
    F_{\mu \mu} &= -I_\mu \\
    F_{\mu \nu} &= -\frac{1}{2} K (I_\mu + I_\nu) S_{\mu \nu},
\end{align}
where $I$ is the atomic ionization potential and $K$ is a constant usually taken to be 1.75. The extended Huckel theory can be improved by using an iterative scheme that accounts for the differences in the electronic environment of identical elements. 

This method is typically used to generate guess orbitals for Hartree-Fock theory.

## Density Functional Tight-Binding (DFTB) Method
This is a semiempirical method based on density functional theory. DFTB similarly uses a minimal basis set, neglects all three- and four-center integrals, and only considers valence electrons explicitly. The core-core energy is parameterized in terms of spline functions fitted to all-electron DFT calculations. The valence electronic energy is calculated from the Kohn-Sham Fock matrix, with matrix elements given by:
\begin{align}
    F_{\mu \mu} &= \epsilon_\mu^{\rm atoms} \\
    F_{\mu \nu} &= \left \langle \mu_{\rm A} \left | -\frac{1}{2} \nabla^2 + V_{\rm A} + V_{\rm B} \right | \nu_{\rm B} \right \rangle; \quad A \neq B.
\end{align}
The diagonal elements $\epsilon_\mu^{\rm atoms}$ are the orbital energies of the free atoms and are approximations to the ionization potential similar to the extended Huckel theory. The off-diagonal elements depend on the effective potential $\textbf{V}$ of the free atoms, which consist of the attraction to the nulei and the Coulomb and exchange-correlation interactions. This is a noniterative method with the Fock matrix diagonalized only once. However, it can also be improved using an iterative scheme, just like the extended Huckel method.

## Electronic Energy and Heat of Formation
The energy calculated by semiempirical methods is the energy relative to infinitely separated electrons and nuclei. It is common to convert the electronic energy to a heat of formation by subtracting the electronic energy of the isolated atoms and adding the experimental atomic heat of formation:
\begin{align}
    \Delta H_{\rm f}({\rm molecule}) = E_{\rm elec} ({\rm molecule}) - \sum^{\rm atoms} E_{\rm elec} ({\rm atoms}) - \sum^{\rm atoms} \Delta H_{\rm f} ({\rm atoms}).
\end{align}

Thermodynamic corrections, such as the zero-point energy, must not be added to the $\Delta H_{\rm f}$ values, as these are included implicitly by parameterization.

## Advantages and Disavantages of Semiempirical Methods
Semiempirical methods perform best for systems where expeirmental data are available. For other systems, the predictions are performed with higher inaccuracy and uncertainity. Compared to classical molecular mechanic methods, semiempirical methods contain fewer parameters that depend only on atomic or diatomic properties. The drawback of this is that reparameterization of the model can fail to fix underlying issues in specific problems, such as the rotational barrier in amides. 

## Performance of Semiempirical Methods
The following tables (taken from Jensen's Introduction to Computational Chemistry) summarize the performance of semiempirical methods for predicting heats of formation and geometries.

The following table shows the mean absolute deviations for heat of formation in kJ/mol for different data sets:

|Data Set                | Number | MNDO/d | AM1/d | PM3 | PM6 | PM7 |
|------------------------|--------|--------|-------|-----|-----|-----|
|H, C, N, O              | 1141   | 49     | 39    | 23  | 19  | 16  |
|H, C, N, O, F, P, S, Cl | 1572   | 50     | 41    | 26  | 19  | 17  |
|Extended                | 3163   | N/A    | 141   | 122 | 88  | 85  |

The Extended data set includes the above element and all the S-block elements, the fourth and fifth period P-block, and selected other elements. The methods from the left to the right generally have increasingly larger number of parameters and better optimization to experimental data. This is reflected in their preformance above. Including additional the S-block and P-block data significantly decreases the accuracy of the methods. 

The following table shows the mean absolute deviations for bond distances in Angstrom:

|Data Set                | Number | MNDO/d | AM1/d |  PM3  |  PM6  |  PM7  |
|------------------------|--------|--------|-------|-------|-------|-------|
|H, C, N, O              | 313    | 0.019  | 0.021 | 0.018 | 0.018 | 0.019 |
|H, C, N, O, F, P, S, Cl | 424    | 0.021  | 0.031 | 0.024 | 0.018 | 0.019 |
|Extended                | 6605   | 0.100  | 0.090 | 0.084 | 0.078 | 0.073 |
|All Periodic Table      | 9118   | 0.095  | 0.090 | 0.084 | 0.084 | 0.080 |

Errors in bond angles are typically in the 5-7$^\circ$ range. Semiempirical methods generally predict reasonable geometries.

## Useful Resources

- Cramer, C. J. *Essentials of Computational Chemistry: Theories and Models*, 2nd ed.; John Wiley & Sons: Chichester, England, 2004. (Chapter 5)
- Jensen, F. *Introduction to Computational Chemistry*, 3rd ed.; John Wiley & Sons: Nashville, TN, 2017. (Chapter 7)