# Information on Code and Mathematical Considerations

The datastructure is intended to be as follows: For any folder containing xyz files, all files can be processed with the methods implemented. This allows screening of existing and randomly generated databases such as QM7. If for any representation the mapping function from xyz file to the representation is known, then the first and second derivative with respect to properties of chemical space can be calculated.

## Code Architecture

**sympy dependent**

* main.py: contains global variables and paths and calls functions
* derivative.py: contains functions to calculate first and second derivative of mapping function $f_M(Z, R, N)$ at a position $k$.
* repro.py: contains mapping functions $f_M$

**jax dependent**

* jax_representation.py: contains functions for $f_M$
* jax_derivative.py: derivative sorting, a tad bit complicated

## Representations

### Sorted Coulomb Matrix

$$M_{ij}(k) = \begin{cases} \frac{1}{2} Z_i^{2.4} & \text{if } i=j\\ \frac{Z_iZ_j}{||\mathbf{R_i}- \mathbf{R_j}||} & \text{if } i \neq j \end{cases} $$


Sorting row-norm: $ \sum_j M_{1j}(k)^2 \geq \sum_jM_{2j}(k)^2 \geq \dotso \geq \sum_jM_{nj}(k)^2 $

All representations using _jax_ are implemented in the _jax_representation.py_ file. A full Coulomb matrix can be generated using the function _CM_trial_ while single entries in field $(i,j)$ can be retrieved using the _CM_index_ function. For the eigenvalues of the Coulomb matrix, the _CM_ev_ function was written which accesses the $i$th eigenvalue and prints an error message to the console if $i$ is out of bounds (no value returned in that case).

The derivative is taken using the _grad_ function from the _jax_ python package. This dictates the structure of the functions, they always need to return scalars. For the sake of an easy globally applicable implementation, the functions depend on the variables $(\mathbf{Z}, \mathbf{R}, N, ...)$. Here $\mathbf{Z}$ is a vector containing nuclear charges $Z_i$, $\mathbf{R}$ a position vector in Cartesian coordinates holding the position vectors $\mathbf{r_i} = (x_i,y_i,z_i)^T$, and the number of electrons in total is given by $N$. _grad_ derives the submitted formula by default by its variabled, the default is the first which is $\mathbf{Z}$ in our case. The thereby generated function has the same dimensions as the variable it was derived by and consists e.g. in case of $\mathbf{Z}$ of elements
$$ grad(fM(\mathbf{Z}, \mathbf{R}, N, ...)) = \left[ \frac{d fM}{d Z[0]}, \frac{d fM}{d Z[1]}, ...\right] $$
In this way a reconstruction of the fully derived representation by one variables requires some reshuffling of values. This is yet to be implemented.

### Gaussian Overlap Matrix

The Overlap Matrix in the basis chosen here is 5 x bigger than the respective Coulomb Matrix. This is because for every atom, the 5