# Derivatives

As explained in backend dev guide, we use `JAX` for automatic differentiation. And if the user doesn't have `JAX`, we use finite differences to compute the derivatives. This occurs at `desc/derivatives.py` with the following,

```python
from desc.backend import use_jax # True if there is JAX installation, False otherwise
Derivative = AutoDiffDerivative if use_jax else FiniteDiffDerivative
```

Usually, this portion of the code is not very visible, since we have derivative methods for `Objective` classes such as `jac_scaled`, `jac_scaled_error`, `jvp_scaled_error` etc.

Let's start with an example of getting the full jacobian matrix of `ForceBalance` objective.

In [4]:
import sys
import os

sys.path.insert(0, os.path.abspath("."))
sys.path.append(os.path.abspath("../../../"))

In [5]:
from desc.objectives import ObjectiveFunction, ForceBalance
from desc.examples import get

In [None]:
# Use W7-X equilibrium from examples
eq = get("W7-X")
# Initialize and build the objective
obj = ForceBalance(eq)
obj.build()

Precomputing transforms


In [43]:
params = obj.xs(eq)
params

({'R_lmn': Array([-7.61552162e-06,  9.16996648e-05,  1.86963797e-05, ...,
          2.19188894e-08, -1.74145484e-06,  0.00000000e+00], dtype=float64),
  'Z_lmn': Array([-2.40502618e-05, -8.58547166e-05, -1.50730988e-05, ...,
          5.67762944e-07, -5.57790527e-07,  8.68739525e-09], dtype=float64),
  'L_lmn': Array([ 2.44528892e-05, -3.26433442e-05,  4.54266096e-05, ...,
          1.96745697e-06,  1.08110518e-06,  9.52033876e-07], dtype=float64),
  'p_l': Array([ 185596.929, -371193.859,  185596.929,       0.   ,       0.   ,
               0.   ,       0.   ], dtype=float64),
  'i_l': Array([-0.85604702, -0.03880954, -0.06867951, -0.01869703,  0.01905612,
          0.        ,  0.        ], dtype=float64),
  'c_l': Array([], shape=(0,), dtype=float64),
  'Psi': Array([-2.133], dtype=float64),
  'Te_l': Array([], shape=(0,), dtype=float64),
  'ne_l': Array([], shape=(0,), dtype=float64),
  'Ti_l': Array([], shape=(0,), dtype=float64),
  'Zeff_l': Array([], shape=(0,), dtype=float64),

In [50]:
(J,) = obj.jac_scaled(*params)
sum = 0
print("The portion of the Jacobian for")
for key in J.keys():
    print(f"\t{key:10} has shape {J[key].shape}")
    sum += J[key].shape[1]
print("Total number of parameters that we took the derivative for is", sum)

The portion of the Jacobian for
	G          has shape (5346, 0)
	I          has shape (5346, 0)
	L_lmn      has shape (5346, 1134)
	Phi_mn     has shape (5346, 0)
	Psi        has shape (5346, 1)
	R_lmn      has shape (5346, 1141)
	Ra_n       has shape (5346, 13)
	Rb_lmn     has shape (5346, 313)
	Te_l       has shape (5346, 0)
	Ti_l       has shape (5346, 0)
	Z_lmn      has shape (5346, 1134)
	Za_n       has shape (5346, 12)
	Zb_lmn     has shape (5346, 312)
	Zeff_l     has shape (5346, 0)
	a_lmn      has shape (5346, 0)
	c_l        has shape (5346, 0)
	i_l        has shape (5346, 7)
	ne_l       has shape (5346, 0)
	p_l        has shape (5346, 7)
Total number of parameters that we took the derivative for is 4074


Alternatively, we can also use the following syntax,

In [54]:
(J,) = obj.jac_scaled(eq.params_dict)
J["R_lmn"].shape

(5346, 1141)

This way of taking the Jacobian is useful if you need to investigate the effect of individual parameters. However, if you want to get a single Jacobian matrix, the proper way is to use an `ObjectiveFunction` to wrap the `Objective`. This can be done as follows,

In [56]:
objfun = ObjectiveFunction(ForceBalance(eq))
objfun.build()
J = objfun.jac_scaled(objfun.x(eq))
J.shape

Building objective: force
Precomputing transforms


(5346, 4074)

You can see that if we would put individual parts of the previous method, we would get the same Jacobian matrix.

In the code, you will see that we have many functions named like `compute_`, `jac_`, `vjp_` and `jvp_`. They are all variations of the original methods, applying some scaling, normalization or bound/target. Here is a brief summary of what they do,

| **Function**   | **Purpose**     | **Full Jacobian**  |
|-----------------------------|------------------------------------|-------------------------------|
| `compute`               | Main method to compute the raw objective function.                                          |      |
| `compute_unscaled`      | Compute the raw value of the objective, optionally applying a loss function.                     | `jac_unscaled` |
| `compute_scaled`        | Compute the objective with weighting and normalization applied.                                  | `jac_scaled`   |
| `compute_scaled_error`  | In addition to `compute_scaled` makes bounds/target adjustments, weighting, and normalization.   | `jac_scaled_error`  |
| `compute_scalar`        | Compute the scalar value of the objective. $\mathbf{f}^2/2$                                      | `grad` | 


`jvp_` and `vjp_` methods compute the derivative in certain directions. These stand for jacobian vector product and vector jacobian product, and they are more efficient than taking the intended column from the full Jacobian matrix. If you look at the implementation of `jac_` methods, you will see that we are actually taking `jvp`s in each direction to form the full Jacobian.

```python
@jit
def jac_scaled_error(self, x, constants=None):
    """Compute Jacobian matrix of self.compute_scaled_error wrt x."""
    v = jnp.eye(x.shape[0])
    return self.jvp_scaled_error(v, x, constants).T
```

Here, `v` is the tangents in each unit direction. In the code, we usually don't take the full Jacobian. For example, `LinearConstraintProjection` reduces the number of parameters to only operate in the null-space of the constraint matrix, but our `compute` function still takes the full state vector. So, how do we take the derivative in that case? The solution is a little bit of linear algebra. Let's consider the following problem.

$$ \min_{\mathbf{x}} \mathbf{f(x)} $$
$$ \text{subject to } x_1 = x_2 $$
$$ \mathbf{x} = [x_1, x_2, x_3, x_4] $$

Since the constraint links $x_1$ and $x_2$, the reduced state vector will have only 3 parameters $\mathbf{y} = [y_1, y_2, y_3]$ and $y_1=x_1=x_2$, $y_2=x_2$, $y_3=x_3$.

Taking the derivative of $f$ with respect to $y_2$ and $y_3$ is straight-forward. But when we are taking the derivative in $y_1$ both $x_1$ and $x_2$ are changing, so we have to take the derivative in both directions. In this simple example deciding which parameters are free and which are dependent was easy. However, for more complex linear constraints, the more systematic way is to use the null-space matrix $Z$. If we want to take the derivative in $y_1$ direction, the tangent vector in reduced space is $\mathbf{v}_r = [1, 0, 0]$, and the tangent vector in full space is $Z\mathbf{v}_r$. We have a handy utility function to calculate the pseudo-inverse and null-space of a matrix. Here is how we get the full tangent direction for the simple example.

In [66]:
import numpy as np
from desc.utils import svd_inv_null

A = np.array([[1.0, -1.0, 0.0, 0.0]])
Ainv, Z = svd_inv_null(A)
vr = np.array([1, 0, 0])
print("Full tangent: ", Z @ vr)
print("Null-space:\n", Z)

Full tangent:  [0.70710678 0.70710678 0.         0.        ]
Null-space:
 [[0.70710678 0.         0.        ]
 [0.70710678 0.         0.        ]
 [0.         1.         0.        ]
 [0.         0.         1.        ]]
