# Differentiable Automata

In [2]:
import gtn
import nb_utils
nb_utils.init()

Many operations used with vectors, matrices, and tensors are differentiable. What that means is we can compute the change in any of the output elements with respect to an infinitesimal change in any of the input elements. For example, consider a vector $\mathbf{z} = f(\mathbf{x}, \mathbf{y})$ which is the output of a function of two vectors $\mathbf{x}$ and $\mathbf{y}$. The Jacobian of $\mathbf{z}$ with respect to $\mathbf{x}$ is the  natrix of partial derivatives with entries $\frac{\partial \mathbf{z}_i}{\partial \mathbf{x}_j}$. The gradient is defined as the tensor of partial derivatives of a scalar function. So if $f(\mathbf{x}) \in \mathbb{R}$ is a scalar function, then the gradient is 

$$
\nabla f(\mathbf{x}) = \left[ \frac{\partial f(\mathbf{x})}{\partial \mathbf{x}_{1}}, \ldots, \frac{\partial f(\mathbf{x})}{\partial \mathbf{x}_{n}}  \right].
$$

In the same way, we can compute partial derivatives of the arc weights of an output graph for a given operation with respect to the arc weights of any of the input graphs. Take the union operation as an example. Suppose we are given two graphs, $A$ and $B$ and we construct the concatenated graph $C = AB$ as in the figure below.

<div class="figure">
  <div class="img">
    <img src="figures/concat_grad_A.svg" style="width:150px;"/>
  </div>
  <div class="img">
    <img src="figures/concat_grad_B.svg" style="width:150px;"/> 
  </div>
  <div class="img">
    <img src="figures/concat_grad_C.svg" style="width:340px;"/> 
  </div>
  <div class="caption" markdown="span">
     The union of the graphs $A$ (left) and $B$ (middle) produce $C$ (right). For each graph the arc weights are shown as variables on the edges.
  </div>
</div>

For each of the arc weights $C_i$ in the union graph $C$, we can compute the partial derivative with respect to the input arc weights. For any arc in $C$, it has either a corresponding arc in $A$ or $B$ from which it gets its weight or it has a weight of zero. The partial derivative of an output arc weight $C_i$ with respect to an input arc weight $A_j$ or $B_j$ is $1$ if the two arcs correspond and $0$ otherwise. For example, in the above graphs we have,

$$
\frac{\partial C_1}{\partial A_1} = 1, \quad \frac{\partial C_2}{\partial A_2} = 1, \quad \frac{\partial C_4}{\partial B_1} = 1, \quad\textrm{and}\quad \frac{\partial C_5}{\partial B_2} = 1.
$$

The remaining partial derivatives are all $0$. For example,

$$
\frac{\partial C_1}{\partial A_2} = 0, \quad \frac{\partial C_1}{\partial B_1} = 0, \quad \textrm{and} \quad \frac{\partial C_1}{\partial B_2} = 0.
$$

---

### Example

Compute the partial derivatives of the arcs of the closure of the graph $A$ with respect to the input arcs.

<div class="figure">
  <div class="img">
    <img src="figures/closure_grad.svg"/>
  </div>
  <div class="caption" markdown="span">
    The closure of the graph $A$ is given above. The weights are denoted by the variables $w_i$.
  </div>
</div>

The non-zero partial derivatives are:

$$
\frac{\partial w_2}{\partial A_1} = 0 \quad \textrm{and} \quad \frac{\partial w_3}{\partial A_2} = 0.
$$

The remaining partial derivatives are zero:

$$
\frac{\partial w_1}{\partial A_1} = 0, \quad \frac{\partial w_1}{\partial A_2} = 0, \quad \frac{\partial w_2}{\partial A_2} = 0, \quad \frac{\partial w_3}{\partial A_1} = 0, \quad \frac{\partial w_4}{\partial A_1} = 0, \quad \textrm{and} \quad \frac{\partial w_4}{\partial A_2} = 0.
$$

---