# Generating the gradients of model inputs

Before we start saving our models to load them into a simulation framework, we need the gradients with respect to inputs. Implicit simulation algorithms need to generation tangent matrices to solve nonlinear equations of the form
\begin{equation}
f(u) + \underline{\frac{\partial f}{\partial u}} \Delta u = 0
\end{equation}
where, for our application, the primary variables are $u$, the model is $f$ and the tangent we need is $\partial f / \partial u$. For standard theoretical equations that give the form of $f$, Popcorn uses symbolic differentiation to generate the tangent code. TensorFlow does gradients itself, but we need the gradients with respect to the inputs of the model, and not with respect to the parameters of the model as we did during training.

In [2]:
import numpy as np
import tensorflow as tf

Let's start off by loading the prototypical model we've been examining:

In [3]:
graph = tf.Graph()
sess = tf.InteractiveSession(graph=graph)
saver = tf.train.import_meta_graph('savemymodel.meta')
saver.restore(sess,'savemymodel')
tf_x =graph.get_tensor_by_name('THEINPUT:0') 
tf_y =graph.get_tensor_by_name('THEMODEL:0')

INFO:tensorflow:Restoring parameters from savemymodel


We grab the input and output tensors, and then the operation node.

In [6]:
outp = graph.get_operation_by_name('THEMODEL')

We have to take gradients with respect to the tensors, not the ops:

In [7]:
grad = tf.gradients(tf_y,tf_x)
g = grad[0]

Now let's check out the gradient and see if it works. Because we only trained a linear fit to the data to keep things small, the gradient should be constant.

In [8]:
g.eval(feed_dict={tf_x:np.array([[9.0,2.0],[8.0,2.0],[8.0,1.0]])})

array([[ 0.17217076, -0.2723155 ],
       [ 0.17217076, -0.2723155 ],
       [ 0.17217076, -0.2723155 ]], dtype=float32)

Looks right! Those should just be values of the two parameters we had. Now let's look at what the gradient graph actually looks like.

In [9]:
from afqstensorutils import travel_op
travel_op(g.op)

gradients/MatMul_grad/MatMul_2  :  MatMul
    gradients/THEMODEL_grad/Reshape_2  :  Reshape
        gradients/THEMODEL_grad/Sum_2  :  Sum
            gradients/Fill_1  :  Fill
                gradients/Shape_1  :  Shape
                    THEMODEL  :  Add
                        MatMul  :  MatMul
                            THEINPUT  :  Placeholder
                            Reshape  :  Reshape
                                strided_slice  :  StridedSlice
                                    Variable/read  :  Identity
                                        Variable  :  VariableV2
                                    strided_slice/stack  :  Const
                                    strided_slice/stack_2  :  Pack
                                        strided_slice/stack_2/values_0  :  Const
                                    strided_slice/stack_3  :  Const
                                Reshape/shape  :  Const
                        strided_slice_1  :  StridedSlice
               

Oy! That looks awfully big for what should be a single invocation of the Variable. Let's compare it to the size of the original graph:

In [29]:
travel_op(tf_y.op)

 THEMODEL  :  Add
    MatMul  :  MatMul
        THEINPUT  :  Placeholder
        Reshape  :  Reshape
            strided_slice  :  StridedSlice
                Variable/read  :  Identity
                    Variable  :  VariableV2
                strided_slice/stack  :  Const
                strided_slice/stack_2  :  Pack
                    strided_slice/stack_2/values_0  :  Const
                strided_slice/stack_3  :  Const
            Reshape/shape  :  Const
    strided_slice_1  :  StridedSlice
        Variable/read  :  Identity
            Variable  :  VariableV2
        strided_slice_1/stack  :  Const
        strided_slice_1/stack_1  :  Const
        strided_slice_1/stack_2  :  Const


The original model had a really small graph, but taking the gradient expands it. This makes sense in the general case. However, we're going to want the ability to simplify things ahead of time. For example, in this instance, we know that there is no actual computation to be done because the gradient is a constant. Loading the graphs and using the C API won't be viable in the long term without figuring out how to do ahead-of-time simplifications. [Fortunately, TensorFlow now has an AOT compiler.](https://www.tensorflow.org/performance/xla/tfcompile) Our next step will be figuring out how to use that instead of the C API wrapper we wrote and seeing what the performance increase it.