# Approximating functions with deep ReLU networks
*Practical session during the [DEADALUS Introductory Intensive Course](https://daedalus-berlin.github.io/events.html), December, 2018.*
The content is mostly based on [D. Yarotsky, 2017](https://www.sciencedirect.com/science/article/pii/S0893608017301545).

## Part II: Getting started with Tensorflow

In this part we will introduce the basic concepts of using Tensorflow. This is not yet related to the approximation theoretic results.

In [1]:
import numpy as np
import tensorflow as tf

At the core of Tensorflow are the **tensors**, which can be thought of as multi-dimensional data-containers.
* 1D tensor represents for exmple a **vector** or a collection of scalars.
* a 2D tensor represents for example a **matrix** or a collection of vectors.
* and so on...

The **shape** of a tensor is the number of components in each of its dimensions, for example a tensor representing a $5\times 4$ matrix has shape $[5, 4]$.

There are several types of tensors in Tensorflow:
* **constant** tensors (these have defined shape and values and are not changed after creation)
* **variable** tensors (these have defined shape and initial values but can be changed after creation, for example used for the weights of a neural network)
* **placeholder** tensors (these only have a shape but do not contain any data (yet) and are used to serve as placeholders in computations where the data is provided later, for example network inputs)

In [2]:
# create a constant scalar (0D) tensor with value 5
constant_scalar = tf.constant(5.0)
print(constant_scalar)

Tensor("Const:0", shape=(), dtype=float32)


In [3]:
# create a constant matrix (2D) tensor with random values
constant_matrix = tf.constant(np.random.randn(3,2))
print(constant_matrix)

Tensor("Const_1:0", shape=(3, 2), dtype=float64)


During the creation of a tensor its **data type** can be specified by providing the `dtype` argument. For example we can create floating point tensors using 32-bit single precision instead of 64-bit double precision arithmetic (which is usually accurate enough for many applications but decreases computational costs). We can also give **names** to tensors to make it easier to identify them later.

In [4]:
# create a variable matrix (2D) tensor with random inital values
variable_matrix = tf.Variable(np.random.randn(3,2), dtype=tf.float32, name='my_matrix')
print(variable_matrix)

variable_vector = tf.Variable(np.random.randn(3), dtype=tf.float32, name='my_vector')
print(variable_vector)

<tf.Variable 'my_matrix:0' shape=(3, 2) dtype=float32_ref>
<tf.Variable 'my_vector:0' shape=(3,) dtype=float32_ref>


Tensors can be transformed into other tensors by so called **operations** (ops). For example we can define another tensor, for example an input placeholder, and compute a new tensor using the **matrix multiplication** and **vector addition** operations.

However, note that **matrix multiplication** requires two matrices, so we have to **reshape** the vector tensor into a matrix with a single column. After the mutiplication we can reshape the result back into a vector.

Finally we can apply a componentwise non-linear function, for example the ReLU. We have then coded our first neural network layer.

In [5]:
placeholder_input = tf.placeholder(dtype=tf.float32, shape=(2,), name='my_input')
print(placeholder_input)
matrix_product = tf.reshape(tf.matmul(variable_matrix, tf.reshape(placeholder_input, [2, 1])), [3])
print(matrix_product)
vector_sum = matrix_product + variable_vector
print(vector_sum)
relu_vector = tf.nn.relu(vector_sum)
print(relu_vector)

Tensor("my_input:0", shape=(2,), dtype=float32)
Tensor("Reshape_1:0", shape=(3,), dtype=float32)
Tensor("add:0", shape=(3,), dtype=float32)
Tensor("Relu:0", shape=(3,), dtype=float32)


Okay this is nice, but we want to see the actual values of the tensors, not only the shape and name. We can not do this yet, becaus all we have done so far is building the so called **computation graph**, which tells Tensorflow how tensors are related. But no actual computations have been done, which in fact is not even possible yet, since Tensorflow does not know the values of our placeholder vector. So we have to do three things:
1. Start a Tensorflow **session**, which is used to manage the actual computations (for example determine which computations should be done on which hardware device in case you have multiple devices like CPU cores and GPU).
2. Initialize all tensors in the computation graph. 
2. Let the session run certain computations, by specifying which output tensors we would like to compute, and **feeding** in all placeholder values necessary to execute the computations.

In [6]:
session = tf.Session()
session.run(tf.global_variables_initializer())
relu_vector_value = session.run([relu_vector], feed_dict={placeholder_input: np.asarray([1, 2])})
print(relu_vector_value)

[array([2.7278907, 0.691774 , 3.0894105], dtype=float32)]


In order to **train** the variable tensors in our network, we need to define a second placeholder for the desired output values, a **loss** function that we wish to minimize, and an **optimizer**, for example a gradient descent method. As before, we wil first define the operations as part of the computation graph and then let the session run these operations to take an effect.

Let us try to train our single layer network to learn the very simple map 

$$ \begin{pmatrix} x_1\\ x_2 \end{pmatrix} \mapsto \begin{pmatrix} \mathrm{relu}(x_1) \\ 0 \\ 0 \end{pmatrix}$$

In [10]:
placeholder_output = tf.placeholder(dtype=tf.float32, shape=[3], name='my_output')
loss = tf.reduce_sum(tf.square(relu_vector-placeholder_output))
optimizer = tf.train.GradientDescentOptimizer(0.1)
gradient_descent_step = optimizer.minimize(loss)

# run some gradient descent steps with random inputs and observe the loss function
print('{:8s}\t{:8s}\n{:8s}\t{:8s}'.format('iter', 'loss', '--------', '--------'))
for i in range(10):
    in_vec = np.random.randn(2)
    out_vec = np.maximum(0, np.append(in_vec, 0))*[1, 0, 0]
    _, loss_value = session.run(
        [gradient_descent_step, loss], 
        feed_dict={placeholder_input: in_vec, placeholder_output:out_vec }
    )
    print('{:8d}\t{:1.2e}'.format(i, loss_value))

iter    	loss    
--------	--------
       0	0.00e+00
       1	5.03e-05
       2	1.07e-05
       3	0.00e+00
       4	1.15e-11
       5	1.76e-06
       6	0.00e+00
       7	7.37e-06
       8	3.55e-08
       9	2.24e-06


Let us see what the network has **learned** and retrieve the values of the variable tensors. Note how we do not have to feed values for the placeholder vectors this time, as no computations involving them have to be computed now.

In [11]:
matrix_value, vector_value = session.run([variable_matrix, variable_vector])
print(matrix_value)
print(vector_value)

[[ 1.0041620e+00  4.1708021e-04]
 [ 2.9930681e-02  7.4012488e-02]
 [-1.9072771e-02 -1.3234124e-02]]
[-0.00299934 -0.43223253 -0.2657008 ]


In most cases we will not feed single input vectors to a call of `session.run()` but always collections of multiple input vectors, called batches. It is customary to reserve the first dimension of all tensors for the batch index. Hence, if you build Tensorflow operations always make sure that your code can handle tensors with an additional first dimension that is passed on unaltered. If the size of batches is not known before, Tensorflow allows tensors to have an undetermined shape in some dimensions.

For example the placeholder tensor below can be fed with a collection of 5 dimensional vectors, however the number of vectors can vary.

In [None]:
batched_vectors = tf.placeholder(dtype=tf.float32, shape=[None, 5])
print(batched_vectors)

For various frequently used operations Tensorflow provides shortcut aliases. For example the single ReLU layer that we have created above can be simply defined using the `dense` operation which creates a fully connected layer. In that case we do not have to worry about creating the variable tensors (this will be done automatically by Tensorflow "under the hood"). This simplicity comes at the cost of less flexibility though, as we do not have direct access to the variable tensors. So the best way to define a network is a matter of personal preference and depends on the application...there is usually more than one way to do something in Tenorflow.

Have a look around the [Tensorflow Documentation](https://www.tensorflow.org/api_docs/python/tf) to see what other operations are available.

In [None]:
alternative_relu_vector = tf.layers.dense(tf.reshape(placeholder_input, [-1, 2]), 3, activation=tf.nn.relu)
print(alternative_relu_vector)