# Quickstart

This notebook is a gentle introduction to the few concepts and abstractions of deepr.

It demonstrates how to train a model that learns how to multiply a number by 2.

To train a model with deepr the main entry point is the [Trainer](https://criteo.github.io/deepr/API/_autosummary/deepr.jobs.Trainer.html#deepr.jobs.Trainer) job.

It is important at this point to stress that `deepr` is not yet another library to build neural networks, but merely a utility to build functions that operate on basic Tensorflow types, i.e. [tf.Tensor](https://www.tensorflow.org/api_docs/python/tf/Tensor) and [tf.data.Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset).

Using functional programming makes it easy to lazily define graphs that will only be built at run time by the [tf.estimator](https://www.tensorflow.org/guide/estimator) high-level API.

The `Trainer` job uses most of the [important concepts](https://criteo.github.io/deepr/API/core.html) of deepr, while only expecting basic types (mainly functions operating on datasets, dictionaries of tensors, etc.).


* `path_model : str`
    Path to the model directory. Can be either local or HDFS.
    
* `pred_fn : Callable[[Dict[str, tf.Tensor], str], Dict[str, tf.Tensor]]`
    Typically a [Layer](https://criteo.github.io/deepr/API/_autosummary/deepr.layers.Layer.html#deepr.layers.Layer) instance, but in general, any callable.

* `loss_fn : Callable[[Dict[str, tf.Tensor], str], Dict[str, tf.Tensor]]`
    Typically a [Layer](https://criteo.github.io/deepr/API/_autosummary/deepr.layers.Layer.html#deepr.layers.Layer) instance, but in general, any callable.

* `optimizer_fn : Callable[[tf.Tensor], tf.Tensor]`
    Typically an [Optimizer](https://criteo.github.io/deepr/API/_autosummary/deepr.optimizers.Optimizer.html#deepr.optimizers.Optimizer) instance, but in general, any callable.

* `train_input_fn : Callable[[], tf.data.Dataset]`
    Typically a [Reader](https://criteo.github.io/deepr/API/_autosummary/deepr.readers.Reader.html#deepr.readers.Reader) instance, but in general, any callable.

* `eval_input_fn : Callable[[], tf.data.Dataset]`
    Typically a [Reader](https://criteo.github.io/deepr/API/_autosummary/deepr.readers.Reader.html#deepr.readers.Reader) instance, but in general, any callable.

* `prepro_fn: Callable[[tf.data.Dataset, str], tf.data.Dataset], Optional`
    Typically a [Prepro](https://criteo.github.io/deepr/API/_autosummary/deepr.prepros.Prepro.html#deepr.prepros.Prepro) instance, but in general, any callable.

There are more parameters that use the other concepts (hooks, metrics, exporter, ...) and this will be covered in another guide.

So to train our model, we need to define all that, let's start !

## Dataset

The first step is to build a dataset. For this we will build a synthetic dataset of numbers of (x, 2x).

Also see other ways to build a dataset in the [reader reference](https://criteo.github.io/deepr/API/core.html#reader)

Some imports first

In [1]:
import logging
import sys
logging.basicConfig(level=logging.INFO, stream=sys.stdout)
logging.getLogger("tensorflow").setLevel(logging.CRITICAL)

In [2]:
import tensorflow as tf
import deepr as dpr
import numpy as np
import deepr.layers as dprl

Let's define a generator function and then use a [GeneratorReader](https://criteo.github.io/deepr/API/_autosummary/deepr.readers.GeneratorReader.html#deepr.readers.GeneratorReader) to create a `tf.data.Dataset`

In [3]:
def generator_fn():
    for _ in range(1000):
        x = np.random.random()
        yield {"x": x, "y": 2 * x}

reader = dpr.readers.GeneratorReader(
    generator_fn,
    output_types={"x":tf.float32, "y":tf.float32},
    output_shapes={"x":(), "y":()}
)

The `Reader` classes are simple helper functions to create `tf.data.Dataset`, heavily inspired by the `tensorflow_dataset` package.

Once the reader is configured, you can create a new `Dataset` with

In [4]:
dataset = reader.as_dataset()
print(dataset)
dataset = reader()  # Simply an alias for as_dataset
print(dataset)

<DatasetV1Adapter shapes: {x: (), y: ()}, types: {x: tf.float32, y: tf.float32}>
<DatasetV1Adapter shapes: {x: (), y: ()}, types: {x: tf.float32, y: tf.float32}>


Iterating over a `tf.data.Dataset` in "graph" mode is not possible.

The base `Reader` class makes it possible to iterate over the dataset, faking eager-execution mode (under the hood it simply creates a session in the special `__iter__` method).

Let's have a look at the content of our dataset

In [5]:
for index, item in enumerate(reader):
    print(item)
    if index == 10:
        break

{'x': 0.5050241, 'y': 1.0100482}
{'x': 0.74931484, 'y': 1.4986297}
{'x': 0.6693086, 'y': 1.3386172}
{'x': 0.713442, 'y': 1.426884}
{'x': 0.840372, 'y': 1.680744}
{'x': 0.7257865, 'y': 1.451573}
{'x': 0.7972316, 'y': 1.5944632}
{'x': 0.71821946, 'y': 1.4364389}
{'x': 0.90175074, 'y': 1.8035015}
{'x': 0.6040216, 'y': 1.2080432}
{'x': 0.6545429, 'y': 1.3090858}


The `Trainer` job expects 2 `input_fn` that are simple callables creating new `tf.data.Dataset`.

Our `reader` does exactly that, so let's set

In [6]:
train_input_fn = reader
eval_input_fn = reader

## Prepro

Now that we have datasets, we need to preprocess them before feeding data to our model. In this example, we only need to create batches of data, and allow multiple iterations over the dataset to be able to perform multiple epochs.

Let's use the `prepro` module to functionally define a preprocessing function.

See the [prepro reference](https://criteo.github.io/deepr/API/core.html#prepro)

In [7]:
prepro_fn = dpr.prepros.Serial(
    dpr.prepros.Batch(batch_size=32),
    dpr.prepros.Repeat(10, modes=[tf.estimator.ModeKeys.TRAIN])
)

As expected, the output of this prepro function is a batched dataset

In [8]:
prepro_fn(reader())

<DatasetV1Adapter shapes: {x: (?,), y: (?,)}, types: {x: tf.float32, y: tf.float32}>

Let's check the result of our preprocessing by iterating over the dataset. We use the helper function `from_dataset` that creates a `reader` from any `tf.data.Dataset`, which gives us eager-like iteration over the underlying dataset.

In [9]:
for item in dpr.readers.base.from_dataset(prepro_fn(reader())):
    print(item)
    break

{'x': array([0.2884208 , 0.6716708 , 0.60438156, 0.74616903, 0.60974383,
       0.8843869 , 0.28427488, 0.744994  , 0.02057592, 0.3612376 ,
       0.9891428 , 0.04443246, 0.98389417, 0.07303068, 0.46858358,
       0.8129141 , 0.42637283, 0.68399006, 0.7564984 , 0.16813973,
       0.30100608, 0.69422716, 0.1550892 , 0.995761  , 0.91428363,
       0.909327  , 0.36975038, 0.74172604, 0.7243495 , 0.44936314,
       0.4023981 , 0.8480999 ], dtype=float32), 'y': array([0.5768416 , 1.3433416 , 1.2087631 , 1.4923381 , 1.2194877 ,
       1.7687738 , 0.56854975, 1.489988  , 0.04115184, 0.7224752 ,
       1.9782856 , 0.08886492, 1.9677883 , 0.14606136, 0.93716717,
       1.6258281 , 0.85274565, 1.3679801 , 1.5129968 , 0.33627945,
       0.60201216, 1.3884543 , 0.3101784 , 1.991522  , 1.8285673 ,
       1.818654  , 0.73950076, 1.4834521 , 1.448699  , 0.8987263 ,
       0.8047962 , 1.6961998 ], dtype=float32)}


## Model

Now that we have a preprocessed dataset, let's build the model. 

The dataset yields dictionaries of tensors.

The model is made of 2 main components

1. `pred_fn(tensors: Dict, mode) -> Dict` operates on the dataset dictionaries, creates new tensors (the predictions).
2. `loss_fn(tensors: Dict, mode) -> Dict` operates on the dataset and `pred_fn` results, creates at least one new tensor `loss`.

We're going to use the `layer` module to quickly define those functions.

Make sure to check the [layer reference](https://criteo.github.io/deepr/API/core.html#layer) for more information.

### Pred function

The first part of the model is the prediction function.

Here it's pretty simple : it will predict a `y_pred` with an `alpha` parameter such that `y_pred = alpha * x`

We first define this as a `Multiply` layer :

In [10]:
@dprl.layer(n_in=1, n_out=1)
def Multiply(tensors):
    alpha = tf.get_variable(name="alpha", shape=(), dtype=tf.float32)
    return alpha * tensors

The `layer` decorator creates a `Layer` class from the function, roughly equivalent to

```python
class Multiply:
    
    def __init__(self, n_in=1, n_out=1, inputs=None, outputs=None, name=None):
        self.n_in = n_in
        self.n_out = n_out
        self.inputs = inputs
        self.outputs = outputs
        self.name = name
        
    def __call__(self, tensors, mode: str):
        if isinstance(tensors, dict):
            return self.forward_as_dict(tensors, mode)
        else:
            return self.forward(tensors, mode)
    
    def forward(self, tensors, mode: str):
        alpha = tf.get_variable(name="alpha", shape=(), dtype=tf.float32)
        return alpha * tensors
    
    def forward_as_dict(self, tensors: Dict, mode: str) -> Dict:
        return {self.outputs: self.forward(tensors[self.inputs])}
```

We can instantiate our `Layer` with

In [11]:
pred_fn = Multiply(inputs="x", outputs="y_pred")

The power of the base [Layer](https://criteo.github.io/deepr/API/_autosummary/deepr.layers.Layer.html#deepr.layers.Layer) class is that layers are actually functions that can operate on both dictionaries and tuples of tensors.

The `inputs` and `outputs` arguments, when given, specify the keys of the dictionaries to use for the layer.

Let's see how it works

In [12]:
tf.reset_default_graph()
print(pred_fn(tf.constant(1.0)))
tf.reset_default_graph()  # Remove alpha variable from the graph
print(pred_fn({"x": tf.constant(1.0)}))

Tensor("mul:0", shape=(), dtype=float32)
{'y_pred': <tf.Tensor 'mul:0' shape=() dtype=float32>}


Let's check the output of this model (alpha is initialized randomly) :

In [13]:
tf.reset_default_graph()
y_pred = pred_fn(tf.constant(1.0))
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(y_pred))

-1.4690943


### Loss function

Let's then define the loss function. A squared l2 loss will work fine here, let's create a layer for this :

In [14]:
@dprl.layer(n_in=2, n_out=1)
def SquaredL2(tensors):
    x, y = tensors
    return tf.reduce_sum((x-y)**2)

In [15]:
loss_fn = SquaredL2(inputs=("y_pred", "y"), outputs="loss")

Let's see if it works : 

In [16]:
with tf.Session() as sess:
    print(sess.run(loss_fn((tf.constant(1.0), tf.constant(0.5)))))
    print(sess.run(loss_fn({"y_pred": tf.constant(1.0), "y": tf.constant(0.5)})))

0.25
{'loss': 0.25}


### Optimizer

The last thing we need is the optimizer. See the [optimizer reference](https://criteo.github.io/deepr/API/core.html#optimizer)

In [17]:
optimizer_fn = dpr.optimizers.TensorflowOptimizer("Adam", 0.1)

## Trainer job

Since all these concepts are now defined, let's create a `Trainer` job. 

Make sure to check the [trainer reference](https://criteo.github.io/deepr/API/_autosummary/deepr.jobs.Trainer.html#deepr.jobs.Trainer)

In [18]:
job = dpr.jobs.Trainer(
    path_model="model", 
    pred_fn=pred_fn, 
    loss_fn=loss_fn,
    optimizer_fn=optimizer_fn,
    train_input_fn=train_input_fn,
    eval_input_fn=eval_input_fn,
    prepro_fn=prepro_fn
)

Creating the job is lazy and doesn't take any time. To run it, call the run method : 

In [19]:
job.run()

INFO:deepr.prepros.core:Not applying Repeat(10) (mode=eval)
INFO:deepr.jobs.trainer:Running final evaluation, using global_step = 640
INFO:deepr.prepros.core:Not applying Repeat(10) (mode=eval)
INFO:deepr.jobs.trainer:{'loss': 0.0, 'global_step': 640}


The loss is 0, great, we now know how to multiply by 2 :)

Let's check alpha is indeed equal to 2 : 

In [20]:
experiment = job.create_experiment()
estimator = experiment.estimator
print(estimator.get_variable_value("alpha"))

2.0
