# Train a ready to use TensorFlow model with a simple before, main, after pipelines

In [1]:
%load_ext autoreload
%autoreload 2

import os
import sys
import warnings
warnings.filterwarnings('ignore')

import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)

sys.path.append('../..')
from batchflow import Pipeline, V, D, B, C
from batchflow.opensets import MNIST
from batchflow.models.tf import VGG7

BATCH_SIZE might be increased for modern GPUs with lots of memory (4GB and higher).

In [2]:
BATCH_SIZE = 32

# Create a dataset

[MNIST](http://yann.lecun.com/exdb/mnist/) is a dataset of handwritten digits frequently used as a baseline for machine learning tasks.

Downloading MNIST database might take a few minutes to complete.

In [3]:
dataset = MNIST()

# Define a pipeline config

Config allows to create flexible pipelines which take parameters.

For instance, if you put a model type into config, you can run a pipeline against different models.

See [a list of available models](https://analysiscenter.github.io/batchflow/intro/tf_models.html#ready-to-use-models) to choose the one which fits you best.

In [4]:
config = dict(model=VGG7)

# Create a template pipeline

A template pipeline is not linked to any dataset. It's just an abstract sequence of actions, so it cannot be executed, but it serves as a convenient building block.

In [5]:
train_template = (Pipeline(config=config)
                .to_array()
                .train_model(name='conv_nn', 
                             fetches='loss', 
                             images=B('images'), 
                             labels=B('labels'),
                             save_to=V('current_loss'))
                .update_variable('loss_history', V('current_loss', mode='a'))
)

Pipelines that runs only once before or after the main pipeline. 
Use before and after pipelines, for example, to initialize variables and save a model.

In [6]:
(train_template.before
 .init_variable('loss_history', init_on_each_run=list)
 .init_variable('current_loss')
 .init_model(mode='dynamic', 
             model_class=C('model'), 
             name='conv_nn', 
             config={'inputs/images/shape': B('image_shape'),
                     'inputs/labels/classes': D('num_classes'),
                     'initial_block/inputs': 'images'}))

<batchflow.once_pipeline.OncePipeline at 0x7f8c656bb940>

In [7]:
(train_template.after
 .save_model('conv_nn', path='./model/'))

<batchflow.once_pipeline.OncePipeline at 0x7f8c656bb978>

# Train the model

Apply a dataset to a template pipeline to create a runnable pipeline:

In [8]:
train_pipeline = (train_template << dataset.train)

In [9]:
train_pipeline.run(BATCH_SIZE, shuffle=True, n_epochs=1, drop_last=True, bar='n')

HBox(children=(IntProgress(value=0, max=1875), HTML(value='')))




<batchflow.pipeline.Pipeline at 0x7f8c656bb160>

# Test the model

It is much faster than training, but if you don't have GPU it would take some patience.

In [10]:
test_template = (Pipeline(config=config)
                .to_array()
                .predict_model(name='loaded_nn', 
                               fetches='predictions', 
                               images=B('images'), 
                               save_to=V('predictions'))
                .gather_metrics(metrics_class='class', 
                                targets=B('labels'), 
                                predictions=V('predictions'),
                                fmt='logits', 
                                axis=-1, 
                                save_to=V('metrics', mode='a')))

In [11]:
(test_template.before
 .init_variable('predictions')
 .init_variable('metrics', init_on_each_run=None)
 .load_model(mode='static',
             model_class=VGG7,
             name='loaded_nn',
             path = './model')) 

<batchflow.once_pipeline.OncePipeline at 0x7f8c406bcc50>

In [12]:
test_pipeline = (test_template << dataset.test)

In [13]:
test_pipeline.run(BATCH_SIZE, shuffle=True, n_epochs=1, drop_last=True, bar='n', prefetch=1)

HBox(children=(IntProgress(value=0, max=312), HTML(value='')))




<batchflow.pipeline.Pipeline at 0x7f8c278245c0>

Let's get the accumulated [metrics information](https://analysiscenter.github.io/batchflow/intro/models.html#model-metrics)

In [14]:
metrics = test_pipeline.get_variable('metrics')

Now we can easiliy calculate any metrics we need

In [15]:
metrics.evaluate('accuracy')

0.989082532051282