In this notebook we train a simple, 10-layers deep convolutional neural network for classifying 5 particle images in a simulated LArTPC detecotr available from the [public dataset](http://deeplearnphysics.org/DataChallenge). We use tensorflow to train the network and `larcv_threadio` to fetch data from larcv files. If you are completely unfamiliar with `larcv_threadio`, go look at this [quick start](http://deeplearnphysics.org/Blog/tutorial-04.html). First let's prepare data samples. For the setup of this example, I need to prepare `practice_train_5k.root` and `practice_test_5k.root` in the current directory. Let us make symbolic links.

In [1]:
%%bash
# Preparation: make symbolic links for practice_train_10k.root and practice_test_10k.root
PRACTICE_FILE_DIR=../..
ln -sf $PRACTICE_FILE_DIR/practice_train_5k.root ./train.root
ln -sf $PRACTICE_FILE_DIR/practice_test_5k.root ./test.root

## Imports

In [3]:
from larcv import larcv
from larcv.dataloader2 import larcv_threadio
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import os,sys,time

# tensorflow/gpu start-up configuration
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
%env CUDA_DEVICE_ORDER=PCI_BUS_ID
%env CUDA_VISIBLE_DEVICES=2
import tensorflow as tf

env: CUDA_DEVICE_ORDER=PCI_BUS_ID
env: CUDA_VISIBLE_DEVICES=2


We set `os.environ['TF_CPP_MIN_LOG_LEVEL']` to suppress lots of *non-error* (standard) output from tensorflow because it can overwhelm ipython's capability to fetch `stdout` stream.

## Configurations
Next, let's define configuration variables.

In [5]:
TUTORIAL_DIR     = '..'
TRAIN_IO_CONFIG  = os.path.join(TUTORIAL_DIR, 'tf/io_train.cfg')
TEST_IO_CONFIG   = os.path.join(TUTORIAL_DIR, 'tf/io_test.cfg' )
TRAIN_BATCH_SIZE = 50
TEST_BATCH_SIZE  = 100
LOGDIR           = 'log'
ITERATIONS       = 5000
SAVE_SUMMARY     = 20
SAVE_WEIGHTS     = 100

# Check log directory is empty
train_logdir = os.path.join(LOGDIR,'train')
test_logdir  = os.path.join(LOGDIR,'test')
if not os.path.isdir(train_logdir): os.makedirs(train_logdir)
if not os.path.isdir(test_logdir):  os.makedirs(test_logdir)
if len(os.listdir(train_logdir)) or len(os.listdir(test_logdir)):
  sys.stderr.write('Error: train or test log dir not empty...\n')
  raise OSError

The top block defines a set of constants in capitalized letters. The bottom part is simply checking if the directories where we will store the network training logs are empty or not (so that we won't mix with the previous attempt). So what do the constants do?

* `TUTORIAL_DIR` ... points to the top-level directory of the [larcv-tutorial](https://github.com/DeepLearnPhysics/larcv-tutorial) repostitory.
* `TRAIN_IO_CONFIG` ... a configuration file for `larcv_threadio` to read data for **training**.
* `TEST_IO_CONFIG` ... a configuration file for `larcv_threadio` to read data for **testing**.
* `TRAIN_BATCH_SIZE` ... a number of images (batch) to be used to calculate the average gradients for updating the network's weights.
* `TEST_BATCH_SIZE` ... a number of images to be used to calculate the average accuracy using test data set.
* `LOGDIR` ... the top-level directory to save the tensorboard logs.
* `ITERATIONS` ... the total number of steps (batches) to train the network.
* `SAVE_SUMMARY` ... a period in a training step count to save the log (tensorboard summaries).
* `SAVE_WEIGHTS` ... a period in a training step count to save the network's weights.

## Configure data reader
We prepare two data reader instances: one for training and another for testing the network. Testing is not absolutely needed but we try here to just cover in this example. We don't go in details of how `larcv_threadio` works here since there is [a dedicated tutorial](http://deeplearnphysics.org/Blog/tutorial-04.html) for that.

In [6]:
#
# Step 0: IO
#
# for "train" data set
train_io = larcv_threadio()  # create io interface
train_io_cfg = {'filler_name' : 'TrainIO',
                'verbosity'   : 0,
                'filler_cfg'  : TRAIN_IO_CONFIG}
train_io.configure(train_io_cfg)   # configure
train_io.start_manager(TRAIN_BATCH_SIZE) # start read thread
time.sleep(2)
train_io.next()

# for "test" data set
test_io = larcv_threadio()   # create io interface
test_io_cfg = {'filler_name' : 'TestIO',
               'verbosity'   : 0,
               'filler_cfg'  : TEST_IO_CONFIG}
test_io.configure(test_io_cfg)   # configure
test_io.start_manager(TEST_BATCH_SIZE) # start read thread
time.sleep(2)
test_io.next()

[93m setting verbosity [00m3
[93m setting verbosity [00m3


## Defining a network
Let's construct a simple network for this exercise. We use 5x2 convolution layers with max-pooling operation followed after every 2 convolution layers except the last layer is average-pooling. 

In [7]:
#
# Step 1: Define network
#
import tensorflow.contrib.slim as slim
import tensorflow.python.platform

def build(input_tensor, num_class=4, trainable=True, debug=True):

    net = input_tensor
    if debug: print('input tensor:', input_tensor.shape)

    filters = 32
    num_modules = 5
    with tf.variable_scope('conv'):
        for step in xrange(5):
            stride = 2
            if step: stride = 1
            net = slim.conv2d(inputs        = net,        # input tensor
                              num_outputs   = filters,    # number of filters (neurons) = # of output feature maps
                              kernel_size   = [3,3],      # kernel size
                              stride        = stride,     # stride size
                              trainable     = trainable,  # train or inference
                              activation_fn = tf.nn.relu, # relu
                              scope         = 'conv%da_conv' % step)

            net = slim.conv2d(inputs        = net,        # input tensor
                              num_outputs   = filters,    # number of filters (neurons) = # of output feature maps
                              kernel_size   = [3,3],      # kernel size
                              stride        = 1,          # stride size
                              trainable     = trainable,  # train or inference
                              activation_fn = tf.nn.relu, # relu
                              scope         = 'conv%db_conv' % step)
            if (step+1) < num_modules:
                net = slim.max_pool2d(inputs      = net,    # input tensor
                                      kernel_size = [2,2],  # kernel size
                                      stride      = 2,      # stride size
                                      scope       = 'conv%d_pool' % step)

            else:
                net = tf.layers.average_pooling2d(inputs = net,
                                                  pool_size = [net.get_shape()[-2].value,net.get_shape()[-3].value],
                                                  strides = 1,
                                                  padding = 'valid',
                                                  name = 'conv%d_pool' % step)
            filters *= 2

            if debug: print('After step',step,'shape',net.shape)

    with tf.variable_scope('final'):
        net = slim.flatten(net, scope='flatten')

        if debug: print('After flattening', net.shape)

        net = slim.fully_connected(net, int(num_class), scope='final_fc')

        if debug: print('After final_fc', net.shape)

    return net

## Build the network
Build the network and define loss, accuracy metrics and our solver. Any optimizer should work but you may have to tune the parameters by yourself. Here, we use `RMSPropOptimizer` with base learning rate `0.0005` with no justification. Note we add minimal set of tensorflow variables into tf.summary to demonstrate later the `tensorboard`, a dedicated monitoring/visualization tool for network training with tensorflow. 

In [8]:
#
# Step 2: Build network + define loss & solver
#
# retrieve dimensions of data for network construction
dim_data  = train_io.fetch_data('train_image').dim()
dim_label = train_io.fetch_data('train_label').dim()
# define place holders
data_tensor    = tf.placeholder(tf.float32, [None, dim_data[1] * dim_data[2] * dim_data[3]], name='image')
label_tensor   = tf.placeholder(tf.float32, [None, dim_label[1]], name='label')
data_tensor_2d = tf.reshape(data_tensor, [-1,dim_data[1],dim_data[2],dim_data[3]],name='image_reshape')

# Let's keep 10 random set of images in the log
tf.summary.image('input',data_tensor_2d,10)
# build net
net = build(input_tensor=data_tensor_2d, num_class=dim_label[1], trainable=True, debug=False)
# Define accuracy
with tf.name_scope('accuracy'):
    correct_prediction = tf.equal(tf.argmax(net,1), tf.argmax(label_tensor,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    tf.summary.scalar('accuracy', accuracy)
# Define loss + backprop as training step
with tf.name_scope('train'):
    cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=label_tensor, logits=net))
    tf.summary.scalar('cross_entropy',cross_entropy)
    train_step = tf.train.RMSPropOptimizer(0.00005).minimize(cross_entropy)

## Defining tensorflow IO
In the next cell we define tensorflow's IO
* `merged_summary` ... is tensorflow operation to create summaries to be written into a _log file_ for `tensorboard`.
* `writer_train` ... writes monitoring data for training data sample into a log file.
* `writer_test` ... is same as `writer_train` except it is for testing data sample.
* `saver` ... is a handle to store the state of the network = trained network variable values (weights, biases, etc.).

In [9]:
#                                                                                                                                      
# Step 3: weight saver & summary writer                                                                                                
#                                                                                                                                      
# Create a bandle of summary                                                                                                           
merged_summary=tf.summary.merge_all()
# Create a session                                                                                                                     
sess = tf.InteractiveSession()
# Initialize variables                                                                                                                 
sess.run(tf.global_variables_initializer())
# Create a summary writer handle                                                                                                       
writer_train=tf.summary.FileWriter(train_logdir)
writer_train.add_graph(sess.graph)
writer_test=tf.summary.FileWriter(test_logdir)
writer_test.add_graph(sess.graph)
# Create weights saver                                                                                                                 
saver = tf.train.Saver()

## Train!


In [10]:
#
# Step 4: Run training loop
#
for i in range(ITERATIONS):

    train_data  = train_io.fetch_data('train_image').data()
    train_label = train_io.fetch_data('train_label').data()

    feed_dict = { data_tensor  : train_data,
                  label_tensor : train_label }

    loss, acc, _ = sess.run([cross_entropy, accuracy, train_step], feed_dict=feed_dict)

    if (i+1)%SAVE_SUMMARY == 0:
        # Save train log
        sys.stdout.write('Training in progress @ step %d loss %g accuracy %g          \n' % (i,loss,acc))
        sys.stdout.flush()
        s = sess.run(merged_summary, feed_dict=feed_dict)
        writer_train.add_summary(s,i)
    
        # Calculate & save test log
        test_data  = test_io.fetch_data('test_image').data()
        test_label = test_io.fetch_data('test_label').data()
        feed_dict  = { data_tensor  : test_data,
                       label_tensor : test_label }
        loss, acc = sess.run([cross_entropy, accuracy], feed_dict=feed_dict)
        sys.stdout.write('Testing in progress @ step %d loss %g accuracy %g          \n' % (i,loss,acc))
        sys.stdout.flush()
        s = sess.run(merged_summary, feed_dict=feed_dict)
        writer_test.add_summary(s,i)
        
        test_io.next()

    train_io.next()

    if (i+1)%SAVE_WEIGHTS == 0:
        ssf_path = saver.save(sess,'weights/toynet',global_step=i)
        print('saved @',ssf_path)

# inform log directory
print()
print('Run `tensorboard --logdir=%s` in terminal to see the results.' % LOGDIR)
train_io.reset()
test_io.reset()

Training in progress @ step 19 loss 1.60804 accuracy 0.22          
Testing in progress @ step 19 loss 1.61259 accuracy 0.19          
Training in progress @ step 39 loss 1.59938 accuracy 0.28          
Testing in progress @ step 39 loss 1.61911 accuracy 0.14          
Training in progress @ step 59 loss 1.60463 accuracy 0.1          
Testing in progress @ step 59 loss 1.60371 accuracy 0.27          
Training in progress @ step 79 loss 1.60484 accuracy 0.18          
Testing in progress @ step 79 loss 1.61089 accuracy 0.13          
Training in progress @ step 99 loss 1.56293 accuracy 0.28          
Testing in progress @ step 99 loss 1.57563 accuracy 0.24          
saved @ weights/toynet-99
Training in progress @ step 119 loss 1.58767 accuracy 0.24          
Testing in progress @ step 119 loss 1.61108 accuracy 0.19          
Training in progress @ step 139 loss 1.55079 accuracy 0.3          
Testing in progress @ step 139 loss 1.56534 accuracy 0.34          
Training in progress @ step

Training in progress @ step 1179 loss 0.842747 accuracy 0.62          
Testing in progress @ step 1179 loss 1.09458 accuracy 0.52          
Training in progress @ step 1199 loss 1.40838 accuracy 0.32          
Testing in progress @ step 1199 loss 0.960923 accuracy 0.61          
saved @ weights/toynet-1199
Training in progress @ step 1219 loss 0.837344 accuracy 0.74          
Testing in progress @ step 1219 loss 0.89485 accuracy 0.63          
Training in progress @ step 1239 loss 0.627333 accuracy 0.74          
Testing in progress @ step 1239 loss 0.902356 accuracy 0.57          
Training in progress @ step 1259 loss 0.889504 accuracy 0.68          
Testing in progress @ step 1259 loss 0.889847 accuracy 0.59          
Training in progress @ step 1279 loss 0.780373 accuracy 0.64          
Testing in progress @ step 1279 loss 0.807954 accuracy 0.66          
Training in progress @ step 1299 loss 0.746441 accuracy 0.62          
Testing in progress @ step 1299 loss 0.794658 accuracy 0.6

Testing in progress @ step 2299 loss 0.768911 accuracy 0.63          
saved @ weights/toynet-2299
Training in progress @ step 2319 loss 0.710802 accuracy 0.7          
Testing in progress @ step 2319 loss 0.895237 accuracy 0.61          
Training in progress @ step 2339 loss 0.771297 accuracy 0.64          
Testing in progress @ step 2339 loss 0.826053 accuracy 0.67          
Training in progress @ step 2359 loss 0.816656 accuracy 0.62          
Testing in progress @ step 2359 loss 0.868144 accuracy 0.63          
Training in progress @ step 2379 loss 0.665603 accuracy 0.72          
Testing in progress @ step 2379 loss 0.761712 accuracy 0.7          
Training in progress @ step 2399 loss 0.943084 accuracy 0.52          
Testing in progress @ step 2399 loss 0.89798 accuracy 0.59          
saved @ weights/toynet-2399
Training in progress @ step 2419 loss 0.6713 accuracy 0.68          
Testing in progress @ step 2419 loss 0.768299 accuracy 0.63          
Training in progress @ step 2439 

Testing in progress @ step 3419 loss 0.688503 accuracy 0.66          
Training in progress @ step 3439 loss 0.53446 accuracy 0.72          
Testing in progress @ step 3439 loss 0.713104 accuracy 0.62          
Training in progress @ step 3459 loss 0.641942 accuracy 0.72          
Testing in progress @ step 3459 loss 0.793349 accuracy 0.66          
Training in progress @ step 3479 loss 0.651127 accuracy 0.76          
Testing in progress @ step 3479 loss 0.697957 accuracy 0.7          
Training in progress @ step 3499 loss 0.811335 accuracy 0.56          
Testing in progress @ step 3499 loss 0.717807 accuracy 0.71          
saved @ weights/toynet-3499
Training in progress @ step 3519 loss 0.619678 accuracy 0.7          
Testing in progress @ step 3519 loss 0.823851 accuracy 0.66          
Training in progress @ step 3539 loss 0.858907 accuracy 0.66          
Testing in progress @ step 3539 loss 0.833918 accuracy 0.65          
Training in progress @ step 3559 loss 0.738354 accuracy 0.6

Training in progress @ step 4559 loss 0.59562 accuracy 0.72          
Testing in progress @ step 4559 loss 0.768135 accuracy 0.7          
Training in progress @ step 4579 loss 0.513815 accuracy 0.78          
Testing in progress @ step 4579 loss 1.30597 accuracy 0.69          
Training in progress @ step 4599 loss 0.846219 accuracy 0.58          
Testing in progress @ step 4599 loss 0.708973 accuracy 0.69          
saved @ weights/toynet-4599
Training in progress @ step 4619 loss 0.545314 accuracy 0.72          
Testing in progress @ step 4619 loss 0.704354 accuracy 0.71          
Training in progress @ step 4639 loss 0.508829 accuracy 0.74          
Testing in progress @ step 4639 loss 0.98429 accuracy 0.69          
Training in progress @ step 4659 loss 0.644264 accuracy 0.68          
Testing in progress @ step 4659 loss 0.600745 accuracy 0.74          
Training in progress @ step 4679 loss 0.46995 accuracy 0.82          
Testing in progress @ step 4679 loss 0.820408 accuracy 0.61 

## Checking log on the tensorboard
As the last line above says you can visualize your log using tensorboard. This command
```
tensorboard --logdir=log
```
on the terminal instantiates the tensorboard server and tells the localhost address to access through your web-browser. You can certainly [ssh-tunnel](https://www.ssh.com/ssh/tunneling/) to access the _localhost_ of your remote machine to check it on your local machine's web-browser as well. For the above training, here's the screenshot of the loss and accuracy curve for train and test samples where the *blue* line represents metric measured on the training set and *orange* line is for the same on the test sample.

![loss](theme/img/tutorial05-training-classification-loss.png)

![accuracy](theme/img/tutorial05-training-classification-accuracy.png)

This notebook covered training convolutional neural networks to perform image classification of 5 LArTPC particles using a practice files. We encourage you to design your own network and train on our [public dataset](http://deeplearnphysics.org/DataChallenge)! We provide 50,000 entries of 5 particle images (10,000 per particle) for training and separate 40,000 for testing your network. When you are confident, try our *data challenge*, yet another set of 40,000 events without _answers_ (i.e. no `particle` information). Share your awesome result in the CSV format to [us](mailto:contact@deeplearnphysics.org) with your network architecture made available on a github repository. 