In this notebook we train a simple, 10-layers deep convolutional neural network for classifying 5 particle images in a simulated LArTPC detecotr available from the public dataset. We use tensorflow to train the network and larcv_threadio to fetch data from larcv files. If you are completely unfamiliar with larcv_threadio, go look at this quick start. First let's prepare data samples. For the setup of this example, I need to prepare practice_train_5k.root and practice_test_5k.root in the current directory. Let us make symbolic links.

In [1]:
%%bash
# Preparation: make symbolic links for practice_train_10k.root and practice_test_10k.root
PRACTICE_FILE_DIR=/home/dell/larcv2/C1/
ln -sf $PRACTICE_FILE_DIR/practice_train_5k.root ./train.root
ln -sf $PRACTICE_FILE_DIR/practice_test_5k.root ./test.root

# Imports

In [None]:
import ROOT

In [2]:
from larcv import larcv

Welcome to JupyROOT 6.19/01


In [3]:
from larcv.dataloader2 import larcv_threadio

In [4]:
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import os,sys,time
import numpy as np

In [5]:
# tensorflow/gpu start-up configuration
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
%env CUDA_DEVICE_ORDER=PCI_BUS_ID
%env CUDA_VISIBLE_DEVICES=2
import tensorflow as tf

env: CUDA_DEVICE_ORDER=PCI_BUS_ID
env: CUDA_VISIBLE_DEVICES=2


We set os.environ['TF_CPP_MIN_LOG_LEVEL'] to suppress lots of non-error (standard) output from tensorflow because it can overwhelm ipython's capability to fetch stdout stream.

# Configurations
Next, let's define configuration variables.

In [6]:
TUTORIAL_DIR     = '/home/dell/larcv2/larcv-tutorial/'
TRAIN_IO_CONFIG  = os.path.join(TUTORIAL_DIR, 'tf/io_train.cfg')
TEST_IO_CONFIG   = os.path.join(TUTORIAL_DIR, 'tf/io_test.cfg' )
TRAIN_BATCH_SIZE = 50
TEST_BATCH_SIZE  = 100
LOGDIR           = 'log'
ITERATIONS       = 5000
SAVE_SUMMARY     = 20
SAVE_WEIGHTS     = 100

# Check log directory is empty
train_logdir = os.path.join(LOGDIR,'train')
test_logdir  = os.path.join(LOGDIR,'test')
if not os.path.isdir(train_logdir): os.makedirs(train_logdir)
if not os.path.isdir(test_logdir):  os.makedirs(test_logdir)
if len(os.listdir(train_logdir)) or len(os.listdir(test_logdir)):
  sys.stderr.write('Error: train or test log dir not empty...\n')
  raise OSError

The top block defines a set of constants in capitalized letters. The bottom part is simply checking if the directories where we will store the network training logs are empty or not (so that we won't mix with the previous attempt). So what do the constants do?

TUTORIAL_DIR ... points to the top-level directory of the larcv-tutorial repostitory.
TRAIN_IO_CONFIG ... a configuration file for larcv_threadio to read data for training.
TEST_IO_CONFIG ... a configuration file for larcv_threadio to read data for testing.
TRAIN_BATCH_SIZE ... a number of images (batch) to be used to calculate the average gradients for updating the network's weights.
TEST_BATCH_SIZE ... a number of images to be used to calculate the average accuracy using test data set.
LOGDIR ... the top-level directory to save the tensorboard logs.
ITERATIONS ... the total number of steps (batches) to train the network.
SAVE_SUMMARY ... a period in a training step count to save the log (tensorboard summaries).
SAVE_WEIGHTS ... a period in a training step count to save the network's weights.

# Configure data reader
We prepare two data reader instances: one for training and another for testing the network. Testing is not absolutely needed but we try here to just cover in this example. We don't go in details of how larcv_threadio works here since there is a dedicated tutorial for that.

In [7]:
#
# Step 0: IO
#
# for "train" data set
train_io = larcv_threadio()  # create io interface
train_io_cfg = {'filler_name' : 'TrainIO',
                'verbosity'   : 0,
                'filler_cfg'  : TRAIN_IO_CONFIG}
train_io.configure(train_io_cfg)   # configure
train_io.start_manager(TRAIN_BATCH_SIZE) # start read thread
time.sleep(2)
train_io.next()

# for "test" data set
test_io = larcv_threadio()   # create io interface
test_io_cfg = {'filler_name' : 'TestIO',
               'verbosity'   : 0,
               'filler_cfg'  : TEST_IO_CONFIG}
test_io.configure(test_io_cfg)   # configure
test_io.start_manager(TEST_BATCH_SIZE) # start read thread
time.sleep(2)
test_io.next()

[93m setting verbosity [00m3
[93m setting verbosity [00m3


# Defining a network
Let's construct a simple network for this exercise. We use 5x2 convolution layers with max-pooling operation followed after every 2 convolution layers except the last layer is average-pooling.

In [8]:
#
# Step 1: Define network
#
import tensorflow.contrib.slim as slim
import tensorflow.python.platform



def build(input_tensor, num_class=4, trainable=True, debug=True):

    net = input_tensor
    if debug: print('input tensor:', input_tensor.shape)

    filters = 32
    num_modules = 5
    with tf.variable_scope('conv'):
        for step in range(5):
            stride = 2
            if step: stride = 1
            net = slim.conv2d(inputs        = net,        # input tensor
                              num_outputs   = filters,    # number of filters (neurons) = # of output feature maps
                              kernel_size   = [3,3],      # kernel size
                              stride        = stride,     # stride size
                              trainable     = trainable,  # train or inference
                              activation_fn = tf.nn.relu, # relu
                              scope         = 'conv%da_conv' % step)

            net = slim.conv2d(inputs        = net,        # input tensor
                              num_outputs   = filters,    # number of filters (neurons) = # of output feature maps
                              kernel_size   = [3,3],      # kernel size
                              stride        = 1,          # stride size
                              trainable     = trainable,  # train or inference
                              activation_fn = tf.nn.relu, # relu
                              scope         = 'conv%db_conv' % step)
            if (step+1) < num_modules:
                net = slim.max_pool2d(inputs      = net,    # input tensor
                                      kernel_size = [2,2],  # kernel size
                                      stride      = 2,      # stride size
                                      scope       = 'conv%d_pool' % step)

            else:
                net = tf.layers.average_pooling2d(inputs = net,
                                                  pool_size = [net.get_shape()[-2].value,net.get_shape()[-3].value],
                                                  strides = 1,
                                                  padding = 'valid',
                                                  name = 'conv%d_pool' % step)
            filters *= 2

            if debug: print('After step',step,'shape',net.shape)

    with tf.variable_scope('final'):
        net = slim.flatten(net, scope='flatten')

        if debug: print('After flattening', net.shape)

        net = slim.fully_connected(net, int(num_class), scope='final_fc')

        if debug: print('After final_fc', net.shape)

    return net

In [9]:
#
# Step 2: Build network + define loss & solver
#
# retrieve dimensions of data for network construction
tf.reset_default_graph() 

dim_data  = train_io.fetch_data('train_image').dim()
dim_label = train_io.fetch_data('train_label').dim()
# define place holders
data_tensor    = tf.placeholder(tf.float32, [None, dim_data[1] * dim_data[2] * dim_data[3]], name='image')
label_tensor   = tf.placeholder(tf.float32, [None, dim_label[1]], name='label')
data_tensor_2d = tf.reshape(data_tensor, [-1,dim_data[1],dim_data[2],dim_data[3]],name='image_reshape')

# Let's keep 10 random set of images in the log
tf.summary.image('input',data_tensor_2d,10)
# build net
net = build(input_tensor=data_tensor_2d, num_class=dim_label[1], trainable=True, debug=False)
# Define accuracy
with tf.name_scope('accuracy'):
    correct_prediction = tf.equal(tf.argmax(net,1), tf.argmax(label_tensor,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    tf.summary.scalar('accuracy', accuracy)
# Define loss + backprop as training step
with tf.name_scope('train'):
    cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=label_tensor, logits=net))
    tf.summary.scalar('cross_entropy',cross_entropy)
    train_step = tf.train.RMSPropOptimizer(0.00005).minimize(cross_entropy)

W0805 15:59:01.443275 139909278713664 deprecation.py:323] From <ipython-input-8-7d2490db2822>:46: average_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.AveragePooling2D instead.
W0805 15:59:01.446020 139909278713664 deprecation.py:323] From /home/dell/anaconda3/lib/python3.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py:1634: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
W0805 15:59:01.849423 139909278713664 deprecation.py:323] From <ipython-input-9-2db20bc27ed7>:25: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_en

In [10]:
#                                                                                                                                      
# Step 3: weight saver & summary writer                                                                                                
#                                                                                                                                      
# Create a bandle of summary                                                                                                           
merged_summary=tf.summary.merge_all()
# Create a session                                                                                                                     
sess = tf.InteractiveSession()
# Initialize variables                                                                                                                 
sess.run(tf.global_variables_initializer())
# Create a summary writer handle                                                                                                       
writer_train=tf.summary.FileWriter(train_logdir)
writer_train.add_graph(sess.graph)
writer_test=tf.summary.FileWriter(test_logdir)
writer_test.add_graph(sess.graph)
# Create weights saver                                                                                                                 
saver = tf.train.Saver()

In [11]:
#
# Step 4: Run training loop
#
for i in range(ITERATIONS):

    train_data  = train_io.fetch_data('train_image').data()
    train_label = train_io.fetch_data('train_label').data()

    feed_dict = { data_tensor  : train_data,
                  label_tensor : train_label }

    loss, acc, _ = sess.run([cross_entropy, accuracy, train_step], feed_dict=feed_dict)

    if (i+1)%SAVE_SUMMARY == 0:
        # Save train log
        sys.stdout.write('Training in progress @ step %d loss %g accuracy %g          \n' % (i,loss,acc))
        sys.stdout.flush()
        s = sess.run(merged_summary, feed_dict=feed_dict)
        writer_train.add_summary(s,i)
    
        # Calculate & save test log
        test_data  = test_io.fetch_data('test_image').data()
        test_label = test_io.fetch_data('test_label').data()
        feed_dict  = { data_tensor  : test_data,
                       label_tensor : test_label }
        loss, acc = sess.run([cross_entropy, accuracy], feed_dict=feed_dict)
        sys.stdout.write('Testing in progress @ step %d loss %g accuracy %g          \n' % (i,loss,acc))
        sys.stdout.flush()
        s = sess.run(merged_summary, feed_dict=feed_dict)
        writer_test.add_summary(s,i)
        
        test_io.next()

    train_io.next()

    if (i+1)%SAVE_WEIGHTS == 0:
        ssf_path = saver.save(sess,'weights/toynet',global_step=i)
        print('saved @',ssf_path)

# inform log directory
print()
print('Run `tensorboard --logdir=%s` in terminal to see the results.' % LOGDIR)
train_io.reset()
test_io.reset()

Training in progress @ step 19 loss 1.60894 accuracy 0.28          
Testing in progress @ step 19 loss 1.60183 accuracy 0.23          
Training in progress @ step 39 loss 1.59624 accuracy 0.24          
Testing in progress @ step 39 loss 1.62453 accuracy 0.1          
Training in progress @ step 59 loss 1.58608 accuracy 0.26          
Testing in progress @ step 59 loss 1.61284 accuracy 0.18          
Training in progress @ step 79 loss 1.58555 accuracy 0.2          
Testing in progress @ step 79 loss 1.57619 accuracy 0.3          
Training in progress @ step 99 loss 1.60136 accuracy 0.18          
Testing in progress @ step 99 loss 1.59881 accuracy 0.19          
saved @ weights/toynet-99
Training in progress @ step 119 loss 1.59041 accuracy 0.26          
Testing in progress @ step 119 loss 1.59514 accuracy 0.33          
Training in progress @ step 139 loss 1.59309 accuracy 0.24          
Testing in progress @ step 139 loss 1.60593 accuracy 0.26          
Training in progress @ step 

W0805 16:39:45.732969 139909278713664 deprecation.py:323] From /home/dell/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.


saved @ weights/toynet-599
Training in progress @ step 619 loss 1.30643 accuracy 0.46          
Testing in progress @ step 619 loss 1.32582 accuracy 0.49          
Training in progress @ step 639 loss 1.28517 accuracy 0.48          
Testing in progress @ step 639 loss 1.63133 accuracy 0.29          
Training in progress @ step 659 loss 1.28601 accuracy 0.5          
Testing in progress @ step 659 loss 1.27028 accuracy 0.54          
Training in progress @ step 679 loss 1.34397 accuracy 0.46          
Testing in progress @ step 679 loss 1.49995 accuracy 0.32          
Training in progress @ step 699 loss 1.33476 accuracy 0.48          
Testing in progress @ step 699 loss 1.21273 accuracy 0.58          
saved @ weights/toynet-699
Training in progress @ step 719 loss 1.24346 accuracy 0.54          
Testing in progress @ step 719 loss 1.32053 accuracy 0.45          
Training in progress @ step 739 loss 1.35787 accuracy 0.42          
Testing in progress @ step 739 loss 1.32251 accuracy 0.5

Testing in progress @ step 1759 loss 1.36045 accuracy 0.42          
Training in progress @ step 1779 loss 1.31742 accuracy 0.44          
Testing in progress @ step 1779 loss 1.15408 accuracy 0.51          
Training in progress @ step 1799 loss 1.57415 accuracy 0.44          
Testing in progress @ step 1799 loss 1.44827 accuracy 0.35          
saved @ weights/toynet-1799
Training in progress @ step 1819 loss 1.15384 accuracy 0.52          
Testing in progress @ step 1819 loss 1.23731 accuracy 0.56          
Training in progress @ step 1839 loss 1.34757 accuracy 0.38          
Testing in progress @ step 1839 loss 1.34269 accuracy 0.42          
Training in progress @ step 1859 loss 1.47447 accuracy 0.44          
Testing in progress @ step 1859 loss 1.35923 accuracy 0.37          
Training in progress @ step 1879 loss 1.35056 accuracy 0.5          
Testing in progress @ step 1879 loss 1.42168 accuracy 0.43          
Training in progress @ step 1899 loss 1.19593 accuracy 0.52          


Testing in progress @ step 2899 loss 1.18955 accuracy 0.61          
saved @ weights/toynet-2899
Training in progress @ step 2919 loss 0.977047 accuracy 0.68          
Testing in progress @ step 2919 loss 1.28696 accuracy 0.55          
Training in progress @ step 2939 loss 1.26222 accuracy 0.56          
Testing in progress @ step 2939 loss 1.1124 accuracy 0.63          
Training in progress @ step 2959 loss 0.980403 accuracy 0.62          
Testing in progress @ step 2959 loss 1.13197 accuracy 0.59          
Training in progress @ step 2979 loss 0.845093 accuracy 0.58          
Testing in progress @ step 2979 loss 0.845257 accuracy 0.72          
Training in progress @ step 2999 loss 1.08444 accuracy 0.58          
Testing in progress @ step 2999 loss 0.850927 accuracy 0.72          
saved @ weights/toynet-2999
Training in progress @ step 3019 loss 0.782109 accuracy 0.7          
Testing in progress @ step 3019 loss 0.972546 accuracy 0.6          
Training in progress @ step 3039 loss

Training in progress @ step 4039 loss 0.869528 accuracy 0.7          
Testing in progress @ step 4039 loss 1.06138 accuracy 0.68          
Training in progress @ step 4059 loss 0.876087 accuracy 0.62          
Testing in progress @ step 4059 loss 1.07748 accuracy 0.6          
Training in progress @ step 4079 loss 1.11291 accuracy 0.58          
Testing in progress @ step 4079 loss 0.921137 accuracy 0.62          
Training in progress @ step 4099 loss 0.735105 accuracy 0.72          
Testing in progress @ step 4099 loss 1.16022 accuracy 0.58          
saved @ weights/toynet-4099
Training in progress @ step 4119 loss 0.669679 accuracy 0.76          
Testing in progress @ step 4119 loss 0.904578 accuracy 0.67          
Training in progress @ step 4139 loss 0.833822 accuracy 0.62          
Testing in progress @ step 4139 loss 1.07333 accuracy 0.73          
Training in progress @ step 4159 loss 0.859325 accuracy 0.66          
Testing in progress @ step 4159 loss 1.07189 accuracy 0.61    

In [1]:
%%bash
tensorboard --logdir=log

Traceback (most recent call last):
  File "/home/dell/anaconda3/bin/tensorboard", line 10, in <module>
    sys.exit(run_main())
  File "/home/dell/anaconda3/lib/python3.7/site-packages/tensorboard/main.py", line 64, in run_main
    app.run(tensorboard.main, flags_parser=tensorboard.configure)
  File "/home/dell/anaconda3/lib/python3.7/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/home/dell/anaconda3/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/home/dell/anaconda3/lib/python3.7/site-packages/tensorboard/program.py", line 228, in main
    server = self._make_server()
  File "/home/dell/anaconda3/lib/python3.7/site-packages/tensorboard/program.py", line 309, in _make_server
    self.assets_zip_provider)
  File "/home/dell/anaconda3/lib/python3.7/site-packages/tensorboard/backend/application.py", line 161, in standard_tensorboard_wsgi
    reload_task)
  File "/home/dell/anaconda3/lib/python3.7/site-pack

CalledProcessError: Command 'b'tensorboard --logdir=log\n'' returned non-zero exit status 1.