# Image Net Preprocessing
Notebook di processamento delle immagini di Image Net. Obiettivo è realizzare un batch input che, sfruttando il meccasnismo a code descritto in <a href=https://www.tensorflow.org/programmers_guide/reading_data>Tensorflow</a>, fornisca batch della dimensione desiderata per il numero di epoche desiderato.

Viene inoltre sfruttanto l'algoritmo di <a href=https://github.com/tensorflow/models/blob/master/slim/preprocessing/inception_preprocessing.py>Inception preprocessing</a> per fornire in input immagini della dimensione corretta con le correzioni preaddestramento fornite da Tensorflow

In [1]:
#Import
import pandas as pd
import numpy as np
import os
import tensorflow as tf
import random
from PIL import Image
#Inception preprocessing code from https://github.com/tensorflow/models/blob/master/slim/preprocessing/inception_preprocessing.py
#useful to maintain training dimension
from utils import inception_preprocessing
import sys

#from inception import inception
'''
Uso di slim e nets_factory (come per SLIM Tensorflow https://github.com/tensorflow/models/blob/master/slim/train_image_classifier.py)
per il ripristino della rete. 

Le reti devono essere censite in nets_factory (v. struttura file nella directory di questo notebook)
'''

slim = tf.contrib.slim
from nets import nets_factory

In [2]:
#Global Variables
IMAGE_NET_ROOT_PATH = '/var/ifs/data/tiny-imagenet-200/'
#IMAGE_NET_ROOT_PATH = '/data/lgrazioli/'
IMAGE_NET_LABELS_PATH = IMAGE_NET_ROOT_PATH + 'words.txt'
IMAGE_NET_TRAIN_PATH = IMAGE_NET_ROOT_PATH + 'train/'
TRAINING_CHECKPOINT_DIR = '/tmp/ImageNetTrainTransfer'
#Transfer learning CHECKPOINT PATH
#File ckpt della rete
CHECKPOINT_PATH = '/var/ifs/data/model-zoo/inceptionv4/tensorflow-1.2/inception_v4.ckpt'

### Lettura file words di ImageNet
Lettura del file words di ImageNet come PandaDF. A ogni id (cartella che contiene immagini per le classi fornite) vengono assegnati i label

In [3]:
#Reading label file as Panda dataframe
labels_df = pd.read_csv(IMAGE_NET_LABELS_PATH, sep='\\t', header=None, names=['id','labels'])
labels_df.head(5)

  from ipykernel import kernelapp as app


Unnamed: 0,id,labels
0,n00001740,entity
1,n00001930,physical entity
2,n00002137,"abstraction, abstract entity"
3,n00002452,thing
4,n00002684,"object, physical object"


In [4]:
labels_df.count()

id        82115
labels    82114
dtype: int64

Aggiunta colonna di lunghezza del label (quante classi contiene ogni label).

In [5]:
#new_labels = []
labels_lengths = []
for idx, row in labels_df.iterrows():
    #Convertire a stringa perchè alcuni sono float
    current_labels = tuple(str(row['labels']).split(','))
    #new_labels.append(current_labels)
    labels_lengths.append(len(current_labels))

In [6]:
labels_df['labels_length'] = labels_lengths
labels_indices = [idx for idx, _ in labels_df.iterrows()]
labels_df['indices'] = labels_indices

In [7]:
labels_df.head(20)

Unnamed: 0,id,labels,labels_length,indices
0,n00001740,entity,1,0
1,n00001930,physical entity,1,1
2,n00002137,"abstraction, abstract entity",2,2
3,n00002452,thing,1,3
4,n00002684,"object, physical object",2,4
5,n00003553,"whole, unit",2,5
6,n00003993,congener,1,6
7,n00004258,"living thing, animate thing",2,7
8,n00004475,"organism, being",2,8
9,n00005787,benthos,1,9


### Train DF
Panda Dataframe che contiene i path di tutte le immagini, la relativa classe, id dell'immagine e classe. La classe viene ottenuta tramite lookup su labels_df (<b>tale operazione pesa molto in termini di tempi di esecuzione</b>)

<b>Può richiedere del tempo. Per lanciare su un campione si può bloccare a un determinato valore di idx</b>

In [8]:
train_paths = []
for idx, label_dir in enumerate(os.listdir(IMAGE_NET_TRAIN_PATH)):
    image_dir_path = IMAGE_NET_TRAIN_PATH + label_dir + '/images/'
    print("Processing label {0}".format(label_dir))
    for image in os.listdir(image_dir_path):
        #Estrazione class_id
        class_id = image.split('.')[0].split('_')[0]
        #Lookup su labels df
        target_label = labels_df[labels_df['id'] == class_id] #=> pass to tf.nn.one_hot
        #Estrazione del label
        target_label = target_label['labels'].values[0]
        train_paths.append((image_dir_path + image, 
                            class_id,
                            image.split('.')[0].split('_')[1],
                            target_label
                           ))
    if idx == 5:
        break
train_df = pd.DataFrame(train_paths, columns=['im_path','class', 'im_class_id', 'target_label'])
print(train_df.count())
train_df.head()

Processing label n07747607
Processing label n02917067
Processing label n03400231
Processing label n04179913
Processing label n03837869
Processing label n02074367
im_path         3000
class           3000
im_class_id     3000
target_label    3000
dtype: int64


Unnamed: 0,im_path,class,im_class_id,target_label
0,/var/ifs/data/tiny-imagenet-200/train/n0774760...,n07747607,290,orange
1,/var/ifs/data/tiny-imagenet-200/train/n0774760...,n07747607,427,orange
2,/var/ifs/data/tiny-imagenet-200/train/n0774760...,n07747607,339,orange
3,/var/ifs/data/tiny-imagenet-200/train/n0774760...,n07747607,11,orange
4,/var/ifs/data/tiny-imagenet-200/train/n0774760...,n07747607,400,orange


Pulizia delle immagini che non sono nel formato desiderato da inception_preprocessing (3 canali). 
<b>Operazione lunga!</b>

In [9]:
#Remove black and white images
uncorrect_images = 0
#Salvataggio indici di immagini da eliminare
to_remove_indexes = []
for idx, record in train_df.iterrows():
    #Leggo immagine come np.array
    im_array = np.array(Image.open(record['im_path']))
    #Se non ha 3 canali la aggiungo a quelle da eliminare
    if im_array.shape[-1] != 3:
        uncorrect_images += 1
        to_remove_indexes.append(idx)
    if idx % 20 == 0:
        sys.stdout.write("\rProcessed {0} images".format(idx))
        sys.stdout.flush()

#Rimozione righe identificate
train_df = train_df.drop(train_df.index[to_remove_indexes])

print("New size: {0}".format(len(train_df)))
print("Removed {0} images".format(uncorrect_images))

Processed 2980 imagesNew size: 2946
Removed 54 images


In [10]:
#Eventuale campionamento da passare al generatore input
example_file_list = list(train_df.im_path)
print(len(example_file_list))


2946


Definizione dizionario dei labels
{label: indice}

In [11]:
labels_dict = {}
unique_labels = set(labels_df['labels'])
for idx, target in enumerate(unique_labels):
    labels_dict[target] = idx
num_classes = len(labels_dict)
num_classes

76003

Costruzione lista dei label (stesso ordine della lista di file)

In [12]:
example_label_list = []
for idx, value in train_df.iterrows():
    example_label_list.append(labels_dict[value['target_label']])
len(example_label_list)

2946

### Transfer Learning
Ripristino Inception v4 model

In [13]:
'''
get_network_fn for returning the corresponding network function.

Se num_classes è da cambiare, impostare is_training a True

Ritorna la funzione definita nel corrispetivo file della rete
'''
model_name = 'inception_v4'
inception_net_fn = nets_factory.get_network_fn(model_name,
                                               num_classes=1001,
                                               is_training = False
                                              )
with tf.device('/gpu:0'):
    sampl_input = tf.placeholder(tf.float32, [None, 300,300, 3], name='incpetion_input_placeholder')
    #Invocazione della model fn per la definizione delle variabili della rete
    #Usa questi tensori che sono quelli per i quali passa il modello
    #Necessario per ripristinare il grafo
    inception_net_fn(sampl_input)

INFO:tensorflow:Scale of 0 disables regularizer.


### Input pipeline
Definizione della input pipeline al modello TF

<b>NB: La memoria della GPU non va MAI oltre i 100MB!</b>

In [14]:
EPOCHS = 50
BATCH_SIZE = 56
#Serve per capire quando il generatore è passato a batch appartenenti a una nuova epoca 
BATCH_PER_EPOCH = np.ceil(len(example_file_list) / BATCH_SIZE)

def parse_single_image(filename_queue):
    #Dequeue a file name from the file name queue
    #filename, y = filename_queue.dequeue()
    #Non bisogna invocare il dequeue il parametro della funziona è già lo scodamento
    filename, y = filename_queue[0], filename_queue[1]
    #A y manca solo il one-hot
    y = tf.one_hot(y, num_classes)
    #Read image
    raw = tf.read_file(filename)
    #convert in jpg (in GPU!)
    jpeg_image = tf.image.decode_jpeg(raw)
    #Preprocessing with inception preprocessing
    jpeg_image = inception_preprocessing.preprocess_image(jpeg_image, 300, 300, is_training=True)
    return jpeg_image, y
#jpeg_image = parse_single_image(filename_queue)

def get_batch(filenames, labels, batch_size, num_epochs=None):
    
    #Coda lettura file, slice_input_producer accetta una lista di liste (stessa dimensione)
    #Risultato dello scodamento è l'elemento corrente di ciascuna delle liste
    #Le liste sono rispettivamente la lista di file e la lista dei label
    filename_queue = tf.train.slice_input_producer([filenames, labels])
    
    #Lettura singolo record
    jpeg_image,y = parse_single_image(filename_queue)
    
    # min_after_dequeue defines how big a buffer we will randomly sample
    #   from -- bigger means better shuffling but slower start up and more
    #   memory used.
    # capacity must be larger than min_after_dequeue and the amount larger
    #   determines the maximum we will prefetch.  Recommendation:
    #   min_after_dequeue + (num_threads + a small safety margin) * batch_size
    min_after_dequeue = 10
    capacity = min_after_dequeue + 3 * batch_size
    
    #tensors è la lista dei tensori delle single feature e immagini. Esegue batch_size volte i tensori example e label per ottenere il batch
    #num_threads incrementa effettivamente l'utilizzo della CPU (confermato dal throughput visisible sul cloudera manager,
    #resta comunque un throughput lento ....
    example_batch = tf.train.shuffle_batch(
        tensors=[jpeg_image, y], batch_size=batch_size, capacity=capacity,
        min_after_dequeue=min_after_dequeue, allow_smaller_final_batch=True, num_threads=2)
    
    return example_batch


#TF Graph, per ora recupera solamente un batch
with tf.device('/cpu:0'):
    with tf.name_scope('preprocessing') as scope:
        x,y = get_batch(example_file_list, example_label_list, batch_size=BATCH_SIZE)
        #x = tf.contrib.layers.flatten(x)

#inception prelogits 
#prelogits = tf.placeholder(tf.float32, [None, 1536], name='prelogits_placeholder')
prelogits = tf.get_default_graph().get_tensor_by_name("InceptionV4/Logits/PreLogitsFlatten/Reshape:0")        

with tf.device('/gpu:0'):
    with tf.name_scope('hidden') as scope:
        hidden = tf.layers.dense(
            prelogits,
            units=128,
            activation=tf.nn.relu        
        )

    #Kenerl init None = glooroot initializers (sttdev = 1/sqrt(n))
    with tf.name_scope('readout') as scope:
        output = tf.layers.dense(
            hidden,
            units=num_classes,
            activation=None
        )

    with tf.name_scope('train_op') as scope:
        # Define loss and optimizer
        cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=output, labels=y))
        optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)


    with tf.name_scope('train_metrics') as scope:
        # Accuracy
        correct_pred = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

tf.summary.scalar('accuracy', accuracy)
tf.summary.scalar('loss', cost)

init = tf. global_variables_initializer()

merged_summeries = tf.summary.merge_all()

In [15]:
#GPU config
config = tf.ConfigProto(log_device_placement=True)
config.gpu_options.allow_growth = True
#Saver per restoring inception net
saver = tf.train.Saver()

with tf.Session(config=config) as sess:
    sess.run(init)
    writer = tf.summary.FileWriter(TRAINING_CHECKPOINT_DIR,
                                   sess.graph)
    #Start populating the filename queue.
    coord = tf.train.Coordinator()
    #Senza questa chiamata non partono i thread per popolare la coda che permette di eseguire la read
    threads = tf.train.start_queue_runners(coord=coord)
    #Current epoch and step servono a capire quando cambiare epoca e quando fermarsi
    current_epoch = 0
    current_step = 0
    while current_epoch < EPOCHS: 
        x_batch, y_batch = sess.run([x,y])
        #Forward pass nella incpetion net
        #inception_pre_logits = sess.run(tf.get_default_graph().get_tensor_by_name("InceptionV4/Logits/PreLogitsFlatten/Reshape:0"),
         #feed_dict={sampl_input: x_batch})
        sess.run(optimizer, feed_dict={sampl_input: x_batch, y: y_batch})
        #print(x_batch.shape)
        if current_step % 10 == 0:
            #print("Batch shape {}".format(x_batch.shape))
            print("Current step: {0}".format(current_step))
            train_loss, train_accuracy, train_summ  = sess.run([cost,accuracy,merged_summeries],
                                                               feed_dict={prelogits: inception_pre_logits, y: y_batch})
            print("Loss: {0} accuracy {1}".format(train_loss, train_accuracy))
            writer.add_summary(train_summ, current_epoch * current_step + 1)
        #Cambiare epoca, raggiunto il massimo per l'epoca corrente
        if current_step == (BATCH_PER_EPOCH - 1):
            current_epoch += 1
            current_step = 0
            print("EPOCH {0}".format(current_epoch))
        #Epoche terminate -> chiudere
        if current_epoch >= EPOCHS:
            break

        if current_step == 0 and current_epoch == 0:
            writer.add_graph(sess.graph)
        #train_summary = sess.run([merged_summeries], feed_dict={x: x_batch, y: y_batch})
        #writer.add_summary(train_summary, current_step)
        current_step +=  1
    #for i in range(10):
        #converted_im = sess.run(jpeg_image)
        #print(converted_im.shape)
        
    #Chiusura del coordinator (chiudi i thread di lettura)
    coord.request_stop()
    coord.join(threads)
    sess.close()

INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.CancelledError'>, Enqueue operation was cancelled
	 [[Node: preprocessing/input_producer/input_producer/input_producer_EnqueueMany = QueueEnqueueManyV2[Tcomponents=[DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](preprocessing/input_producer/input_producer, preprocessing/input_producer/input_producer/RandomShuffle)]]


ResourceExhaustedError: OOM when allocating tensor with shape[56,64,73,73]
	 [[Node: InceptionV4/InceptionV4/Mixed_4a/Branch_1/Conv2d_1a_3x3/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](InceptionV4/InceptionV4/Mixed_4a/Branch_1/Conv2d_0c_7x1/Relu, InceptionV4/Mixed_4a/Branch_1/Conv2d_1a_3x3/weights/read)]]

Caused by op 'InceptionV4/InceptionV4/Mixed_4a/Branch_1/Conv2d_1a_3x3/convolution', defined at:
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/ipykernel/__main__.py", line 3, in <module>
    app.launch_new_instance()
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 474, in start
    ioloop.IOLoop.instance().start()
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/zmq/eventloop/ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/tornado/ioloop.py", line 887, in start
    handler_func(fd_obj, events)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 276, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 228, in dispatch_shell
    handler(stream, idents, msg)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 390, in execute_request
    user_expressions, allow_stdin)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/ipykernel/zmqshell.py", line 501, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2698, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2802, in run_ast_nodes
    if self.run_code(code, result):
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-13-95e0d727faa1>", line 18, in <module>
    inception_net_fn(sampl_input)
  File "/data/lgrazioli/Transfer Learning/nets/nets_factory.py", line 114, in network_fn
    return func(images, num_classes, is_training=is_training)
  File "/data/lgrazioli/Transfer Learning/nets/inception_v4.py", line 282, in inception_v4
    net, end_points = inception_v4_base(inputs, scope=scope)
  File "/data/lgrazioli/Transfer Learning/nets/inception_v4.py", line 209, in inception_v4_base
    scope='Conv2d_1a_3x3')
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
    return func(*args, **current_args)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 947, in convolution
    outputs = layer.apply(inputs)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/tensorflow/python/layers/base.py", line 492, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/tensorflow/python/layers/base.py", line 441, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/tensorflow/python/layers/convolutional.py", line 158, in call
    data_format=utils.convert_data_format(self.data_format, self.rank + 2))
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 670, in convolution
    op=op)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 338, in with_space_to_batch
    return op(input, num_spatial_dims, padding)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 662, in op
    name=name)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 131, in _non_atrous_convolution
    name=name)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 399, in conv2d
    data_format=data_format, name=name)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/opt/continuum/anaconda/envs/deepEnv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[56,64,73,73]
	 [[Node: InceptionV4/InceptionV4/Mixed_4a/Branch_1/Conv2d_1a_3x3/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](InceptionV4/InceptionV4/Mixed_4a/Branch_1/Conv2d_0c_7x1/Relu, InceptionV4/Mixed_4a/Branch_1/Conv2d_1a_3x3/weights/read)]]
