OutOfRangeError (see above for traceback): FIFOQueue '_1_batch_join/fifo_queue' is closed and has insufficient elements (requested 90, current size 0) [[Node: batch_join = QueueDequeueUpToV2[component_types=[DT_FLOAT, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch_join/fifo_queue, _recv_batch_size_0)]] #338

uzair789 · 2017-06-20T20:38:32Z

Hi,

I am trying to train a facenet model on my own dataset. My dataset consists of images which were obtained by using a face detector developed at our lab at CMU. There is no problem with the generated crops. I have used the same dataset for training different models in Caffe.

When I change the data_dir path to my own dataset, the training starts and aborts at the third iteration in the first epoch itself. This is the run command that I use:

 python src/train_softmax.py --logs_base_dir /home/uzair/tensorflow/facenet/logs/ --models_base_dir 
 /home/uzair/tensorflow/facenet/models_base_dir/  --image_width 96 --image_height 112 --model_def 
 models.face-resnet --lfw_dir /home/uzair/Datasets/lfw_mtcnnpy_96_112 --optimizer RMSPROP --
 learning_rate -1 --max_nrof_epochs 80 --keep_probability 0.8 --random_crop --random_flip --
 learning_rate_schedule_file /home/uzair/tensorflow/facenet/data/learning_rate_schedule_classifier_casia.txt --weight_decay 5e-5 --
 center_loss_factor 1e-2 --center_loss_alfa 0.9 --lfw_pairs /home/uzair/tensorflow/facenet/data/pairs.txt -
 -embedding_size 512 --batch_size 90 --epoch_size 100 --data_dir /home/uzair/caffe-
 face/datasets/CASIA/CAISAdataset_112X96_#2

I have looked at other solutions where people suggest reducing the --epoch_size value but I see that in the code the

    index_queue = tf.train.range_input_producer(range_size, num_epochs=None,
                         shuffle=True, seed=None, capacity=32)

function does not depend on num_epochs. So this is not a valid solution any more. Also I am using 'jpeg' images in my dataset and I have already changed the line

image = tf.image.decode_png(file_contents)

to

image = tf.image.decode_image(file_contents)

I have the exact error message with the stacktrace below:

    2017-06-20 16:05:33.969081: W tensorflow/core/framework/op_kernel.cc:1152] Out of range: FIFOQueue '_1_batch_join/fifo_queue' is closed and has insufficient elements (requested 90, current size 0)
  [[Node: batch_join = QueueDequeueUpToV2[component_types=[DT_FLOAT, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch_join/fifo_queue, _recv_batch_size_0)]]
 2017-06-20 16:05:33.969110: W tensorflow/core/framework/op_kernel.cc:1152] Out of range: FIFOQueue '_1_batch_join/fifo_queue' is closed and has insufficient elements (requested 90, current size 0)
     	 [[Node: batch_join = QueueDequeueUpToV2[component_types=[DT_FLOAT, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch_join/fifo_queue, _recv_batch_size_0)]]
 2017-06-20 16:05:33.969138: W tensorflow/core/framework/op_kernel.cc:1152] Out of range: FIFOQueue '_1_batch_join/fifo_queue' is closed and has insufficient elements (requested 90, current size 0)
    	 [[Node: batch_join = QueueDequeueUpToV2[component_types=[DT_FLOAT, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch_join/fifo_queue, _recv_batch_size_0)]]
2017-06-20 16:05:33.969152: W tensorflow/core/framework/op_kernel.cc:1152] Out of range: FIFOQueue '_1_batch_join/fifo_queue' is closed and has insufficient elements (requested 90, current size 0)
     	 [[Node: batch_join = QueueDequeueUpToV2[component_types=[DT_FLOAT, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch_join/fifo_queue, _recv_batch_size_0)]]
 2017-06-20 16:05:33.969164: W tensorflow/core/framework/op_kernel.cc:1152] Out of range: FIFOQueue '_1_batch_join/fifo_queue' is closed and has insufficient elements (requested 90, current size 0)
    	 [[Node: batch_join = QueueDequeueUpToV2[component_types=[DT_FLOAT, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch_join/fifo_queue, _recv_batch_size_0)]]
 2017-06-20 16:05:33.969206: W tensorflow/core/framework/op_kernel.cc:1152] Out of range: FIFOQueue '_1_batch_join/fifo_queue' is closed and has insufficient elements (requested 90, current size 0)
     	 [[Node: batch_join = QueueDequeueUpToV2[component_types=[DT_FLOAT, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch_join/fifo_queue, _recv_batch_size_0)]]
2017-06-20 16:05:33.969219: W tensorflow/core/framework/op_kernel.cc:1152] Out of range: FIFOQueue '_1_batch_join/fifo_queue' is closed and has insufficient elements (requested 90, current size 0)
     [[Node: batch_join = QueueDequeueUpToV2[component_types=[DT_FLOAT, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch_join/fifo_queue, _recv_batch_size_0)]]
2017-06-20 16:05:33.969557: W tensorflow/core/framework/op_kernel.cc:1152] Out of range: FIFOQueue '_1_batch_join/fifo_queue' is closed and has insufficient elements (requested 90, current size 0)
     	 [[Node: batch_join = QueueDequeueUpToV2[component_types=[DT_FLOAT, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch_join/fifo_queue, _recv_batch_size_0)]]
2017-06-20 16:05:33.969587: W tensorflow/core/framework/op_kernel.cc:1152] Out of range: FIFOQueue '_1_batch_join/fifo_queue' is closed and has insufficient elements (requested 90, current size 0)
   [[Node: batch_join = QueueDequeueUpToV2[component_types=[DT_FLOAT, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch_join/fifo_queue, _recv_batch_size_0)]]
 2017-06-20 16:05:33.969610: W tensorflow/core/framework/op_kernel.cc:1152] Out of range: FIFOQueue '_1_batch_join/fifo_queue' is closed and has insufficient elements (requested 90, current size 0)
   [[Node: batch_join = QueueDequeueUpToV2[component_types=[DT_FLOAT, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch_join/fifo_queue, _recv_batch_size_0)]]
2017-06-20 16:05:33.969635: W tensorflow/core/framework/op_kernel.cc:1152] Out of range: FIFOQueue '_1_batch_join/fifo_queue' is closed and has insufficient elements (requested 90, current size 0)
   [[Node: batch_join = QueueDequeueUpToV2[component_types=[DT_FLOAT, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch_join/fifo_queue, _recv_batch_size_0)]]
2017-06-20 16:05:33.969671: W tensorflow/core/framework/op_kernel.cc:1152] Out of range: FIFOQueue '_1_batch_join/fifo_queue' is closed and has insufficient elements (requested 90, current size 0)
  [[Node: batch_join = QueueDequeueUpToV2[component_types=[DT_FLOAT, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch_join/fifo_queue, _recv_batch_size_0)]]
2017-06-20 16:05:33.969713: W tensorflow/core/framework/op_kernel.cc:1152] Out of range: FIFOQueue '_1_batch_join/fifo_queue' is closed and has insufficient elements (requested 90, current size 0)
  [[Node: batch_join = QueueDequeueUpToV2[component_types=[DT_FLOAT, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch_join/fifo_queue, _recv_batch_size_0)]]
Traceback (most recent call last):
  File "src/train_softmax.py", line 522, in <module>
    main(parse_arguments(sys.argv[1:]))
  File "src/train_softmax.py", line 259, in main
   
 cross_entropy_mean_backprop,reg_losses_without_ringloss,reg_losses_without_ringloss_backprop)
 File "src/train_softmax.py", line 338, in train
    err, _, step, reg_loss,   
         R_val,norm_feat,raw_ring_loss,grad_softmax1,grad_ringloss1, 
         ringloss_backprop1,total_loss_backprop1 
       ,R_backprop1,cross_entropy_mean_backprop1,reg_losses_without_ringloss1,
       reg_losses_without_ringloss_backprop1 = sess.run([loss, train_op, global_step, 
       regularization_losses,Rval,mean_norm_features,prelogits_center_loss,
      grad_softmax,grad_ringloss,ringloss_backprop,total_loss_backprop,
      R_backprop,cross_entropy_mean_backprop,reg_losses_without_ringloss,
      reg_losses_without_ringloss_backprop], feed_dict=feed_dict)
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 778, in run
 run_metadata_ptr)
     File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 982, in _run
   feed_dict_string, options, run_metadata)
     File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1032, in _do_run
    target_list, options, run_metadata)
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1052, in _do_call
    raise type(e)(node_def, op, message)
  tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_batch_join/fifo_queue' 
is closed and has insufficient elements (requested 90, current size 0)
    [[Node: batch_join = QueueDequeueUpToV2[component_types=[DT_FLOAT, DT_INT64], 
       timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch_join/fifo_queue, 
 _recv_batch_size_0)]]

   Caused by op u'batch_join', defined at:
    File "src/train_softmax.py", line 522, in <module>
   main(parse_arguments(sys.argv[1:]))
   File "src/train_softmax.py", line 153, in main
   allow_smaller_final_batch=True)
   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 1065, in 
  batch_join
 name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 745, in _batch_join
  dequeued = queue.dequeue_up_to(batch_size, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 499, in dequeue_up_to
   self._queue_ref, n=n, component_types=self._dtypes, name=name)
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1420, in _queue_dequeue_up_to_v2
     timeout_ms=timeout_ms, name=name)
   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
    original_op=self._default_original_op, op_def=op_def)
   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
    self._traceback = _extract_stack()

 OutOfRangeError (see above for traceback): FIFOQueue '_1_batch_join/fifo_queue' is closed and 
 has insufficient elements (requested 90, current size 0)
    	 [[Node: batch_join = QueueDequeueUpToV2[component_types=[DT_FLOAT, DT_INT64], 
 timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch_join/fifo_queue, 
  _recv_batch_size_0)]]

Id really appreciate any help that I can get. I really need to move past this error so that I can train on different datasets that are available at my lab.

The text was updated successfully, but these errors were encountered:

davidsandberg · 2017-06-20T21:18:29Z

Hi @uzair789,
Not sure what causes this but I have seen similar problems arising when there are corrupt files in the dataset. Then the queue runners that fills up batch queue crashes one by one. And that happens with just an error message but training continues. But when this happens for a few times there are no queue runners left and then training crashes.
One way to find out is to check the training log for error messages related to decode_image. You could also make a test script with a very simple pipeline, ie. just one thread, reading the files in your dataset and using decode_image. Then you should be able to troubleshoot this much quicker.

bkj · 2017-06-21T16:17:03Z

I was getting this error (I think) because:
a) using JPGs instead of PNGs
b) some greyscale images mixed w/ RGB images

I fixed by changing

image = tf.image.decode_png(file_contents)

to

# Support other formats; Force three channels
image = tf.image.decode_image(file_contents, channels=3)

at https://github.com/davidsandberg/facenet/blob/master/src/train_softmax.py#L124.

Also, I commented out https://github.com/davidsandberg/facenet/blob/master/src/train_softmax.py#L135, (since I couldn't figure out what it was supposed to be doing).

uzair789 · 2017-06-21T17:06:38Z

Thanks David and bkj. I was able to move past the iteration I was getting stuck at by following bkj's advice. I added the 'channels = 3' parameter into the decode_image() function and it seems to work now. Will still need to wait and see if the whole training process runs without getting stuck at some other iteration.

uzair789 · 2017-07-03T22:44:14Z

closing this as this problem was solved by following bkj's advice.

GHmaryam · 2017-12-15T20:11:18Z

I had a similar problem; I resolved it by changing the line:

filename_queue = tf.train.string_input_producer([tfrecords_filename], num_epochs=num_epochs)

to

filename_queue = tf.train.string_input_producer([tfrecords_filename])

In the tensorflow document for tf.train.string_input_producer, it says:

num_epochs: .... If not specified, string_input_producer can cycle through the strings in string_tensor an unlimited number of times.

That fixed my issue since I did not necessarily have this error in the first round; but very randomly on the subsequent epochs!

v2up · 2017-12-26T09:00:27Z

https://stackoverflow.com/questions/34050071/tensorflow-random-shuffle-queue-is-closed-and-has-insufficient-elements/43370673 may help.

prats226 · 2018-01-31T06:02:42Z

I was facing this error in another code sample. num_epochs=2 doesn't throw this error. Did not get time to debug the issue.

aginpatrick · 2018-02-16T21:45:34Z

I got the same error with train_tripletloss. I'm already using decode_image function with channels=3. All pictures are RGB. Function string_input_producer is not used. Anyone can help?

aginpatrick · 2018-02-19T19:15:04Z

For the record, I finally found my error. It was related to this damned hidden .DS_Store file that MacOS creates automatically. Removed it from my dataset directory and it works now.

maxisme · 2018-04-18T14:48:32Z

@aginpatrick how did you go about discovering that was the reason?

aginpatrick · 2018-04-18T15:12:35Z

@maxisme I recreated by hand a new directory with a new dataset (which was a copy of half of the original because I was suspecting something related to image format or image dimensions, something like that). It worked. I updated this dataset to include 3/4 of the original (worked) and so on. With a copy of 100% of the old dataset, it still worked! Then I began to suspect something related to hidden files that I could have in my original directory/dataset. Bingo. It was .DS_Store. Dammit!

maxisme · 2018-04-18T15:19:27Z

Haha. Couldn't this be solved by making the get_dataset -> get_image_paths a bit tighter? I just tried:

image_paths = [os.path.join(facedir,img) for img in images if ".jpg" in img]

in replacement for https://github.com/davidsandberg/facenet/blob/master/src/facenet.py#L336
and still getting the error :(

aginpatrick · 2018-04-18T15:23:22Z

Hmmm. I suggest to try what I did: run your code with a minimal dataset and augment it progressively.

maxisme · 2018-04-18T18:10:47Z

Still throwing the error!

maxisme · 2018-04-18T18:32:43Z

I have evaluated the dataset to look for corrupt files (as @davidsandberg said) but I can't find any:

        for image in image_paths:
            if str(type(cv2.imread(image))) != "<type 'numpy.ndarray'>":
                print (image)

maxisme · 2018-04-19T14:33:44Z

Even using the vggface dataset it does this? I have noticed in the download code you convert to 250px and also make a png but the image is then converted to 160px https://github.com/davidsandberg/facenet/blob/master/src/train_tripletloss.py#L118 and filetype is irrelevant here https://github.com/davidsandberg/facenet/blob/master/src/train_tripletloss.py#L108 please can anyone else help I have been attempting to find the bug forever!

lechatthecat · 2018-04-28T17:47:16Z

I saw same error but found out that path for .record files and num_classes were wrong.

yemenr · 2018-05-29T11:50:11Z

@maxisme Have you solved the problem? I met the same error...

esterglez · 2018-06-20T09:50:16Z

In my case it happens that both input images are groundtruth images were not having the same dimension (720 x 720 vs 360 x 360) (working on deeplab-resnet-master project, which is based on semantic segmentation)

shawkui · 2018-09-22T18:09:19Z

@maxisme, I also meet this problem today and just as what@davidsandberg said, there are some corrupt images, i.e., some image which cannot be read due to some unknown reasons, in my dataset and I write a simple script to find those images. You can try it.
The script is not well organized but you can write one based on the same idea. Hope it is useful.

import os
import shutil
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.image import imread
#os.remove(path) #Delete file
#os.removedirs(path) #Delete empty folder

data_dir='Your data dir'
flds = os.listdir(data_dir)

All folders' paths

for fld in flds:
sub_flds = os.listdir(data_dir+'/'+fld)
try:
for i in sub_flds:
i_path = data_dir +'/'+fld+ '/' + i
img = imread(i_path)
#print(np.shape(img))
except:
print(data_dir+'/'+fld)
shutil.rmtree(data_dir+'/'+fld) #Delete folders

KrissLin · 2018-10-01T19:18:41Z

@maxisme, I also meet this problem today and just as what@davidsandberg said, there are some corrupt images, i.e., some image which cannot be read due to some unknown reasons, in my dataset and I write a simple script to find those images. You can try it.
The script is not well organized but you can write one based on the same idea. Hope it is useful.

import os
import shutil
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.image import imread
#os.remove(path) #Delete file
#os.removedirs(path) #Delete empty folder

data_dir='Your data dir'
flds = os.listdir(data_dir)

All folders' paths

for fld in flds:
sub_flds = os.listdir(data_dir+'/'+fld)
try:
for i in sub_flds:
i_path = data_dir +'/'+fld+ '/' + i
img = imread(i_path)
#print(np.shape(img))
except:
print(data_dir+'/'+fld)
shutil.rmtree(data_dir+'/'+fld) #Delete folders

This actually solved my problem! Thanks a lot for sharing.
A better presentation for those need this code:

import argparse
import os
import shutil
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.image import imread

#os.remove(path) #Delete file
#os.removedirs(path) #Delete empty folder


def find_corrupt(folder_path):
    data_dir = folder_path
    flds = os.listdir(data_dir)

    for fld in flds:
        sub_flds = os.listdir(data_dir + '/' + fld)
        try:
            for i in sub_flds:
                i_path = data_dir + '/' + fld + '/' + i
                img = imread(i_path)
                #print(np.shape(img))
        except:
            print(data_dir + '/' + fld)
            shutil.rmtree(data_dir + '/' + fld)  #Delete folders


if __name__ == "__main__":
    PARSER = argparse.ArgumentParser(description="____")
    PARSER.add_argument('-f', '--folder_path')
    ARGS = PARSER.parse_args()
    find_corrupt(str(ARGS.folder_path))

jyhjana · 2018-12-28T03:44:53Z

The terminal input: find yourdatasets/ -size -1

danielkaifeng · 2019-04-10T12:12:41Z

For the record, I finally found my error. It was related to this damned hidden .DS_Store file that MacOS creates automatically. Removed it from my dataset directory and it works now.

I checked my dataset dir and surprisingly found hidden .DS_store!
I wonder for a while why my Ubuntu server would have this .DS_store and finally I realize this dataset was uploaded from my local Mac!

Thanks a lot, by rm .DS_Store I solve this problem.

KanchanIIT · 2019-06-25T10:22:21Z

For the record, I finally found my error. It was related to this damned hidden .DS_Store file that MacOS creates automatically. Removed it from my dataset directory and it works now.

yes, this is the solution I got for this problem. Thanks @aginpatrick

soveidadelgarmi · 2019-07-13T13:41:45Z

Hi, I have the same issue while running open_pose training code on my own dataset. but my dataset is .mat file as depth image(grayscale). I load it with the scipy.io module in python. and repeat it on two other channels to convert 3 channels.
but I receive this issue during training
Can someone help me?
thank you

white-black333 · 2021-03-10T11:10:06Z

parser.add_argument('--epoch_size', type=int,
    help='Number of batches per epoch.', default=1000)
    this is code in train_tripletloss of facenet project.

uzair789 closed this as completed Jul 3, 2017

zzw1123 mentioned this issue Nov 29, 2017

Questions about the set of batch_size and epoch_size #559

Closed

aj96 mentioned this issue May 2, 2019

OutOfRangeError (see above for traceback): FIFOQueue '_2_training/data_loading/batch/fifo_queue' is closed and has insufficient elements (requested 4, current size 0) zhenheny/LEGO#7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uzair789 commented Jun 20, 2017

davidsandberg commented Jun 20, 2017

bkj commented Jun 21, 2017 •

edited

Loading

uzair789 commented Jun 21, 2017

uzair789 commented Jul 3, 2017

GHmaryam commented Dec 15, 2017

v2up commented Dec 26, 2017

prats226 commented Jan 31, 2018

aginpatrick commented Feb 16, 2018

aginpatrick commented Feb 19, 2018

maxisme commented Apr 18, 2018

aginpatrick commented Apr 18, 2018 •

edited

Loading

maxisme commented Apr 18, 2018

aginpatrick commented Apr 18, 2018

maxisme commented Apr 18, 2018

maxisme commented Apr 18, 2018 •

edited

Loading

maxisme commented Apr 19, 2018 •

edited

Loading

lechatthecat commented Apr 28, 2018

yemenr commented May 29, 2018

esterglez commented Jun 20, 2018

shawkui commented Sep 22, 2018

KrissLin commented Oct 1, 2018

All folders' paths

jyhjana commented Dec 28, 2018

danielkaifeng commented Apr 10, 2019

KanchanIIT commented Jun 25, 2019

soveidadelgarmi commented Jul 13, 2019

white-black333 commented Mar 10, 2021

Comments

uzair789 commented Jun 20, 2017

davidsandberg commented Jun 20, 2017

bkj commented Jun 21, 2017 • edited Loading

uzair789 commented Jun 21, 2017

uzair789 commented Jul 3, 2017

GHmaryam commented Dec 15, 2017

v2up commented Dec 26, 2017

prats226 commented Jan 31, 2018

aginpatrick commented Feb 16, 2018

aginpatrick commented Feb 19, 2018

maxisme commented Apr 18, 2018

aginpatrick commented Apr 18, 2018 • edited Loading

maxisme commented Apr 18, 2018

aginpatrick commented Apr 18, 2018

maxisme commented Apr 18, 2018

maxisme commented Apr 18, 2018 • edited Loading

maxisme commented Apr 19, 2018 • edited Loading

lechatthecat commented Apr 28, 2018

yemenr commented May 29, 2018

esterglez commented Jun 20, 2018

shawkui commented Sep 22, 2018

All folders' paths

KrissLin commented Oct 1, 2018

All folders' paths

jyhjana commented Dec 28, 2018

danielkaifeng commented Apr 10, 2019

KanchanIIT commented Jun 25, 2019

soveidadelgarmi commented Jul 13, 2019

white-black333 commented Mar 10, 2021

bkj commented Jun 21, 2017 •

edited

Loading

aginpatrick commented Apr 18, 2018 •

edited

Loading

maxisme commented Apr 18, 2018 •

edited

Loading

maxisme commented Apr 19, 2018 •

edited

Loading