-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OutOfRangeError (see above for traceback): FIFOQueue '_1_batch_join/fifo_queue' is closed and has insufficient elements (requested 90, current size 0) [[Node: batch_join = QueueDequeueUpToV2[component_types=[DT_FLOAT, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch_join/fifo_queue, _recv_batch_size_0)]] #338
Comments
Hi @uzair789, |
I was getting this error (I think) because: I fixed by changing
to
at https://github.com/davidsandberg/facenet/blob/master/src/train_softmax.py#L124. Also, I commented out https://github.com/davidsandberg/facenet/blob/master/src/train_softmax.py#L135, (since I couldn't figure out what it was supposed to be doing). |
Thanks David and bkj. I was able to move past the iteration I was getting stuck at by following bkj's advice. I added the 'channels = 3' parameter into the decode_image() function and it seems to work now. Will still need to wait and see if the whole training process runs without getting stuck at some other iteration. |
closing this as this problem was solved by following bkj's advice. |
I had a similar problem; I resolved it by changing the line:
to
In the tensorflow document for tf.train.string_input_producer, it says:
That fixed my issue since I did not necessarily have this error in the first round; but very randomly on the subsequent epochs! |
I was facing this error in another code sample. num_epochs=2 doesn't throw this error. Did not get time to debug the issue. |
I got the same error with train_tripletloss. I'm already using decode_image function with channels=3. All pictures are RGB. Function string_input_producer is not used. Anyone can help? |
For the record, I finally found my error. It was related to this damned hidden .DS_Store file that MacOS creates automatically. Removed it from my dataset directory and it works now. |
@aginpatrick how did you go about discovering that was the reason? |
@maxisme I recreated by hand a new directory with a new dataset (which was a copy of half of the original because I was suspecting something related to image format or image dimensions, something like that). It worked. I updated this dataset to include 3/4 of the original (worked) and so on. With a copy of 100% of the old dataset, it still worked! Then I began to suspect something related to hidden files that I could have in my original directory/dataset. Bingo. It was .DS_Store. Dammit! |
Haha. Couldn't this be solved by making the
in replacement for https://github.com/davidsandberg/facenet/blob/master/src/facenet.py#L336 |
Hmmm. I suggest to try what I did: run your code with a minimal dataset and augment it progressively. |
Still throwing the error! |
I have evaluated the dataset to look for corrupt files (as @davidsandberg said) but I can't find any:
|
Even using the vggface dataset it does this? I have noticed in the download code you convert to 250px and also make a png but the image is then converted to 160px https://github.com/davidsandberg/facenet/blob/master/src/train_tripletloss.py#L118 and filetype is irrelevant here https://github.com/davidsandberg/facenet/blob/master/src/train_tripletloss.py#L108 please can anyone else help I have been attempting to find the bug forever! |
I saw same error but found out that path for .record files and num_classes were wrong. |
@maxisme Have you solved the problem? I met the same error... |
In my case it happens that both input images are groundtruth images were not having the same dimension (720 x 720 vs 360 x 360) (working on deeplab-resnet-master project, which is based on semantic segmentation) |
@maxisme, I also meet this problem today and just as what@davidsandberg said, there are some corrupt images, i.e., some image which cannot be read due to some unknown reasons, in my dataset and I write a simple script to find those images. You can try it. import os data_dir='Your data dir' All folders' pathsfor fld in flds: |
This actually solved my problem! Thanks a lot for sharing.
|
The terminal input: find yourdatasets/ -size -1 |
I checked my dataset dir and surprisingly found hidden .DS_store! Thanks a lot, by rm .DS_Store I solve this problem. |
yes, this is the solution I got for this problem. Thanks @aginpatrick |
Hi, I have the same issue while running open_pose training code on my own dataset. but my dataset is .mat file as depth image(grayscale). I load it with the scipy.io module in python. and repeat it on two other channels to convert 3 channels. |
|
Hi,
I am trying to train a facenet model on my own dataset. My dataset consists of images which were obtained by using a face detector developed at our lab at CMU. There is no problem with the generated crops. I have used the same dataset for training different models in Caffe.
When I change the data_dir path to my own dataset, the training starts and aborts at the third iteration in the first epoch itself. This is the run command that I use:
I have looked at other solutions where people suggest reducing the --epoch_size value but I see that in the code the
function does not depend on num_epochs. So this is not a valid solution any more. Also I am using 'jpeg' images in my dataset and I have already changed the line
to
I have the exact error message with the stacktrace below:
Id really appreciate any help that I can get. I really need to move past this error so that I can train on different datasets that are available at my lab.
The text was updated successfully, but these errors were encountered: