FileNotFoundError: [Errno 2] No such file or directory: './cube/' #how to train the custom dataset? #97

avinashsen707 · 2020-01-31T18:47:02Z

binpick@ncrai-Precision-7820-Tower:~/catkin_ws/src/dope/scripts$ python3 train.py --data ./cube/ --outf cube_1214  --gpuids 0 1 --epochs 120 --loginterval 1 --batchsize 32
start: 00:11:52.635052
load data
Traceback (most recent call last):
  File "train.py", line 1255, in <module>
    transforms.Scale(opt.imagesize//8),
  File "train.py", line 416, in __init__
    self.imgs = load_data(root)
  File "train.py", line 411, in load_data
    for name in os.listdir(str(path)):
FileNotFoundError: [Errno 2] No such file or directory: './cube/'

The text was updated successfully, but these errors were encountered:

TontonTremblay · 2020-01-31T18:56:58Z

Can you give it the absolute path without the .? Some thing like /home/jtremblay/data/cude/

avinashsen707 · 2020-02-01T05:45:30Z

Can you give it the absolute path without the .? Some thing like /home/jtremblay/data/cude/

/media/binpick/DATA/binpick_extended_user/Training$ python3 train.py --data ./fat/single/011_banana_16k/kitchen_0/ --outf ban
ana  --gpuids 0 1 --epochs 120 --loginterval 1 --batchsize 32
start: 11:08:10.511437
load data
training data: 7 batches
load models
Training network pretrained on imagenet.
Traceback (most recent call last):
  File "train.py", line 1306, in <module>
    net = torch.nn.DataParallel(net,device_ids=opt.gpuids).cuda()
  File "/home/binpick/.local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 102, in __init__
    _check_balance(self.device_ids)
  File "/home/binpick/.local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 17, in _check_balance
    dev_props = [torch.cuda.get_device_properties(i) for i in device_ids]
  File "/home/binpick/.local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 17, in <listcomp>
    dev_props = [torch.cuda.get_device_properties(i) for i in device_ids]
  File "/home/binpick/.local/lib/python3.5/site-packages/torch/cuda/__init__.py", line 292, in get_device_properties
    raise AssertionError("Invalid device id")
AssertionError: Invalid device id

 python3
Python 3.5.2 (default, Oct  8 2019, 13:06:37) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> torch.cuda.device_count()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'torch' is not defined
>>> import torch
>>> torch.cuda.device_count()
1
>>> 
KeyboardInterrupt
>>> 
KeyboardInterrupt
>>> exit()

python3 train.py --data ./fat/single/011_banana_16k/kitchen_0/ --outf ban
ana  --gpuids 0 1 2 3  --epochs 120 --loginterval 1 --batchsize 32
Traceback (most recent call last):
  File "train.py", line 61, in <module>
    import torchvision.transforms as transforms
  File "/home/binpick/.local/lib/python3.5/site-packages/torchvision/__init__.py", line 2, in <module>
    from torchvision import datasets
  File "/home/binpick/.local/lib/python3.5/site-packages/torchvision/datasets/__init__.py", line 9, in <module>
    from .fakedata import FakeData
  File "/home/binpick/.local/lib/python3.5/site-packages/torchvision/datasets/fakedata.py", line 3, in <module>
    from .. import transforms
  File "/home/binpick/.local/lib/python3.5/site-packages/torchvision/transforms/__init__.py", line 1, in <module>
    from .transforms import *
  File "/home/binpick/.local/lib/python3.5/site-packages/torchvision/transforms/transforms.py", line 16, in <module>
    from . import functional as F
  File "/home/binpick/.local/lib/python3.5/site-packages/torchvision/transforms/functional.py", line 5, in <module>
    from PIL import Image, ImageOps, ImageEnhance, PILLOW_VERSION
ImportError: cannot import name 'PILLOW_VERSION'

python3 train.py --data ./fat/single/011_banana_16k/kitchen_0/ --outf ban
ana  --gpuids 0  --epochs 120 --loginterval 1 --batchsize 32
start: 12:46:42.727720
load data
training data: 7 batches
load models
Training network pretrained on imagenet.
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
  File "train.py", line 1392, in <module>
    _runnetwork(epoch,trainingdata)
  File "train.py", line 1334, in _runnetwork
    output_belief, output_affinities = net(data)
  File "/home/binpick/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/binpick/.local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 112, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/binpick/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "train.py", line 153, in forward
    out1 = self.vgg(x)
  File "/home/binpick/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/binpick/.local/lib/python3.5/site-packages/torch/nn/modules/container.py", line 91, in forward
    input = module(input)
  File "/home/binpick/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/binpick/.local/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 301, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

Thankyou sir, while i look in the path and corrected ; it ended up in the following error!

TontonTremblay · 2020-02-06T01:32:50Z

the batchsize is too large. Try --batchsize 1.

avinashsen707 · 2020-02-06T06:12:21Z

the batchsize is too large. Try --batchsize 1.

thankyou sir, it worked.
I am training now with FAT dataset for each object. But the confusion is that there are almost 20 background for each object; so i need to do train in each folder to get weights. So how can i select the final weight to be placed in DOPE code; since i am getting alot of weights as per my given epoch in each folder.

and could you help in giving the optimum epoch and log interval does i need to give to get good results, i am using Nvidia P4000 quadro gpu with 8GB in dell precision7820 workstation.

Thanking you in advance for the patient reply for beginers like me.

TontonTremblay · 2020-02-06T16:35:27Z

With 8gb you should be able to run a batchsize of 8, the idea here is to increase it until you have fill your gpu memory. You can check with nvidia-smi while it is running to see how much you are using.

You can give /fat/ to the training script with the --objectofinterest banana it will only use the banana information.

I hope this helps.

avinashsen707 · 2020-02-07T04:35:31Z

as per your suggestion i made it like this

binpick@ncrai-Precision-7820-Tower:/media/binpick/DATA/binpick_userfiles/Training$ python3 train.py --data ./pvc_tee_800/ --outf tee_1214  --gpuids 0 --epochs 10 --loginterval 1 --batchsize 8
start: 09:55:13.370283
load data
training data: 100 batches
load models
Training network pretrained on imagenet.
Traceback (most recent call last):
  File "train.py", line 1392, in <module>
    _runnetwork(epoch,trainingdata)
  File "train.py", line 1330, in _runnetwork
    for batch_idx, targets in enumerate(loader):
  File "/home/binpick/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 286, in __next__
    return self._process_next_batch(batch)
  File "/home/binpick/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
IndexError: Traceback (most recent call last):
  File "/home/binpick/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/binpick/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 57, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "train.py", line 491, in __getitem__
    cuboid = np.array(data['exported_objects'][0]['cuboid_dimensions'])
**IndexError: list index out of range**

Some of my dataset samples are hereby attaching!

avinashsen707 · 2020-03-04T08:27:10Z

Problem solved ; the json files data were missing inside it to load for the script!

tuhinmallick · 2021-06-18T12:53:08Z

Traceback (most recent call last):
File "train.py", line 1312, in
net = torch.nn.DataParallel(net,device_ids=opt.gpuids).cuda()
File "C:\Users\intraflyQuadro\anaconda3\lib\site-packages\torch\nn\parallel\data_parallel.py", line 142, in init
_check_balance(self.device_ids)
File "C:\Users\intraflyQuadro\anaconda3\lib\site-packages\torch\nn\parallel\data_parallel.py", line 23, in _check_balance
dev_props = _get_devices_properties(device_ids)
File "C:\Users\intraflyQuadro\anaconda3\lib\site-packages\torch_utils.py", line 459, in _get_devices_properties
return [_get_device_attr(lambda m: m.get_device_properties(i)) for i in device_ids]
File "C:\Users\intraflyQuadro\anaconda3\lib\site-packages\torch_utils.py", line 459, in
return [_get_device_attr(lambda m: m.get_device_properties(i)) for i in device_ids]
File "C:\Users\intraflyQuadro\anaconda3\lib\site-packages\torch_utils.py", line 442, in _get_device_attr
return get_member(torch.cuda)
File "C:\Users\intraflyQuadro\anaconda3\lib\site-packages\torch_utils.py", line 459, in
return [get_device_attr(lambda m: m.get_device_properties(i)) for i in device_ids]
File "C:\Users\intraflyQuadro\anaconda3\lib\site-packages\torch\cuda_init.py", line 309, in get_device_properties
raise AssertionError("Invalid device id")
AssertionError: Invalid device id

How did you get rid of this error ?

TontonTremblay · 2021-06-18T16:25:43Z

how many GPUs do you have? What is the value of opt.gpuids?

tuhinmallick · 2021-06-24T13:34:59Z

I have only one GPU ( Quadro RTX 6000) and I got the device id fixed by specifying the gpuid as only 0. thanks for the help.
But now I am getting the following error

Training network pretrained on imagenet.
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\intraflyQuadro\anaconda3\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\intraflyQuadro\anaconda3\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "C:\Users\intraflyQuadro\anaconda3\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\intraflyQuadro\anaconda3\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "C:\Users\intraflyQuadro\anaconda3\lib\runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "C:\Users\intraflyQuadro\anaconda3\lib\runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "C:\Users\intraflyQuadro\anaconda3\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\intraflyQuadro\Desktop\Deep_Object_Pose\scripts\train.py", line 1398, in
_runnetwork(epoch,trainingdata)
File "C:\Users\intraflyQuadro\Desktop\Deep_Object_Pose\scripts\train.py", line 1336, in _runnetwork
for batch_idx, targets in enumerate(loader):
File "C:\Users\intraflyQuadro\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 355, in iter
return self._get_iterator()
File "C:\Users\intraflyQuadro\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 301, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\intraflyQuadro\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 914, in init
w.start()
File "C:\Users\intraflyQuadro\anaconda3\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\intraflyQuadro\anaconda3\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\intraflyQuadro\anaconda3\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\Users\intraflyQuadro\anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 45, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\intraflyQuadro\anaconda3\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "C:\Users\intraflyQuadro\anaconda3\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Can you please help me out ?

TontonTremblay · 2021-06-24T16:30:59Z

I never seen this error, and I never tested a quadro with this code. Can you try to change the number of workers on the dataloader to 1?

tuhinmallick · 2021-07-30T11:08:48Z

The problem did get sorted out by having the number of workers on the dataloader to 0.

But a new error cropped up :-

I am training on 150k images. Is it because of the large dataset of images ?

Because I didn't get the error when I am doing the training on 10k

TontonTremblay · 2021-07-30T16:27:19Z

Are you using a dataset from.nvisii?

…

On Fri, Jul 30, 2021 at 04:09 Tuhin Mallick ***@***.***> wrote: The problem did get sorted out by having the number of workers on the dataloader to 0. But a new error cropped up :- [image: Screenshot] <https://user-images.githubusercontent.com/67542764/127644502-700d8e64-b3d6-4b5b-917b-243d69093358.PNG> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#97 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABK6JIB7SOI2M35MZWR2X53T2KB4XANCNFSM4KOL3THQ> .

tuhinmallick · 2021-07-30T17:53:47Z

No, I am using NDDS

avinashsen707 changed the title ~~FileNotFoundError: [Errno 2] No such file or directory: './cube/' #how to train the cyustom dataset?~~ FileNotFoundError: [Errno 2] No such file or directory: './cube/' #how to train the custom dataset? Jan 31, 2020

avinashsen707 mentioned this issue Feb 11, 2020

Failed to get "datas" in ".json" files of my "custom dataset" !? NVIDIA/Dataset_Synthesizer#53

Closed

avinashsen707 closed this as completed Mar 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FileNotFoundError: [Errno 2] No such file or directory: './cube/' #how to train the custom dataset? #97

FileNotFoundError: [Errno 2] No such file or directory: './cube/' #how to train the custom dataset? #97

avinashsen707 commented Jan 31, 2020

TontonTremblay commented Jan 31, 2020

avinashsen707 commented Feb 1, 2020 •

edited

Loading

TontonTremblay commented Feb 6, 2020

avinashsen707 commented Feb 6, 2020

TontonTremblay commented Feb 6, 2020

avinashsen707 commented Feb 7, 2020 •

edited

Loading

avinashsen707 commented Mar 4, 2020

tuhinmallick commented Jun 18, 2021

TontonTremblay commented Jun 18, 2021

tuhinmallick commented Jun 24, 2021

TontonTremblay commented Jun 24, 2021

tuhinmallick commented Jul 30, 2021 •

edited

Loading

TontonTremblay commented Jul 30, 2021 via email

tuhinmallick commented Jul 30, 2021

FileNotFoundError: [Errno 2] No such file or directory: './cube/' #how to train the custom dataset? #97

FileNotFoundError: [Errno 2] No such file or directory: './cube/' #how to train the custom dataset? #97

Comments

avinashsen707 commented Jan 31, 2020

TontonTremblay commented Jan 31, 2020

avinashsen707 commented Feb 1, 2020 • edited Loading

TontonTremblay commented Feb 6, 2020

avinashsen707 commented Feb 6, 2020

TontonTremblay commented Feb 6, 2020

avinashsen707 commented Feb 7, 2020 • edited Loading

avinashsen707 commented Mar 4, 2020

tuhinmallick commented Jun 18, 2021

TontonTremblay commented Jun 18, 2021

tuhinmallick commented Jun 24, 2021

TontonTremblay commented Jun 24, 2021

tuhinmallick commented Jul 30, 2021 • edited Loading

TontonTremblay commented Jul 30, 2021 via email

tuhinmallick commented Jul 30, 2021

avinashsen707 commented Feb 1, 2020 •

edited

Loading

avinashsen707 commented Feb 7, 2020 •

edited

Loading

tuhinmallick commented Jul 30, 2021 •

edited

Loading