sample netadapt pruner fails #26

liamsun2019 · 2022-01-12T02:23:36Z

I tried examples/pruner/netadapt/netadapt_prune.py and got the following errors:

INFO (tinynn.prune.netadapt_pruner) Global Target/Initial FLOPS: 437178624/582904832
INFO (tinynn.prune.netadapt_pruner) Start iteration 1
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'OneShotChannelPruner.init..'
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'OneShotChannelPruner.init..'
INFO (tinynn.prune.netadapt_pruner) Init pool process with cuda id 0

The process then blocks after displaying above messages.

peterjc123 · 2022-01-12T06:21:53Z

@liamsun2019 Fixed. Would you please try again?

liamsun2019 · 2022-01-12T06:32:06Z

I tried the latest version:

python3.6 netadapt_prune.py --batch-size 16 --distributed False

INFO (tinynn.prune.netadapt_pruner) Global Target/Initial FLOPS: 437178624/582904832
INFO (tinynn.prune.netadapt_pruner) Start iteration 1
INFO (tinynn.prune.netadapt_pruner) Init pool process with cuda id 0
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/reductions.py", line 322, in reduce_storage
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 191, in DupFd
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 48, in init
OSError: [Errno 24] Too many open files

Is there any config related to file-open limitations？

peterjc123 · 2022-01-12T08:01:58Z

Please use less workers in your Dataloaders.

liamsun2019 · 2022-01-12T08:06:45Z

I tried that way. But it still fails even setting worker to 1.

liamsun2019 · 2022-01-12T08:22:19Z

I checked the file open number, it's over 670 times for fd dup operations while no close ops are done before the error emerges.

peterjc123 · 2022-01-12T08:42:52Z

@liamsun2019 Could you please use a higher limit? We usually perform the experiment on a server, so we won't encounter this kind of problem.

liamsun2019 · 2022-01-12T08:47:36Z

I understand the environment difference. The prune goes well after I ulimit -n 2048.

peterjc123 · 2022-01-13T02:42:45Z

Ok, glad it worked at your side. I'll close it for now.

dinghuanghao added the bug Something isn't working label Jan 12, 2022

peterjc123 closed this as completed Jan 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sample netadapt pruner fails #26

sample netadapt pruner fails #26

liamsun2019 commented Jan 12, 2022

peterjc123 commented Jan 12, 2022

liamsun2019 commented Jan 12, 2022

peterjc123 commented Jan 12, 2022

liamsun2019 commented Jan 12, 2022

liamsun2019 commented Jan 12, 2022

peterjc123 commented Jan 12, 2022

liamsun2019 commented Jan 12, 2022

peterjc123 commented Jan 13, 2022

sample netadapt pruner fails #26

sample netadapt pruner fails #26

Comments

liamsun2019 commented Jan 12, 2022

peterjc123 commented Jan 12, 2022

liamsun2019 commented Jan 12, 2022

peterjc123 commented Jan 12, 2022

liamsun2019 commented Jan 12, 2022

liamsun2019 commented Jan 12, 2022

peterjc123 commented Jan 12, 2022

liamsun2019 commented Jan 12, 2022

peterjc123 commented Jan 13, 2022