Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sample netadapt pruner fails #26

Closed
liamsun2019 opened this issue Jan 12, 2022 · 8 comments
Closed

sample netadapt pruner fails #26

liamsun2019 opened this issue Jan 12, 2022 · 8 comments
Labels
bug Something isn't working

Comments

@liamsun2019
Copy link

I tried examples/pruner/netadapt/netadapt_prune.py and got the following errors:

INFO (tinynn.prune.netadapt_pruner) Global Target/Initial FLOPS: 437178624/582904832
INFO (tinynn.prune.netadapt_pruner) Start iteration 1
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'OneShotChannelPruner.init..'
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'OneShotChannelPruner.init..'
INFO (tinynn.prune.netadapt_pruner) Init pool process with cuda id 0

The process then blocks after displaying above messages.

@dinghuanghao dinghuanghao added the bug Something isn't working label Jan 12, 2022
@peterjc123
Copy link
Collaborator

@liamsun2019 Fixed. Would you please try again?

@liamsun2019
Copy link
Author

I tried the latest version:

python3.6 netadapt_prune.py --batch-size 16 --distributed False

INFO (tinynn.prune.netadapt_pruner) Global Target/Initial FLOPS: 437178624/582904832
INFO (tinynn.prune.netadapt_pruner) Start iteration 1
INFO (tinynn.prune.netadapt_pruner) Init pool process with cuda id 0
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/reductions.py", line 322, in reduce_storage
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 191, in DupFd
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 48, in init
OSError: [Errno 24] Too many open files

Is there any config related to file-open limitations?

@peterjc123
Copy link
Collaborator

Please use less workers in your Dataloaders.

@liamsun2019
Copy link
Author

I tried that way. But it still fails even setting worker to 1.

@liamsun2019
Copy link
Author

I checked the file open number, it's over 670 times for fd dup operations while no close ops are done before the error emerges.

@peterjc123
Copy link
Collaborator

@liamsun2019 Could you please use a higher limit? We usually perform the experiment on a server, so we won't encounter this kind of problem.

@liamsun2019
Copy link
Author

I understand the environment difference. The prune goes well after I ulimit -n 2048.

@peterjc123
Copy link
Collaborator

Ok, glad it worked at your side. I'll close it for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants