Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TQDM Error with multi GPU Transducer #192

Open
bonham79 opened this issue Jun 8, 2024 · 1 comment · May be fixed by #197
Open

TQDM Error with multi GPU Transducer #192

bonham79 opened this issue Jun 8, 2024 · 1 comment · May be fixed by #197
Assignees
Labels
bug Something isn't working

Comments

@bonham79
Copy link
Collaborator

bonham79 commented Jun 8, 2024

Issue when running multi-gpu training with edit action transducer:

Traceback (most recent call last):                                                                                                   
  File "/home/salamander/anaconda3/envs/sigmorphon2024/bin/yoyodyne-train", line 8, in <module>
    sys.exit(main())
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/train.py", line 390, in main
    model = get_model_from_argparse_args(args, datamodule)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/train.py", line 214, in get_model_from_argparse_args
    return model_cls(
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/models/transducer.py", line 43, in __init__
    super().__init__(*args, **kwargs)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/models/lstm.py", line 36, in __init__
    super().__init__(*args, **kwargs)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/models/base.py", line 155, in __init__
    self.save_hyperparameters(
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/pytorch_lightning/core/mixins/hparams_mixin.py", line 110, in save_hyperparameters
    save_hyperparameters(self, *args, ignore=ignore, frame=frame)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/pytorch_lightning/utilities/parsing.py", line 275, in save_hyperparameters
    obj._hparams_initial = copy.deepcopy(obj._hparams)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 297, in _reconstruct
    value = deepcopy(value, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 161, in deepcopy
    rv = reductor(4)
TypeError: cannot pickle '_io.TextIOWrapper' object
Exception ignored in: <function tqdm.__del__ at 0x7f96d86a6290>
Traceback (most recent call last):
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/tqdm/std.py", line 1148, in __del__
    self.close()
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/tqdm/std.py", line 1267, in close
    if self.disable:
AttributeError: 'tqdm' object has no attribute 'disable'

From what I gather, the TQDM class within the expert module can't be pickled to distribute across multiple GPUs. This is fixed by adding expert to the ignore function when saving hyperparameters, but wanted to get feedback if there was a less 'hacky' way to deal with it.

@kylebgorman thoughts?

@bonham79 bonham79 self-assigned this Jun 8, 2024
@kylebgorman
Copy link
Contributor

When something doesn't pickle yet you usually can just give it the necessary methods, but I don't want to hack into TQDM so I think the hacky solution is fine.

@kylebgorman kylebgorman added the bug Something isn't working label Jun 9, 2024
@bonham79 bonham79 linked a pull request Jun 10, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants