Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: module 'torch.distributed' has no attribute '_reduce_scatter_base' #52

Closed
pipiwawa opened this issue Nov 17, 2022 · 2 comments

Comments

@pipiwawa
Copy link

运行TSR_train.py 时出现错误
File "TSR_train.py", line 7, in
from src.TSR_trainer import TrainerConfig, TrainerForContinuousEdgeLine, TrainerForEdgeLineFinetune
File "D:\AIworkspace\ZITS_inpainting-main\src\TSR_trainer.py", line 14, in
from apex import amp
File "D:\Users\lcx\anaconda3\envs\train_env\lib\site-packages\apex_init_.py", line 27, in
from . import transformer
File "D:\Users\lcx\anaconda3\envs\train_env\lib\site-packages\apex\transformer_init_.py", line 4, in
from apex.transformer import pipeline_parallel
File "D:\Users\lcx\anaconda3\envs\train_env\lib\site-packages\apex\transformer\pipeline_parallel_init_.py", line 1, in
from apex.transformer.pipeline_parallel.schedules import get_forward_backward_func
File "D:\Users\lcx\anaconda3\envs\train_env\lib\site-packages\apex\transformer\pipeline_parallel\schedules_init_.py", line 3, in
from apex.transformer.pipeline_parallel.schedules.fwd_bwd_no_pipelining import (
File "D:\Users\lcx\anaconda3\envs\train_env\lib\site-packages\apex\transformer\pipeline_parallel\schedules\fwd_bwd_no_pipelining.py", line 10, in
from apex.transformer.pipeline_parallel.schedules.common import Batch
File "D:\Users\lcx\anaconda3\envs\train_env\lib\site-packages\apex\transformer\pipeline_parallel\schedules\common.py", line 14, in
from apex.transformer.tensor_parallel.layers import (
File "D:\Users\lcx\anaconda3\envs\train_env\lib\site-packages\apex\transformer\tensor_parallel_init_.py", line 21, in
from apex.transformer.tensor_parallel.layers import (
File "D:\Users\lcx\anaconda3\envs\train_env\lib\site-packages\apex\transformer\tensor_parallel\layers.py", line 32, in
from apex.transformer.tensor_parallel.mappings import (
File "D:\Users\lcx\anaconda3\envs\train_env\lib\site-packages\apex\transformer\tensor_parallel\mappings.py", line 29, in
torch.distributed.reduce_scatter_tensor = torch.distributed._reduce_scatter_base
AttributeError: module 'torch.distributed' has no attribute '_reduce_scatter_base'
我的环境是 torch=1.9.0+cu111 cuda=11.1 , 请问作者如何解决?
谢谢

@DQiaole
Copy link
Owner

DQiaole commented Nov 18, 2022

你好,
我没有遇到过这个问题。你可能得根据README.md重新安装一下环境。

@DQiaole DQiaole closed this as completed Dec 12, 2022
@Littlechickencub
Copy link

运行TSR_train.py 时出现错误 File "TSR_train.py", line 7, in from src.TSR_trainer import TrainerConfig, TrainerForContinuousEdgeLine, TrainerForEdgeLineFinetune File "D:\AIworkspace\ZITS_inpainting-main\src\TSR_trainer.py", line 14, in from apex import amp File "D:\Users\lcx\anaconda3\envs\train_env\lib\site-packages\apex__init__.py", line 27, in from . import transformer File "D:\Users\lcx\anaconda3\envs\train_env\lib\site-packages\apex\transformer__init__.py", line 4, in from apex.transformer import pipeline_parallel File "D:\Users\lcx\anaconda3\envs\train_env\lib\site-packages\apex\transformer\pipeline_parallel__init__.py", line 1, in from apex.transformer.pipeline_parallel.schedules import get_forward_backward_func File "D:\Users\lcx\anaconda3\envs\train_env\lib\site-packages\apex\transformer\pipeline_parallel\schedules__init__.py", line 3, in from apex.transformer.pipeline_parallel.schedules.fwd_bwd_no_pipelining import ( File "D:\Users\lcx\anaconda3\envs\train_env\lib\site-packages\apex\transformer\pipeline_parallel\schedules\fwd_bwd_no_pipelining.py", line 10, in from apex.transformer.pipeline_parallel.schedules.common import Batch File "D:\Users\lcx\anaconda3\envs\train_env\lib\site-packages\apex\transformer\pipeline_parallel\schedules\common.py", line 14, in from apex.transformer.tensor_parallel.layers import ( File "D:\Users\lcx\anaconda3\envs\train_env\lib\site-packages\apex\transformer\tensor_parallel__init__.py", line 21, in from apex.transformer.tensor_parallel.layers import ( File "D:\Users\lcx\anaconda3\envs\train_env\lib\site-packages\apex\transformer\tensor_parallel\layers.py", line 32, in from apex.transformer.tensor_parallel.mappings import ( File "D:\Users\lcx\anaconda3\envs\train_env\lib\site-packages\apex\transformer\tensor_parallel\mappings.py", line 29, in torch.distributed.reduce_scatter_tensor = torch.distributed._reduce_scatter_base AttributeError: module 'torch.distributed' has no attribute '_reduce_scatter_base' 我的环境是 torch=1.9.0+cu111 cuda=11.1 , 请问作者如何解决? 谢谢

我和你一样的问题,也是按照要求安装的环境,你解决了吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants