Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGFIX] Cnn_dailymail and xnli raise error when downloading in multi-gpus mode #1587

Merged
merged 7 commits into from
Jan 26, 2022

Conversation

gongel
Copy link
Member

@gongel gongel commented Jan 13, 2022

PR types

Bug fixes

PR changes

Others

Description

  • FIx: When downloading, counting file_num in multi-gpus mode will raise error.

.trainer_endpoints[:])
if ParallelEnv().current_endpoint in unique_endpoints:
file_num = len(os.listdir(fullname))
if file_num != 15:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对于magic number需要进行额外注释

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE, thx

.trainer_endpoints[:])
if ParallelEnv().current_endpoint in unique_endpoints:
file_num = len(os.listdir(fullname))
if file_num != len(ALL_LANGUAGES):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里稍微说明下背景,相比其他数据集这里多了file_num = len(os.listdir(os.path.join(dir_path, "stories"))),这个在多进程下有些问题,当前多进程下载解压机制假定了_get_data中的文件相关操作是制定节点上的,咱们数据集依赖的 get_path_from_url 是符合这个假设的,这避免了绝大部分数据集多进程下载解压的问题,这里的file_num = len(os.listdir(os.path.join(dir_path, "stories")))不太一样,这个PR是临时修复办法,后续可以在从数据集层面解决下

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里稍微说明下背景,相比其他数据集这里多了file_num = len(os.listdir(os.path.join(dir_path, "stories"))),这个在多进程下有些问题,当前多进程下载解压机制假定了_get_data中的文件相关操作是制定节点上的,咱们数据集依赖的 get_path_from_url 是符合这个假设的,这避免了绝大部分数据集多进程下载解压的问题,这里的file_num = len(os.listdir(os.path.join(dir_path, "stories")))不太一样,这个PR是临时修复办法,后续可以在从数据集层面解决下

好的 Thx

@LiuChiachi LiuChiachi merged commit 2197402 into PaddlePaddle:develop Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants