Skip to content

Conversation

@VictorSanh
Copy link
Member

Some community datasets don't have dataset_infos.json.
This will return an error in the helicopter view

Also drastically reduced the number of workers in multiprocessing (which I think was causing some issues).

@stephenbach
Copy link
Member

@VictorSanh @tianjianjiang Does this PR also address the issue in #411? Is there anything from #411 that should be included here?

@tianjianjiang
Copy link
Contributor

@VictorSanh @tianjianjiang Does this PR also address the issue in #411? Is there anything from #411 that should be included here?

@stephenbach +CC @VictorSanh I will have to test it again to be sure. However, last time I tested it, cpu_count() wouldn't fix #411, and cpu_count() was much slower than len(en_datasets), probably because it was I/O bound.

@VictorSanh
Copy link
Member Author

i removed the temptative fix for pool. let's have it in another PR.
I am merging things now cause people are asking for this fix

@VictorSanh VictorSanh merged commit e27351e into main Oct 13, 2021
@VictorSanh VictorSanh deleted the fix_get_dataset_infos_community_datasets branch October 13, 2021 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants