You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This would mean that there would be a chunk for each sample so dev split would be possible again. But perhaps it would be better to overthink random_split_ConcatDataset because it also causes other problems (#96).
Error message
AssertionError: Dev_split ratio is too large, there is no data in train set. Please lower dev_split = 0.1
This has been fixed with #1758 the way described in this issue. However, as explained it would probably still make sense to overthink random_split_ConcatDataset.
Describe the bug
When disabling multiprocessing, preprocessing is done in one huge chunk:
haystack/haystack/modeling/data_handler/data_silo.py
Line 156 in 8082549
However, the dev split method does not divide up chunks:
haystack/haystack/modeling/data_handler/data_silo.py
Lines 384 to 408 in 8082549
This means that dev split cannot work when multiprocessing is disabled.
It could be fixed by changing:
haystack/haystack/modeling/data_handler/data_silo.py
Line 156 in 8082549
to:
This would mean that there would be a chunk for each sample so dev split would be possible again. But perhaps it would be better to overthink random_split_ConcatDataset because it also causes other problems (#96).
Error message
AssertionError: Dev_split ratio is too large, there is no data in train set. Please lower dev_split = 0.1
To Reproduce
The text was updated successfully, but these errors were encountered: