Skip to content

Commit

Permalink
Make sure eval subset is sampled without replacing (#3651)
Browse files Browse the repository at this point in the history
Explicitly specify `replace=False` for
[numpy.random.choice](https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html)
(it was missing and default is `replace=True` which could lead to
duplicate examples in the evaluation set).
  • Loading branch information
andreaskoepf committed Aug 14, 2023
1 parent c4c9f37 commit 90e442a
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion model/model_training/custom_datasets/__init__.py
Expand Up @@ -196,7 +196,7 @@ def get_one_dataset(
train, eval = train_val_dataset(dataset, val_split=val_split)

if eval and max_val_set and len(eval) > max_val_set:
subset_indices = np.random.choice(len(eval), max_val_set)
subset_indices = np.random.choice(len(eval), size=max_val_set, replace=False)
eval = Subset(eval, subset_indices)

return train, eval

0 comments on commit 90e442a

Please sign in to comment.