-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random Splitter #196
Random Splitter #196
Conversation
mottodora
commented
Jun 22, 2018
•
edited
Loading
edited
- RandomSplitter
- tests
- documents
Codecov Report
@@ Coverage Diff @@
## master #196 +/- ##
==========================================
+ Coverage 77.08% 79.01% +1.92%
==========================================
Files 89 95 +6
Lines 3875 4193 +318
==========================================
+ Hits 2987 3313 +326
+ Misses 888 880 -8 |
1.) | ||
if seed is not None: | ||
numpy.random.seed(seed) | ||
perm = numpy.random.permutation(len(dataset)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting numpy random seed may have side effect to other places.
How about using numpy.random.RandomState(seed).permutation(len(dataset))
?
So far looks good to me for the design, I think it is ok to separate PR, and merge it in current status |
I also want to consider the case when we want to split only |
if return_index: | ||
return train_inds, valid_inds, test_inds | ||
else: | ||
return dataset[train_inds], dataset[valid_inds], dataset[test_inds] | ||
if type(dataset) == NumpyTupleDataset: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use isinstance(dataset, NumpyTupleDataset)
.
I think the different way is to introduce converter
function as an argument, so that user can explicitly specify how to split based on dataset
with given indices
.
Default behavior would be...
def converter(dataset, indices):
return dataset[indices]
and for NumpyTupleDataset, user can explicitly specify...
def converter_numpy_tuple_dataset(dataset, indices):
return NumpyTupleDataset(*dataset.features[indices])
if type(dataset) == NumpyTupleDataset: | ||
train = NumpyTupleDataset(*dataset[train_inds]) | ||
valid = NumpyTupleDataset(*dataset[valid_inds]) | ||
test = NumpyTupleDataset(*dataset[test_inds]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use features, NumpyTupleDataset(*dataset.features[indices])
?
please merge after #200 |
Can you write docstring as in other chainer chemistry code (Google format)? https://github.com/pfnet-research/chainer-chemistry/blob/master/chainer_chemistry/dataset/converters.py#L5 other than that, LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM