Random Splitter #196

mottodora · 2018-06-22T08:46:49Z

RandomSplitter
tests
documents

codecov-io · 2018-06-22T08:55:27Z

Codecov Report

Merging #196 into master will increase coverage by 1.92%.
The diff coverage is 89.53%.

@@            Coverage Diff             @@
##           master     #196      +/-   ##
==========================================
+ Coverage   77.08%   79.01%   +1.92%     
==========================================
  Files          89       95       +6     
  Lines        3875     4193     +318     
==========================================
+ Hits         2987     3313     +326     
+ Misses        888      880       -8

corochann · 2018-06-22T09:14:51Z

chainer_chemistry/dataset/splitters/random_splitter.py

+                                          1.)
+        if seed is not None:
+            numpy.random.seed(seed)
+        perm = numpy.random.permutation(len(dataset))


Setting numpy random seed may have side effect to other places.
How about using numpy.random.RandomState(seed).permutation(len(dataset))?

corochann · 2018-06-22T09:17:19Z

So far looks good to me for the design, I think it is ok to separate PR, and merge it in current status
(only BaseSplitter & RandomSplitter for this PR).

corochann · 2018-06-22T09:18:17Z

I also want to consider the case when we want to split only train and val, without test dataset.
How can we handle this case?

corochann · 2018-06-23T09:27:46Z

chainer_chemistry/dataset/splitters/base_splitter.py

        if return_index:
            return train_inds, valid_inds, test_inds
        else:
-            return dataset[train_inds], dataset[valid_inds], dataset[test_inds]
+            if type(dataset) == NumpyTupleDataset:


use isinstance(dataset, NumpyTupleDataset).

I think the different way is to introduce converter function as an argument, so that user can explicitly specify how to split based on dataset with given indices.

Default behavior would be...

def converter(dataset, indices): return dataset[indices]

and for NumpyTupleDataset, user can explicitly specify...

def converter_numpy_tuple_dataset(dataset, indices): return NumpyTupleDataset(*dataset.features[indices])

corochann · 2018-06-23T09:28:13Z

chainer_chemistry/dataset/splitters/base_splitter.py

+            if type(dataset) == NumpyTupleDataset:
+                train = NumpyTupleDataset(*dataset[train_inds])
+                valid = NumpyTupleDataset(*dataset[valid_inds])
+                test = NumpyTupleDataset(*dataset[test_inds])


use features, NumpyTupleDataset(*dataset.features[indices]) ?

mottodora · 2018-06-25T07:25:15Z

please merge after #200

corochann · 2018-06-27T02:31:08Z

Can you write docstring as in other chainer chemistry code (Google format)?

https://github.com/pfnet-research/chainer-chemistry/blob/master/chainer_chemistry/dataset/converters.py#L5
https://github.com/pfnet-research/chainer-chemistry/blob/master/chainer_chemistry/models/ggnn.py#L109

other than that, LGTM

corochann

LGTM

implement RandomSplitter

ebc2d1e

corochann reviewed Jun 22, 2018

View reviewed changes

mottodora changed the title ~~[WIP] Splitter~~ Random Splitter Jun 23, 2018

mottodora added 4 commits June 23, 2018 10:02

fix seed setting

7a3ec70

add train valid split

db4cd36

add unit test

1fd2b0d

fix unit tests

578c1b5

mottodora changed the title ~~Random Splitter~~ [WIP] Random Splitter Jun 23, 2018

corochann reviewed Jun 23, 2018

View reviewed changes

mottodora force-pushed the splitter branch from 23ba920 to 578c1b5 Compare June 25, 2018 02:08

mottodora added 5 commits June 25, 2018 11:08

add converter

c5ea2a5

apply converter

aff23f7

add ndarray test

cf57b0a

delete unnecessary file

2403713

Fix indexing behavior

47b6371

mottodora added 4 commits June 25, 2018 16:26

use NumpyTupleDataset.features

e4d4b0b

use **kwargs for options

d66e494

fix arguments

c6ba4de

add documents

1b003ae

mottodora changed the title ~~[WIP] Random Splitter~~ Random Splitter Jun 26, 2018

fix API

fa1ebf8

fix docstring style

320799d

corochann approved these changes Jun 27, 2018

View reviewed changes

corochann merged commit 9bcc316 into chainer:master Jun 27, 2018

mottodora deleted the splitter branch June 27, 2018 07:23

mottodora added this to the 0.4.0 milestone Jul 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random Splitter #196

Random Splitter #196

mottodora commented Jun 22, 2018 •

edited

Loading

codecov-io commented Jun 22, 2018 •

edited

Loading

corochann Jun 22, 2018

corochann commented Jun 22, 2018

corochann commented Jun 22, 2018

corochann Jun 23, 2018

corochann Jun 23, 2018

mottodora commented Jun 25, 2018

corochann commented Jun 27, 2018

corochann left a comment

Random Splitter #196

Random Splitter #196

Conversation

mottodora commented Jun 22, 2018 • edited Loading

codecov-io commented Jun 22, 2018 • edited Loading

Codecov Report

corochann Jun 22, 2018

Choose a reason for hiding this comment

corochann commented Jun 22, 2018

corochann commented Jun 22, 2018

corochann Jun 23, 2018

Choose a reason for hiding this comment

corochann Jun 23, 2018

Choose a reason for hiding this comment

mottodora commented Jun 25, 2018

corochann commented Jun 27, 2018

corochann left a comment

Choose a reason for hiding this comment

mottodora commented Jun 22, 2018 •

edited

Loading

codecov-io commented Jun 22, 2018 •

edited

Loading