Add an example of using own dataset #114

n-yoshikawa · 2018-03-13T08:58:21Z

This PR adds an example of using own dataset.

train.py shows how to use CSVFileParser and how to predict from SMILES.
dataset.csv is a sample CSV file. It is generated by extracting 100 molecules of QM9. value1 and value2 are homo and lumo.

codecov-io · 2018-03-13T09:13:55Z

Codecov Report

Merging #114 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #114   +/-   ##
=======================================
  Coverage   73.27%   73.27%           
=======================================
  Files          69       69           
  Lines        2582     2582           
=======================================
  Hits         1892     1892           
  Misses        690      690

delta2323 · 2018-03-15T09:53:46Z

examples/own_dataset/README.md

+## How to use your own dataset
+1. Prepare a CSV file which contains the list of SMILES and the values you want to train.
+The first line of the CSV file should be label names.
+See `dataset.csv` as an example.


Could you write that dataset.csv is made by sampling from the QM9 dataset?

delta2323 · 2018-03-16T00:47:58Z

Please add a link from the tutorial to this example. For example, adding the following comment at the bottom of the tutorial: once you completed this tutorial the next step would be to use Chainer Chemistry to model your own dataset. See this example (link) how to do it.

delta2323 · 2018-03-16T00:41:35Z

examples/own_dataset/README.md

+python train.py dataset.csv --label value1 value2
+```
+
+## How to use your own dataset


It reminds me that this is the only preferred way to use one's own dataset. But using CSVFileParser is just one way: Chainer Chemistry has SDFFileParser and users may even deal with the dataset without the parsers, although we think it would be costly. So how about changing the section title to "Procedure" simply?

delta2323 · 2018-03-16T00:53:43Z

examples/own_dataset/README.md

+## Usage
+```
+python train.py dataset.csv --label value1 value2
+```


Please explain (a part of ) options of the script. At least --label needs description. But I think the option is the only one treat specially and it would be enough to write as "type python train.py --help to see complete options".

delta2323 · 2018-03-16T03:57:52Z

Could you change the test scripts to check if this example run without errors.

delta2323 · 2018-03-16T01:46:35Z

examples/own_dataset/train.py

+from chainer_chemistry.dataset.converters import concat_mols
+from chainer_chemistry.dataset.parsers import CSVFileParser
+from chainer_chemistry.dataset.preprocessors import preprocess_method_dict
+from chainer_chemistry.datasets import NumpyTupleDataset


You do not have to separate import of chainer_chemisty because Chainer Chemisty is a third party library for this example and therefore can be treated in the same way as Chainer or NumPy.

Could you sort import statements in alphabetical order?

delta2323 · 2018-03-16T01:46:48Z

examples/own_dataset/train.py

+from chainer_chemistry.dataset.preprocessors import preprocess_method_dict
+from chainer_chemistry.datasets import NumpyTupleDataset
+
+from rdkit import Chem


same as import of chainer_chemistry

delta2323 · 2018-03-16T01:50:38Z

examples/own_dataset/train.py

+        sys.exit("Error: No target label is specified.")
+
+    # Dataset preparation
+    dataset = None


Do we need this line?

delta2323 · 2018-03-16T01:51:20Z

examples/own_dataset/train.py

+    def postprocess_label(label_list):
+        return numpy.asarray(label_list, dtype=numpy.float32)
+
+    print('preprocessing dataset...')


I prefer to capitalize the first character, as we do in other places.

delta2323 · 2018-03-16T01:58:12Z

examples/own_dataset/README.md

+The first line of the CSV file should be label names.
+See `dataset.csv` as an example.
+
+2. Use `CSVFileParser` of Cheiner Chemistry to feed data to model.


It would be better to add a link to the document of CSVFileParser.

delta2323 · 2018-03-16T04:04:32Z

examples/own_dataset/train.py

+    preprocessor = preprocess_method_dict[method]()
+    parser = CSVFileParser(preprocessor,
+                           postprocess_label=postprocess_label,
+                           labels=labels, smiles_col='SMILES')


Is it intentional that you hard-coded the value of smiles_col?

Yes, I want to show that we can specify the column name by smiles_col in this example.

delta2323 · 2018-03-16T04:13:05Z

examples/own_dataset/train.py

+        return numpy.mean(numpy.absolute(diff), axis=0)[0]
+
+    classifier = L.Classifier(model, lossfun=F.mean_squared_error,
+                              accfun=scaled_abs_error)


Could you add a comment somewhere that scaled errors are reported as main/accuracy and validation/main/accuracy because using accfun for calculating a (scaled) error is not ordinal usage of Classifier.

n-yoshikawa · 2018-03-19T03:00:11Z

I could not figure out how to add "test scripts to check if this example run without errors". Could you show me an example or how other examples are tested?

delta2323 · 2018-03-20T00:38:20Z

We have scripts that run examples in CPU (example_test_cpu.sh) and GPU (example_test_cpu.sh) in the examples directory. So, I imagined to add your example to them.

delta2323

LGTM except comments and the example tests.

delta2323 · 2018-03-20T00:43:36Z

docs/source/tutorial.rst

+Using your own dataset
+========================
+You can use your own dataset in Chainer Chemistry.
+`example/own_dataset <https://github.com/pfnet-research/chainer-chemistry/tree/master/examples/own_dataset/>`_ shows an example code.


Code is uncountable (cf. here). So, "example code" or "an example" would be better.

delta2323 · 2018-03-20T00:45:27Z

examples/own_dataset/README.md

@@ -4,10 +4,15 @@
 python train.py dataset.csv --label value1 value2
 ```

-## How to use your own dataset
+The `--label` option specifies which row in `dataset.csv` is trained.


I think it is columns, not rows that we have to specify.

Also, as we can specify multiple columns, "is" should be substituted with "are".

delta2323 · 2018-03-20T08:52:58Z

Thank you for the update. I'm running the example test scripts.

delta2323 · 2018-03-22T00:58:05Z

The example failed with this configuration. Could you check that?

python train.py dataset.csv --method nfp --label value1 --conv-layers 1 --gpu 0 --epoch 1 --unit-num 10

log

Preprocessing dataset...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 1944.47it/s]
Train NFP model...
epoch       main/loss   main/accuracy  validation/main/loss  validation/main/accuracy  elapsed_time
1           0.884043    0.0152891      1.07675               0.0166278                 6.00279
Traceback (most recent call last):
  File "train.py", line 207, in <module>
    main()
  File "train.py", line 200, in main
    prediction = model(atoms, adjs).data[0]
  File "train.py", line 55, in __call__
    x = self.graph_conv(atoms, adjs)
  File "/home/delta/dev/chainer-chemistry/chainer_chemistry/models/nfp.py", line 145, in __call__
    h = self.embed(atom_array)
  File "/home/delta/dev/chainer-chemistry/chainer_chemistry/links/embed_atom_id.py", line 47, in __call__
    h = super(EmbedAtomID, self).__call__(x)
  File "/home/delta/.pyenv/versions/anaconda3-4.3.0/envs/anaconda3/lib/python3.6/site-packages/chainer/links/connection/embed_id.py", line 70, in __call__
    return embed_id.embed_id(x, self.W, ignore_label=self.ignore_label)
  File "/home/delta/.pyenv/versions/anaconda3-4.3.0/envs/anaconda3/lib/python3.6/site-packages/chainer/functions/connection/embed_id.py", line 170, in embed_id
    return EmbedIDFunction(ignore_label=ignore_label).apply((x, W))[0]
  File "/home/delta/.pyenv/versions/anaconda3-4.3.0/envs/anaconda3/lib/python3.6/site-packages/chainer/function_node.py", line 245, in apply
    outputs = self.forward(in_data)
  File "/home/delta/.pyenv/versions/anaconda3-4.3.0/envs/anaconda3/lib/python3.6/site-packages/chainer/functions/connection/embed_id.py", line 35, in forward
    .format(type(W), type(x)))
ValueError: numpy and cupy must not be used together
type(W): <class 'cupy.core.core.ndarray'>, type(x): <class 'numpy.ndarray'>

n-yoshikawa · 2018-03-27T08:59:19Z

I forgot to specify device number in concat_mols. Both examples do not fail in my environment now.

delta2323 · 2018-03-28T04:10:58Z

Thank you for fixing the problem. I'll run the scripts again.

delta2323

LGTM except small comments.

delta2323 · 2018-03-28T08:30:51Z

examples/own_dataset/README.md

+The first line of the CSV file should be label names.
+See `dataset.csv` as an example.
+`dataset.csv` is made by sampling from the QM9 dataset.
+`value1` is homo and `value2` is lumo.


Capitalize homo and lumo.

delta2323 · 2018-03-28T08:32:28Z

examples/own_dataset/README.md

+Type `python train.py --help` to see complete options.
+
+## Procedure
+1. Prepare a CSV file which contains the list of SMILES and the values you want to train.


I think "... contains a list of and values ..." would be better.

delta2323 · 2018-03-28T08:36:26Z

examples/own_dataset/train.py

+from chainer_chemistry.dataset.converters import concat_mols
+from chainer_chemistry.dataset.parsers import CSVFileParser
+from chainer_chemistry.dataset.preprocessors import preprocess_method_dict
+from chainer_chemistry.datasets import NumpyTupleDataset


Could you sort import statements in alphabetical order?

delta2323 · 2018-03-28T08:37:38Z

examples/own_dataset/train.py

+        sys.exit("Error: No target label is specified.")
+
+    # Dataset preparation
+


Remove this empty line.

delta2323 · 2018-04-02T10:53:06Z

LGTM

Add example of using own dataset

2b97467

delta2323 requested changes Mar 15, 2018

View reviewed changes

delta2323 requested changes Mar 16, 2018

View reviewed changes

Change following comments

ddfc9cc

delta2323 requested changes Mar 20, 2018

View reviewed changes

Add test to run example

77da969

delta2323 self-assigned this Mar 20, 2018

Support GPU

49b7a33

delta2323 approved these changes Mar 28, 2018

View reviewed changes

delta2323 requested changes Mar 28, 2018

View reviewed changes

Change import order

5fba802

delta2323 approved these changes Apr 2, 2018

View reviewed changes

delta2323 merged commit 91681bd into chainer:master Apr 2, 2018

n-yoshikawa deleted the example_own_dataset branch April 3, 2018 08:47

delta2323 added this to the 0.3.0 milestone Apr 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an example of using own dataset #114

Add an example of using own dataset #114

n-yoshikawa commented Mar 13, 2018

codecov-io commented Mar 13, 2018 •

edited

Loading

delta2323 Mar 15, 2018

delta2323 commented Mar 16, 2018

delta2323 Mar 16, 2018

delta2323 Mar 16, 2018

delta2323 commented Mar 16, 2018

delta2323 Mar 16, 2018

delta2323 Mar 28, 2018

delta2323 Mar 16, 2018

delta2323 Mar 16, 2018

delta2323 Mar 16, 2018

delta2323 Mar 16, 2018

delta2323 Mar 16, 2018

n-yoshikawa Mar 19, 2018

delta2323 Mar 16, 2018

n-yoshikawa commented Mar 19, 2018

delta2323 commented Mar 20, 2018

delta2323 left a comment •

edited

Loading

delta2323 Mar 20, 2018

delta2323 Mar 20, 2018

delta2323 Mar 20, 2018

delta2323 commented Mar 20, 2018

delta2323 commented Mar 22, 2018

n-yoshikawa commented Mar 27, 2018

delta2323 commented Mar 28, 2018

delta2323 left a comment

delta2323 Mar 28, 2018

delta2323 Mar 28, 2018

delta2323 Mar 28, 2018

delta2323 Mar 28, 2018

delta2323 commented Apr 2, 2018

		sys.exit("Error: No target label is specified.")

		# Dataset preparation

Add an example of using own dataset #114

Add an example of using own dataset #114

Conversation

n-yoshikawa commented Mar 13, 2018

codecov-io commented Mar 13, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

delta2323 commented Mar 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

delta2323 commented Mar 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

n-yoshikawa commented Mar 19, 2018

delta2323 commented Mar 20, 2018

delta2323 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

delta2323 commented Mar 20, 2018

delta2323 commented Mar 22, 2018

n-yoshikawa commented Mar 27, 2018

delta2323 commented Mar 28, 2018

delta2323 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

delta2323 commented Apr 2, 2018

codecov-io commented Mar 13, 2018 •

edited

Loading

delta2323 left a comment •

edited

Loading