-
Notifications
You must be signed in to change notification settings - Fork 306
Conversation
Hi thanks for a timely PR. I am interested in supporting this feature.
Since MixUp is simpler, it is easier to include in a library. Having said that, I have several design comments.
|
|
||
""" | ||
|
||
def __init__(self, dataset, dtype=numpy.float32, label_dtype=numpy.float32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove dtype
and label_dtype
from arguments?
By default, images and soft-labels are dtype=np.float32
.
If different dtypes are needed, transforms can be used.
Thank you for your quick and attentive response. OK, I will change After merging this PR and if I can confirm advantage fo BC, I may suggest BC-like mixing as option.
Don't mind. Actually, I will take time to restart this PR. |
chainercv/datasets/mixup_dataset.py
Outdated
@@ -0,0 +1,60 @@ | |||
import numpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use np
in ChainerCV. Could you change it?
chainercv/datasets/__init__.py
Outdated
@@ -15,6 +15,7 @@ | |||
from chainercv.datasets.cub.cub_utils import cub_label_names # NOQA | |||
from chainercv.datasets.directory_parsing_label_dataset import directory_parsing_label_names # NOQA | |||
from chainercv.datasets.directory_parsing_label_dataset import DirectoryParsingLabelDataset # NOQA | |||
from chainercv.datasets.mixup_dataset import MixUpSoftLabelDataset # NOQA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change the name of the file to mixup_soft_label_dataset
?
Also, could you change the name of the test file as well?
chainercv/datasets/mixup_dataset.py
Outdated
|
||
""" | ||
|
||
def __init__(self, dataset, max_label): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using n_class
instead of max_label
?
n_class
is widely used in other parts of ChainerCV.
In that case the doc would be
n_class (int): The number of classes in the base dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove `+1 in line 57 if you do this.
chainercv/datasets/mixup_dataset.py
Outdated
>>> mixed_image, mixed_label = dataset[0] | ||
|
||
Args: | ||
dataset: The underlying dataset. dataset should returns two image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
daset should return ...
can be written more explicitly.
The dataset returns :obj:`img_0, label_0, img_1, label_1`, which is a tuple containing two pairs of an image and a label.
chainercv/datasets/mixup_dataset.py
Outdated
Args: | ||
dataset: The underlying dataset. dataset should returns two image | ||
and their label. Typically, dataset is `SiameseDataset`. | ||
More over, each element of each dataset should have same shape. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover, ...
can be rewritten as...
The shapes of images and labels should be constant.
chainercv/datasets/mixup_dataset.py
Outdated
>>> mnist, _ = get_mnist() | ||
>>> base_dataset = SiameseDataset(mnist, mnist) | ||
>>> dataset = MixUpSoftLabelDataset(base_dataset) | ||
>>> mixed_image, mixed_label = dataset[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about printing the shape? This is an important details.
>>> print(mixed_label.shape, mixed_label.dtype) # ((10,), numpy.float32)
self.assertEqual(example[1].dtype, np.float32) | ||
self.assertEqual(example[1].ndim, 1) | ||
self.assertEqual(example[1].shape[0], self.n_class + 1) | ||
self.assertAlmostEqual(example[1].sum(), 1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check that the label is nonnegative?
chainercv/datasets/mixup_dataset.py
Outdated
dataset respectively by weighted average. | ||
|
||
Unlike `LabeledImageDatasets`, label is a one-dimensional float array with | ||
at most two nonzero weights (i.e. soft label). The summed weights is one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sum of the two weights is one
chainercv/datasets/mixup_dataset.py
Outdated
Unlike `LabeledImageDatasets`, label is a one-dimensional float array with | ||
at most two nonzero weights (i.e. soft label). The summed weights is one. | ||
|
||
The base dataset `__getitem__` should return image and label. Please see |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect. We can remove this.
chainercv/datasets/mixup_dataset.py
Outdated
|
||
"""Dataset which returns mixed images and labels for mixup learning[1]. | ||
|
||
`MixUpSoftLabelDataset` mix two images and labels which is chosen by base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:class:`MixUpSoftLabelDataset` mixes two pairs of labeled images fetched from the base dataset.
Thank you for a nice PR! |
chainercv/datasets/mixup_dataset.py
Outdated
>>> from chainercv.datasets import MixUpSoftLabelDataset | ||
>>> mnist, _ = get_mnist() | ||
>>> base_dataset = SiameseDataset(mnist, mnist) | ||
>>> dataset = MixUpSoftLabelDataset(base_dataset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to supply the number of class to MixupSoftLabelDataset
Thank you for your attentive review. I addressed your comments. |
docs/source/reference/datasets.rst
Outdated
@@ -18,6 +18,10 @@ SiameseDataset | |||
~~~~~~~~~~~~~~ | |||
.. autoclass:: SiameseDataset | |||
|
|||
MixUpSoftLabelDataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this before SiameseDataset
?
We are ordering docs in alphabetical order.
return len(self._dataset) | ||
|
||
def get_example(self, i): | ||
image1, label1, image2, label2 = self._dataset[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about changing the names of the variables to img_0, label_0, img_1, label_1
so that they are consistent with the documentation?
Thanks! I addressed your comments. |
LGTM. Thank you for your contribution! |
Between class learning is simple and effective technique, I think it is valuable to merge to chainercv.
https://arxiv.org/abs/1711.10284