Add return_bb option to CUBDatasets and add a test #399

yuyu2172 · 2017-08-18T05:18:27Z

No description provided.

Hakuyume · 2017-08-18T13:47:37Z

tests/datasets_tests/cub_tests/test_cub_label_dataset.py

+    def test_cub_label_dataset(self):
+        assert_is_classification_dataset(
+            self.dataset, len(cub_label_names), n_example=10)
+


Is it possible to check the effect of crop_bbox?

It is totally different thing, but what do you think about changing this dataset to return bbox instead of returning a cropped image.
This is more flexible because in some cases users want to crop an image by a padded bbox.

This can be done by replacing crop_bbox option to return_bbox. This style of interface is similar to VOCDetectionDataset, which also can return extra data.

Returning bbox looks good to me. As you said, it will be more flexible. Perhaps, adding transforms.image.crop_with_bbox (or simply crop) will be helpful. With this function, we can write cropped dataset easily,
TransformDataset(CUBLabelDataset(return_bbox), lambda in_data: crop_with_bbox(in_data[0], in_data[2]), in_data[1]).

I think return_bbox is an inaccurate name because we defined bbox as a set of bounding boxes.

I was thinking of returning a set of bbox (shape=(1,4)).

Yes, it is redundant.

But, I think it is better to keep data type consistent with other bboxes, so that we can use tools for bboxes.

I think name return_bb and returns (4,) is better. User can bbox utils by bb[np.newaxis]/bb[None].

I reflected this change.

Hakuyume · 2017-08-19T01:09:44Z

I noticed we have both LabelDataset (CUBLabelDataset) and ClassificationDataset (DirectoryParsingClassificationDataset). We should unify these names because their tasks are same.

yuyu2172 · 2017-08-19T01:18:07Z

You are right.
ClassificationDataset is good because label is overloaded too much.
Also, it is consistent with other names of datasets which contain task names.

On the other hand, LabelDataset is good because this can be used for tasks other than Classification.

Hakuyume · 2017-08-19T03:22:16Z

From my understanding, Classification in the class name indicates the main task for which the dataset was designed. Its name does not limit the usage of the dataset.

As you pointed out, we can use annotation type instead of task type. In this case, we should change class names as follows:

ClassificationDataset -> ImagewiseLabelDataset
DetectionDataset -> BBoxWithLabelDataset (?)
SemanticSegmentationDataset -> PixelwiseLabeldataset

yuyu2172 · 2017-08-20T00:11:52Z

In my opinion, there are three options. I slightly prefer 2 or 3.

Name by the main tasks for which the dataset was designed.
Name the dataset by the most distinctive annotation.

In this case, I would also suggest the following names:

ClassificationDataset -> LabelDataset
DetectionDataset -> BboxDataset
SemanticSegmentationDataset -> SemanticSegmentationDataset.

One advantage with this naming convention is that it only uses notations that have already been used in ChainerCV.

Treat LabelDataset as a special case. LabelDataset can be used in multiple tasks such as Classification and Image Retrieval. This is the primary reason why ClassificationDataset sounds wrong. On the other hand, there is one to one correspondence between a task and a dataset type for Detection and SemanticSegmentation. There is no problem with assigning a task name in this case.

Hakuyume · 2017-08-20T03:09:05Z

ClassificationDataset -> LabelDataset
DetectionDataset -> BboxDataset
SemanticSegmentationDataset -> SemanticSegmentationDataset.

SemanticSegmentationDataset looks inconsistent. This is the name of task.

On the other hand, there is one to one correspondence between a task and a dataset type for Detection and SemanticSegmentation.

This is not true. For example, a detection dataset can be used for object counting task.

yuyu2172 · 2017-08-20T03:12:53Z

This is not true. For example, a detection dataset can be used for object counting task.

I see. So it seems that option 3 is too arbitrary.

SemanticSegmentationDataset looks inconsistent. This is the name of task.

Although it is the name of the task, it is the name of the output. This can be improved.

Hakuyume · 2017-08-20T03:16:33Z

Although it is the name of the task, it is the name of the output. This can be improved.

How about SemanticMask?

yuyu2172 · 2017-08-20T03:21:40Z

I am not sure if that name is common in the field.

Hakuyume · 2017-08-20T03:26:18Z

Personally, I prefer Name by the main tasks for which the dataset was designed. Is there anyone who think "I cannot use this dataset for image retrieval task because this is named Classification"?

yuyu2172 · 2017-08-20T23:17:43Z

On top of the inherent inconsistency problem, the task name is longer (Label -> Classification).
I observe this phenomenon a lot.

Keypoint -> Pose Estimation
Caption -> Question Answering (Although in this case, we can abbreviate the task name to QA)
Scene Graph -> Scene Graph Generation

I took a quick look at COCO and Visual Genome, which are prominent datasets that cover multiple tasks.

SemanticSegmentationDataset looks inconsistent. This is the name of task.

I think a user would not get confused about the name of Bbox/Detection dataset just because there is SemanticSegmentationDataset.

Hakuyume · 2017-08-21T03:33:59Z

I think a user would not get confused about the name of Bbox/Detection dataset just because there is SemanticSegmentationDataset.

What do you mean?

yuyu2172 · 2017-08-21T03:38:40Z

It is totally fine to name SemanticSegmentationDataset together with BboxDataset.
We can have a task and an annotation whose names are the same.

Hakuyume · 2017-08-21T03:52:04Z

We can have a task and an annotation whose names are the same.

Yes, that is not the problem. However, Segmentation sounds separating a thing into some pieces. Do we call separated pieces as segmentation? Aren't they segments? This is the reason why SemanticSegmentationDataset looks strange to me.

yuyu2172 · 2017-08-21T04:00:59Z

segmentation can have the same meaning as segment.

I googled the word, and it is used in two ways.
https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FAST

Hakuyume · 2017-08-21T04:25:59Z

segmentation can have the same meaning as segment.

I see. So, SemanticSegmentationDataset is OK.
The names of datasets will be LabelDataset, BboxDataset and SemanticSegmentationDataset, right?
Perhaps, it is better to rename (pixel-wise) label to segmentation (seg/segm?). For example, vis_label -> vis_segmentation.

yuyu2172 · 2017-08-21T05:41:23Z

Perhaps, it is better to rename (pixel-wise) label to segmentation (seg/segm?)

Looking back at our internal discussion, label was preferred over segm because there is no similar name for score.
label/score pair is used throughout the library and it was suggested to be used for Semantic Segmentation as well.

However, vis_segmentation is much more informative than vis_label.
Since the situation has changed, I think segm is good.

By the way, taking instance_segmentation into consideration, I think vis_semantic_segmentation is better.

Hakuyume · 2017-08-21T05:52:22Z

label/score pair is used throughout the library and it was suggested to be used for Semantic Segmentation as well.

I see.

By the way, taking instance_segmentation into consideration, I think vis_semantic_segmentation is better.

I thought the functionality of vis_instance_segmentation is very similar to that of vis_semantic_segmentation and we can treat both task by one function. But, vis_semantic_segmentation looks better now.

Hakuyume · 2017-08-21T05:56:36Z

Let me summarize.

datasets -> <Identity><Main-annoatation-type>Dataset
dataset assertions -> assert_is_<main-annotation-type>_dataset
visualizations-> vis_<main-annoatation-type>
model assertions -> assert_is_<task>_link

yuyu2172 · 2017-08-21T05:59:11Z

I think it is OK to change label to segm.

Hakuyume · 2017-08-21T06:02:34Z

I think it is OK to change label to segm.

LabelDataset -> img, label (per image)
BboxDataset -> img, bbox, label (per bounding box)
SemanticSegmentationDataset -> img, segm (per pixel)

yuyu2172 · 2017-08-21T06:03:29Z

Adding to that, there are following objects:

Evaluator: <task><metric>Evaluator (edited)
VisReport <task>VisReport
evaluations: eval_<metric>
transforms <operation>_<annotation-type> (e.g. resize_bbox)

Hakuyume · 2017-08-21T06:07:48Z

VisReport VisReport

Considering the consistency with vis_*, shouldn't it be <Main-annotation-type>VisReport? Do you intend to specify the type of target?

transforms _ (e.g. resize_bbox)

If the <annotation-type> is image, we omit it, right? (e.g. random_crop_image -> random_crop)

yuyu2172 · 2017-08-21T06:14:10Z

So there are three conventions:

Annotation convention: (e.g. img is CHW)
Dataset convention: Set of annotations returned by a dataset (e.g. img, label is returned by LabelDataset))
Task convention: Input and output of a network (e.g. Classification task handles a network that takes img, label during training as input. It takes img and outputs score (or prob) during testing.)

3 depends on 2 and 2 depends on 1.
For example, Detection task assumes the BboxDataset as a dataset.

Considering the consistency with vis_*, shouldn't it be VisReport? Do you intend to specify the type of target?

Since VisReport assumes the input and output of a network, I think its behavior is selected by a task.

If the is image, we omit it, right? (e.g. random_crop_image -> random_crop)

Yes. This is for convenience.

yuyu2172 · 2017-09-12T05:05:06Z

@Hakuyume
Let's finish this.

Hakuyume · 2017-09-12T05:18:57Z

Do you mean this?

label_names (not imagewise_label_names)
LabelDataset instead of ImagewiseLabelDataset.

It looks OK. This can be better because it is shorter.

I mean both.

dataset	values	additional info
LabelDataset	img, label	label_names
BoundingboxDataset	img, bbox, label	boundingbox_label_names
SemanticSegmentationDataset	img, label	semantic_segmentation_label_names, semantic_segmentation_label_colors

yuyu2172 · 2017-09-12T05:21:30Z

I would prefer bbox_label_names over boundngbox_label_names.
bbox is used throughout the library.

Other than that, it looks ok.

Hakuyume · 2017-09-12T05:55:08Z

I would prefer bbox_label_names over boundngbox_label_names.
bbox is used throughout the library.

Sorry, it's my mistake.

dataset	values	additional info	visualizer	related tasks
LabelDataset	img, label	label_names	-	classification, image retrieval
BboxDataset	img, bbox, label	bbox_label_names	vis_bbox	object detection, object counting
SemanticSegmentationDataset	img, label	semantic_segmentation_label_names, semantic_segmentation_label_colors	vis_semantic_segmentation	semantic segmentation

Hakuyume · 2017-09-12T05:56:00Z

If we choose to use LabelDataset for datasets with imagewise annotations, DirectoryParsingLabelDataset would be a good name.

I agree with you.

yuyu2172 · 2017-09-12T06:15:51Z

Sorry. I forgot to point out, but I prefer the dataset name for bbox to be BboxDataset.

I will start implementing these changes.

Hakuyume · 2017-09-12T07:35:04Z

Sorry. I forgot to point out, but I prefer the dataset name for bbox to be BboxDataset.

Sorry, I fixed it. And I added two columns 'visualizer' and 'related tasks'.

yuyu2172 · 2017-09-14T05:49:01Z

The new API (#399 (comment)) is implemented:

…into cub-label-test

…test

yuyu2172 · 2017-09-20T01:21:28Z

Please merge this after #405.

Hakuyume

Why don't you support mask in CUBLabelDataset?

yuyu2172 · 2017-10-05T05:28:53Z

Good idea.

yuyu2172 · 2017-10-05T05:57:09Z

@Hakuyume
I found that mask is not really an appropriate name for this data because it can be in any value between [0, 255].
I think prob_map is better name for this.
Since, this is a relative big change, how about making another PR?

Hakuyume · 2017-10-05T06:41:45Z

I found that mask is not really an appropriate name for this data because it can be in any value between [0, 255].
I think prob_map is better name for this.

Is it a value of probability? If so, can we scale it to [0, 1)?

Since, this is a relative big change, how about making another PR?

Yes. it is better.

yuyu2172 · 2017-10-05T06:57:37Z

Yes.

Please review this first.

Hakuyume

LGTM except mask

add a test for cub_label_dataset

0a51de8

yuyu2172 assigned Hakuyume Aug 18, 2017

yuyu2172 added 2 commits August 18, 2017 14:21

fix test

1e27522

fix flake8

4614247

yuyu2172 added the test label Aug 18, 2017

yuyu2172 added this to the v0.7 milestone Aug 18, 2017

Hakuyume reviewed Aug 18, 2017

View reviewed changes

This was referenced Sep 14, 2017

Use bbox* instead of detection* for datasets and label_names #419

Merged

Change function name: vis_label -> vis_semantic_segmentation #420

Merged

yuyu2172 added 6 commits September 20, 2017 09:58

Merge remote-tracking branch 'origin/master' into cub-label-test

16e9b98

use return_bb options

3b2437d

Merge branch 'cub-label-test' of https://github.com/yuyu2172/chainercv …

f8e5929

…into cub-label-test

flake8

5fd288f

Merge remote-tracking branch 'yuyu2172/label-dataset' into cub-label-…

409f6c8

…test

updated based on chainer#405

e60d3b3

style

5ba70c7

yuyu2172 changed the title ~~Add a test for CUBLabelDataset~~ Add return_bb option to CUBDatasets and add a test Sep 20, 2017

yuyu2172 added the no-compat label Sep 20, 2017

yuyu2172 added 2 commits October 5, 2017 11:40

Merge remote-tracking branch 'origin/master' into cub-label-test

9daf105

add the

ca519f5

Hakuyume reviewed Oct 5, 2017

View reviewed changes

Hakuyume approved these changes Oct 5, 2017

View reviewed changes

Hakuyume merged commit 82f8ef8 into chainer:master Oct 5, 2017

yuyu2172 mentioned this pull request Oct 5, 2017

Add prob_map option to CUBDataset #443

Merged

Add return_bb option to CUBDatasets and add a test #399

Add return_bb option to CUBDatasets and add a test #399

Conversation

yuyu2172 commented Aug 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hakuyume commented Aug 19, 2017

yuyu2172 commented Aug 19, 2017

Hakuyume commented Aug 19, 2017

yuyu2172 commented Aug 20, 2017

Hakuyume commented Aug 20, 2017

yuyu2172 commented Aug 20, 2017 • edited

Hakuyume commented Aug 20, 2017

yuyu2172 commented Aug 20, 2017

Hakuyume commented Aug 20, 2017

yuyu2172 commented Aug 20, 2017

Hakuyume commented Aug 21, 2017

yuyu2172 commented Aug 21, 2017

Hakuyume commented Aug 21, 2017

yuyu2172 commented Aug 21, 2017

Hakuyume commented Aug 21, 2017

yuyu2172 commented Aug 21, 2017

Hakuyume commented Aug 21, 2017

Hakuyume commented Aug 21, 2017

yuyu2172 commented Aug 21, 2017

Hakuyume commented Aug 21, 2017 • edited

yuyu2172 commented Aug 21, 2017 • edited

Hakuyume commented Aug 21, 2017

yuyu2172 commented Aug 21, 2017

yuyu2172 commented Sep 12, 2017

Hakuyume commented Sep 12, 2017

yuyu2172 commented Sep 12, 2017

Hakuyume commented Sep 12, 2017 • edited

Hakuyume commented Sep 12, 2017

yuyu2172 commented Sep 12, 2017

Hakuyume commented Sep 12, 2017

yuyu2172 commented Sep 14, 2017

yuyu2172 commented Sep 20, 2017

Hakuyume left a comment

Choose a reason for hiding this comment

yuyu2172 commented Oct 5, 2017

yuyu2172 commented Oct 5, 2017

Hakuyume commented Oct 5, 2017 • edited

yuyu2172 commented Oct 5, 2017

Hakuyume left a comment

Choose a reason for hiding this comment

yuyu2172 commented Aug 20, 2017 •

edited

Hakuyume commented Aug 21, 2017 •

edited

yuyu2172 commented Aug 21, 2017 •

edited

Hakuyume commented Sep 12, 2017 •

edited

Hakuyume commented Oct 5, 2017 •

edited