-
Notifications
You must be signed in to change notification settings - Fork 306
Add Cityscapes semantic segmentation dataset #392
Conversation
|
||
self.label_fns, self.img_fns = [], [] | ||
resol = os.path.basename(label_dir) | ||
for dname in glob.glob('{}/*'.format(label_dir)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using os.path.join(label_dir, '*')
instead?
This will work even if there is /
at the end of img_dir
.
for dname in glob.glob('{}/*'.format(label_dir)): | ||
if split in dname: | ||
for label_fn in glob.glob( | ||
'{}/*/*_labelIds.png'.format(dname)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
|
||
""" | ||
|
||
def __init__(self, img_dir, label_dir, split='train', ignore_labels=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about setting a default directory names?
if img_dir
is None,
how about setting the img_dir as CHAINER_DATASET_ROOT/pfnet/chainercv/cityscapes/leftImg8bit
?
Same for label_dir
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but for label_dir, we can't assume which label users use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By having a default directory value, users do not need to specify the directory path once they set up their files properly. I think that this feature is very useful.
How about changing the options to data_dir
and label_mode
.
(i.e. def __init__(self, data_dir=None, label_mode=None, split='train', ignore_labels=True):
)
The data_dir
would be the path to the root dir whose default value is CHAINER_DATASET_ROOT/pfnet/chainercv/cityscapes
.
Below the root directory, we expect at least two folders (leftImg8bit
and a label directory that is going to be used).
The label_mode
should raise an error when unspecified. It should be either fine
or coarse
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll reflect your suggestion and add a test for it.
for label_fn in glob.glob( | ||
'{}/*/*_labelIds.png'.format(dname)): | ||
self.label_fns.append(label_fn) | ||
for label_fn in self.label_fns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn
is not used in ChainerCV.
However, I think that filenames
is too long. An alternative would be fnames
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, sorry for the same mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it OK to use fnames?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer filenames
to fnames
. It is not so long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filenames
is OK, but label_filenames
and img_filenames
are bit too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about label_paths
and img_paths
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference between *_filenames
and *_paths
is unclear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*_paths
is shorter than *_filenames
and _fnames
. If the problem of *_filenames
is its length, *_paths
will be a good solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
I think changing all *_filenames
to *_paths
is good. Leaving both is bad.
I will update other datasets accordingly.
@mitmul
Can you use *_paths
?
self.label_fns[i], dtype=np.int32, color=False)[0] | ||
H, W = label_orig.shape | ||
if self.ignore_labels: | ||
label_out = np.ones((H, W), dtype=np.int32) * -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can optimize this part by
- not initializing
label_out
. - for loop only labels in ignore lists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When make the loop only over the ignore id list, how can I replace the label ids which are not marked as an ignoreInEval
with trainId
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh. I see. My bad.
It looks OK.
Is it OK to remove line 79?
Also, np.where is not necessary.
img_dir = os.path.join(img_dir, split) | ||
self.ignore_labels = ignore_labels | ||
|
||
self.label_fns, self.img_fns = [], [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using two lines and use list
.
self.label_fnames = list()
self.img_fnames = list()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a check of []/list()
to our coding style checker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
img_dir = os.path.join(self.temp_dir, 'leftImg8bit') | ||
label_dir = os.path.join(self.temp_dir, 'gtFine') | ||
if self.split == 'test': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary.
for i in range(10): | ||
img = np.random.randint(0, 255, size=(128, 160, 3)) | ||
img = Image.fromarray(img.astype(np.uint8)) | ||
img.save(os.path.join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use write_image
.
#382
|
||
|
||
@testing.parameterize( | ||
{'split': 'train'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please test ignore_labels
(True, False).
|
||
.. note:: | ||
|
||
Please download the data by yourself because Cityscapes dataset doesn't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this.
Please manually downalod the data because it is not allowed to re-distribute Cityscapes dataset.
Thank you for a great PR.
instead of
This is because I initially thought that the red highlight emphasizes a word too much. |
Also, could you add this to the docs? |
self.label_fns, self.img_fns = [], [] | ||
resol = os.path.basename(label_dir) | ||
for dname in glob.glob('{}/*'.format(label_dir)): | ||
if split in dname: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work for Coarse dataset as well?
From the Github page, it seems that there is train_extra split.
https://github.com/mcordts/cityscapesScripts
I have not yet looked at the dataset by myself, so please tell me if I am wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I used if split in dname
at L:59
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add that to the doc and test?
Fixed. Replaced *_fnames with *_paths. |
data_dir (string): Path to the dataset directory. The directory should | ||
contain at least two directories, :obj:`leftImg8bit` and either | ||
:obj:`gtFine` or :obj:`gtCoarse`. If :obj:`None` is given, it uses | ||
:obj:`$CHAINER_DATSET_ROOT/pfnet/chainercv/cityscapes` as default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by default.
'argment.') | ||
elif label_mode != 'fine' and label_mode != 'coarse': | ||
raise ValueError('\'label_name\' argment should be eighter ' | ||
'\'fine\' or \'coarse\'. But {} was ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is OK to simplify this. (delete line 45 and line 46)
if label_mode not in ['fine', 'coarse']:
raise ValueError('\'label_name\' argment should be eighter '
'\'fine\' or \'coarse\'.')
for dname in glob.glob(os.path.join(label_dir, '*')): | ||
if split in dname: | ||
for city_dname in glob.glob(os.path.join(dname, '*')): | ||
for label_fname in glob.glob( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fname -> path
os.path.join(city_dname, '*_labelIds.png')): | ||
self.label_paths.append(label_fname) | ||
city_dnames.append(os.path.basename(city_dname)) | ||
for city_dname, label_fname in zip(city_dnames, self.label_paths): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
city_dnames.append(os.path.basename(city_dname)) | ||
for city_dname, label_fname in zip(city_dnames, self.label_paths): | ||
label_fname = os.path.basename(label_fname) | ||
img_fname = label_fname.replace( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@yuyu2172 Thanks, fixed. |
|
||
img_dir = os.path.join(data_dir, os.path.join('leftImg8bit', split)) | ||
resol = 'gtFine' if label_mode == 'fine' else 'gtCoarse' | ||
label_dir = os.path.join(data_dir, resol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can easily anticipate users to instantiate this object expecting that ChainerCV would download the dataset.
This happens with CityscapesSemanticSegmentationDataset()
without properly setting files under the default locations.
We should give a clear guidance to the users on where they should prepare the datasets when the error occurs.
This can be done in the similar way to ResNet in Chainer. https://github.com/chainer/chainer/blob/master/chainer/links/model/vision/resnet.py#L695
Perhaps raise an error here when either of the two necessary directories do not exist?
The error message can be something like this.
'Cityscapes dataset does not exist at the expected location.'
'Please download it from https://www.cityscapes-dataset.com/.'
'Then place directory leftImg8bit at {} and {} at {}.'.format(
img_dir, resol, label_dir)
contain at least two directories, :obj:`leftImg8bit` and either | ||
:obj:`gtFine` or :obj:`gtCoarse`. If :obj:`None` is given, it uses | ||
:obj:`$CHAINER_DATSET_ROOT/pfnet/chainercv/cityscapes` by default. | ||
label_mode (string): The resolution of the labels. It should be either |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string -> {'fine', 'coarse'}
defined in the original | ||
`cityscapesScripts<https://github.com/mcordts/cityscapesScripts>_` | ||
will be replaced with :obj:`-1` in the :meth:`get_example` method. | ||
The default value is :obj:`True` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgetting a period.
Great. Few more comments. |
Thanks for the comments. Updated it. |
contain at least two directories, :obj:`leftImg8bit` and either | ||
:obj:`gtFine` or :obj:`gtCoarse`. If :obj:`None` is given, it uses | ||
:obj:`$CHAINER_DATSET_ROOT/pfnet/chainercv/cityscapes` by default. | ||
label_mode ({'fine', 'coarse'}): The resolution of the labels. It |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
label_mode -> label_resol
I think this is more specific and better. Also this is consistent with the variable name used inside the code.
Sorry for the extra work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think resol
is difficult to understand. How about using use_fine_label
, which takes a boolean? If you want to support the case of three resolutions, I think label_resolution
is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for you feedback.
+1 for label_resolution
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using type
?
type
the type/modality of data, e.g. gtFine for fine ground truth, or leftImg8bit for left 8-bit images.
So, I feel label_type
is appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
label_resolution=coarse
is more explicit than use_fine_label=False
, and it makes the code more readable.
It is not obvious that the opposite of use_fine_label
is use_coarse_label
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
label_level
is not intuitive to me.
label_type
can be confused as dtype.
It seems that label_resolution
is fine. The length does not bother me much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, label_resolution
works for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with label_type
is that it is not specific to labels in the original code.
It accepts leftImg8bit.
This is nonsense. Sorry...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, just keeping the consistency of naming from the official scripts is better I think.
This rule may break the consistency in ChainerCV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. Well, label_resolution
is OK for you?
|
||
""" | ||
|
||
def __init__(self, data_dir=None, label_mode=None, split='train', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
label_mode -> label_resol
data_dir = download.get_dataset_directory( | ||
'pfnet/chainercv/cityscapes') | ||
if label_mode not in ['fine', 'coarse']: | ||
raise ValueError('\'label_mode\' argment should be eighter ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
label_mode -> label_resol
img = read_image(self.img_paths[i]) | ||
label_orig = read_image( | ||
self.label_paths[i], dtype=np.int32, color=False)[0] | ||
H, W = label_orig.shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can delete this and use np.ones(label_orgi.shape, dtype=np.int32) * -1
instead for line 102.
label_out = np.ones((H, W), dtype=np.int32) * -1 | ||
for label in cityscapes_labels: | ||
if not label.ignoreInEval: | ||
label_out[np.where(label_orig == label.id)] = label.trainId |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove np.where
.
|
@yuyu2172 Thanks for the good catches. I fixed them. |
contain at least two directories, :obj:`leftImg8bit` and either | ||
:obj:`gtFine` or :obj:`gtCoarse`. If :obj:`None` is given, it uses | ||
:obj:`$CHAINER_DATSET_ROOT/pfnet/chainercv/cityscapes` by default. | ||
label_resolutionution ({'fine', 'coarse'}): The resolution of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
LGTM! |
This PR adds a dataset class for Cityscapes dataset. It replaces this PR: #230