Skip to content
This repository has been archived by the owner on Jul 2, 2021. It is now read-only.

Add Cityscapes semantic segmentation dataset #392

Merged
merged 18 commits into from
Aug 19, 2017

Conversation

mitmul
Copy link
Member

@mitmul mitmul commented Aug 15, 2017

This PR adds a dataset class for Cityscapes dataset. It replaces this PR: #230

@mitmul mitmul mentioned this pull request Aug 15, 2017
3 tasks
@yuyu2172 yuyu2172 self-assigned this Aug 16, 2017

self.label_fns, self.img_fns = [], []
resol = os.path.basename(label_dir)
for dname in glob.glob('{}/*'.format(label_dir)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using os.path.join(label_dir, '*') instead?
This will work even if there is / at the end of img_dir.

for dname in glob.glob('{}/*'.format(label_dir)):
if split in dname:
for label_fn in glob.glob(
'{}/*/*_labelIds.png'.format(dname)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.


"""

def __init__(self, img_dir, label_dir, split='train', ignore_labels=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about setting a default directory names?
if img_dir is None,
how about setting the img_dir as CHAINER_DATASET_ROOT/pfnet/chainercv/cityscapes/leftImg8bit?
Same for label_dir.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but for label_dir, we can't assume which label users use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By having a default directory value, users do not need to specify the directory path once they set up their files properly. I think that this feature is very useful.

How about changing the options to data_dir and label_mode.
(i.e. def __init__(self, data_dir=None, label_mode=None, split='train', ignore_labels=True):)
The data_dir would be the path to the root dir whose default value is CHAINER_DATASET_ROOT/pfnet/chainercv/cityscapes.
Below the root directory, we expect at least two folders (leftImg8bit and a label directory that is going to be used).
The label_mode should raise an error when unspecified. It should be either fine or coarse.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll reflect your suggestion and add a test for it.

for label_fn in glob.glob(
'{}/*/*_labelIds.png'.format(dname)):
self.label_fns.append(label_fn)
for label_fn in self.label_fns:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#161 (comment)

fn is not used in ChainerCV.
However, I think that filenames is too long. An alternative would be fnames.

@Hakuyume

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry for the same mistake.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK to use fnames?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer filenames to fnames. It is not so long.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filenames is OK, but label_filenames and img_filenames are bit too long.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about label_paths and img_paths?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between *_filenames and *_paths is unclear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*_paths is shorter than *_filenames and _fnames. If the problem of *_filenames is its length, *_paths will be a good solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.
I think changing all *_filenames to *_paths is good. Leaving both is bad.

I will update other datasets accordingly.

@mitmul
Can you use *_paths?

self.label_fns[i], dtype=np.int32, color=False)[0]
H, W = label_orig.shape
if self.ignore_labels:
label_out = np.ones((H, W), dtype=np.int32) * -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can optimize this part by

  1. not initializing label_out.
  2. for loop only labels in ignore lists

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When make the loop only over the ignore id list, how can I replace the label ids which are not marked as an ignoreInEval with trainId?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh. I see. My bad.
It looks OK.
Is it OK to remove line 79?
Also, np.where is not necessary.

img_dir = os.path.join(img_dir, split)
self.ignore_labels = ignore_labels

self.label_fns, self.img_fns = [], []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using two lines and use list.
self.label_fnames = list()
self.img_fnames = list()

Copy link
Member

@Hakuyume Hakuyume Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a check of []/list() to our coding style checker?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


img_dir = os.path.join(self.temp_dir, 'leftImg8bit')
label_dir = os.path.join(self.temp_dir, 'gtFine')
if self.split == 'test':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary.

for i in range(10):
img = np.random.randint(0, 255, size=(128, 160, 3))
img = Image.fromarray(img.astype(np.uint8))
img.save(os.path.join(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use write_image.
#382



@testing.parameterize(
{'split': 'train'},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test ignore_labels (True, False).


.. note::

Please download the data by yourself because Cityscapes dataset doesn't
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this.

Please manually downalod the data because it is not allowed to re-distribute Cityscapes dataset.

@yuyu2172
Copy link
Member

Thank you for a great PR.
One thing about the code block in ChainerCV.
We have been using

:obj: `XXX`

instead of

``XX``

This is because I initially thought that the red highlight emphasizes a word too much.
Could you follow this convention for consistency?

@yuyu2172
Copy link
Member

Also, could you add this to the docs?
docs/source/reference/dataset.rst

self.label_fns, self.img_fns = [], []
resol = os.path.basename(label_dir)
for dname in glob.glob('{}/*'.format(label_dir)):
if split in dname:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work for Coarse dataset as well?
From the Github page, it seems that there is train_extra split.
https://github.com/mcordts/cityscapesScripts

I have not yet looked at the dataset by myself, so please tell me if I am wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I used if split in dname at L:59

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add that to the doc and test?

@mitmul
Copy link
Member Author

mitmul commented Aug 18, 2017

Fixed. Replaced *_fnames with *_paths.

data_dir (string): Path to the dataset directory. The directory should
contain at least two directories, :obj:`leftImg8bit` and either
:obj:`gtFine` or :obj:`gtCoarse`. If :obj:`None` is given, it uses
:obj:`$CHAINER_DATSET_ROOT/pfnet/chainercv/cityscapes` as default.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by default.

'argment.')
elif label_mode != 'fine' and label_mode != 'coarse':
raise ValueError('\'label_name\' argment should be eighter '
'\'fine\' or \'coarse\'. But {} was '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is OK to simplify this. (delete line 45 and line 46)

if label_mode not in ['fine', 'coarse']:
    raise ValueError('\'label_name\' argment should be eighter '
                               '\'fine\' or \'coarse\'.')

for dname in glob.glob(os.path.join(label_dir, '*')):
if split in dname:
for city_dname in glob.glob(os.path.join(dname, '*')):
for label_fname in glob.glob(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fname -> path

os.path.join(city_dname, '*_labelIds.png')):
self.label_paths.append(label_fname)
city_dnames.append(os.path.basename(city_dname))
for city_dname, label_fname in zip(city_dnames, self.label_paths):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

city_dnames.append(os.path.basename(city_dname))
for city_dname, label_fname in zip(city_dnames, self.label_paths):
label_fname = os.path.basename(label_fname)
img_fname = label_fname.replace(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@mitmul
Copy link
Member Author

mitmul commented Aug 18, 2017

@yuyu2172 Thanks, fixed.


img_dir = os.path.join(data_dir, os.path.join('leftImg8bit', split))
resol = 'gtFine' if label_mode == 'fine' else 'gtCoarse'
label_dir = os.path.join(data_dir, resol)
Copy link
Member

@yuyu2172 yuyu2172 Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can easily anticipate users to instantiate this object expecting that ChainerCV would download the dataset.
This happens with CityscapesSemanticSegmentationDataset() without properly setting files under the default locations.
We should give a clear guidance to the users on where they should prepare the datasets when the error occurs.
This can be done in the similar way to ResNet in Chainer. https://github.com/chainer/chainer/blob/master/chainer/links/model/vision/resnet.py#L695

Perhaps raise an error here when either of the two necessary directories do not exist?

The error message can be something like this.

'Cityscapes dataset does not exist at the expected location.'
'Please download it from https://www.cityscapes-dataset.com/.'
'Then place directory leftImg8bit at {} and {} at {}.'.format(
    img_dir, resol, label_dir)

contain at least two directories, :obj:`leftImg8bit` and either
:obj:`gtFine` or :obj:`gtCoarse`. If :obj:`None` is given, it uses
:obj:`$CHAINER_DATSET_ROOT/pfnet/chainercv/cityscapes` by default.
label_mode (string): The resolution of the labels. It should be either
Copy link
Member

@yuyu2172 yuyu2172 Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string -> {'fine', 'coarse'}

defined in the original
`cityscapesScripts<https://github.com/mcordts/cityscapesScripts>_`
will be replaced with :obj:`-1` in the :meth:`get_example` method.
The default value is :obj:`True`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgetting a period.

@yuyu2172
Copy link
Member

Great. Few more comments.

@yuyu2172 yuyu2172 added this to the v0.7 milestone Aug 18, 2017
@mitmul
Copy link
Member Author

mitmul commented Aug 18, 2017

Thanks for the comments. Updated it.

contain at least two directories, :obj:`leftImg8bit` and either
:obj:`gtFine` or :obj:`gtCoarse`. If :obj:`None` is given, it uses
:obj:`$CHAINER_DATSET_ROOT/pfnet/chainercv/cityscapes` by default.
label_mode ({'fine', 'coarse'}): The resolution of the labels. It
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

label_mode -> label_resol

I think this is more specific and better. Also this is consistent with the variable name used inside the code.
Sorry for the extra work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think resol is difficult to understand. How about using use_fine_label, which takes a boolean? If you want to support the case of three resolutions, I think label_resolution is better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you feedback.
+1 for label_resolution.

Copy link
Member Author

@mitmul mitmul Aug 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using type?

type the type/modality of data, e.g. gtFine for fine ground truth, or leftImg8bit for left 8-bit images.

from https://github.com/mcordts/cityscapesScripts/blob/8815d4643222abc0f0a41614745ffa1637734335/README.md#dataset-structure

So, I feel label_type is appropriate.

Copy link
Member

@yuyu2172 yuyu2172 Aug 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

label_resolution=coarse is more explicit than use_fine_label=False, and it makes the code more readable.
It is not obvious that the opposite of use_fine_label is use_coarse_label.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

label_level is not intuitive to me.

label_type can be confused as dtype.
It seems that label_resolution is fine. The length does not bother me much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, label_resolution works for me.

Copy link
Member

@yuyu2172 yuyu2172 Aug 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with label_type is that it is not specific to labels in the original code.
It accepts leftImg8bit.

This is nonsense. Sorry...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, just keeping the consistency of naming from the official scripts is better I think.

This rule may break the consistency in ChainerCV.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. Well, label_resolution is OK for you?


"""

def __init__(self, data_dir=None, label_mode=None, split='train',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

label_mode -> label_resol

data_dir = download.get_dataset_directory(
'pfnet/chainercv/cityscapes')
if label_mode not in ['fine', 'coarse']:
raise ValueError('\'label_mode\' argment should be eighter '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

label_mode -> label_resol

img = read_image(self.img_paths[i])
label_orig = read_image(
self.label_paths[i], dtype=np.int32, color=False)[0]
H, W = label_orig.shape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can delete this and use np.ones(label_orgi.shape, dtype=np.int32) * -1 instead for line 102.

label_out = np.ones((H, W), dtype=np.int32) * -1
for label in cityscapes_labels:
if not label.ignoreInEval:
label_out[np.where(label_orig == label.id)] = label.trainId
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove np.where.

@yuyu2172
Copy link
Member

label_resol seems better. Sorry for the extra work. label_mode was my idea...

@mitmul
Copy link
Member Author

mitmul commented Aug 19, 2017

@yuyu2172 Thanks for the good catches. I fixed them.

contain at least two directories, :obj:`leftImg8bit` and either
:obj:`gtFine` or :obj:`gtCoarse`. If :obj:`None` is given, it uses
:obj:`$CHAINER_DATSET_ROOT/pfnet/chainercv/cityscapes` by default.
label_resolutionution ({'fine', 'coarse'}): The resolution of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

@yuyu2172
Copy link
Member

LGTM!

@yuyu2172 yuyu2172 merged commit 2696e82 into chainer:master Aug 19, 2017
@mitmul mitmul deleted the add-cityscapes branch May 18, 2018 09:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants