Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation mask convertor #117

Merged
merged 18 commits into from
Jan 25, 2021
Merged

Segmentation mask convertor #117

merged 18 commits into from
Jan 25, 2021

Conversation

hlydecker
Copy link
Contributor

WIP. Some questions need answering.

  • Currently only works with hard coded categories, for single category images. Will need to adapt it to link categories to masks by colour codes.

  • Currently is missing license and info objects.

Elevn Li and others added 4 commits October 9, 2020 13:26
- added some more documentation and TODOs
- changes "contours" to "segmentation" to fit within COCO terminology
@codecov
Copy link

codecov bot commented Oct 9, 2020

Codecov Report

Merging #117 into master will decrease coverage by 5.28%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #117      +/-   ##
==========================================
- Coverage   51.15%   45.86%   -5.29%     
==========================================
  Files           8        9       +1     
  Lines         477      532      +55     
==========================================
  Hits          244      244              
- Misses        233      288      +55     
Flag Coverage Δ
#weedcoco 45.86% <0.00%> (-5.29%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
weedcoco/importers/mask.py 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d57ff5...6eed412. Read the comment docs.

@hlydecker
Copy link
Contributor Author

hlydecker commented Oct 9, 2020

@jnothman this has been built based on working with the ginger images/masks. These are binary (black = nothing, white = plant), and there are usually one or two main objects with a bunch of super tiny ones, created as a result of the masking/annotation process. How should we deal with these? Subtract the tiny polygons, or potentially subsume them with the main ones?

@hlydecker hlydecker linked an issue Oct 9, 2020 that may be closed by this pull request
@hlydecker hlydecker added this to the October TCG milestone Oct 9, 2020
@hlydecker hlydecker added this to To Do in Current priorities via automation Oct 9, 2020
@jnothman
Copy link
Contributor

jnothman commented Oct 9, 2020

this has been built based on working with the carrots images/masks

Do you mean ginger?
Why not add a test image and test mask to the repo, and design a test case?

These are binary (black = nothing, white = plant),

I had imagined a config file color-category-map.yml:

FFFFFF: "weed: UNSPECIFIED"

or

FF0000: "weed: lolium perenne"
00FF00: "weed: rapistrum rugosum"
0000FF: "weed: sonchus oleraceus"

Subtract the tiny polygons, or potentially subsume them with the main ones?

I don't think so. We should be authentic to the input. On this, it's not our job to be opinionated.

COCO assumes that there are multiple (or, more precisely, one or more) polygons. #90 codifies this in the schema: https://github.com/Sydney-Informatics-Hub/Weed-ID-Interchange/blob/6ef3a168215627c039b74224a73f2782a98a4b63/weedcoco/schema/Annotation.yaml#L33-L42.

Note that an alternative representation is as a mask (a 2d binary array) encoded with RLE and special encoding that only seems to be handled by COCO API (https://github.com/cocodataset/cocoapi/blob/8c9bcc3cf640524c4c20a9c40e89cb6a2f2fa0e9/common/maskApi.c#L204-L231).

Turning the image into a mask, based on known annotation colours, and then using pycocotools, may provide more straightforward solutions than thresholding and opencv, which is designed more for photography than discrete masking.

@hlydecker
Copy link
Contributor Author

I have no idea why I wrote carrots; yes I meant ginger!!!

Copy link
Contributor

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think coco_from_mask.json should be included in the repo. Rather:

  • there should be at least one pytest test case checking that the converter works;
  • we might add a script search/scripts/index_rds_images.sh which is given the path to the RDS root, and converts and loads data from there.



def generate_masks_contours(mask_path):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This blank line violates PEP257. I'm surprised black lets it through


image_id = 0
for filename in os.listdir(image_dir):
if filename.endswith(".png") or filename.endswith(".jpg"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be an else clause that warns or raises an error if the file type is unexpected

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jpeg and tif might also be possible extensions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A neat shorthand

Suggested change
if filename.endswith(".png") or filename.endswith(".jpg"):
if filename.endswith((".png", ".jpg", ".jpeg", ".tif", ".tiff")):

weedcoco/importers/mask.py Outdated Show resolved Hide resolved
### crop_type ###
# description: 'Crop type.

# One of several strings describing the crop grown in the image.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these comments aren't needed in a test file. You can do sed -E 's/^$|^#/d' on the file to get only the content lines

Similar in concept to the origina convertor. Bootstrapped from a blog post with changes. Still not completely functional, but parts of it will run and seem to behave how we want.
@hlydecker
Copy link
Contributor Author

Major changes are afoot. Using this blog post as a template, to redesign this convertor to work with colour category mapping. In some ways this is reinventing aspects of the opencv contour generator, but it may be a better direction for our use case.

weedcoco/tests/importers/mask_data/category_name_chobbitty Outdated Show resolved Hide resolved
weedcoco/importers/masks2.py Outdated Show resolved Hide resolved
weedcoco/importers/masks2.py Outdated Show resolved Hide resolved
@jnothman
Copy link
Contributor

I'm not yet happy with the sufficiency of the tests, but this is otherwise ready for review.

@jnothman jnothman marked this pull request as ready for review January 11, 2021 11:55
@jnothman
Copy link
Contributor

Maybe I should open a new PR so that Henry can review. @hlydecker would you like to and are you available to do so?

@hlydecker
Copy link
Contributor Author

Happy to review this sometime this afternoon!

Copy link
Contributor Author

@hlydecker hlydecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good and the existing tests are sensible enough. Most of my comments are just some suggestions for making warnings and errors more clear to potential users.

I do wonder what sort of other tests could be included. test_basic is indeed basic but it does test the basic functionality!

"image_id": len(images),
"category_id": cat_idx,
"segmentation": rle,
# "is_crowd": 0, # TODO: how should we define this?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using RLE, we should probably set is_crowd: 1. From my understanding, RLE is really made for situations where we have a a field of several objects of the same category but we aren't annotating each individual one. So in terms of our data, we aren't annotating individual plants; instead we are imply annotating any visible stuff that falls within that category, which could potentially be multiple plants.

weedcoco/importers/mask.py Outdated Show resolved Hide resolved
weedcoco/importers/mask.py Outdated Show resolved Hide resolved
weedcoco/importers/mask.py Outdated Show resolved Hide resolved
@hlydecker
Copy link
Contributor Author

Ah of course now I realise the awkwardness here; it makes sense that I cannot be a reviewer for my own pull request even if the actual content is not my progeny.

jnothman and others added 4 commits January 20, 2021 18:01
Co-authored-by: Henry Lydecker <henry.lydecker@gmail.com>
Co-authored-by: Henry Lydecker <henry.lydecker@gmail.com>
Co-authored-by: Henry Lydecker <henry.lydecker@gmail.com>
@jnothman
Copy link
Contributor

Want to check out the changes since last review, @hlydecker and gimme a tick if possible?

@hlydecker
Copy link
Contributor Author

Will take a look this afternoon!

Copy link
Contributor Author

@hlydecker hlydecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; not much was changed. Documentation changes have improved clarity.

The testing plan sounds good to me as well.

@@ -30,7 +30,7 @@ def generate_segmentations(mask_path, color_map, colors_not_found):
Yields
------
segmentation : str
COCO segmentation string
COCO segmentation string in compressed RLE format
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good addition to the documentation

# * check segmentation RLE string can be read back in and reproduces the mask
# * check handling of missing correspondence between mask and image files
# * check handling of different image file formats
# * check handling of agcontext
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be good. The agcontext ingestion + testing would probably make sense as something that is shared as a utility called by each individual converter.

@@ -163,7 +163,7 @@ def _image_name_to_mask(name):
warnings.warn(
f"{len(categories)} categories defined, but only "
f"{len(categories_found)} of these are present in masks. "
f"Missing are {missing_category_colors}"
f"These categories were not found: {missing_category_colors}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both this change and the one at 118 are great improvements in the clarity of the messages to users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both this change and the one at 118 are great improvements in the clarity of the messages to users.

They were both your explicit contributions! :D

@jnothman
Copy link
Contributor

Okay to merge as is, despite TODOs?

@hlydecker
Copy link
Contributor Author

I'd say so :)

@jnothman jnothman merged commit 63b8af5 into master Jan 25, 2021
Current priorities automation moved this from To Do to Done Jan 25, 2021
@jnothman jnothman deleted the segmentation-mask-convertor branch January 25, 2021 04:54
@jnothman
Copy link
Contributor

Thanks for the review @hlydecker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Convertor for dataset with images and segmentation mask
2 participants