Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add converter from Pascal VOC to Weed COCO #49

Closed
wants to merge 0 commits into from
Closed

Conversation

jnothman
Copy link
Contributor

@jnothman jnothman commented Sep 1, 2020

A first draft. Not fully tested.

Fixes #33

def voc_to_coco(
xml_dir: Path, image_dir: Path, category_mapping: Optional[Mapping[str, int]] = None
):
"""Convert VOC to MS COCO images and annotations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice documentation; would be good to update the existing ones to be similar.

obj = json.load(path)


def main():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked this up because I didn't understand what the heck the main function is. Looks like I have some reading to do; this seems like something I need to adopt to help me fully transition to writing scripts vs interactively using notebooks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It basically serves two purposes:

  • allows the user to use this file both as a library and as a script
  • encapsulates the scope to avoid errors

)

if use_default_category_mapping:
# TODO: define role somehow??
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now role will have to be hard coded. Perhaps if we have a filled out agcontext yaml we can use that to populate the category field as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, needs some way to be specified. I wonder whether "name" should include species and role somehow...

@jnothman
Copy link
Contributor Author

jnothman commented Sep 3, 2020

I can now run the following:

python conversion_tools/voc_to_json/voc_to_json.py --voc-dir /Volumes/research-data/PRJ-iweeds/Narrabri/artificial_illumination/wheat/20190728_Z16/VOC --image-dir /Volumes/research-data/PRJ-iweeds/Narrabri/artificial_illumination/wheat/20190728_Z16/images --agcontext /Volumes/research-data/PRJ-iweeds/agcontext-junk-example.yaml  -o /tmp/coco_from_voc-test.json
cat /tmp/coco_from_voc-test.json | python conversion_tools/draft\ export\ to\ elastic.py --thumbnail-dir artificial | curl -X POST localhost:9200/_bulk  -H 'Content-Type: application/json' --data-binary @-

@hlydecker
Copy link
Contributor

I can now run the following:

python conversion_tools/voc_to_json/voc_to_json.py --voc-dir /Volumes/research-data/PRJ-iweeds/Narrabri/artificial_illumination/wheat/20190728_Z16/VOC --image-dir /Volumes/research-data/PRJ-iweeds/Narrabri/artificial_illumination/wheat/20190728_Z16/images --agcontext /Volumes/research-data/PRJ-iweeds/agcontext-junk-example.yaml  -o /tmp/coco_from_voc-test.json
cat /tmp/coco_from_voc-test.json | python conversion_tools/draft\ export\ to\ elastic.py --thumbnail-dir artificial | curl -X POST localhost:9200/_bulk  -H 'Content-Type: application/json' --data-binary @-

I like this; it seems like a pretty clean and relatively straightforward process. It helps me imagine more of what our pipeline for user provided data might look like.

@@ -31,6 +31,12 @@
id_lookup[key, obj["id"]] = obj


for agcontext in coco["agcontexts"]:
# massage for ES
if agcontext.get("camera_fov") == "variable":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it looks like we will need to stick with this "massaging" to make this work. Elasticsearch will not allow a field to have multiple potential types, and I think it would be a bit silly to make this into some sort of nested field array where we have something like:

camera_fov: {
 is_fixed: False
 fov: null
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with this massaging, as long as any slider over FOV that we have also allows us to retrieve NULLs.

"id": annotation_id,
"image_id": image_id,
"category_id": category_mapping[voc_object.find("name").text],
# TODO: handle truncated, difficult?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by truncated annotation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VOC allows an annotation to be tagged as "truncated". I think that means "only part of the object being labelled is visible in the image; it is cut off by the edge of the picture". I don't know what "difficult" is suppose to mean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh I wonder if people would consider our rye grass "difficult"...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "difficult" here means "so difficult it's not worth demeriting a system for failing it"

Copy link
Contributor Author

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not quite finished this PR, btw. I'll mark it as draft. I felt #82 and #84 would support it well.

@@ -31,6 +31,12 @@
id_lookup[key, obj["id"]] = obj


for agcontext in coco["agcontexts"]:
# massage for ES
if agcontext.get("camera_fov") == "variable":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with this massaging, as long as any slider over FOV that we have also allows us to retrieve NULLs.

"id": annotation_id,
"image_id": image_id,
"category_id": category_mapping[voc_object.find("name").text],
# TODO: handle truncated, difficult?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VOC allows an annotation to be tagged as "truncated". I think that means "only part of the object being labelled is visible in the image; it is cut off by the edge of the picture". I don't know what "difficult" is suppose to mean.

"id": annotation_id,
"image_id": image_id,
"category_id": category_mapping[voc_object.find("name").text],
# TODO: handle truncated, difficult?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman
Copy link
Contributor Author

Writing well-tested code from scratch takes quite a lot of work! This is now ready for review.

@jnothman
Copy link
Contributor Author

Oh no! CI is using a recent beta version of black, which differs on how it formats weedcoco.utils!

@@ -26,10 +26,6 @@ jobs:
- uses: actions/checkout@v2
- name: Install
run: |
pip install . pytest pytest-cov
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these changes should be undone by the present branch. Failed merge?

cd ../deepweeds_to_json
curl -L https://github.com/AlexOlsen/DeepWeeds/archive/51e3fab.tar.gz | tar xzv
mv DeepWeeds-51e3fab*/* .
# fake some images: make them all be a pic from cwfid
mkdir deepweeds_images_full
for fname in $(cut -d, -f1 < labels.csv); do ln -s ../../cwfid_to_json/cwfid_images/001_image.png deepweeds_images_full/$fname; done
python -m weedcoco.importers.deepweeds --labels-dir labels --image-dir deepweeds_images_full -o deepweeds_imageinfo.json
- name: Validate WeedCOCO
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also shouldn't have been removed in merge.

@codecov
Copy link

codecov bot commented Oct 8, 2020

Codecov Report

Merging #49 into master will decrease coverage by 19.39%.
The diff coverage is 31.42%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master      #49       +/-   ##
===========================================
- Coverage   31.48%   12.08%   -19.40%     
===========================================
  Files           6        7        +1     
  Lines         343      364       +21     
===========================================
- Hits          108       44       -64     
- Misses        235      320       +85     
Flag Coverage Δ
#weedcoco 12.08% <31.42%> (-19.40%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
weedcoco/importers/cwfid.py 0.00% <0.00%> (ø)
weedcoco/importers/deepweeds.py 0.00% <ø> (ø)
weedcoco/importers/voc.py 0.00% <0.00%> (ø)
weedcoco/utils.py 0.00% <0.00%> (ø)
weedcoco/validation.py 0.00% <0.00%> (-80.83%) ⬇️
weedcoco/tests/importers/test_voc.py 97.77% <97.77%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 113df9e...b6cb484. Read the comment docs.

@jnothman
Copy link
Contributor Author

jnothman commented Oct 8, 2020

I'll reopen this. Trying to revert the merge, I pushed the wrong parent, which closed this PR irreversibly, it seems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Converters for Pascal VOC datasets into Weed-COCO
2 participants