New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add converter from Pascal VOC to Weed COCO #49
Conversation
def voc_to_coco( | ||
xml_dir: Path, image_dir: Path, category_mapping: Optional[Mapping[str, int]] = None | ||
): | ||
"""Convert VOC to MS COCO images and annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice documentation; would be good to update the existing ones to be similar.
obj = json.load(path) | ||
|
||
|
||
def main(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked this up because I didn't understand what the heck the main function is. Looks like I have some reading to do; this seems like something I need to adopt to help me fully transition to writing scripts vs interactively using notebooks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It basically serves two purposes:
- allows the user to use this file both as a library and as a script
- encapsulates the scope to avoid errors
) | ||
|
||
if use_default_category_mapping: | ||
# TODO: define role somehow?? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now role will have to be hard coded. Perhaps if we have a filled out agcontext yaml we can use that to populate the category field as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, needs some way to be specified. I wonder whether "name" should include species and role somehow...
I can now run the following: python conversion_tools/voc_to_json/voc_to_json.py --voc-dir /Volumes/research-data/PRJ-iweeds/Narrabri/artificial_illumination/wheat/20190728_Z16/VOC --image-dir /Volumes/research-data/PRJ-iweeds/Narrabri/artificial_illumination/wheat/20190728_Z16/images --agcontext /Volumes/research-data/PRJ-iweeds/agcontext-junk-example.yaml -o /tmp/coco_from_voc-test.json
cat /tmp/coco_from_voc-test.json | python conversion_tools/draft\ export\ to\ elastic.py --thumbnail-dir artificial | curl -X POST localhost:9200/_bulk -H 'Content-Type: application/json' --data-binary @- |
I like this; it seems like a pretty clean and relatively straightforward process. It helps me imagine more of what our pipeline for user provided data might look like. |
weedcoco/draft export to elastic.py
Outdated
@@ -31,6 +31,12 @@ | |||
id_lookup[key, obj["id"]] = obj | |||
|
|||
|
|||
for agcontext in coco["agcontexts"]: | |||
# massage for ES | |||
if agcontext.get("camera_fov") == "variable": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately it looks like we will need to stick with this "massaging" to make this work. Elasticsearch will not allow a field to have multiple potential types, and I think it would be a bit silly to make this into some sort of nested field array where we have something like:
camera_fov: {
is_fixed: False
fov: null
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with this massaging, as long as any slider over FOV that we have also allows us to retrieve NULLs.
weedcoco/importers/voc.py
Outdated
"id": annotation_id, | ||
"image_id": image_id, | ||
"category_id": category_mapping[voc_object.find("name").text], | ||
# TODO: handle truncated, difficult? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by truncated annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VOC allows an annotation to be tagged as "truncated". I think that means "only part of the object being labelled is visible in the image; it is cut off by the edge of the picture". I don't know what "difficult" is suppose to mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh I wonder if people would consider our rye grass "difficult"...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "difficult" here means "so difficult it's not worth demeriting a system for failing it"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weedcoco/draft export to elastic.py
Outdated
@@ -31,6 +31,12 @@ | |||
id_lookup[key, obj["id"]] = obj | |||
|
|||
|
|||
for agcontext in coco["agcontexts"]: | |||
# massage for ES | |||
if agcontext.get("camera_fov") == "variable": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with this massaging, as long as any slider over FOV that we have also allows us to retrieve NULLs.
weedcoco/importers/voc.py
Outdated
"id": annotation_id, | ||
"image_id": image_id, | ||
"category_id": category_mapping[voc_object.find("name").text], | ||
# TODO: handle truncated, difficult? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VOC allows an annotation to be tagged as "truncated". I think that means "only part of the object being labelled is visible in the image; it is cut off by the edge of the picture". I don't know what "difficult" is suppose to mean.
weedcoco/importers/voc.py
Outdated
"id": annotation_id, | ||
"image_id": image_id, | ||
"category_id": category_mapping[voc_object.find("name").text], | ||
# TODO: handle truncated, difficult? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing well-tested code from scratch takes quite a lot of work! This is now ready for review. |
Oh no! CI is using a recent beta version of black, which differs on how it formats weedcoco.utils! |
.github/workflows/pytest.yml
Outdated
@@ -26,10 +26,6 @@ jobs: | |||
- uses: actions/checkout@v2 | |||
- name: Install | |||
run: | | |||
pip install . pytest pytest-cov |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these changes should be undone by the present branch. Failed merge?
.github/workflows/validate_json.yml
Outdated
cd ../deepweeds_to_json | ||
curl -L https://github.com/AlexOlsen/DeepWeeds/archive/51e3fab.tar.gz | tar xzv | ||
mv DeepWeeds-51e3fab*/* . | ||
# fake some images: make them all be a pic from cwfid | ||
mkdir deepweeds_images_full | ||
for fname in $(cut -d, -f1 < labels.csv); do ln -s ../../cwfid_to_json/cwfid_images/001_image.png deepweeds_images_full/$fname; done | ||
python -m weedcoco.importers.deepweeds --labels-dir labels --image-dir deepweeds_images_full -o deepweeds_imageinfo.json | ||
- name: Validate WeedCOCO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also shouldn't have been removed in merge.
Codecov Report
@@ Coverage Diff @@
## master #49 +/- ##
===========================================
- Coverage 31.48% 12.08% -19.40%
===========================================
Files 6 7 +1
Lines 343 364 +21
===========================================
- Hits 108 44 -64
- Misses 235 320 +85
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
I'll reopen this. Trying to revert the merge, I pushed the wrong parent, which closed this PR irreversibly, it seems. |
A first draft. Not fully tested.Fixes #33