Skip to content

Add support for bounding boxes and instance masks to VectorDataset #2819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mayrajeo
Copy link

@mayrajeo mayrajeo commented Jun 6, 2025

Related to #2505 , a proposal to add bbox_xyxy, segmentation and label as the return values for VectorDataset.__getitem__. After this, VectorDataset.__getitem__ returns

        sample = {
            'mask': masks, # As before
            'bbox_xyxy': boxes_xyxy, # N, 4 tensor containing the bounding boxes in xmin, ymin, xmax, ymax
            'segmentation': segmentations, # A list of N tensors containing the COCO format segmentations
            'crs': self.crs,
            'label': labels, # N tensor containing the int labels for each object
            'bounds': query,
        }

Major downside: as the object detection related things are returned by default, VectorDataset can't be used with most of the common collate_fns, such as stack_samples if the batch size is larger than 1.

Example gist: https://gist.github.com/mayrajeo/1de0497dd82d2c9a2b6381ef482face8

@github-actions github-actions bot added the datasets Geospatial or benchmark datasets label Jun 6, 2025
@mayrajeo
Copy link
Author

mayrajeo commented Jun 6, 2025

@microsoft-github-policy-service agree

@adamjstewart adamjstewart added this to the 0.8.0 milestone Jun 6, 2025
@isaaccorley
Copy link
Collaborator

@adamjstewart Thoughts on adding an "output_type" arg to VectorDataset? My only concern is computing mask/instance mask/boxes for each polygon can add unwanted overhead in the case that I just want boxes, or I just want the mask.

@adamjstewart
Copy link
Collaborator

I'm not completely opposed. In fact, this is one of my implementation questions in #2505.

I guess I would be curious to know exactly how much overhead we are talking about. As long as the overhead is noticeable (10% slower) and the change is minimal (a few lines of code, plus a new parameter in all subclasses) then let's do it. Only downside of a parameter is that the return type becomes even more dynamic, and it's difficult to statically type check.

Note that Mask R-CNN (our only instance segmentation model) requires both boxes and masks.

@isaaccorley
Copy link
Collaborator

isaaccorley commented Jun 6, 2025

Yeah I was thinking output types would be [semantic, instance, boxes].

My concern is that instance masks have the ability to take up a ton of memory.

  • Semantic returns a mask of shape (num_classes, h, w)

  • Boxes aren't memory intensive (num_objects, 4)

  • for Instance we are turning boxes, and instance masks of shape (num_objects, h, w)

If num_objects is large (which is common for small object detection) this can take up quite a bit of memory in my experience, so I imagine we shouldn't return all output types and only the one the user wants.

@adamjstewart
Copy link
Collaborator

I wasn't thinking about the difference between semantic and instance masks, that's a good point. Yeah, we should probably have a knob to control this then.

@mayrajeo
Copy link
Author

mayrajeo commented Jun 9, 2025

Should the sample keys be consistent so that masks is an empty list or a tensor if semantic segmentation masks are not desired, or depend on the flags specified? Maybe the latter, because it could be possible to add both oriented bounding boxes either as instance masks ([x1, y1, x2, y2, x3, y3, x4, y4]) or detectron2-style ([x_center, y_center, w, h, rotation], -180 <= rotation < 180) annotation format, or keypoints if needed.

Something like

class VectorDataset(GeoDataset):
    ...
    def __init    def __init__(
        self,
        paths: Path | Iterable[Path] = 'data',
        crs: CRS | None = None,
        res: float | tuple[float, float] = (0.0001, 0.0001),
        transforms: Callable[[dict[str, Any]], dict[str, Any]] | None = None,
        masks: bool = True,
        boxes: bool = False,
        instances: bool = False,
        label_name: str | None = None,
        gpkg_layer: str | int | None = None,
    ) -> None:
    ...
    self.masks = masks
    self.boxes = boxes
    self.instances = instances
    ...

    def __getitem__(self, query: BoundingBox) -> dict[str, Any]:
         # Create things depending on self.masks, self.boxes and self.instances
         ...
        sample = {
            'crs': self.crs,
            'bounds': query,
        }
        if self.masks:
            sample['mask'] = masks
       # etc...

maybe?

@isaaccorley
Copy link
Collaborator

I think it should depend on the flags specified although I don't think we should have a flag, it should be a list of output types that a user can specify.

For OBB I prefer the polygon approach like (x1, y1, ...,x4, y4) instead of using rotation angles just because there's several definitions of the rotation angle and keeping it in polygon format makes easy conversion to<->from shapely/geopandas.

@adamjstewart
Copy link
Collaborator

The sample keys aren't up to us, they need to follow the format expected by Kornia. Note that Kornia does not yet support OBB, so we need to add this to Kornia first.

@mayrajeo
Copy link
Author

For the output types, would output_type: List[str] = ['masks'] with possible keys being ['masks', 'boxes', 'instances'] be good? Though masks and instances are mutually exclusive, as kornia seems to expect masks key for both semantic and instance masks correct? Another option might be to specify task, which is one of segmentation, object_detection, instance_segmentation or something like that, since there isn't many cases where only instance masks are needed right?.

Note that Kornia does not yet support OBB, so we need to add this to Kornia first.

They can be thought as polygons, so they need to be converted into binary masks. Getting them from polygons with shapely is easy (oriented_envelope) but they can be omitted for now.

@adamjstewart
Copy link
Collaborator

See https://github.com/kornia/kornia/blob/v0.8.1/kornia/constants.py#L142 for a list of valid keys. Our TorchGeo object detection trainer is setup to look for bbox_xyxy and label, and our instance segmentation trainer is setup to look for bbox_xyxy, label, and mask. So we should use these keys in the return dict. I'm leaning towards:

task: Literal['object_detection', 'semantic_segmentation', 'instance_segmentation'] = 'semantic_segmentation'`

for the __init__ parameter and default, although those names are quite long.

@isaaccorley
Copy link
Collaborator

See kornia/kornia@v0.8.1/kornia/constants.py#L142 for a list of valid keys. Our TorchGeo object detection trainer is setup to look for bbox_xyxy and label, and our instance segmentation trainer is setup to look for bbox_xyxy, label, and mask. So we should use these keys in the return dict. I'm leaning towards:

task: Literal['object_detection', 'semantic_segmentation', 'instance_segmentation'] = 'semantic_segmentation'`

for the __init__ parameter and default, although those names are quite long.

They are long but it's likely better to be explicit here tbh. In the future we can add support for multiple output types but let's keep it simple for now. A user can always make separate datasets with different output types and join them.

@mayrajeo
Copy link
Author

Added tasks, examples here: https://gist.github.com/mayrajeo/596c43c0a0c815e6fc09c1f8e20136cd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants