-
Notifications
You must be signed in to change notification settings - Fork 445
Add support for bounding boxes and instance masks to VectorDataset #2819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@microsoft-github-policy-service agree |
@adamjstewart Thoughts on adding an "output_type" arg to VectorDataset? My only concern is computing mask/instance mask/boxes for each polygon can add unwanted overhead in the case that I just want boxes, or I just want the mask. |
I'm not completely opposed. In fact, this is one of my implementation questions in #2505. I guess I would be curious to know exactly how much overhead we are talking about. As long as the overhead is noticeable (10% slower) and the change is minimal (a few lines of code, plus a new parameter in all subclasses) then let's do it. Only downside of a parameter is that the return type becomes even more dynamic, and it's difficult to statically type check. Note that Mask R-CNN (our only instance segmentation model) requires both boxes and masks. |
Yeah I was thinking output types would be [semantic, instance, boxes]. My concern is that instance masks have the ability to take up a ton of memory.
If num_objects is large (which is common for small object detection) this can take up quite a bit of memory in my experience, so I imagine we shouldn't return all output types and only the one the user wants. |
I wasn't thinking about the difference between semantic and instance masks, that's a good point. Yeah, we should probably have a knob to control this then. |
Should the Something like class VectorDataset(GeoDataset):
...
def __init def __init__(
self,
paths: Path | Iterable[Path] = 'data',
crs: CRS | None = None,
res: float | tuple[float, float] = (0.0001, 0.0001),
transforms: Callable[[dict[str, Any]], dict[str, Any]] | None = None,
masks: bool = True,
boxes: bool = False,
instances: bool = False,
label_name: str | None = None,
gpkg_layer: str | int | None = None,
) -> None:
...
self.masks = masks
self.boxes = boxes
self.instances = instances
...
def __getitem__(self, query: BoundingBox) -> dict[str, Any]:
# Create things depending on self.masks, self.boxes and self.instances
...
sample = {
'crs': self.crs,
'bounds': query,
}
if self.masks:
sample['mask'] = masks
# etc... maybe? |
I think it should depend on the flags specified although I don't think we should have a flag, it should be a list of output types that a user can specify. For OBB I prefer the polygon approach like (x1, y1, ...,x4, y4) instead of using rotation angles just because there's several definitions of the rotation angle and keeping it in polygon format makes easy conversion to<->from shapely/geopandas. |
The sample keys aren't up to us, they need to follow the format expected by Kornia. Note that Kornia does not yet support OBB, so we need to add this to Kornia first. |
For the output types, would
They can be thought as polygons, so they need to be converted into binary masks. Getting them from polygons with shapely is easy ( |
See https://github.com/kornia/kornia/blob/v0.8.1/kornia/constants.py#L142 for a list of valid keys. Our TorchGeo object detection trainer is setup to look for bbox_xyxy and label, and our instance segmentation trainer is setup to look for bbox_xyxy, label, and mask. So we should use these keys in the return dict. I'm leaning towards: task: Literal['object_detection', 'semantic_segmentation', 'instance_segmentation'] = 'semantic_segmentation'` for the |
They are long but it's likely better to be explicit here tbh. In the future we can add support for multiple output types but let's keep it simple for now. A user can always make separate datasets with different output types and join them. |
Added tasks, examples here: https://gist.github.com/mayrajeo/596c43c0a0c815e6fc09c1f8e20136cd |
Related to #2505 , a proposal to add
bbox_xyxy
,segmentation
andlabel
as the return values forVectorDataset.__getitem__
. After this,VectorDataset.__getitem__
returnsMajor downside: as the object detection related things are returned by default,
VectorDataset
can't be used with most of the commoncollate_fn
s, such asstack_samples
if the batch size is larger than 1.Example gist: https://gist.github.com/mayrajeo/1de0497dd82d2c9a2b6381ef482face8