-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/sg 729 abstract cocodataset #777
Conversation
…work with different classes
src/super_gradients/training/datasets/detection_datasets/coco_format_detection.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
But I also wanted to leave my 5 cents here:
I think at some point we would need to make something DatasetReader class with logic to parse a dataset in whatever format into List of samples and use that to instantiate dataset classes. What I mean is this:
class COCODetectionDatasetReader:
def __init__(images_dir, annotation_file, with_crowd, ...):
...
def load_samples() -> List[DetectionSample]:
...
reader = COCODetectionDatasetReader(...)
samples = reader.load_samples()
dataset = DetectionDataset(samples, transforms)
So the idea is DetectionDataset
has zero knowledge about where the data is coming from. E.g instead of sub classing we use composition, which is more flexible:
reader = COCODetectionDatasetReader(...)
reader = PascalVOCDetectionDatasetReader(...)
reader = DetectionDatasetReader(...)
And DetectionSample
to provide all necessary information for a single image sample:
@dataclass
class DetectionSample
image_path: str
bboxes: np.ndarray # [N, 4] in XYXY format
labels: np.ndarray # [N]
is_crowd: np.ndarray # [N]
Note, let's say for some customer we have to have additional label attached to bbox. No problem:
@dataclass
class CustomerSpecificDetectionSample(DetectionSample):
additional_property: np.ndarray # [N]
class CustomerSpecificDetectionDatasetReader:
def load_samples() -> List[CustomerSpecificDetectionSample]:
...
I realize this is much of work and def. goes out of the scope of this PR.
So this is more to raise an awareness and suggest this for future sprints.
It sounds like a great idea, I guess that we could also make it so that load_samples would be a generator to provide an option not to cache labels. |
Abstracting Coco dataset.
Changes bewteen the coco and its abstraction: