Prevent folder name conflicts in label() call #93

capjamesg · 2023-11-21T12:44:50Z

This PR appends a timestamp of when label() was called to the output folder label name if the provided folder name already exists and contains images.

This PR prevents the scenario where label() labels a dataset (which could be hundreds or thousands of images) then returns an error after labeling because the existing folder already contains an image with the same name as one in the newly-labeled dataset.

capjamesg · 2023-11-21T23:31:31Z

NB: This PR fixes #53.

yeldarby

Note: same comments made on detection_base_model likely also apply to classification_base_model

yeldarby · 2023-11-30T14:52:06Z

autodistill/core/base_model.py

+        if output_folder is None:
+            output_folder = input_folder + "_labeled"
+
+        os.makedirs(output_folder, exist_ok=True)


Logic should be:

If output_folder already exists

Check whether it was created with the same config (same base model, same ontology)

If the config is the same, continue

If the config is different, move the old folder to a backup location ({output_folder}-{old_timestamp})

To do this we need to store a hash of the ontology in the config file.

We should also rename this config.json to be .autodistill.json so it's not conflicting with other stuff & is clear where it came from/what it's used for.

To do the hashing we'll add a .hash() method to the Ontology base class that's just an md5 of the JSON of the ontology. Subclasses can override this to create their own definition of what it means for an ontology to be "different" from another.

yeldarby · 2023-11-30T15:00:42Z

autodistill/detection/detection_base_model.py

+
+            if not os.path.exists(annotation_path):
+                detections = self.predict(f_path)
+                detections_map[f_path_short] = detections


Save these to disk as we go vs to memory so that if you cancel or crash in the middle you don't lose all your progress

yeldarby · 2023-11-30T15:01:13Z

autodistill/detection/detection_base_model.py


        images_map = {}
        detections_map = {}

+        # if output_folder/autodistill.json exists
+        if os.path.exists(output_folder + "/data.yaml"):
+            dataset = sv.DetectionDataset.from_yolo(


Don't load all the images into memory here in case the dataset is huge; we just need the list of filenames

yeldarby · 2023-11-30T15:01:50Z

autodistill/detection/detection_base_model.py

        files = glob.glob(input_folder + "/*" + extension)
        progress_bar = tqdm(files, desc="Labeling images")
-        # iterate through images in input_folder
+


Save the data.yaml file first instead of at the end so it's there next run

yeldarby · 2023-11-30T15:02:02Z

autodistill/detection/detection_base_model.py

+        config["roboflow_tags"] = roboflow_tags
+        config["task"] = "detection"
+
+        with open(os.path.join(output_folder, "config.json"), "w+") as f:


Change filename to match

prevent folder name conflicts in label() call

54c43cb

capjamesg requested a review from yeldarby November 21, 2023 12:44

capjamesg self-assigned this Nov 21, 2023

capjamesg added 3 commits November 21, 2023 18:01

add intelligent detections loading and resumption

128bcd5

fix dataset splitting bug, add resume labeling logic

f230c09

add new config options

0183177

yeldarby reviewed Nov 30, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent folder name conflicts in label() call #93

Prevent folder name conflicts in label() call #93

capjamesg commented Nov 21, 2023

capjamesg commented Nov 21, 2023

yeldarby left a comment

yeldarby Nov 30, 2023

yeldarby Nov 30, 2023

yeldarby Nov 30, 2023

yeldarby Nov 30, 2023

yeldarby Nov 30, 2023

yeldarby Nov 30, 2023

Prevent folder name conflicts in label() call #93

Are you sure you want to change the base?

Prevent folder name conflicts in label() call #93

Conversation

capjamesg commented Nov 21, 2023

capjamesg commented Nov 21, 2023

yeldarby left a comment

Choose a reason for hiding this comment

yeldarby Nov 30, 2023

Choose a reason for hiding this comment

yeldarby Nov 30, 2023

Choose a reason for hiding this comment

yeldarby Nov 30, 2023

Choose a reason for hiding this comment

yeldarby Nov 30, 2023

Choose a reason for hiding this comment

yeldarby Nov 30, 2023

Choose a reason for hiding this comment

yeldarby Nov 30, 2023

Choose a reason for hiding this comment