pip install git+https://github.com/amikelive/coco-annotator.git#egg=COCOAnnotator

In [None]:
"""

Here is an example of how you could write such a Python script using the `via` library. This code assumes that your input data is in a directory called "input_data" and will save output files into another folder called "output_data". You can modify this as needed for your own use case.

1) First, install the via package if it's not already installed on your system by running pip install git+https://github.com/amikelive/coco-annotator.git#egg=COCOAnnotator in your terminal or command prompt.

2) Create two folders - one named 'input\_data' (or whatever name you prefer), which will contain all your original images along with their annotations in COCO format; and another folder named 'output\_data', where we'll store our finalized datasets after processing them through the script.

3) Inside the 'input\_data' folder, create subfolders according to each dataset category containing its respective image files and annotation jsons. For instance, let's say there are three categories - 'car', 'person', and 'traffic light'. Then inside the 'input\_data' folder, create these 3 additional folders too.

4) Now copy over some sample images and corresponding annotation jsons for each category into their respective folders within 'input\_data'. The filenames should match exactly between the image and annotation json so that they can be matched up during conversion process.

5) Next, open your favorite IDE or text editor and start writing your Python script. We're going to need several imports at the beginning like below:

"""

import os
import cv2
import numpy as np
​
# Importing required libraries for working with COCO formats
from pycocotools.coco import COCO
from pycocotools.mask import decode

# Importing required classes from the coco-annotator library
from core.annotation import AnnotationController
from utils.logger import get_logger
logger = get_logger()
​


# Define the paths to both the input and output directories:

root_dir = '/path/to/your/root/directory/'
input_folder = f'{root_dir}/input_data'
output_folder = f'{root_dir}/output_data'
​


# Let's define a function to recursively search through a given directory and return a list of all found.json files:

def find_files(directory):
  """Recursively finds all.json files under a given directory"""
  results = []
​
  for entry in os.scandir(directory):
    if entry.is_file():
      if entry.name.endswith(".json"):
          results.append(entry.path)
​
    elif entry.is_dir():
      results += find_files(entry.path)
​
  return results
​

# Use the above function to iterate through all.json files present in the input\_data folder and load them individually into memory as COCO objects:
# Find all.json files in the input_data folder
json_files = find_files(input_folder)
​
# Load each.json file as a COCO object
coco_objs = [COCO(f) for f in json_files]
​

# Define a mapping dictionary that maps the desired label names to their integer IDs used internally by Mask RCNN models:

# Mapping dict to map human readable labels to internal model ids
label_map = {
    'car': 0, 
    'person': 1,  
    'traffic light': 2  
}
​


# Finally, loop through every loaded COCO object and filter out only those instances whose "category" field matches any key in the label\_map defined earlier. Save filtered annotations back to individual.JSON files in the output\_data folder while also converting bounding box coordinates to normalized form (values between 0 and 1). Note that depending upon the complexity of your dataset, this step may take some time since we have to parse and reformat large amounts of data here:
# Loop through all loaded COCO objs and filter based on the provided label_map
for i, coco_obj in enumerate(coco_objs):
    
    # Get img id -> ann id mappings for current obj
    img_ids = coco_obj.getImgIds()
    cat_ids = coco_obj.getCatIds()

    # Iterate through all images in the current COCO obj
    for img_id in tqdm(img_ids):
    
        # Fetch all annotations associated with the current image
        anns = coco_obj.loadAnns(coco_obj.getAnnIds(imgIds=[img_id], catIds=cat_ids))
        
        # Filter out unwanted annotations 
        filtered_anns = [a for a in anns if a['category_id'] in label_map.keys()]
                    
        # Convert bbox coords to normalized values
        width, height = coco_obj.imgs[img_id]['width'], coco_obj.imgs[img_id]['height']
        converted_anns = [{k:(v*1.0 / max(width, height) if k == 'bbox' else v) for k, v in ann.items()} for ann in filtered_anns]
                
        # Write filtered annotations to disk in.JSON format
        filename = str(i)+ '_' +str(img_id) + '.json'
        filepath = os.path.join(output_folder, filename)
        with open(filepath,'w') as outfile: 
            json.dump({'categories': [{'id': int(label_map[c]), 'name': c} for c in label_map.keys()],
                        'annotations':converted_anns},
                      outfile)