# COCO Instance Segmentation

COCO is a large image dataset designed for object detection, segmentation, person keypoints detection, stuff segmentation, and caption generation. We will focus on instance segmentation, i.e. distinguishing each category, giving different labels for individual instances in the same type of objects. It can be regarded as delivering the tasks of object detection and semantic segmentation at the same time. We will be using GPU of Hyperion server of City, University of London. Data is pre-downloaded from https://www.kaggle.com/datasets/awsaf49/coco-2017-dataset to Hyperion under directory '/mnt/data/public/coco2017'. Data original source: https://cocodataset.org/

In [None]:
!pip install pycocotools
from pycocotools.coco import COCO
import cython
from torch.utils import data
import numpy as np
import skimage.io as io
import random
import os
import cv2
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import torch, torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor
from torchvision.ops.boxes import box_convert

### For visualizing the outputs ###
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
%matplotlib inline

device = torch.device('cpu')
if torch.cuda.is_available():
   device = torch.device('cuda')

print(device)

## Check Dataset File Structure

In [None]:
path = '/mnt/data/public/coco2017/coco2017'

files = os.listdir(path)

for f in files:
	print(f)

In [None]:
path_a = path + '/annotations'

files = os.listdir(path_a)

for f in files:
	print(f)

## Check Categories

In [None]:
dataDir='/mnt/data/public/coco2017/coco2017/annotations'
dataType='val'
annFile='{}/instances_{}2017.json'.format(dataDir,dataType)

# Initialize the COCO api for instance annotations
coco_val=COCO(annFile)

# Load the categories in a variable
catIDs = coco_val.getCatIds()
cats = coco_val.loadCats(catIDs)

print(cats)

In [None]:
annotations_file='/mnt/data/public/coco2017/coco2017/annotations/instances_train2017.json' #file of coco dataset annotations 

# Initialize the COCO api for instance annotations
coco_train=COCO(annotations_file)

# Load the categories in a variable
catIDs = coco_train.getCatIds()
cats = coco_train.loadCats(catIDs)

print(cats)

We can see that the category id of person is 1.

Let's see the number of images for each category.

In [None]:
for i in range(80):
    print('{} - category id: {}, count of training images: {}'.format(cats[i]['name'], cats[i]['id'], len(coco_train.getImgIds(catIds=[i]))))

# Mask R-CNN
The code for mask R-CNN uses torchvision model, and is largely based on official tutorial from pytorch.org on finetuning mask R-CNN, with customization to the server environment (i.e. its absence of direct access to Internet) and our use case.

Reference:
https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html and Lecture notes, INM705 DEEP LEARNING FOR IMAGE ANALYSIS - Lab5 by Dr Alex Ter-Sarkisov@City, University of London

From https://github.com/pytorch/vision.git, pre-download engine.py, utlis.py, transform.py (and they depend on coco_eval.py, coco_utlis.py, so they also need to be downloaded)

Let's see one of the images.

In [None]:
from PIL import Image
Image.open(path+'/test2017/000000112691.jpg')

# Finetuning from a pretrained model

In [None]:
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

# load a model pre-trained pre-trained on COCO; need to download weights manually beforehand as Hyperion is not connected to internet directly
# Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth"
# Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth"
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False, pretrained_backbone=False,)
pretrained_weights = torch.load('fasterrcnn_resnet50_fpn_coco-258fb6c6.pth', map_location='cpu')
# # copy only backbone weights
for _n, _par in model.state_dict().items():
     if 'backbone' in _n:
        _par.requires_grad = False
        _par.copy_(pretrained_weights[_n])
        _par.requires_grad = True

if device == torch.device('cuda'):
    model = model.to(device)

# replace the classifier with a new one, that has
# num_classes which is user-defined
num_classes = 2  # 1 class (person) + background
# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes) 

## Helper function

In [None]:
#import python modules downloaded from pytorch github https://github.com/pytorch/vision/tree/main/references/detection  
from engine import train_one_epoch, evaluate
import utils
import transforms as T


def get_transform(train):
    transforms = []
    # converts the image, a PIL image, into a PyTorch Tensor
    transforms.append(T.ToTensor())
    if train:
        # during training, randomly flip the training images
        # and ground-truth for data augmentation
        transforms.append(T.RandomHorizontalFlip(0.5))
    return T.Compose(transforms)

## Explore annotations

Lets's explore the annotations and data structure a bit before creating the dataset.

In [None]:
AnnIds = coco_train.getAnnIds(catIds=[1], areaRng=[], iscrowd=False)
anns_obj1 = coco_train.loadAnns(AnnIds)
print(anns_obj1[0])
anns_obj2 = [ann for ann in anns_obj1 if len(coco_train.annToMask(ann)) > 0]  #remove empty mask
print(anns_obj2[0])

In [None]:
len(anns_obj1)

In [None]:
len(anns_obj2)

It shows that it does not have empty mask for non-crowd person category. Let's see the annotations and image object.

In [None]:
print(coco_train.annToMask(anns_obj1[0]))

In [None]:
image_id = anns_obj1[0]['id']
img_obj = coco_train.loadImgs(image_id)[0]
print(img_obj['file_name'])

In [None]:
anns_obj1[0]['bbox'][3]

In [None]:
img_obj

# Define COCO dataset

While there are online examples of dataset codes, we need to customize to our needs by setting category to focus, excluding crowd for better data quality, and cleaning the data by removing those without mask annotations.

One note on the labels. The model considers class 0 as background. So we should avoid labeling any class as 0 if we don't have background. 

Let's focus on bird and giraffe.



Let's deine COCO dataset class for dataloader of the model.

In [None]:
#inspired and adapt from reference below, with customization and correction
#reference: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
           #https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/coco.py
           #https://stackoverflow.com/questions/68513782/use-ms-coco-format-as-input-to-pytorch-maskrcnn
           #Lecture notes, INM705 DEEP LEARNING FOR IMAGE ANALYSIS - Lab5 by Dr. Alex Ter-Sarkisov @ City, University of London

dataset_dir='/mnt/data/public/coco2017/coco2017'
focus_category = [16, 25]   #16: bird category; 25: giraffe

class CocoDataset(torch.utils.data.Dataset):
    def __init__(self, dataset_dir, subset='train', transforms=None, focus_category=focus_category, areaRng=[], iscrowd=False):
        ann_file = '{}/annotations/instances_{}2017.json'.format(dataset_dir, subset)
        self.imgs_dir = os.path.join(dataset_dir, subset+'2017')
        self.coco = COCO(ann_file)
        AnnIds = self.coco.getAnnIds(catIds=focus_category, areaRng=areaRng, iscrowd=iscrowd)
        self.anns_obj = self.coco.loadAnns(AnnIds)
        self.image_id_all = [ ann['image_id'] for ann in self.anns_obj ]
        self.image_id_all = np.unique(np.array(self.image_id_all, dtype=int))
        self.catlabel = {}
        for i in range(len(focus_category)):
            self.catlabel[focus_category[i]] = i +1   #note: lable 0 reserved for background category in the model
        self.transforms = transforms
        self.focus_category = focus_category
        self.areaRng = areaRng
        self.iscrowd = iscrowd
        

        
    def __len__(self):
        self.length = len(self.image_id_all)
        return self.length  

    def mabic(self, ann, masks, areas, boxes, image_id_instance, catIds):
        #print('ann type: {}; ann: {}'.format(type(ann),ann))
        #print('ann area: {}'.format(ann['area']))
        areas.append(ann['area'])
        masks.append(self.coco.annToMask(ann))
        #1e-7 solves float num rounding issue
        boxes.append([ann['bbox'][0], ann['bbox'][1], ann['bbox'][0]+ann['bbox'][2]+1e-7, ann['bbox'][1]+ann['bbox'][3]+1e-7]) 
        image_id_instance.append(ann['image_id'])
        catIds.append(ann['category_id'])
        return masks, areas, boxes, image_id_instance, catIds
    
    def __getitem__(self, idx):
        '''
        Args:
            idx: index of sample to be fed
        return:
            dict containing:
            - PIL Image of shape (H, W)
            - target (dict) containing: 
                - boxes:    FloatTensor[N, 4], N being the n° of instances and it's bounding 
                boxe coordinates in [x0, y0, x1, y1] format, ranging from 0 to W and 0 to H;
                - labels:   Int64Tensor[N], class label (0 is background);
                - image_id: Int64Tensor[1], unique id for each image;
                - area:     Tensor[N], area of bbox;
                - iscrowd:  UInt8Tensor[N], True or False;
                - masks:    UInt8Tensor[N, H, W], segmantation maps;
        '''
    
        #get image id and then get annotation object
        image_id = self.image_id_all[idx]
        AnnIds = self.coco.getAnnIds(imgIds = image_id, catIds=self.focus_category, areaRng=self.areaRng, iscrowd=self.iscrowd)
        anns = self.coco.loadAnns(AnnIds)  
 
        # fix old format issue: convert [xmin, ymin, width, height] in pycocotools to [xmin, ymin, xmax, ymax] format required by model
        boxes=[]
        masks=[]
        areas=[]
        image_id_instance=[]
        catIds=[]
        

        print('AnnIds: {}'.format(AnnIds))   
        if len(AnnIds) == 1:
            print('type of annotation object anns:{}'.format(type(anns)))
            print('------------------'+str(type(anns))+'-------------------')
            
        #if isinstance(AnnIds, int):
            #masks, areas, boxes, image_id_instance, catIds = self.mabic(anns, masks, areas, boxes, image_id_instance, catIds)
        #elif len(AnnIds) == 1:
            #masks, areas, boxes, image_id_instance, catIds = self.mabic(anns, masks, areas, boxes, image_id_instance, catIds)
        #else:
        for ann in anns: 
            masks, areas, boxes, image_id_instance, catIds = self.mabic(ann, masks, areas, boxes, image_id_instance, catIds)
        
 
        #get image for return
        if type(image_id) == np.int64:
            image_id = image_id.item()           #convert numpy integer to python native integer for loadImgs
        img_obj = self.coco.loadImgs(image_id)[0]
        img = Image.open(os.path.join(self.imgs_dir, img_obj['file_name']))
        if self.transforms is not None:
            img = self.transforms(img)
        
        #note: label 0 is reserved for background in the model
        print('catlabel: {}; catIds: {}'.format(self.catlabel, catIds))
        labels = [ self.catlabel[i] for i in catIds]
        
        labels = torch.as_tensor(labels, dtype=torch.int64)   
        iscrowd = torch.zeros(len(anns), dtype=torch.int64) #because we have excluded crowd
        boxes = torch.as_tensor(boxes, dtype=torch.float32).reshape(-1, 4)  #empty box causes error, reshape from 0 to (0,4) for empty box
        masks = torch.as_tensor(np.array(masks), dtype=torch.uint8)
        image_id_instance = torch.tensor(image_id_instance)
        area = torch.as_tensor(areas)

        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        target["masks"] = masks
        target["image_id"] = image_id_instance
        target["area"] = area
        target["iscrowd"] = iscrowd

        return img, target


In [None]:
anns_obj = COCO(annFile).loadAnns(COCO(annFile).getAnnIds(COCO(annFile).getImgIds(536))) 

# Instance segmentation model

In [None]:
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor

      
def get_instance_segmentation_model(num_classes):
    # load an instance segmentation model pre-trained on COCO
    model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=False,  pretrained_backbone=False,)
    pretrained_weights = torch.load('maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth', map_location='cpu')
    # # copy only backbone weights
    for _n, _par in model.state_dict().items():
         if 'backbone' in _n:
            _par.requires_grad = False
            _par.copy_(pretrained_weights[_n])
            _par.requires_grad = True

    if device == torch.device('cuda'):
        model = model.to(device)
    
    
    # get the number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # replace the pre-trained head with a new one
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

    # now get the number of input features for the mask classifier
    in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
    hidden_layer = 256
    # and replace the mask predictor with a new one
    model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask,
                                                       hidden_layer,
                                                       num_classes)

    return model

## Putting everything together

In [None]:
# use our dataset and defined transformations
import torchvision.transforms as transforms
dataset = CocoDataset('/mnt/data/public/coco2017/coco2017', 'train', transforms.Compose([transforms.ToTensor()]))
dataset_test = CocoDataset('/mnt/data/public/coco2017/coco2017', 'val', transforms.Compose([transforms.ToTensor()]))

# split the dataset in train and test set
torch.manual_seed(1)
indices = torch.randperm(len(dataset)).tolist()
dataset = torch.utils.data.Subset(dataset, indices[:-50])
dataset_test = torch.utils.data.Subset(dataset_test, indices[-50:])

# define training and validation data loaders
data_loader = torch.utils.data.DataLoader(
    dataset, batch_size=2, shuffle=True, num_workers=4,
    collate_fn=utils.collate_fn)

data_loader_test = torch.utils.data.DataLoader(
    dataset_test, batch_size=1, shuffle=False, num_workers=4,
    collate_fn=utils.collate_fn)

## Instantiate the model and optimizer

In [None]:
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

# our dataset has two classes only - background and person
num_classes = 2

# get the model using our helper function
model = get_instance_segmentation_model(num_classes)
# move model to the right device
model.to(device)

# construct an optimizer
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005,
                            momentum=0.9, weight_decay=0.0005)

# and a learning rate scheduler which decreases the learning rate by
# 10x every 3 epochs
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                               step_size=3,
                                               gamma=0.1)

## Training

In [None]:
# let's train it for 2 epochs
from torch.optim.lr_scheduler import StepLR
num_epochs = 2

for epoch in range(num_epochs):
    # train for one epoch, printing every 1000 iterations
    train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
    # update the learning rate
    lr_scheduler.step()
    # evaluate on the test dataset
    evaluate(model, data_loader_test, device=device)

## Prediction

In [None]:
# pick one image from the test set
img, _ = dataset_test[0]
# put the model in evaluation mode
model.eval()
with torch.no_grad():
    prediction = model([img.to(device)])
    
prediction

Convert the image, which has been rescaled to 0-1 and had the channels flipped so that we have it in [C, H, W] format.

In [None]:
Image.fromarray(img.mul(255).permute(1, 2, 0).byte().numpy())

Let's now visualize the top predicted segmentation mask. The masks are predicted as [N, 1, H, W], where N is the number of predictions, and are probability maps between 0-1.

In [None]:
Image.fromarray(prediction[0]['masks'][0, 0].mul(255).byte().cpu().numpy())

# debug...

In [None]:
anns = [{'segmentation': [[443.87, 249.81, 438.71, 267.35, 438.71, 284.9, 434.58, 290.06, 425.29, 285.94, 432.52, 261.16, 428.39, 260.13, 426.32, 272.52, 424.26, 279.74, 417.03, 279.74, 421.16, 260.13, 409.81, 260.13, 397.42, 275.61, 396.39, 292.13, 385.03, 313.81, 378.84, 328.26, 374.71, 349.94, 376.77, 360.26, 373.68, 372.65, 359.23, 367.48, 361.29, 349.94, 361.29, 328.26, 365.42, 314.84, 370.58, 299.35, 375.74, 289.03, 349.94, 290.06, 344.77, 308.65, 340.65, 327.23, 340.65, 347.87, 336.52, 355.1, 329.29, 356.13, 330.32, 344.77, 332.39, 335.48, 332.39, 320.0, 330.32, 300.39, 334.45, 282.84, 318.97, 253.94, 313.81, 233.29, 313.81, 211.61, 308.65, 188.9, 307.61, 174.45, 284.9, 172.39, 306.58, 158.97, 322.06, 151.74, 339.61, 145.55, 357.16, 150.71, 367.48, 160.0, 371.61, 161.03, 381.94, 152.77, 395.35, 144.52, 409.81, 138.32, 423.23, 134.19, 436.65, 130.06, 451.1, 131.1, 461.42, 137.29, 464.52, 147.61, 464.52, 157.94, 466.58, 168.26, 466.58, 184.77, 463.48, 209.55, 457.29, 220.9, 452.13, 231.23, 445.94, 242.58]], 'area': 21895.6492, 'iscrowd': 0, 'image_id': 99026, 'bbox': [284.9, 130.06, 181.68, 242.59], 'category_id': 20, 'id': 63673}, {'segmentation': [[105.87, 193.33, 178.84, 178.94, 235.37, 176.89, 272.38, 164.55, 267.24, 154.27, 266.21, 138.86, 255.93, 130.63, 285.74, 122.41, 300.13, 115.22, 327.88, 113.16, 339.19, 125.5, 352.55, 137.83, 358.72, 154.27, 339.19, 150.16, 284.71, 172.78, 306.3, 175.86, 315.55, 228.28, 307.32, 243.7, 300.13, 268.36, 289.85, 292.01, 277.52, 319.76, 269.29, 333.12, 268.27, 360.87, 271.35, 375.26, 268.27, 380.4, 260.04, 376.29, 254.9, 371.15, 253.88, 350.59, 259.01, 332.09, 260.04, 313.59, 257.99, 296.12, 245.65, 301.26, 243.6, 314.62, 236.4, 332.09, 236.4, 347.51, 237.43, 358.81, 233.32, 361.9, 227.15, 358.81, 230.24, 342.37, 227.15, 318.73, 200.43, 329.01, 179.87, 336.2, 164.45, 336.2, 156.23, 339.29, 148.01, 357.79, 139.79, 371.15, 124.37, 391.71, 113.06, 415.35, 115.12, 425.62, 96.62, 420.48, 96.62, 377.32, 99.7, 363.95, 76.06, 353.68, 41.11, 327.98, 24.67, 295.09, 23.64, 265.28, 30.84, 242.67, 48.31, 220.06, 84.28, 205.67, 109.98, 195.39]], 'area': 46748.43474999999, 'iscrowd': 0, 'image_id': 99026, 'bbox': [23.64, 113.16, 335.08, 312.46], 'category_id': 20, 'id': 64255}, {'segmentation': [[20.65, 157.94, 131.1, 146.58, 155.87, 134.19, 190.97, 134.19, 222.97, 141.42, 230.19, 145.55, 215.74, 153.81, 230.19, 175.48, 139.35, 184.77, 99.1, 192.0, 108.39, 195.1, 47.48, 222.97, 40.26, 229.16, 26.84, 217.81]], 'area': 10007.109449999998, 'iscrowd': 0, 'image_id': 99026, 'bbox': [20.65, 134.19, 209.54, 94.97], 'category_id': 20, 'id': 66125}, {'segmentation': [[115.61, 132.13, 126.97, 101.16, 166.19, 99.1, 213.68, 116.65], [164.13, 136.26, 242.58, 118.71, 265.29, 115.61, 271.48, 122.84, 263.23, 132.13, 266.32, 144.52, 267.35, 152.77, 260.13, 161.03, 252.9, 155.87, 247.74, 165.16, 231.23, 172.39, 216.77, 152.77, 228.13, 144.52]], 'area': 4435.479550000002, 'iscrowd': 0, 'image_id': 99026, 'bbox': [115.61, 99.1, 155.87, 73.29], 'category_id': 20, 'id': 66283}, {'segmentation': [[513.73, 75.34, 535.71, 73.45, 541.99, 77.22, 543.88, 88.53, 539.48, 94.18, 536.97, 111.14, 540.74, 125.58, 541.99, 149.45, 536.34, 172.06, 525.66, 195.29, 522.52, 217.27, 523.78, 225.44, 523.78, 232.97, 518.76, 233.6, 514.36, 227.95, 515.62, 217.27, 516.87, 205.34, 515.62, 202.2, 507.45, 219.16, 501.17, 234.23, 495.52, 245.54, 491.75, 255.58, 488.61, 263.12, 486.1, 273.17, 483.59, 280.71, 477.93, 283.85, 472.91, 281.96, 471.03, 278.19, 476.05, 265.0, 480.45, 254.33, 480.45, 242.4, 484.21, 227.95, 487.35, 221.67, 484.84, 218.53, 488.61, 205.34, 486.1, 202.2, 477.93, 199.69, 474.17, 198.43, 457.84, 229.21, 454.07, 254.33, 455.32, 264.38, 451.56, 268.77, 448.42, 268.77, 443.39, 263.12, 446.53, 242.4, 459.72, 208.48, 467.89, 165.15, 460.35, 134.37, 474.17, 119.3, 496.15, 109.88, 502.43, 105.48, 506.82, 90.41, 503.68, 87.27, 496.15, 84.13, 504.31, 80.36, 509.96, 77.85, 513.73, 75.34]], 'area': 9881.042499999998, 'iscrowd': 0, 'image_id': 99026, 'bbox': [443.39, 73.45, 100.49, 210.4], 'category_id': 20, 'id': 66737}, {'segmentation': [[203.19, 36.86, 203.4, 32.91, 201.61, 26.17, 201.69, 16.6, 207.89, 16.72, 207.11, 27.33, 206.26, 32.28, 208.05, 34.38, 208.05, 36.81], [212.85, 35.74, 213.48, 26.58, 211.27, 16.37, 218.53, 16.26, 219.48, 23.42, 218.85, 26.16, 217.59, 36.06], [201.5, 13.83, 199.5, 4.46, 199.19, 0.0, 218.03, 0.0, 219.08, 4.57, 218.35, 13.41]], 'area': 470.5772, 'iscrowd': 0, 'image_id': 99026, 'bbox': [199.19, 0.0, 20.29, 36.86], 'category_id': 1, 'id': 208606}, {'segmentation': [[75.23, 47.14, 74.13, 40.92, 79.62, 38.36, 76.69, 25.92, 65.35, 23.72, 69.37, 48.96, 77.06, 47.5], [57.67, 18.24, 62.42, 45.31, 64.25, 43.48, 59.5, 19.7], [55.11, 26.28, 54.74, 41.65, 58.4, 44.21, 56.57, 46.04, 51.45, 47.5, 49.99, 50.06, 61.69, 47.87, 55.84, 26.65], [49.97, 12.36, 46.95, 4.28, 45.26, 4.62, 42.23, 8.66, 42.91, 12.02, 36.51, 7.99, 41.56, 0.0, 76.91, 0.58, 77.92, 10.34, 75.22, 15.73, 77.24, 20.44, 66.13, 20.44, 63.78, 12.7, 50.98, 14.04]], 'area': 881.2295999999999, 'iscrowd': 0, 'image_id': 99026, 'bbox': [36.51, 0.0, 43.11, 50.06], 'category_id': 1, 'id': 209655}, {'segmentation': [[545.45, 99.77, 565.38, 87.68, 586.02, 87.68, 603.1, 95.5, 615.9, 106.89, 620.17, 125.39, 620.89, 145.32, 616.62, 179.48, 598.83, 209.37, 594.56, 230.0, 590.29, 252.07, 579.61, 250.64, 592.42, 194.42, 581.75, 181.61, 573.92, 177.34, 575.34, 197.98, 572.49, 204.39, 567.51, 205.81, 566.8, 201.54, 568.22, 178.77, 551.86, 186.59, 541.89, 210.08, 532.64, 232.14, 522.68, 232.14, 525.53, 221.47, 532.64, 202.96, 536.2, 183.75, 536.2, 170.23, 541.18, 150.3, 544.03, 134.64, 539.05, 109.03]], 'area': 8651.279750000002, 'iscrowd': 0, 'image_id': 99026, 'bbox': [522.68, 87.68, 98.21, 164.39], 'category_id': 20, 'id': 275887}, {'segmentation': [[294.43, 46.68, 297.72, 33.51, 294.43, 18.14, 292.96, 5.7, 295.52, 0.58, 276.5, 0.58, 274.67, 6.07, 275.77, 14.85, 273.57, 35.33, 274.67, 40.82, 267.72, 45.21, 276.13, 48.5, 280.89, 49.6, 284.55, 28.38, 287.11, 29.12, 284.91, 38.63, 286.01, 50.33, 293.69, 49.97]], 'area': 958.73165, 'iscrowd': 0, 'image_id': 99026, 'bbox': [267.72, 0.58, 30.0, 49.75], 'category_id': 1, 'id': 1229705}, {'segmentation': [[66.42, 139.07, 30.56, 134.31, 17.63, 132.15, 14.68, 136.87, 22.89, 144.22, 36.28, 148.1, 46.56, 146.44], [147.94, 133.72, 44.58, 154.0, 111.37, 146.84, 131.24, 146.05, 148.74, 137.7]], 'area': 900.1238000000003, 'iscrowd': 0, 'image_id': 99026, 'bbox': [14.68, 132.15, 134.06, 21.85], 'category_id': 20, 'id': 1406046}, {'segmentation': [[241.13, 35.86, 241.13, 26.65, 241.13, 21.94, 241.13, 18.57, 240.9, 15.2, 240.45, 12.06, 240.45, 8.24, 239.56, 5.32, 236.19, 0.61, 258.19, 0.38, 259.09, 1.73, 258.87, 3.75, 256.62, 8.02, 255.27, 12.28, 255.27, 17.22, 256.17, 34.74, 248.76, 34.51, 249.66, 18.79, 247.86, 17.45, 245.84, 35.19]], 'area': 504.05975000000063, 'iscrowd': 0, 'image_id': 99026, 'bbox': [236.19, 0.38, 22.9, 35.48], 'category_id': 1, 'id': 1743445}]
for idx, ann in enumerate(anns):
    print(idx)
    print(ann)
    print(coco_train.annToMask(ann))

In [None]:
AnnIds = [63673, 64255, 66125, 66283, 66737, 208606, 209655, 275887, 1229705, 1406046, 1743445]
if isinstance(AnnIds, int):
    print('int')
elif len(AnnIds) == 1:
    print(1)
else:
    print('else')

In [None]:
boxes=[]
masks=[]
areas=[]
image_id_instance=[]
for ann in anns:
    print(type(ann['image_id']))
    masks.append(coco_train.annToMask(ann))
    areas.append(ann['area'])
    boxes.append([ann['bbox'][0], ann['bbox'][1], ann['bbox'][0]+ann['bbox'][2]+1e-7, ann['bbox'][1]+ann['bbox'][3]+1e-7]) #1e-7 solves float num rounding issue
    image_id_instance.append(ann['image_id'])

In [None]:
for ann in {'segmentation': [[241.13, 35.86, 241.13, 26.65, 241.13, 21.94, 241.13, 18.57, 240.9, 15.2, 240.45, 12.06, 240.45, 8.24, 239.56, 5.32, 236.19, 0.61, 258.19, 0.38, 259.09, 1.73, 258.87, 3.75, 256.62, 8.02, 255.27, 12.28, 255.27, 17.22, 256.17, 34.74, 248.76, 34.51, 249.66, 18.79, 247.86, 17.45, 245.84, 35.19]], 'area': 504.05975000000063, 'iscrowd': 0, 'image_id': 99026, 'bbox': [236.19, 0.38, 22.9, 35.48], 'category_id': 1, 'id': 1743445}:
    print(ann)
    coco_train.annToMask(ann)

In [None]:
coco_train.annToMask('segmentation')

In [None]:
coco_train.imgs[49]

In [None]:
last = {'segmentation': [[241.13, 35.86, 241.13, 26.65, 241.13, 21.94, 241.13, 18.57, 240.9, 15.2, 240.45, 12.06, 240.45, 8.24, 239.56, 5.32, 236.19, 0.61, 258.19, 0.38, 259.09, 1.73, 258.87, 3.75, 256.62, 8.02, 255.27, 12.28, 255.27, 17.22, 256.17, 34.74, 248.76, 34.51, 249.66, 18.79, 247.86, 17.45, 245.84, 35.19]], 'area': 504.05975000000063, 'iscrowd': 0, 'image_id': 99026, 'bbox': [236.19, 0.38, 22.9, 35.48], 'category_id': 1, 'id': 1743445}
last['image_id']

In [None]:
coco_train.imgs[99062]

In [None]:
type(coco_train.imgs)

In [None]:
type(last)

In [None]:
coco_train.info()

In [None]:
len([37909, 50503, 126550, 1898727])

In [None]:
anns = coco_train.loadAnns([37909, 50503, 126550, 1898727])
ann=anns[-1]
coco_train.annToMask(ann)