# 🦠 Sartorius - Torch Mask R-CNN
### A self-contained, Torch Mask R-CNN implementation

Adapted from https://www.kaggle.com/julian3833/sartorius-starter-torch-mask-r-cnn-lb-0-202

Main differences to Julian's notebook: 
 - use 3 classes for model training
 - use different thresholds for each class
 - use IOUmAP score to select best model

### Changelog


| Version | Comments | Validation | LB |
| --- | --- | --- | --- |
|51| use CV2 for image processing, set random state in train_test_split | 0.275 | 0.278 |
|48| fix combine_masks mistake | 0.267 | 0.291 |
|46| revert cutoffs to V43 | 0.247 | 0.288 |
|45| update cutoffs | 0.242 | 0.281 |
|43| update cutoffs | 0.249 | 0.29 |
|42| BOX_DETECTIONS_PER_IMG = 540 (from Julians notebook) | 0.245 | 0.281 |
|40| BOX_DETECTIONS_PER_IMG = 450 | 0.245 | 0.28 |
|39| use different thresholds for each class | 0.242 | 0.279|
|37| use cell_type as class labels, use best validation epoch using IOU score | 0.241 | 0.274 |
|28| use cell_type as class labels, use best validation epoch | | 0.265 |
|26| same as V 16, select correct best model (best_epoch+1) | | 0.274 |
|16| with `MIN_SCORE=0.5`, use best validation epoch (19) | | 0.263 |
|11| 30 epochs, use best validation (17) | | 0.203 |
|5| 10 epochs, Adam optimizer | | 0.135 | 
|1| 8 epochs. With Scheduler. | | 0.197 | 

[Julian's](https://www.kaggle.com/julian3833/sartorius-starter-torch-mask-r-cnn-lb-0-202) log:

|| Version | Comments | LB |
|---|  --- | --- | --- |
||30| Version 18 with `MIN_SCORE=0.5`. Remove validation. | `0.273` |
||28| V27 but pick best epoch using mask-only validation loss. 18 epochs. | `0.205` |
||27| V18 + 7.5% validation (`PCT_IMAGES_VALIDATION`) w/best epoch for pred. Added `BOX_DETECTIONS_PER_IMG` and `MIN_SCORE` but not used yet. | `0.178` |
||24| 8 epochs. With Scheduler. | `0.195` |
||23| 8 epochs. Mask loss only. | `0.036` |
||22| 8 epochs. Normalize. (7 epochs = `0.189`) | `0.202`|
||19| 3 epochs size 25%. 3 epochs size 50%. 6 epochs full sized| `0.178` |
||18| 8 epochs. Full sized. Tidied-up code.|  `0.202` |
||15| 12 -> 15 epochs. Setup classification head with classes. Bugfix in `analyze_train_sample`|  `0.172` |
|| *14* | *12 epochs. Full sized* |`0.173` |
|| 8 | 12 epochs. Resize to (256, 256) |`0.057` |



## Imports

In [None]:
#无外部依赖关系（仅模型权重）
#我们只依赖Pytorch
#导入内部和外部所依赖的库，import后对应的方法会在代码中进行使用
import os
import random
import time
import collections
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import cv2
from sklearn.model_selection import train_test_split
from tqdm.notebook import tqdm

import torch
import torchvision
from torchvision.transforms import ToPILImage
from torchvision.transforms import functional as F
from torch.utils.data import Dataset, DataLoader
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor

In [None]:
from collections import Counter

In [None]:
# Fix randomness
 #torch中随机化的过程
def fix_all_seeds(seed):
    np.random.seed(seed)
    random.seed(seed)            # 框架运行时会生成随机粒子
    os.environ['PYTHONHASHSEED'] = str(seed)       #定义seed的路径
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)
        torch.backends.cudnn.deterministic = True
    
fix_all_seeds(2021)

## Configuration

In [None]:
# Reduced the train dataset to 5000 rows
TEST = False

if os.path.exists("../input/sartorius-cell-instance-segmentation"):     # 通过os方法中的路径判断函数判断文件路径是否存在
    # running on kaggle
    data_directory = '../input/sartorius-cell-instance-segmentation'     #定义了一个数据文件夹，数据存放在该路径之下
    DEVICE = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')  #定义了代码是使用cuda版本还是cpu进行推理
    BATCH_SIZE = 2   #通过训练两张图片，对两张图片的损失求均值，反向传播，可以让训练的损失更加平缓，训练的模型效果更好
    NUM_EPOCHS = 2   #设置训练轮次=1

elif 'google.colab' in str(get_ipython()):
    # running on CoLab
    from google.colab import drive
    drive.mount('/content/drive')
    data_directory = '/content/drive/MyDrive/input'
    DEVICE = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
    BATCH_SIZE = 1
    NUM_EPOCHS = 5
    
else:
    data_directory = 'input'
    DEVICE = torch.device('cpu')
    BATCH_SIZE = 2
    NUM_EPOCHS = 1
    TEST = True
#if elif和else是一个三选择判断，三个条件满足之一，运行之一，三者都是通过找data_directory的路径，定义框架推理模式，设置BATCH_SIZE和训练轮次

TRAIN_CSV = f"{data_directory}/train.csv"   #找到data_directory文件夹后，在该路径下会有train.csv文件，通过后边的路径可以找到训练文件所在路径
TRAIN_PATH = f"{data_directory}/train"    #原理同上，找到训练文件所在路径
TEST_PATH = f"{data_directory}/test"      #原理同上，找到测试文件所在路径

#设置图片的宽度和高度，对不同大小的图片在进入网络之前进行统一处理
WIDTH = 704
HEIGHT = 520

#对数据集进行图像增强的处理
resize_factor = 0.5 # 0.5  如果是ture的话，后边标的0.5相当于对现有的数据集进行长宽缩小一半的操作，混合到原有数据集共同参加训练，丰富数据样本的过程

# Normalize to resnet mean and std if True.
NORMALIZE = False    #标准化的一个过程，是一个参数设置为false
RESNET_MEAN = (0.485, 0.456, 0.406)
RESNET_STD = (0.229, 0.224, 0.225)     #标准差进行初始化

# No changes tried with the optimizer yet.
MOMENTUM = 0.9   #使用std随机初始化的优化器方式再加一个动量进去，动量是0.9
#随机初始化加动量的原因是：在下降过程中，训练的过程中会陷入到局部最优，并不能得到全局最优，加入动量是采用物理学的方法，在下降的过程带有一定的惯性，尽量实现全局最优
LEARNING_RATE = 0.001  #学习率，训练一次，进行梯度下降的过程，使用回归的损失函数*学习率*WEIGHT_DECAY参数，调整学习率的大小，通过斜率观察梯度的方向
WEIGHT_DECAY = 0.01

# Changes the confidence required for a pixel to be kept for a mask. 
# Only used 0.5 till now.
# MASK_THRESHOLD = 0.5
# MIN_SCORE = 0.5
# cell type specific thresholds
cell_type_dict = {"astro": 1, "cort": 2, "shsy5y": 3}     #细胞名称
mask_threshold_dict = {1: 0.6, 2: 0.8, 3:  0.7}
min_score_dict = {1: 0.6, 2: 0.8, 3: 0.7}     #对每一类阈值进行限制

# Use a StepLR scheduler if True. 
USE_SCHEDULER = False

PCT_IMAGES_VALIDATION = 0.075

BOX_DETECTIONS_PER_IMG = 540

## Utilities

In [None]:
# ref: https://www.kaggle.com/inversion/run-length-decoding-quick-start
def rle_decode(mask_rle, shape, color=1):
    '''
    mask_rle: run-length as string formated (start length)
    shape: (height, width, channels) of array to return
    color: color for the mask
    Returns numpy array (mask)

    '''
    s = mask_rle.split()

    starts = list(map(lambda x: int(x) - 1, s[0::2]))
    lengths = list(map(int, s[1::2]))
    ends = [x + y for x, y in zip(starts, lengths)]
    if len(shape)==3:
        img = np.zeros((shape[0] * shape[1], shape[2]), dtype=np.float32)
    else:
        img = np.zeros(shape[0] * shape[1], dtype=np.float32)
    for start, end in zip(starts, ends):
        img[start : end] = color

    return img.reshape(shape)   #对mask_rle, shape, color进行处理，返回img.reshape，对图片进行格式调整


def rle_encoding(x):
    dots = np.where(x.flatten() == 1)[0]
    run_lengths = []
    prev = -2
    for b in dots:
        if (b>prev+1): run_lengths.extend((b + 1, 0))
        run_lengths[-1] += 1
        prev = b
    return ' '.join(map(str, run_lengths))


def remove_overlapping_pixels(mask, other_masks):
    for other_mask in other_masks:
        if np.sum(np.logical_and(mask, other_mask)) > 0:
            mask[np.logical_and(mask, other_mask)] = 0
    return mask

def combine_masks(masks, mask_threshold):
    """
    combine masks into one image
    """
    maskimg = np.zeros((HEIGHT, WIDTH))
    # print(len(masks.shape), masks.shape)
    for m, mask in enumerate(masks,1):
        maskimg[mask>mask_threshold] = m
    return maskimg


def get_filtered_masks(pred):
    """
    filter masks using MIN_SCORE for mask and MAX_THRESHOLD for pixels
    """
    use_masks = []   
    for i, mask in enumerate(pred["masks"]):                                   

        # Filter-out low-scoring results. Not tried yet.
        scr = pred["scores"][i].cpu().item()
        label = pred["labels"][i].cpu().item()
        if scr > min_score_dict[label]:
            mask = mask.cpu().numpy().squeeze()
            # Keep only highly likely pixels
            binary_mask = mask > mask_threshold_dict[label]
            binary_mask = remove_overlapping_pixels(binary_mask, use_masks)
            use_masks.append(binary_mask)

    return use_masks


### Metric: mean of the precision values at each IoU threshold

Ref: https://www.kaggle.com/theoviel/competition-metric-map-iou

In [None]:
def apply_nms(orig_prediction, iou_thresh):
    """
    Applies non max supression and eliminates low score bounding boxes.

      Args:
        orig_prediction: the model output. A dictionary containing element scores and boxes.
        iou_thresh: Intersection over Union threshold. Every bbox prediction with an IoU greater than this value
                      gets deleted in NMS.

      Returns:
        final_prediction: Resulting prediction
    """

    # torchvision returns the indices of the bboxes to keep
    keep = torchvision.ops.nms(orig_prediction['boxes'], orig_prediction['scores'], iou_thresh)

    # Keep indices from nms
    final_prediction = orig_prediction
    final_prediction['boxes'] = final_prediction['boxes'][keep]
    final_prediction['scores'] = final_prediction['scores'][keep]
    final_prediction['labels'] = final_prediction['labels'][keep]

    return final_prediction

def IOU(box1, box2):

   #计算交点坐标
    xmin_inter = max(box1[0], box2[0])
    ymin_inter = max(box1[1], box2[1])
    xmax_inter = min(box1[2], box2[2])
    ymax_inter = min(box1[3], box2[3])

    #计算相交矩形的面积
    inter_area = max(0, xmax_inter - xmin_inter + 1) * max(0, ymax_inter - ymin_inter + 1) # FIXME why plus one?
 
    #计算boxes面积
    area1 = (box1[2] - box1[0] + 1) * (box1[3] - box1[1] + 1)
    area2 = (box2[2] - box2[0] + 1) * (box2[3] - box2[1] + 1)
 
    #计算IoU
    iou = inter_area / float(area1 + area2 - inter_area)
    assert iou >= 0
    return iou

def compute_AP(ground_truth, predictions, iou_thresh=0.5, n_classes=3,score_thred=0.5):
    
    """
    #计算所有类的平均精度。

    Args:
        ground_truth: 需要具有以下格式：[序列、帧、obj、[xmin、ymin、xmax、ymax]，标签、分数]
        predictions: 列出预测对象。需要具有以下格式：[序列、帧、obj、[xmin、ymin、xmax、ymax]，标签、分数]
        iou_thresh: 将预测与地面真相相比较被认为是正确的iou。
        n_classes: 现有类的数量
    Returns:
        Average precision for the specified threshold.
    """
   #初始化列表
    APs = []
    class_gt = []
    class_predictions = []

    # 每一类计算的AP
    for c in range(n_classes):
        # Find gt and predictions of the class
        for gt in ground_truth:
            if gt[4] == c:
                class_gt.append(gt)
        for predict in predictions:
            if predict[4] == c:
                class_predictions.append(predict)

        # 在每张图像中为bb创建带有零数组的dict
        gt_amount_bb = Counter([gt[1] for gt in class_gt])
        for key, val in gt_amount_bb.items():
            gt_amount_bb[key] = np.zeros(val)

        # 按分数对类预测进行排序
        class_predictions = sorted(class_predictions, key=lambda x: x[5], reverse=True)

        # 为 Positives 创建数组(True and False)
        TP = np.zeros(len(class_predictions))
        FP = np.zeros(len(class_predictions))
        # Number of true boxes
        truth = len(class_gt)

        # 初始化为变量
        epsilon = 1e-6

        # 在每张图片中重复预测，并与ground truths进行比较
        for predict_idx, prediction in enumerate(class_predictions):
            # Filter prediction image ground truths
            image_gt = [obj for obj in class_gt if obj[1] == prediction[1]]

            # 初始化为变量
            best_iou = -1
            best_gt_iou_idx = -1

            # 反复分析图像中的ground truths并计算出IoUs
            for gt_idx, gt in enumerate(image_gt):
                iou = IOU(prediction[3], gt[3])
                if iou > best_iou:
                    best_iou = iou
                    best_gt_iou_idx = gt_idx

            # 如果最佳IoU大于阈值，则已找到TP预测
            if best_iou > iou_thresh and best_gt_iou_idx > -1:
                # Check if gt box was already covered
                if  gt_amount_bb[prediction[1]][best_gt_iou_idx] == 0:
                    gt_amount_bb[prediction[1]][best_gt_iou_idx] = 1  # set as covered
                    TP[predict_idx] = 1  # Count as true positive
                else:
                    FP[predict_idx] = 1
            else:
                FP[predict_idx] = 1

        # 计算召回率和准确率
        TP_cumsum = np.cumsum(TP)
        FP_cumsum = np.cumsum(FP)
        recall = np.append([0], TP_cumsum / (truth + epsilon))
        precision = np.append([1], np.divide(TP_cumsum, (TP_cumsum + FP_cumsum + epsilon)))

        # 计算区域精度/召回率并添加到列表中
        APs.append(np.trapz(precision, recall))

    return sum(APs)/len(APs) # average of class precisions


def compute_mAP(ground_truth, predictions, n_classes):
    """
    Calls AP computation for different levels of IoUs, [0.5:.05:0.95].

    Args:
        ground_truth: list with ground-truth objects. Needs to have the following format: [sequence, frame, obj, [xmin, ymin, xmax, ymax], label, score]
        predictions: list with predictions objects. Needs to have the following format: [sequence, frame, obj, [xmin, ymin, xmax, ymax], label, score]
        n_classes: number of existent classes.

    Returns:
        mAp and list with APs for each IoU threshold.
    """
    iou_thresh=0.4
    score_thresh=0.8
    APs=[]
    
    # return mAP
    for iou_thresh in np.arange(0.5,0.95,0.05):
        APs.append(compute_AP(ground_truth, predictions, iou_thresh, n_classes,score_thresh))
    #APs=[compute_AP(ground_truth, predictions, iou_thresh, n_classes,score_thresh) for iou_thresh in np.arange(0.5,1.0,0.0.5)]
    
    #APs = compute_AP(ground_truth, predictions, iou_thresh, n_classes)
    return np.mean(APs), APs


In [None]:
"""
Author: Roman Solovyev, IPPM RAS
URL: https://github.com/ZFTurbo

Code based on: https://github.com/fizyr/keras-retinanet/blob/master/keras_retinanet/utils/eval.py
"""

import numpy as np
import pandas as pd
# try:
#     import pyximport
#     pyximport.install(setup_args={"include_dirs": np.get_include()}, reload_support=False)
#     from .compute_overlap import compute_overlap
# except:
#     print("Couldn't import fast version of function compute_overlap, will use slow one. Check cython intallation")
#     from .compute_overlap_slow import compute_overlap
def compute_overlap(boxes, query_boxes):
    """
    Args
        a: (N, 4) ndarray of float
        b: (K, 4) ndarray of float

    Returns
        overlaps: (N, K) ndarray of overlap between boxes and query_boxes
    """
    N = boxes.shape[0]
    K = query_boxes.shape[0]
    overlaps = np.zeros((N, K), dtype=np.float64)
    for k in range(K):
        box_area = (
            (query_boxes[k, 2] - query_boxes[k, 0]) *
            (query_boxes[k, 3] - query_boxes[k, 1])
        )
        for n in range(N):
            iw = (
                min(boxes[n, 2], query_boxes[k, 2]) -
                max(boxes[n, 0], query_boxes[k, 0])
            )
            if iw > 0:
                ih = (
                    min(boxes[n, 3], query_boxes[k, 3]) -
                    max(boxes[n, 1], query_boxes[k, 1])
                )
                if ih > 0:
                    ua = np.float64(
                        (boxes[n, 2] - boxes[n, 0]) *
                        (boxes[n, 3] - boxes[n, 1]) +
                        box_area - iw * ih
                    )
                    overlaps[n, k] = iw * ih / ua
    return overlaps

def get_real_annotations(table):
    res = dict()
    ids = table['ImageID'].values.astype(np.str)
    labels = table['LabelName'].values.astype(np.str)
    xmin = table['XMin'].values.astype(np.float32)
    xmax = table['XMax'].values.astype(np.float32)
    ymin = table['YMin'].values.astype(np.float32)
    ymax = table['YMax'].values.astype(np.float32)

    for i in range(len(ids)):
        id = ids[i]
        label = labels[i]
        if id not in res:
            res[id] = dict()
        if label not in res[id]:
            res[id][label] = []
        box = [xmin[i], ymin[i], xmax[i], ymax[i]]
        res[id][label].append(box)

    return res


def get_detections(table):
    res = dict()
    ids = table['ImageID'].values.astype(np.str)
    labels = table['LabelName'].values.astype(np.str)
    scores = table['Conf'].values.astype(np.float32)
    xmin = table['XMin'].values.astype(np.float32)
    xmax = table['XMax'].values.astype(np.float32)
    ymin = table['YMin'].values.astype(np.float32)
    ymax = table['YMax'].values.astype(np.float32)

    for i in range(len(ids)):
        id = ids[i]
        label = labels[i]
        if id not in res:
            res[id] = dict()
        if label not in res[id]:
            res[id][label] = []
        box = [xmin[i], ymin[i], xmax[i], ymax[i], scores[i]]
        res[id][label].append(box)

    return res


def _compute_ap(recall, precision):
    """ Compute the average precision, given the recall and precision curves.

    Code originally from https://github.com/rbgirshick/py-faster-rcnn.

    # Arguments
        recall:    The recall curve (list).
        precision: The precision curve (list).
    # Returns
        The average precision as computed in py-faster-rcnn.
    """
    # correct AP calculation
    # first append sentinel values at the end
    mrec = np.concatenate(([0.], recall, [1.]))
    mpre = np.concatenate(([0.], precision, [0.]))

    # compute the precision envelope
    for i in range(mpre.size - 1, 0, -1):
        mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])

    # to calculate area under PR curve, look for points
    # where X axis (recall) changes value
    i = np.where(mrec[1:] != mrec[:-1])[0]

    # and sum (\Delta recall) * prec
    ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
    return ap


def mean_average_precision_for_boxes(ann, pred, iou_threshold=0.5, exclude_not_in_annotations=False, verbose=True):
    """

    :param ann: path to CSV-file with annotations or numpy array of shape (N, 6)
    :param pred: path to CSV-file with predictions (detections) or numpy array of shape (N, 7)
    :param iou_threshold: IoU between boxes which count as 'match'. Default: 0.5
    :param exclude_not_in_annotations: exclude image IDs which are not exist in annotations. Default: False
    :param verbose: print detailed run info. Default: True
    :return: tuple, where first value is mAP and second values is dict with AP for each class.
    """
    
    if isinstance(ann, str):
        valid = pd.read_csv(ann)
    else:
        valid = pd.DataFrame(ann, columns=['ImageID', 'LabelName', 'XMin', 'XMax', 'YMin', 'YMax'])

    if isinstance(pred, str):
        preds = pd.read_csv(pred)
    else:
        preds = pd.DataFrame(pred, columns=['ImageID', 'LabelName', 'Conf', 'XMin', 'XMax', 'YMin', 'YMax'])

    ann_unique = valid['ImageID'].unique()
    preds_unique = preds['ImageID'].unique()

    if verbose:
        print('Number of files in annotations: {}'.format(len(ann_unique)))
        print('Number of files in predictions: {}'.format(len(preds_unique)))

    # Exclude files not in annotations!
    if exclude_not_in_annotations:
        preds = preds[preds['ImageID'].isin(ann_unique)]
        preds_unique = preds['ImageID'].unique()
        if verbose:
            print('Number of files in detection after reduction: {}'.format(len(preds_unique)))

    unique_classes = valid['LabelName'].unique().astype(np.str)
    if verbose:
        print('Unique classes: {}'.format(len(unique_classes)))

    all_detections = get_detections(preds)
    all_annotations = get_real_annotations(valid)
    if verbose:
        print('Detections length: {}'.format(len(all_detections)))
        print('Annotations length: {}'.format(len(all_annotations)))

    average_precisions = {}
    for zz, label in enumerate(sorted(unique_classes)):

        # Negative class
        if str(label) == 'nan':
            continue

        false_positives = []
        true_positives = []
        scores = []
        num_annotations = 0.0

        for i in range(len(ann_unique)):
#             print('iter_{} '.format(i))
            detections = []
            annotations = []
            id = str(ann_unique[i])
            if id in all_detections:
                if label in all_detections[id]:
                    detections = all_detections[id][label]
            if id in all_annotations:
                if label in all_annotations[id]:
                    annotations = all_annotations[id][label]

            if len(detections) == 0 and len(annotations) == 0:
                continue

            num_annotations += len(annotations)
            detected_annotations = []

            annotations = np.array(annotations, dtype=np.float64)
            for d in detections:
                scores.append(d[4])

                if len(annotations) == 0:
                    false_positives.append(1)
                    true_positives.append(0)
                    continue

                overlaps = compute_overlap(np.expand_dims(np.array(d, dtype=np.float64), axis=0), annotations)
                assigned_annotation = np.argmax(overlaps, axis=1)
                max_overlap = overlaps[0, assigned_annotation]

                if max_overlap >= iou_threshold and assigned_annotation not in detected_annotations:
                    false_positives.append(0)
                    true_positives.append(1)
                    detected_annotations.append(assigned_annotation)
                else:
                    false_positives.append(1)
                    true_positives.append(0)

        if num_annotations == 0:
            average_precisions[label] = 0, 0
            continue

        false_positives = np.array(false_positives)
        true_positives = np.array(true_positives)
        scores = np.array(scores)
#         print('ddd1 '*10)
        # sort by score
        indices = np.argsort(-scores)
        false_positives = false_positives[indices]
        true_positives = true_positives[indices]
#         print('ddd2 '*10)
        # compute false positives and true positives
        false_positives = np.cumsum(false_positives)
        true_positives = np.cumsum(true_positives)
#         print('ddd3 '*10)
        # compute recall and precision
        recall = true_positives / num_annotations
        precision = true_positives / np.maximum(true_positives + false_positives, np.finfo(np.float64).eps)
#         print('ddd4 '*10)
        # compute average precision
        average_precision = _compute_ap(recall, precision)
        average_precisions[label] = average_precision, num_annotations
        if verbose:
            s1 = "{:30s} | {:.6f} | {:7d}".format(label, average_precision, int(num_annotations))
            print(s1)

    present_classes = 0
    precision = 0
    for label, (average_precision, num_annotations) in average_precisions.items():
        if num_annotations > 0:
            present_classes += 1
            precision += average_precision
    mean_ap = precision / present_classes
    if verbose:
        print('mAP: {:.6f}'.format(mean_ap))
    return mean_ap, average_precisions


In [None]:
def evaluate(model, data_loader, device, sequences=1):
   
    # Set evaluation mode flag
    model.eval()
    # Create list with all object detection -> [set, frame, obj, [xmin,ymin,xmax,ymax], label, score]
    ground_truth = []
    predictions = []

    # Gather all targets and outputs on test set
    for image, targets in data_loader:
        image = [img.to(device) for img in image]
        outputs = model(image)
        for idx in range(len(outputs)):
            outputs[idx] = apply_nms(outputs[idx], iou_thresh=0.3)

        # create list for targets and outputs to pass to compute_mAP()
        # lists have the following structure:  [sequence, frame, obj_idx, [xmin, ymin, xmax, ymax], label, score]
        for s in range(sequences):
            obj_gt = 0
            obj_target = 0
            for out, target in zip(outputs, targets):

                for i in range(len(target['boxes'])):
                    ground_truth.append([s, target['image_id'].detach().cpu().numpy()[0], obj_target,
                                         target['boxes'].detach().cpu().numpy()[i],
                                         target['labels'].detach().cpu().numpy()[i], 1])
                    obj_target += 1

                for j in range(len(out['boxes'])):
                    predictions.append([s, target['image_id'].detach().cpu().numpy()[0], obj_gt,
                                        out['boxes'].detach().cpu().numpy()[j],
                                        out['labels'].detach().cpu().numpy()[j],
                                        out['scores'].detach().cpu().numpy()[j]])
                    obj_gt += 1
#     print(ground_truth[:5])
#     print(predictions[:5])
    ann=[]
    pre=[]
    for gt_info in ground_truth:
        se,imageid,_,box,label_indx,confi=gt_info
        box_XMin,box_YMin,box_XMax,box_YMax=box.tolist()
        ann.append([imageid,str(label_indx),box_XMin,box_XMax,box_YMin,box_YMax])
    for predict_info in predictions:
        se,imageid,_,box,label_indx,confi=predict_info
        box_XMin,box_YMin,box_XMax,box_YMax=box.tolist()
        pre.append([imageid,str(label_indx),confi,box_XMin,box_XMax,box_YMin,box_YMax])
#     print('$$'*10)
#     print(ann)
#     print('$$'*10)
#     print(pre)
    mAP, AP = mean_average_precision_for_boxes(ann, pre,iou_threshold=0.3)
#     print('*'*10)
    print("mAP:{:.3f}".format(mAP))
    print("AP:{}".format(AP))#AP is dict with AP for each class
   # for ap_metric, iou in zip(AP, np.arange(0.0, 1, 0.1)):
     #    print("\tAP at IoU level [{:.2f}]: {:.3f}".format(iou, ap_metric))

    return mAP, AP

### Transformations
Just Horizontal and Vertical Flip for now.

Normalization to Resnet's mean and std can be performed using the parameter `NORMALIZE` in the top cell.

The first 3 transformations come from [this](https://www.kaggle.com/abhishek/maskrcnn-utils) utils package by Abishek, `VerticalFlip` is my adaption of HorizontalFlip, and `Normalize` is of my own.

In [None]:
# These are slight redefinitions of torch.transformation classes
# The difference is that they handle the target and the mask
# Copied from Abishek, added new ones
class Compose:
    def __init__(self, transforms):
        self.transforms = transforms

    def __call__(self, image, target):
        for t in self.transforms:
            image, target = t(image, target)
        return image, target

class VerticalFlip:               #垂直翻转
    def __init__(self, prob):
        self.prob = prob

    def __call__(self, image, target):
        if random.random() < self.prob:
            height, width = image.shape[-2:]
            image = image.flip(-2)
            bbox = target["boxes"]
            bbox[:, [1, 3]] = height - bbox[:, [3, 1]]
            target["boxes"] = bbox
           # target["masks"] = target["masks"].flip(-2)   #<-- 数据增强方法调用dataset，dataset中我们将masks注销掉了，这部分在下一个cell中，接收的target没有masks这一样，这里也需要注销掉
        return image, target

class HorizontalFlip:         #水平翻转
    def __init__(self, prob):
        self.prob = prob

    def __call__(self, image, target):
        if random.random() < self.prob:
            height, width = image.shape[-2:]
            image = image.flip(-1)
            bbox = target["boxes"]
            bbox[:, [0, 2]] = width - bbox[:, [2, 0]]
            target["boxes"] = bbox
           # target["masks"] = target["masks"].flip(-1)   #<-- 数据增强方法调用dataset，dataset中我们将masks注销掉了，这部分在下一个cell中，接收的target没有masks这一样，这里也需要注销掉
        return image, target

class Normalize:
    def __call__(self, image, target):
        image = F.normalize(image, RESNET_MEAN, RESNET_STD)
        return image, target

class ToTensor:
    def __call__(self, image, target):
        image = F.to_tensor(image)
        return image, target
    

def get_transform(train):
    transforms = [ToTensor()]
    if NORMALIZE:
        transforms.append(Normalize())
    
    # 训练数据扩充
    if train: 
        transforms.append(HorizontalFlip(0.5))
        transforms.append(VerticalFlip(0.5))

    return Compose(transforms)

## Training Dataset and DataLoader

In [None]:
cell_type_dict = {"astro": 1, "cort": 2, "shsy5y": 3}

class CellDataset(Dataset):
    def __init__(self, image_dir, df, transforms=None, resize=False):
        self.transforms = transforms
        self.image_dir = image_dir
        self.df = df
        
        self.should_resize = resize is not False
        if self.should_resize:
            self.height = int(HEIGHT * resize)
            self.width = int(WIDTH * resize)
            print("image size used:", self.height, self.width)
        else:
            self.height = HEIGHT
            self.width = WIDTH
        
        self.image_info = collections.defaultdict(dict)
        temp_df = self.df.groupby(["id", "cell_type"])['annotation'].agg(lambda x: list(x)).reset_index()
        for index, row in temp_df.iterrows():
            self.image_info[index] = {
                    'image_id': row['id'],
                    'image_path': os.path.join(self.image_dir, row['id'] + '.png'),
                    'annotations': list(row["annotation"]),
                    'cell_type': cell_type_dict[row["cell_type"]]
                    }
            
    def get_box(self, a_mask):
        ''' Get the bounding box of a given mask '''
        pos = np.where(a_mask)
        xmin = np.min(pos[1])
        xmax = np.max(pos[1])
        ymin = np.min(pos[0])
        ymax = np.max(pos[0])
        return [xmin, ymin, xmax, ymax]

    def __getitem__(self, idx):
        ''' Get the image and the target'''
        
        img_path = self.image_info[idx]["image_path"]
        img = cv2.imread(img_path, cv2.IMREAD_COLOR)
        
        if self.should_resize:
            img = cv2.resize(img, (self.width, self.height))

        info = self.image_info[idx]

        n_objects = len(info['annotations'])
        masks = np.zeros((len(info['annotations']), self.height, self.width), dtype=np.uint8)
        boxes = []
        labels = []
        for i, annotation in enumerate(info['annotations']):
            a_mask = rle_decode(annotation, (HEIGHT, WIDTH))
            
            if self.should_resize:
                a_mask = cv2.resize(a_mask, (self.width, self.height))
            
            a_mask = np.array(a_mask) > 0
            masks[i, :, :] = a_mask
            
            boxes.append(self.get_box(a_mask))

        # labels
        labels = [int(info["cell_type"]) for _ in range(n_objects)]
        #labels = [1 for _ in range(n_objects)]
        
        
        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        labels = torch.as_tensor(labels, dtype=torch.int64)
        masks = torch.as_tensor(masks, dtype=torch.uint8)  #<-- 将 masks 注销只进行box预测 

        image_id = torch.tensor([idx])
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        iscrowd = torch.zeros((n_objects,), dtype=torch.int64)

        # This is the required target for the Mask R-CNN
        target = {
            'boxes': boxes,
            'labels': labels,
          #  'masks': masks,   #<-- 将 masks 从输出结构体中删除，这里将masks这一Key去掉了，上面的cell中数据增强不能使用mask
            'image_id': image_id,
            'area': area,
            'iscrowd': iscrowd
        }

        if self.transforms is not None:
            img, target = self.transforms(img, target)

        return img, target

    def __len__(self):
        return len(self.image_info)

In [None]:
df_base = pd.read_csv(TRAIN_CSV, nrows=5000 if TEST else None)

In [None]:
df_images = df_base.groupby(["id", "cell_type"]).agg({'annotation': 'count'}).sort_values("annotation", ascending=False).reset_index()

for ct in cell_type_dict:
    ctdf = df_images[df_images["cell_type"]==ct].copy()
    if len(ctdf)>0:
        ctdf['quantiles'] = pd.qcut(ctdf['annotation'], 5)
        display(ctdf.head())

In [None]:
df_images.groupby("cell_type").annotation.describe().astype(int)

In [None]:
# We used this as a reference to fill BOX_DETECTIONS_PER_IMG=140
df_images[['annotation']].describe().astype(int)

In [None]:
# Use the quantiles of amoount of annotations to stratify
df_images_train, df_images_val = train_test_split(df_images, stratify=df_images['cell_type'], 
                                                  test_size=PCT_IMAGES_VALIDATION,
                                                  random_state=1234)
df_train = df_base[df_base['id'].isin(df_images_train['id'])]
df_val = df_base[df_base['id'].isin(df_images_val['id'])]
print(f"Images in train set:           {len(df_images_train)}")
print(f"Annotations in train set:      {len(df_train)}")
print(f"Images in validation set:      {len(df_images_val)}")
print(f"Annotations in validation set: {len(df_val)}")

In [None]:
ds_train = CellDataset(TRAIN_PATH, df_train, resize=resize_factor, transforms=get_transform(train=True))
dl_train = DataLoader(ds_train, batch_size=BATCH_SIZE, shuffle=True, pin_memory=True,
                      num_workers=2, collate_fn=lambda x: tuple(zip(*x)))

ds_val = CellDataset(TRAIN_PATH, df_val, resize=resize_factor, transforms=get_transform(train=False))
dl_val = DataLoader(ds_val, batch_size=BATCH_SIZE, shuffle=True, pin_memory=True,
                    num_workers=2, collate_fn=lambda x: tuple(zip(*x)))

# Train model

## setup model

In [None]:
# Override pythorch checkpoint with an "offline" version of the file
!mkdir -p /root/.cache/torch/hub/checkpoints/
!cp ../input/cocopre/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth /root/.cache/torch/hub/checkpoints/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth

In [None]:
def get_model(num_classes, model_chkpt=None):
    # This is just a dummy value for the classification head
    
    if NORMALIZE:
#         model = torchvision.models.detection.maskrcnn_resnet50_fpn(
#             pretrained=True, box_detections_per_img=BOX_DETECTIONS_PER_IMG,
#             image_mean=RESNET_MEAN,image_std=RESNET_STD
#         )
        model = torchvision.models.detection.fasterrcnn_resnet50_fpn(  # <-- 这里我将 mask-rcnn 网络修改成faster-rcnn 
            pretrained=True, box_detections_per_img=BOX_DETECTIONS_PER_IMG,
            image_mean=RESNET_MEAN,image_std=RESNET_STD
        )                                                          
    else:
#         model = torchvision.models.detection.maskrcnn_resnet50_fpn(
#             pretrained=True, box_detections_per_img=BOX_DETECTIONS_PER_IMG
#         )
        model = torchvision.models.detection.fasterrcnn_resnet50_fpn(  # <-- 这里我将 mask-rcnn 网络修改成faster-rcnn 
            pretrained=True, box_detections_per_img=BOX_DETECTIONS_PER_IMG
        )                        
    
    # get the number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # replace the pre-trained head with a new one
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes+1)

    
#     # now get the number of input features for the mask classifier    
#     in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
#     hidden_layer = 256
#     # and replace the mask predictor with a new one
#     model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask, hidden_layer, num_classes+1)  #<-- 用faster-rcnn，所以不需要MASK分支
    
#     if model_chkpt:
#         model.load_state_dict(torch.load(model_chkpt, map_location=DEVICE))     #<-- 网络预训练参数加载，可以自动加载
    return model


# ============================================================================================================
# 利用上面 get_model 函数建立 Faster-RCNN 模型（原来是建立Mask-RCNN）
# The model does classification, bounding boxesfor individuals, all at the same time
# We only care about MASKS
model = get_model(num_classes=len(cell_type_dict))#调用get_model函数，传入的参数num_classes采用len的方法计算cell_type_dict中有几类目标，num_classes=3
model.to(DEVICE)#把模型加载到cpu或gpu的框架里进行运算

# TODO: try removing this for
for param in model.parameters():#一个训练过程，遍历了以后，将param.model所有参数都设置成了可以梯度下降的模式
    param.requires_grad = True
    
model.train(); #加载模型，设置完成，定义训练模式

In [None]:
#pip install git+https://github.com/gautamchitnis/cocoapi.git@cocodataset-master#subdirectory=PythonAPI

## Training loop!

In [None]:
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=LEARNING_RATE, momentum=MOMENTUM, weight_decay=WEIGHT_DECAY)

lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)

n_batches, n_batches_val = len(dl_train), len(dl_val)

for epoch in range(1, NUM_EPOCHS + 1): #之前定义了训练轮次等于30，通过循环进行30轮训练
    print(f"Starting epoch {epoch} of {NUM_EPOCHS}") #输出当前进行的轮次
    model.train()
    time_start = time.time() #训练开始记录时间
    loss_accum = 0.0#计算损失的参数初始化为0.0
    loss_classifier_accum = 0.0#分类参数的初始化为0.0
    
    for batch_idx, (images, targets) in enumerate(dl_train, 1): 
    
        # Predict
        images = list(image.to(DEVICE) for image in images)#将图片转化成torch张量的形式，方便后续输入
        targets = [{k: v.to(DEVICE) for k, v in t.items()} for t in targets]#将标签转化成gpu或cpu的张量格式，再输入模型中进行损失计算，返回loss dict

        loss_dict = model(images, targets)
        loss = sum(loss for loss in loss_dict.values())
        #loss_dict.values方法会获取loss_dict字典里的所有值，把值提取出来求和
        #将每一个batch的计算的损失求和，再对损失求平均，得到平均损失
        
        #Backprop——反向传播，得到损失后，针对损失情况（预测结果和真实值的误差）对模型进行调整，然后再进行下一轮训练
        optimizer.zero_grad()#对优化器进行0初始化
        loss.backward()  #损失的反向传播
        optimizer.step() #在反向传播过程中进行优化器的操作
        
        # Logging
        loss_accum += loss.item()
        loss_classifier_accum += loss_dict['loss_classifier'].item()#将分类损失进行求和
        
    #    if batch_idx % 500 == 0: #在数据很多时，当batch批次可以被500整除时，输出训练各项损失的情况
     #       print(f"[Batch {batch_idx:3d} / {n_batches:3d}] Batch train loss: {loss.item():7.3f}. Mask-only loss: {loss_mask:7.3f}.")
                        
    if USE_SCHEDULER:
        lr_scheduler.step()

    # Train losses
    train_loss = loss_accum / n_batches#将每个训练批次的损失求和后除总批次，得该轮训练损失
    train_loss_classifier = loss_classifier_accum / n_batches #同上，得到分类损失

    # Validation——验证过程，对训练好的模型进行精度测试
    val_loss_accum = 0 #初始化了验证损失
    val_loss_classifier_accum = 0#初始化验证分类损失
    
    with torch.no_grad(): #使用torch.no_grad的方法，固化参数，不允许进行梯度下降，对验证集进行遍历，最后对验证推理进行求和平均的计算
 #方法同训练一致
        for batch_idx, (images, targets) in enumerate(dl_val, 1):
            images = list(image.to(DEVICE) for image in images)
            targets = [{k: v.to(DEVICE) for k, v in t.items()} for t in targets]

            val_loss_dict = model(images, targets)
            val_batch_loss = sum(loss for loss in val_loss_dict.values())
            val_loss_accum += val_batch_loss.item()
            val_loss_classifier_accum += val_loss_dict['loss_classifier'].item()

    # Validation losses
    val_loss = val_loss_accum / n_batches_val
    val_loss_classifier = val_loss_classifier_accum / n_batches_val
    elapsed = time.time() - time_start#计算训练一轮到验证完的时间消耗
    
    torch.save(model.state_dict(), f"pytorch_model-e{epoch}.bin") #对训练的模型进行保存
    prefix = f"[Epoch {epoch:2d} / {NUM_EPOCHS:2d}]"

    print(f"{prefix} Train loss: {train_loss:7.3f}. Val loss: {val_loss:7.3f} [{elapsed:.0f} secs]")
    
    evaluate(model,dl_val, device=DEVICE)
    
#定义字符串，将训练轮次和训练总轮次通过print方式将prefix进行输出

# Analyze prediction results for train set

In [None]:
# Plots: the image, The image + the ground truth mask, The image + the predicted mask

def analyze_train_sample(model, ds_train, sample_index): #定义analyze_train_sample函数， #返回图片和标签
     
    img, targets = ds_train[sample_index]
    #print(img.shape)  #输出tagrets，对应下边的数字矩阵     
    l = np.unique(targets["labels"])
    ig, ax = plt.subplots(nrows=1, ncols=3, figsize=(20,60), facecolor="#fefefe")
    ax[0].imshow(img.numpy().transpose((1,2,0)))
    ax[0].set_title(f"cell type {l}")   #细胞类型
    ax[0].axis("off")
    #对标签进行可视化，画图的意思，显示做推理的图片——原图
    
    

    #img = img_tensor.cpu().data
    ax[1].imshow(img.numpy().transpose((1,2,0)))
    for i, box in enumerate(targets["boxes"]):
        xmin, ymin, xmax, ymax = box
        rect = patches.Rectangle((xmin, ymin), (xmax - xmin), (ymax - ymin), linewidth=1, edgecolor='green',
                                     facecolor='none')
        ax[1].add_patch(rect)
    #plt.show()
    
   # masks = combine_masks(targets['boxes'], 0.5)
    #plt.imshow(img.numpy().transpose((1,2,0)))
 #   ax[1].imshow(masks)
   # ax[1].set_title(f"Ground truth, {len(targets['area'])} cells")
 #   ax[1].axis("off")
    #调用mask方法，再次画了一张图 ，画的是对原图进行标注的图形 
    model.eval()
    with torch.no_grad():
        preds = model([img.to(DEVICE)])[0]
    
#     print('---------------------------')
#     print(targets)
#     print(preds)
#     print('--------------------------')
    
    
    l = pd.Series(preds['labels'].cpu().numpy()).value_counts()              
    lstr = ""
    for i in l.index:
        lstr += f"{l[i]}x{i} "
    #print(l, l.sort_values().index[-1])
    
    ax[2].imshow(img.numpy().transpose((1,2,0)))                      #画出预测的框
    s_boxes=[]                                                       #用来保存分数大于阈值的框
    s_scores=[]                                                      #用来保存大于阈值的分数
    s_classes=[]                                                     #用来保存大于阈值的类别
    for i, box in enumerate(preds["boxes"]):
        scr = preds["scores"][i].cpu().item()
        label = preds["labels"][i].cpu().item()
        if scr > min_score_dict[label]:
            #mask = mask.cpu().numpy().squeeze()
            # Keep only highly likely pixels
            #binary_mask = mask > mask_threshold_dict[label]
            s_boxes.append(box.cpu().numpy())
            s_scores.append(scr)
            s_classes.append(label)
#             xmin, ymin, xmax, ymax = box
#             rect = patches.Rectangle((xmin, ymin), (xmax - xmin), (ymax - ymin), linewidth=1, edgecolor='red',
#                                          facecolor='none')
        
        #ax[2].add_patch(rect)
    s_boxes=torch.tensor(s_boxes)
    s_scores=torch.tensor(s_scores)
    s_classes=torch.tensor(s_classes)
#     print(s_boxes)
#     print(s_scores)
#     print(s_classes)
    thre=0.6                    #设置nms的阈值
    output_boxes=torchvision.ops.boxes.batched_nms(s_boxes,s_scores,s_classes,thre)               #调用多类别NMS
    for index in output_boxes:
        xmin, ymin, xmax, ymax = s_boxes[index]
        rect = patches.Rectangle((xmin, ymin), (xmax - xmin), (ymax - ymin), linewidth=1, edgecolor='red',
                                         facecolor='none')
        ax[2].add_patch(rect)
    print('---------------')
    print(output_boxes)
    print('---------------')
    
    plt.show()
    
#     plt.imshow(img.cpu().numpy().transpose((1,2,0)))
#     mask_threshold = mask_threshold_dict[l.sort_values().index[-1]]
#     print(mask_threshold)
#     pred_masks = combine_masks(get_filtered_masks(preds), mask_threshold)
#     ax[2].imshow(pred_masks)
#     ax[2].set_title(f"Predictions, labels: {lstr}")
#     ax[2].axis("off")
#     plt.show() 
#     #画图，画的是模型推理的图像，显示对模型预测的结果
#     print(masks.shape, pred_masks.shape)
#     score = iou_map([masks],[pred_masks])
#     print("Score:", score)    
    
    
# NOTE: It puts the model in eval mode!! Revert for re-training
analyze_train_sample(model, ds_train, 20)
#调用analyze_train_sample方法，20代表从ds_train中选出的第20张图片，可以任意修改，出来的图片应该都不一样才对  
#问题在于少一些东西，mask模型推理的结果是字典输出的和faster rcnn的键名是不一致的，对不上，又不知道应该输入的数据是什么，所以不知道怎么改

In [None]:
analyze_train_sample(model, ds_train, 102)

In [None]:
analyze_train_sample(model, ds_train, 7)

## Get the model from the best epoch

In [None]:
# Epochs with their losses and IOU scores val_scores = pd.DataFrame()
val_scores = pd.DataFrame()
for e, val_loss in enumerate(validation_mask_losses):
    model_chk = f"pytorch_model-e{e+1}.bin"
    print("Loading:", model_chk)
    model = get_model(len(cell_type_dict), model_chk)
    model.load_state_dict(torch.load(model_chk))
    model = model.to(DEVICE)
    val_scores.loc[e,"mask_loss"] = val_loss
    val_scores.loc[e,"score"] = get_score(ds_val, model)
    
    
#display(val_scores.sort_values("score", ascending=False))

# best_epoch = np.argmax(val_scores["score"])
# print(best_epoch+1)

# Prediction

## Test Dataset and DataLoader

In [None]:
class CellTestDataset(Dataset):
    def __init__(self, image_dir, transforms=None, resize=False):
        self.transforms = transforms
        self.image_dir = image_dir
        self.image_ids = [f[:-4]for f in os.listdir(self.image_dir)]
        self.should_resize = resize is not False
        if self.should_resize:
            self.height = int(HEIGHT * resize)
            self.width = int(WIDTH * resize)
            print("image size used:", self.height, self.width)
            
    def __getitem__(self, idx):
        image_id = self.image_ids[idx]
        image_path = os.path.join(self.image_dir, image_id + '.png')
        image = cv2.imread(image_path, cv2.IMREAD_COLOR)
        if self.should_resize:
            image = cv2.resize(image, (self.width, self.height))

        if self.transforms is not None:
            image, _ = self.transforms(image=image, target=None)
        return {'image': image, 'image_id': image_id}

    def __len__(self):
        return len(self.image_ids)

In [None]:
ds_test = CellTestDataset(TEST_PATH, transforms=get_transform(train=False))

In [None]:
model_chk = f"pytorch_model-e{best_epoch+1}.bin"
print("Loading:", model_chk)
model = get_model(len(cell_type_dict))
model.load_state_dict(torch.load(model_chk))
model = model.to(DEVICE)

for param in model.parameters():
    param.requires_grad = False

model.eval();

submission = []
for sample in ds_test:
    img = sample['image']
    image_id = sample['image_id']
    with torch.no_grad():
        result = model([img.to(DEVICE)])[0]
    
    previous_masks = []
    for i, mask in enumerate(result["masks"]):

        # Filter-out low-scoring results.
        score = result["scores"][i].cpu().item()
        label = result["labels"][i].cpu().item()
        if score > min_score_dict[label]:
            mask = mask.cpu().numpy()
            # Keep only highly likely pixels
            binary_mask = mask > mask_threshold_dict[label]
            binary_mask = remove_overlapping_pixels(binary_mask, previous_masks)
            previous_masks.append(binary_mask)
            rle = rle_encoding(binary_mask)
            submission.append((image_id, rle))

    # Add empty prediction if no RLE was generated for this image
    all_images_ids = [image_id for image_id, rle in submission]
    if image_id not in all_images_ids:
        submission.append((image_id, ""))

df_sub = pd.DataFrame(submission, columns=['id', 'predicted'])
df_sub.to_csv("submission.csv", index=False)
df_sub.head()