#### This notebook was inspired from [here](https://www.kaggle.com/code/ammarnassanalhajali/layout-parser-model-training)

# Detectron2

[Detectron2](https://detectron2.readthedocs.io/en/latest/index.html) is a popular open-source software library developed by Facebook AI Research (FAIR) for building computer vision models. It serves as a powerful framework for object detection, instance segmentation, and keypoint detection tasks. Detectron2 is built on top of PyTorch, geared towards a more convenient way to build modular, flexible pipelines for specific Computer Vision Tasks such as object detection, instance segmentation. 

Detectron2 has a collection of trained models for these tasks in their [model zoo](https://github.com/facebookresearch/detectron2/blob/main/MODEL_ZOO.md). We can also use detectron2 to train pre-implemented state-of-the-art models from scratch for new datasets, as we do in this notebook. 

Read the [documentation](https://detectron2.readthedocs.io/en/latest/index.html).

# 1 Install detectron2

## 1.1 Recommended Way (is not working on kaggle)

In [None]:
# !python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

In [2]:
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:512"

## 1.2 Fast Way
Ignore the warnings.

In [3]:
%%capture
import sys, os, distutils.core
# Note: This is a faster way to install detectron2 in Colab, but it does not include all functionalities (e.g. compiled operators).
# See https://detectron2.readthedocs.io/tutorials/install.html for full installation instructions
!git clone 'https://github.com/facebookresearch/detectron2'
dist = distutils.core.run_setup("./detectron2/setup.py")
!python -m pip install {' '.join([f"'{x}'" for x in dist.install_requires])}
sys.path.insert(0, os.path.abspath('./detectron2'))

# 2 Notebook Config

In [None]:

import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:512"


from datetime import datetime

# if False, model is set to `PRETRAINED_PATH` model
is_train = True

# if True, evaluate on validation dataset
is_evaluate = True

# if True, run inference on test dataset
is_inference = True

# if True and `is_train` == True, `PRETRAINED_PATH` model is trained further
is_resume_training = False

# Perform augmentation
is_augment = False

SEED = 42
import random
import os
import numpy as np
import torch
def seed_everything(seed):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = True

seed_everything(SEED)

"""## 2.2 Paths"""

from pathlib import Path

TRAIN_IMG_DIR = Path("/kaggle/input/dlsprint2/badlad/images/train")

TRAIN_COCO_PATH = Path("/kaggle/input/dlsprint2/badlad/labels/coco_format/train/badlad-train-coco.json")

TEST_IMG_DIR = Path("/kaggle/input/dlsprint2/badlad/images/test")

TEST_METADATA_PATH = Path("/kaggle/input/dlsprint2/badlad/badlad-test-metadata.json")

# Training output directory
OUTPUT_DIR = Path("./output")
OUTPUT_MODEL = OUTPUT_DIR/"model_final.pth"

# Path to your pretrained model weights
PRETRAINED_PATH = Path("")

"""## 2.3 imports"""

# detectron2
from detectron2.utils.memory import retry_if_cuda_oom
from detectron2.utils.logger import setup_logger
from detectron2.checkpoint import DetectionCheckpointer
from detectron2.modeling import build_model
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
import detectron2.data.transforms as T
from detectron2.data import detection_utils as utils
from detectron2.data import DatasetCatalog, MetadataCatalog, build_detection_test_loader, build_detection_train_loader, DatasetMapper
from detectron2.utils.visualizer import Visualizer
from detectron2.structures import BoxMode
from detectron2.engine import DefaultPredictor, DefaultTrainer
from detectron2.config import get_cfg
from detectron2 import model_zoo

import pandas as pd
import numpy as np
from tqdm.notebook import tqdm  # progress bar
import matplotlib.pyplot as plt
import json
import cv2
import copy
from typing import Optional

from IPython.display import FileLink

# torch
import torch
import os

import gc

import warnings
# Ignore "future" warnings and Data-Frame-Slicing warnings.
warnings.filterwarnings('ignore')

setup_logger()

import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:512"

"""# 3 COCO Annotations Data

## 3.1 Load
"""

with TRAIN_COCO_PATH.open() as f:
    train_dict = json.load(f)

with TEST_METADATA_PATH.open() as f:
    test_dict = json.load(f)

print("#### LABELS AND METADATA LOADED ####")

"""## 3.2 Observe"""

def organize_coco_data(data_dict: dict):
    thing_classes: list[str] = []

    # Map Category Names to IDs
    for cat in data_dict['categories']:
        thing_classes.append(cat['name'])

    print(thing_classes)

    # thing_classes = ['paragraph', 'text_box', 'image', 'table']
    # Images
    images_metadata: list[dict] = data_dict['images']

    # Convert COCO annotations to detectron2 annotations format
    data_annotations = []
    for ann in data_dict['annotations']:
        # coco format -> detectron2 format
        annot_obj = {
            # Annotation ID
            "id": ann['id'],

            # Segmentation Polygon (x, y) coords
            "gt_masks": ann['segmentation'],

            # Image ID for this annotation (Which image does this annotation belong to?)
            "image_id": ann['image_id'],

            # Category Label (0: paragraph, 1: text box, 2: image, 3: table)
            "category_id": ann['category_id'],

            "x_min": ann['bbox'][0],  # left
            "y_min": ann['bbox'][1],  # top
            "x_max": ann['bbox'][0] + ann['bbox'][2],  # left+width
            "y_max": ann['bbox'][1] + ann['bbox'][3]  # top+height
        }
        data_annotations.append(annot_obj)

    return thing_classes, images_metadata, data_annotations

thing_classes, images_metadata_train, train_data_annotations = organize_coco_data(
    train_dict
)

thing_classes_test, images_metadata_test, _ = organize_coco_data(
    test_dict
)

train_metadata = pd.DataFrame(images_metadata_train)
train_metadata = train_metadata[['id', 'file_name', 'width', 'height']]
train_metadata = train_metadata.rename(columns={"id": "image_id"})
print("train_metadata size=", len(train_metadata))
train_metadata.head(5)

train_annot_df = pd.DataFrame(train_data_annotations)
print("train_annot_df size=", len(train_annot_df))
train_annot_df.head(5)

"""Here `gt_masks` are the sequence of `(x, y)` coordinates of vertices of the polygon surrounding the target object."""

# test_metadata = pd.DataFrame(images_metadata_test)
# test_metadata = test_metadata[['id', 'file_name', 'width', 'height']]
# test_metadata = test_metadata.rename(columns={"id": "image_id"})
# print("test_metadata size=", len(test_metadata))
# test_metadata.head(5)

"""These are the categories we are going to detect."""

print(thing_classes)

DATA_REGISTER_TRAINING = "badlad_train"
DATA_REGISTER_VALID    = "badlad_valid"
DATA_REGISTER_TEST     = "badlad_test"

train_df_with_annotations = train_metadata.merge(train_annot_df, on='image_id')
train_df_with_annotations.head()

# Group annotations by image_id and aggregate into a list
train_df_with_annotations_grouped = train_df_with_annotations.groupby('image_id').agg(lambda x: x.tolist()).reset_index()
train_df_with_annotations_grouped.head()

torch.cuda.empty_cache()
gc.collect()

"""# 4 Preparing Data for Training

## 4.1 Train-Validation Split
"""

"""## 4.2 Formatting Data for `detectron2`"""

def convert_coco_to_detectron2_format(
    imgdir: Path,
    metadata_df: pd.DataFrame,
    annot_df: Optional[pd.DataFrame] = None,
    target_indices: Optional[np.ndarray] = None,
):

    dataset_dicts = []
    for _, train_meta_row in metadata_df.iterrows():
    # Your code for each row goes here

        # Iterate over each image
        image_id, filename, width, height = train_meta_row.values

        annotations = []

        # If train/validation data, then there will be annotations
        if annot_df is not None:
            for _, ann in annot_df.query("image_id == @image_id").iterrows():
                # Get annotations of current iteration's image
                class_id = ann["category_id"]
                gt_masks = ann["gt_masks"]
                bbox_resized = [
                    float(ann["x_min"]),
                    float(ann["y_min"]),
                    float(ann["x_max"]),
                    float(ann["y_max"]),
                ]

                annotation = {
                    "bbox": bbox_resized,
                    "bbox_mode": BoxMode.XYXY_ABS,
                    "segmentation": gt_masks,
                    "category_id": class_id,
                }

                annotations.append(annotation)

        # coco format -> detectron2 format dict
        record = {
            "file_name": str(imgdir/filename),
            "image_id": image_id,
            "width": width,
            "height": height,
            "annotations": annotations
        }

        dataset_dicts.append(record)

    if target_indices is not None:
        dataset_dicts = [dataset_dicts[i] for i in target_indices]

    return dataset_dicts

# Create empty lists to store images with tables and images without tables
images_with_tables = []
images_without_tables = []

# Loop through each row in the DataFrame
for _, row in train_df_with_annotations_grouped.iterrows():
    # Check if category_id for "table" (3) is present in the list
    if 3 in row['category_id']:
        images_with_tables.append(row['image_id'])
    else:
        images_without_tables.append(row['image_id'])

# Create DataFrames for images with tables and images without tables
images_with_tables_df = train_metadata[train_metadata['image_id'].isin(images_with_tables)]
images_without_tables_df = train_metadata[train_metadata['image_id'].isin(images_without_tables)]

# Convert the DataFrames to detectron2 format:
dataset_dicts_with_tables = convert_coco_to_detectron2_format(TRAIN_IMG_DIR, images_with_tables_df, train_annot_df)
dataset_dicts_without_tables = convert_coco_to_detectron2_format(TRAIN_IMG_DIR, images_without_tables_df, train_annot_df)

# Now, you have two separate datasets: dataset_dicts_with_tables containing images with tables annotated,
# and dataset_dicts_without_tables containing images without tables annotated.


# Create DataFrames for images with tables and images without tables
images_with_tables_df = train_metadata[train_metadata['image_id'].isin(images_with_tables)]
images_without_tables_df = train_metadata[train_metadata['image_id'].isin(images_without_tables)]


# Calculate the number of images with tables and without tables
num_images_with_tables = len(dataset_dicts_with_tables)
num_images_without_tables = len(dataset_dicts_without_tables)

from sklearn.model_selection import train_test_split
import random
# Perform a stratified split on the dataset with tables
dataset_with_tables_train, dataset_with_tables_valid = train_test_split(dataset_dicts_with_tables, test_size=0.2,)

# Perform a stratified split on the dataset without tables
dataset_without_tables_train, dataset_without_tables_valid = train_test_split(dataset_dicts_without_tables, test_size=0.2)

# Concatenate the datasets to create the final balanced training and validation sets
balanced_dataset_train = dataset_with_tables_train + dataset_without_tables_train
balanced_dataset_valid = dataset_with_tables_valid + dataset_without_tables_valid

# Shuffle the datasets to further randomize the order of images
random.shuffle(balanced_dataset_train)
random.shuffle(balanced_dataset_valid)

def check_category_statistics(dataset):
    category_counts = {}
    for data in dataset:
        annotations = data["annotations"]
        for annotation in annotations:
            category_id = annotation["category_id"]
            if category_id not in category_counts:
                category_counts[category_id] = 0
            category_counts[category_id] += 1

    category_statistics = {
        "category_id": [],
        "category_name": [],
        "num_instances": []
    }
    for category_id, num_instances in category_counts.items():
        category_statistics["category_id"].append(category_id)
        category_statistics["category_name"].append(thing_classes[category_id])
        category_statistics["num_instances"].append(num_instances)

    category_statistics_df = pd.DataFrame(category_statistics)
    return category_statistics_df

train_category_statistics = check_category_statistics(balanced_dataset_train)
valid_category_statistics = check_category_statistics(balanced_dataset_valid)

print("Training Set Category Statistics:")
print(train_category_statistics)

print("\nValidation Set Category Statistics:")
print(valid_category_statistics)



"""## 4.3 Registering and Loading Data for `detectron2`"""

DATA_REGISTER_TRAINING = "badlad_train"
DATA_REGISTER_VALID    = "badlad_valid"
DATA_REGISTER_TEST     = "badlad_test"

# Register Training data
if is_train:
    DatasetCatalog.register(DATA_REGISTER_TRAINING, lambda: balanced_dataset_train)
    MetadataCatalog.get(DATA_REGISTER_TRAINING).set(thing_classes=thing_classes)
    metadata_dicts_train = MetadataCatalog.get(DATA_REGISTER_TRAINING)
    print("dicts training size=", len(balanced_dataset_train))
    print("################")

# Register Validation data
if is_train or is_evaluate:
    DatasetCatalog.register(DATA_REGISTER_VALID, lambda: balanced_dataset_valid)
    MetadataCatalog.get(DATA_REGISTER_VALID).set(thing_classes=thing_classes)
    metadata_dicts_valid = MetadataCatalog.get(DATA_REGISTER_VALID)
    print("dicts valid size=", len(balanced_dataset_valid))
    print("################")

# Register Test Inference data
DatasetCatalog.register(
    DATA_REGISTER_TEST,
    lambda: convert_coco_to_detectron2_format(
        TEST_IMG_DIR,
        test_metadata,
    )
)

# Set Test data categories
MetadataCatalog.get(DATA_REGISTER_TEST).set(
    thing_classes=thing_classes_test
)

# dataset_dicts_test = DatasetCatalog.get(DATA_REGISTER_TEST)
metadata_dicts_test = MetadataCatalog.get(DATA_REGISTER_TEST)

#### LABELS AND METADATA LOADED ####
['paragraph', 'text_box', 'image', 'table']
['paragraph', 'text_box', 'image', 'table']
train_metadata size= 20365
train_annot_df size= 425101
['paragraph', 'text_box', 'image', 'table']


In [None]:
! git clone https://github.com/microsoft/unilm.git --depth=1 --quiet
! sed -i 's/from collections import Iterable/from collections.abc import Iterable/' unilm/dit/object_detection/ditod/table_evaluation/data_structure.py

In [None]:
import sys
sys.path.append("unilm")

import cv2

from unilm.dit.object_detection.ditod import add_vit_config

In [None]:
%%writefile cascade_dit_base.yaml
_BASE_: "/kaggle/input/dit-publay-finetuned/Base-RCNN-FPN.yaml"
MODEL:
  PIXEL_MEAN: [ 127.5, 127.5, 127.5 ]
  PIXEL_STD: [ 127.5, 127.5, 127.5 ]
  WEIGHTS: "https://layoutlm.blob.core.windows.net/dit/dit-pts/dit-base-224-p16-500k-62d53a.pth"
  VIT:
    NAME: "dit_base_patch16"
  ROI_HEADS:
    NAME: CascadeROIHeads
  ROI_BOX_HEAD:
    CLS_AGNOSTIC_BBOX_REG: True
  RPN:
    POST_NMS_TOPK_TRAIN: 2000
SOLVER:
  WARMUP_ITERS: 1000
  IMS_PER_BATCH: 16
  MAX_ITER: 60000
  CHECKPOINT_PERIOD: 2000
TEST:
  EVAL_PERIOD: 2000

These are the categories we are going to detect.

In [None]:
torch.cuda.empty_cache()
gc.collect()

In [None]:
models = [
          '/kaggle/input/dit-publay-finetuned/dit-pub-25000.pth', 
          '/kaggle/input/dit-publay-finetuned/dit-pub-30000.pth',
          '/kaggle/input/dit-publay-finetuned/dit-pub-35000.pth',
          '/kaggle/input/dit-publay-finetuned/dit-pub-40000.pth',
          '/kaggle/input/dit-publay-finetuned/dit-pub-45000.pth',
          '/kaggle/input/dit-publay-finetuned/dit-pub-50000.pth'
         ]

In [None]:
def rebuild_model(inf_cfg):
    model = build_model(inf_cfg)
    _ = DetectionCheckpointer(model).load(inf_cfg.MODEL.WEIGHTS)
    return model

In [None]:
inf_cfg = get_cfg()
add_vit_config(inf_cfg)
inf_cfg.merge_from_file("/kaggle/working/cascade_dit_base.yaml")
inf_cfg.SOLVER.IMS_PER_BATCH = 64
inf_cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128
inf_cfg.MODEL.ROI_HEADS.NUM_CLASSES = 4
inf_cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.25
inf_cfg.MODEL.DEVICE = "cuda"
inf_cfg.DATALOADER.NUM_WORKERS = 2  # lower this if CUDA overflow occurs
inf_cfg.MODEL.WEIGHTS = str(models[5])
inf_cfg.OUTPUT_DIR = str(OUTPUT_DIR)
print("creating cfg.OUTPUT_DIR -> ", inf_cfg.OUTPUT_DIR)
OUTPUT_DIR.mkdir(exist_ok=True)
model = rebuild_model(inf_cfg)
model.eval()

In [None]:
evaluator = COCOEvaluator(
    DATA_REGISTER_VALID, inf_cfg, False, output_dir=inf_cfg.OUTPUT_DIR, use_fast_impl=True
)

val_loader = build_detection_test_loader(inf_cfg, DATA_REGISTER_VALID)

results = inference_on_dataset(
    model, val_loader, evaluator=evaluator
)