# This notebook trains a visual content recognition model on modified *Beyond Words* and DFKV bounding box annotations. 


This notebook finetunes a pre-trained object detection model (Faster-RCNN R50-FPN) to predict bounding boxes around illustrations in historical newspapers, journals and books images. 

This notebook is based on the following notebook:

- https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5

and on the work of :

*Benjamin Charles Germain Lee*
(LOC Innovator-in-Residence)



In [1]:
!python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Collecting git+https://github.com/facebookresearch/detectron2.git
  Cloning https://github.com/facebookresearch/detectron2.git to /tmp/pip-req-build-d3l1tj27
  Running command git clone -q https://github.com/facebookresearch/detectron2.git /tmp/pip-req-build-d3l1tj27


In [2]:
#!git clone https://github.com/facebookresearch/detectron2.git

In [3]:
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)

Mounted at /content/gdrive


In [4]:
%cd gdrive/My Drive/newspaper-navigator/newspaper-navigator/notebooks

/content/gdrive/My Drive/newspaper-navigator/newspaper-navigator/notebooks


# First, we handle imports and data formatting.

This cell imports libraries and constructs a COCO instance using the training and validation JSON files; essentially, this enables the model to handle the data loading using the COCO standard:

In [5]:
# to display images inline
%matplotlib inline

# import some common libraries
import cv2
import random
import glob
import os
import shutil
import json
import math
import numpy as np
import matplotlib.pyplot as plt

# import detectron2, etc.
import detectron2
from detectron2.config import get_cfg
from detectron2.data import DatasetCatalog, MetadataCatalog
from detectron2.data.datasets import register_coco_instances
from detectron2.engine import DefaultTrainer
from detectron2.engine import DefaultPredictor
from detectron2.evaluation import COCOEvaluator
from detectron2.utils.visualizer import Visualizer
from detectron2.utils.visualizer import ColorMode
from detectron2.utils.logger import setup_logger
setup_logger()

# cd into the beyond words dataset
os.chdir("../beyond_words_data")

# we now register the dataset
register_coco_instances("beyond_words_train", {}, "train_annos.json", "images")
register_coco_instances("beyond_words_val", {}, "test_annos.json", "images")
register_coco_instances("beyond_words_combined", {}, "anno_complete.json", "images")

# Next, we visualize some examples.

This cell visualizes some examples from the training set:

In [6]:
# sets random seed for reproducibility
random.seed(42)

dataset_dicts = DatasetCatalog.get("beyond_words_train")
my_metadata = MetadataCatalog.get("beyond_words_val")

n_examples_to_display = 5
for d in random.sample(dataset_dicts, n_examples_to_display):
    print(d["file_name"])
    img = cv2.imread(d["file_name"])
    visualizer = Visualizer(img[:, :, ::-1], metadata=my_metadata, scale=0.5)
    v = visualizer.draw_dataset_dict(d)
    plt.figure(figsize=(15,12))
    plt.imshow(v.get_image()[:, :, ::-1])


Output hidden; open in https://colab.research.google.com to view.

# Next, we finetune the Faster-RCNN implementation from Detectron2's Model Zoo.

We can pick our choice of pre-trained model here: https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md#coco-object-detection-baselines.

The 7 classes of visual content are:

- Photograph
- Illustration
- Map
- Comics/Cartoon
- Editorial Cartoon
- Headline
- Advertisement

Below, we finetune "Faster_rcnn_R_50_FPN_3x" and evaluate mean average precision on the held-out data (80%-20% train-val split); for this demo we run for 10 epochs - feel free to run for more epochs and watch the performance on the validation set improve!

Note that there are lines of code below (currently commented out) for saving intermediate model weights and predictions on the validation set to an S3 bucket for later analysis.


In [7]:
# sets batch size
batch_size = 16
# sets epoch size accordingly (to convert iterations to epochs)
epoch = math.ceil(2748/float(batch_size))
# sets total number of epochs to train for
epoch_num = 10

cfg = get_cfg()
# loads in correct pre-trained model parameters
cfg.merge_from_file("../..//detectron2/configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")
# loads pre-trained model weights (from Model Zoo)
cfg.MODEL.WEIGHTS = "detectron2://COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl"
# loads in training/val data using the registered COCO instance
cfg.DATASETS.TRAIN = ("beyond_words_train",)
cfg.DATASETS.TEST = ("beyond_words_val",)
# sets number of object classes
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1

# makes output directory for weights, etc.
os.makedirs("../model_weights/", exist_ok=True)

# sets output directory for model weights, checkpoints, etc.
cfg.OUTPUT_DIR = '../model_weights/'

# some hyperparameters
cfg.SOLVER.BASE_LR = 0.00025
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 64
cfg.SOLVER.MAX_ITER = epoch
cfg.DATALOADER.NUM_WORKERS = 2
cfg.SOLVER.IMS_PER_BATCH = 8

In [None]:
print("EPOCH 1")


# trains for one epoch
trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=True)  #change here if resuming
trainer.train()

######## FOR SAVING MODEL WEIGHTS AFTER EACH EPOCH TO AN S3 BUCKET #########
# with open("../model_weights/model_final.pth", "rb") as f:
#     s3.upload_fileobj(f, "BUCKET-NAME-HERE", "new_val_model_weights/model_epoch_1.pth")
############################################################################

# evaluates on validation data after one epoch
# metrics are printed out to console
trainer.test(trainer.cfg, trainer.model, COCOEvaluator("beyond_words_val", trainer.cfg, False, trainer.cfg.OUTPUT_DIR))

######## FOR SAVING PREDICTIONS ON VALIDATION SET AFTER EACH EPOCH TO AN S3 BUCKET ########
# with open("../model_weights/coco_instances_results.json", "rb") as f:
#     s3.upload_fileobj(f, "BUCKET-NAME-HERE", "new_val_model_weights/coco_results_epoch_1.json")
###########################################################################################

# trains then evaluates on validation data iteratively for desired number of epochs
for i in range(0, epoch_num-1):
    
    print("EPOCH " + str(i+2))

    # trains again
    cfg.SOLVER.MAX_ITER = epoch*(i+2)
    trainer = DefaultTrainer(cfg) 
    trainer.resume_or_load(resume=True)
    trainer.train()
    
    ######## FOR SAVING MODEL WEIGHTS AFTER EACH EPOCH TO AN S3 BUCKET #########
#     with open("../model_weights/model_final.pth", "rb") as f:
#         s3.upload_fileobj(f, "BUCKET-NAME-HERE", "new_val_model_weights/model_eopch_" + str(i+2) + ".pth")
    ############################################################################   
    
    trainer.test(trainer.cfg, trainer.model, COCOEvaluator("beyond_words_val", trainer.cfg, False, trainer.cfg.OUTPUT_DIR))

    ######## FOR SAVING PREDICTIONS ON VALIDATION SET AFTER EACH EPOCH TO AN S3 BUCKET ########
#     with open("../model_weights/coco_instances_results.json", "rb") as f:
#         s3.upload_fileobj(f, "BUCKET-NAME-HERE", "new_val_model_weights/coco_results_epoch_" + str(i+2) + ".json")
    ###########################################################################################

EPOCH 1
[32m[03/16 08:46:58 d2.engine.defaults]: [0mModel:
GeneralizedRCNN(
  (backbone): FPN(
    (fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (top_block): LastLevelMaxPool()
    (bottom_up): ResNet(
      (stem): BasicStem(
        (conv1): Conv2d(
          3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
 

  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]


[32m[03/16 08:47:11 d2.evaluation.evaluator]: [0mInference done 11/644. Dataloading: 0.0482 s/iter. Inference: 0.0917 s/iter. Eval: 0.0002 s/iter. Total: 0.1402 s/iter. ETA=0:01:28
[32m[03/16 08:47:16 d2.evaluation.evaluator]: [0mInference done 44/644. Dataloading: 0.0582 s/iter. Inference: 0.0920 s/iter. Eval: 0.0002 s/iter. Total: 0.1505 s/iter. ETA=0:01:30
[32m[03/16 08:47:21 d2.evaluation.evaluator]: [0mInference done 76/644. Dataloading: 0.0606 s/iter. Inference: 0.0930 s/iter. Eval: 0.0002 s/iter. Total: 0.1542 s/iter. ETA=0:01:27
[32m[03/16 08:47:26 d2.evaluation.evaluator]: [0mInference done 107/644. Dataloading: 0.0630 s/iter. Inference: 0.0937 s/iter. Eval: 0.0002 s/iter. Total: 0.1572 s/iter. ETA=0:01:24
[32m[03/16 08:47:31 d2.evaluation.evaluator]: [0mInference done 139/644. Dataloading: 0.0627 s/iter. Inference: 0.0938 s/iter. Eval: 0.0002 s/iter. Total: 0.1570 s/iter. ETA=0:01:19
[32m[03/16 08:47:36 d2.evaluation.evaluator]: [0mInference done 172/644. Dataload

# Next, we create a predictor for predicting on examples from the validaton set.

This cell generates a predictor for performing predictions on the validation examples:

In [10]:
# sets the testing confidence threshold
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5
cfg.DATASETS.TEST = ("beyond_words_val", )

cfg.merge_from_file("../..//detectron2/configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1
cfg.MODEL.WEIGHTS = "../model_weights/model_final.pth"

predictor = DefaultPredictor(cfg)

# Lastly, we display the predictions.

This cell shows some sample predictions in the notebook itself:

In [14]:
n_test_to_display = 20

for d in random.sample(DatasetCatalog.get("beyond_words_val"), n_test_to_display):    
    im = cv2.imread(d["file_name"])
    outputs = predictor(im)
    v = Visualizer(im[:, :, ::-1],
                   metadata=MetadataCatalog.get("beyond_words_val"), 
                   scale=1.2   )
    v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
    #v = v.draw_dataset_dict(d)
    plt.figure(figsize=(15,12))
    plt.imshow(v.get_image()[:, :, ::-1])
    
    # if we want to save the images:
    # cv2.imwrite(filepath_here, v.get_image()[:, :, ::-1])


Output hidden; open in https://colab.research.google.com to view.