# Postraining Optimization Toolkit. Advanced topics.

This notebook considers how to support custom model quantization via POT with Accuracy Checker backend which is a fundamental part of POT responsible for data reading, pre- & post- processing, inference launching and metrics collection.

Accuracy Checker is the tool for models accuracy validation. It has a lot of predefined configuration options for dataset conversion, preprocessing, postprocessing and metric evaluation. However sometimes its capabilities is not enough for running particular model.

Lets duscuss, what should we do in this case.

## Prerequsites


In [None]:
!git clone https://github.com/opencv/open_model_zoo.git

## First of all, sa couple of words about Accuracy Checker architecture.

Accuracy Checker has a  modular structure. This tool is very flexible and easy to extend with new components.

The common aproach of adding new functionality includes the following steps:
1. Create a new class for the component. 
2. The parent class of the new component should be basic abstract class for all objects of extended module.
3. Implement all abstract methods for a base class in the new component.
4. Define the name for configuration in the `__provider__` field.
5. Optionally, if component should have configurable parameters, you need to specify them in `parameters()` method.
6. Finally, you need to register new functionality inside __init__.py of extendable module.

## Case 1: Emotion Recognition

Now we'll go through all these steps on the example of [Emotion Recognition model](https://github.com/onnx/models/tree/master/vision/body_analysis/emotion_ferplus) from onnx model zoo

### Step 0 - Analyze the Model
At the begining, we need to understand which component should we modify.
It's not enough to just convert the model to IR for that reason - we need to get to know the model better. 

During the analysis you should find answers to the following questions:

*   What is the model use case? For which purpose does it used?
*   Which dataset should I use for evaluation?
*   Which preprocessing steps should be preformed?
*   How to retrive results from the model and postprocess them?
*   Which metric should be used?

Let's come back to our emotion recognition model. We can find all needed information from readme - https://github.com/onnx/models/blob/master/vision/body_analysis/emotion_ferplus/README.md

*   What is the model use case? For which purpose does it used? - __emotions_recognition__, __classification__
*   Which dataset should I use for evaluation? **FER+ stored in csv format** 
*   Which preprocessing steps should be preformed? **resize image with antialias interpolation to 64x64 size**
*   How to retrive results from the model and postprocess them? **model returns tensor with shape [N, 8], where N - number images in batch size. It contains probabilities in raw logits format. Possibly softmax operation should be applied.** 
*   Which metric should be used? **emotion recognition is particular case of classification task, so classification accuracy can be suitable for model evaluation**


### Step 1. Support new dataset.

Emotion recognition model was trained on FER+ dataset. It can be download from here - https://www.kaggle.com/deadskull7/fer2013

The dataset represented as csv table with 3 fields:
* *emotion* - ground truth label
* *pixels* - array of pixel intensity which represent gray scale image in size 48x48. Also it make sence to say that they given in flatten format row by row.
* *usage* - split of dataset which image belongs (`Training`, `PublicTest`). For validation and quantization we only interested in validation part, so we should implement image filtering by this field.

Keeping in mind all these details, lets implement converter for dataset.

In [None]:
%%writefile advanced_materials/fer_plus_converter.py
from PIL import Image
import numpy as np
# classes which represent configuration parameters in the code
from ..config import PathField, BoolField
# data type which was generated during annotation conversion
from ..representation import ClassificationAnnotation
from ..utils import read_csv

from .format_converter import BaseFormatConverter, ConverterReturn



class FERPlusFormatConverter(BaseFormatConverter):
    """
    FER+ dataset converter. All annotation converters should be derived from BaseFormatConverter class.
    Annotation data for conversion can be found here https://www.kaggle.com/deadskull7/fer2013
    """

    # register name for this converter
    # this name will be used for converter class look up
    __provider__ = 'fer_plus'
    # specify a hint about generated data type
    annotation_types = (ClassificationAnnotation, )

    @classmethod
    def parameters(cls):
        """
        describe config parsing template for this converter
        :return: dictionary, where config fields used as keys and helpers for config parsing as values.
        """
        # get basic parameters from parent class
        configuration_parameters = super().parameters()
        # update them with new
        configuration_parameters.update({
            'annotation_file': PathField(description="Path to csv file which contain dataset."),
            'convert_images': BoolField(
                optional=True,
                default=False,
                description="Allows to convert images from pickle file to user specified directory."
            ),
            'converted_images_dir': PathField(
                optional=True, is_directory=True, check_exists=False, description="Path to converted images location."
            )
        })

        return configuration_parameters

    def configure(self):
        """
        This method is responsible for obtaining the necessary parameters
        for converting from the command line or config.
        """
        self.csv_file = self.get_value_from_config('annotation_file')
        self.converted_images_dir = self.get_value_from_config('converted_images_dir')
        self.convert_images = self.get_value_from_config('convert_images')
        if self.convert_images and not self.converted_images_dir:
            self.converted_images_dir = self.test_csv_file.parent / 'converted_images'
            if not self.converted_images_dir.exists():
                self.converted_images_dir.mkdir(parents=True)

        if self.convert_images and Image is None:
            raise ValueError(
                "conversion mnist images requires Pillow installation, please install it before usage"
            )

    def convert(self, check_content=False, progress_callback=None, progress_interval=100, **kwargs):
        """
        This method is executed automatically when convert.py is started.
        All arguments are automatically got from command line arguments or config file in method configure

        Returns:
            annotations: list of annotation representation objects.
            meta: dictionary with additional dataset level metadata.
            content errors: service field for errors handling
        """
        annotations = []
        # read original dataset annotation
        annotation_table = read_csv(self.csv_file)
        # process object by object
        for index, annotation in enumerate(annotation_table):
            # ignore data not from testing subset
            if annotation['Usage'] != 'PublicTest':
                continue
            # identifier is unique name of data in the dataset. For images usually file name used
            identifier = '{}.png'.format(index)
            # getting label
            label = int(annotation['emotion'])
            # since our annotation contains pixels intensity inside the table,
            # we need to get images from it for more convenient usage.
            # convert images once, we can turn off this flag in the config and use pregenerated images
            if self.convert_images:
                pixels_array = [int(y) for y in annotation['pixels'].split()]
                pixels = np.array(pixels_array).reshape(48, 48)
                image = Image.fromarray(pixels)
                image = image.convert("L")
                image.save(str(self.converted_images_dir / identifier))
            # create a new instance of the annotation representation
            # different representations can have different set of parameters required for metric calculation
            # for ClassificationAnnotation, identifier and label used.
            annotations.append(ClassificationAnnotation(identifier, label))

        # metadata contains specific info about dataset which can help during the evaluation
        # (e.g. mapping of labels, has background label in the dataset or not)
        # for some task where additional info is not required it can be left empty of None
        meta = self.get_meta()

        # finally, this method should return the named tuple with fields annotations, meta and content errors
        return ConverterReturn(annotations, meta, None)


    def get_meta(self):
        # use original lables from dataset
        emotion_table = {'neutral': 0, 'happiness': 1, 'surprise': 2, 'sadness': 3, 'anger': 4, 'disgust': 5, 'fear': 6,
                         'contempt': 7}
        # inside Accuracy Checker we use class_id as label, so label map should be represented as class_id: class_name
        label_map = {v: k  for v, k in emotion_table.items()}

        # dataset meta should be represented like dictionary, label_map key used for storing label mapping
        return {'label_map': label_map}


Now, lets integrate it to Accuracy Checker

In [None]:
!cp advanced_materials/fer_plus_converter.py open_model_zoo/tools/accuracy_checker/accuracy_checker/annotation_converters

New class registration involves import new functionality inside `__init__.py`

In [None]:
%%writefile open_model_zoo/tools/accuracy_checker/accuracy_checker/annotation_converters/__init__.py
"""
Copyright (c) 2019 Intel Corporation

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
"""

from .format_converter import BaseFormatConverter
from .convert import make_subset, save_annotation, analyze_dataset
from .market1501 import Market1501Converter
from .mars import MARSConverter
from .pascal_voc import PascalVOCDetectionConverter
from .sample_converter import SampleConverter
from .wider import WiderFormatConverter
from .detection_opencv_storage import DetectionOpenCVStorageFormatConverter
from .lfw import LFWConverter
from .vgg_face_regression import VGGFaceRegressionConverter
from .super_resolution_converter import SRConverter, SRMultiFrameConverter
from .imagenet import ImageNetFormatConverter
from .icdar import ICDAR13RecognitionDatasetConverter, ICDAR15DetectionDatasetConverter
from .kondate_nakayosi import KondateNakayosiRecognitionDatasetConverter
from .ms_coco import MSCocoDetectionConverter, MSCocoKeypointsConverter, MSCocoSingleKeypointsConverter
from .cityscapes import CityscapesConverter
from .ncf_converter import MovieLensConverter
from .brats import BratsConverter, BratsNumpyConverter
from .cifar import CifarFormatConverter
from .mnist import MNISTCSVFormatConverter
from .wmt import WMTConverter
from .common_semantic_segmentation import CommonSegmentationConverter
from .camvid import CamVidConverter
from .lpr import LPRConverter
from .image_retrieval import ImageRetrievalConverter
from .cvat_object_detection import CVATObjectDetectionConverter
from .cvat_attributes_recognition import CVATAttributesRecognitionConverter
from .cvat_age_gender_recognition import CVATAgeGenderRecognitionConverter
from .cvat_facial_landmarks import CVATFacialLandmarksRecognitionConverter
from .cvat_text_recognition import CVATTextRecognitionConverter
from .cvat_multilabel_recognition import CVATMultilabelAttributesRecognitionConverter
from .cvat_human_pose import CVATPoseEstimationConverter
from .cvat_person_detection_action_recognition import CVATPersonDetectionActionRecognitionConverter
from .squad import SQUADConverter
from .text_classification import (
    XNLIDatasetConverter,
    BertXNLITFRecordConverter,
    IMDBConverter,
    MRPCConverter,
    CoLAConverter
)
from .cmu_panoptic import CmuPanopticKeypointsConverter
from .action_recognition import ActionRecognitionConverter
from .ms_asl_continuous import MSASLContiniousConverter

from .monocular_depth_perception import ReDWebDatasetConverter

from .fashion_mnist import FashionMnistConverter
from .inpainting import InpaintingConverter
from .fer_plus_converter import FERPlusFormatConverter

__all__ = [
    'BaseFormatConverter',
    'make_subset',
    'save_annotation',
    'analyze_dataset',

    'ImageNetFormatConverter',
    'Market1501Converter',
    'SampleConverter',
    'PascalVOCDetectionConverter',
    'WiderFormatConverter',
    'MARSConverter',
    'DetectionOpenCVStorageFormatConverter',
    'LFWConverter',
    'VGGFaceRegressionConverter',
    'SRConverter',
    'SRMultiFrameConverter',
    'ICDAR13RecognitionDatasetConverter',
    'ICDAR15DetectionDatasetConverter',
    'KondateNakayosiRecognitionDatasetConverter',
    'MSCocoKeypointsConverter',
    'MSCocoSingleKeypointsConverter',
    'MSCocoDetectionConverter',
    'CityscapesConverter',
    'MovieLensConverter',
    'BratsConverter',
    'BratsNumpyConverter',
    'CifarFormatConverter',
    'MNISTCSVFormatConverter',
    'WMTConverter',
    'CommonSegmentationConverter',
    'CamVidConverter',
    'LPRConverter',
    'ImageRetrievalConverter',
    'CVATObjectDetectionConverter',
    'CVATAttributesRecognitionConverter',
    'CVATAgeGenderRecognitionConverter',
    'CVATFacialLandmarksRecognitionConverter',
    'CVATTextRecognitionConverter',
    'CVATMultilabelAttributesRecognitionConverter',
    'CVATPoseEstimationConverter',
    'CVATPersonDetectionActionRecognitionConverter',
    'SQUADConverter',
    'XNLIDatasetConverter',
    'BertXNLITFRecordConverter',
    'IMDBConverter',
    'MRPCConverter',
    'CoLAConverter',
    'CmuPanopticKeypointsConverter',
    'ActionRecognitionConverter',
    'MSASLContiniousConverter',
    'ReDWebDatasetConverter',
    'FashionMnistConverter',
    'InpaintingConverter',
    'FERPlusFormatConverter'
]


### Step 2. Implement the preprocessing

Adding new preprocessor looks simmilar.

In [None]:
# %load advanced_materials/emotion_recognition_preprocessing.py
from PIL import Image
import numpy as np

# base class for all preprocessors is Preprocessor
from ..preprocessor import Preprocessor
# helpers for configuration parsing
from ..config import NumberField


class EmotionRecognitionResize(Preprocessor):
    # name of preprocessor for configuration
    __provider__ = 'emotion_recognition_preprocessing'

    # definition of important configuration parameters
    # for image resizing we need to know target size
    @classmethod
    def parameters(cls):
        parameters = super().parameters()
        parameters.update({
            'size': NumberField(
                value_type=int, optional=False, min_value=1, description="Destination sizes for both dimensions."
            ),
        })

        return parameters

    def configure(self):
        # getting parameters from config
        self.size = self.get_value_from_config('size')

    def process(self, image, annotation_meta=None):
        """
        Preprocessor realization function, which will be called for each image in the input dataset
        :param image: DataRepresentation entry which include read image and related metadata for it.
        :param annotation_meta: Dictionary with info from  annotation.
                                optional, used in case when we need to use or update some info about image
        :return: DataRepresentation with updated image
        """
        # image dasta stored inside DataRepresentation in data field
        data = image.data
        # internally we work with numpy arrays, so we need to convert it to pillow image object for resizing
        resized_data = Image.fromarray(data).resize((self.size, self.size), Image.ANTIALIAS)
        # return back data to numpy array
        data = np.array(resized_data)
        # expand dims for gray scale image
        if len(data.shape) == 2:
            data = np.expand_dims(data, axis=-1)
        image.data = data
        # returns updated DataRepresentation
        return image


In [None]:
!cp advanced_materials/emotion_recognition_preprocessing.py open_model_zoo/tools/accuracy_checker/accuracy_checker/preprocessor

In [None]:
%%writefile open_model_zoo/tools/accuracy_checker/accuracy_checker/preprocessor/__init__.py
"""
Copyright (c) 2019 Intel Corporation

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
"""

from .preprocessing_executor import PreprocessingExecutor
from .preprocessor import Preprocessor
from .color_space_conversion import BgrToRgb, RgbToBgr, BgrToGray, RgbToGray, TfConvertImageDType, SelectInputChannel
from .normalization import Normalize, Normalize3d
from .geometric_transformations import (
    GeometricOperationMetadata,
    Flip,
    Crop,
    CropRect,
    ExtendAroundRect,
    PointAligner,
    Tiling,
    Crop3D,
    TransformedCropWithAutoScale,
    ImagePyramid
)
from .resize import Resize, AutoResize
from .nlp_preprocessors import DecodeByVocabulary, PadWithEOS
from .centernet_preprocessing import CenterNetAffineTransformation
from .brats_preprocessing import Resize3D, NormalizeBrats, CropBraTS, SwapModalitiesBrats
from .inpainting_preprocessor import FreeFormMask, RectMask, CustomMask
from .emotion_recognition_preprocessing import EmotionRecognitionResize

__all__ = [
    'PreprocessingExecutor',

    'Preprocessor',
    'GeometricOperationMetadata',

    'Resize',
    'Resize3D',
    'AutoResize',
    'Flip',
    'Crop',
    'CropRect',
    'ExtendAroundRect',
    'PointAligner',
    'Tiling',
    'Crop3D',
    'CropBraTS',
    'TransformedCropWithAutoScale',
    'ImagePyramid',

    'BgrToGray',
    'BgrToRgb',
    'RgbToGray',
    'RgbToBgr',
    'TfConvertImageDType',
    'SelectInputChannel',

    'Normalize3d',
    'Normalize',
    'NormalizeBrats',

    'SwapModalitiesBrats',

    'DecodeByVocabulary',
    'PadWithEOS',

    'CenterNetAffineTransformation',

    'FreeFormMask',
    'RectMask',
    'CustomMask',
    
    'EmotionRecognitionResize'
]


### Step 3: implement output parsing and postprocessing.
Adapter responsibility is to convert raw model output to  the suitable for metric calculation format. Inside accuracy checker, there is also postprocessor entity. The main difference between adapter and postprocessor is that postprocessor is an optional step in predicted data preparation - filtering, NMS, casting to integer, clipping and so on. Also postprocessor can work with annotation content in some cases (e.g. in popular datasets for semantic segmentation task, annotation represented as png mask where each class represented by specific color, for metric evaluation, we need to convert color to class ids).

In our task, we do not need additional postprocessing steps, so an adapter will be enough.

In [None]:
!cp advanced_materials/emotion_recognition_adapter.py open_model_zoo/tools/accuracy_checker/accuracy_checker/adapters

In [None]:
# %load advanced_materials/emotion_recognition_adapter.py
# base class for all adapters
from ..adapters import Adapter
# output format
from ..representation import ClassificationPrediction


class EmotionRecognitionAdapter(Adapter):
    """
    Class for converting output of emotion recognition model to ClassificationPrediction representation
    """
    # new adapter name in the config
    __provider__ = 'emotion_recognition'
    # like other components adapter might have parameters for configuration, but in our case they are not used
    # we can use default implementation of these parameters

    def process(self, raw, identifiers=None, frame_meta=None):
        """
        Args:
            identifiers: list of input data identifiers
            raw: output of model
            frame_meta: list of meta information about each frame
        Returns:
            list of ClassificationPrediction objects
        """
        # in some cases output can be returned as a list of dictionaries, while dictionary is expected.
        # We need handle it inside extract prediction
        prediction = self._extract_predictions(raw, frame_meta)[self.output_blob]

        # define container to store a batch of predictions as independent entities
        result = []
        # go through batch dementions and extract results for specific image
        for identifier, output in zip(identifiers, prediction):
            # depending on the task output representation can be different and has it's own parameters
            # for classification, identifier and class probabilities are required
            single_prediction = ClassificationPrediction(identifier, output)
            result.append(single_prediction)

        # return list of prediction representations
        return result


In [None]:
%%writefile open_model_zoo/tools/accuracy_checker/accuracy_checker/adapters/__init__.py
"""
Copyright (c) 2019 Intel Corporation

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
"""

from .adapter import Adapter, AdapterField, create_adapter

from .action_recognition import ActionDetection
from .text_detection import (
    TextDetectionAdapter,
    TextProposalsDetectionAdapter,
    EASTTextDetectionAdapter
)

from .text_recognition import (
    BeamSearchDecoder,
    CTCGreedySearchDecoder,
    LPRAdapter
)

from .image_processing import SuperResolutionAdapter
from .attributes_recognition import (
    HeadPoseEstimatorAdapter,
    VehicleAttributesRecognitionAdapter,
    PersonAttributesAdapter,
    AgeGenderAdapter,
    LandmarksRegressionAdapter,
    GazeEstimationAdapter
)

from .reidentification import ReidAdapter
from .detection import (
    SSDAdapter,
    FacePersonAdapter,
    TFObjectDetectionAPIAdapter,
    SSDAdapterMxNet,
    PyTorchSSDDecoder,
    SSDONNXAdapter,
    MTCNNPAdapter,
    RetinaNetAdapter,
    ClassAgnosticDetectionAdapter
)
from .yolo import TinyYOLOv1Adapter, YoloV2Adapter, YoloV3Adapter
from .classification import ClassificationAdapter
from .segmentation import SegmentationAdapter, BrainTumorSegmentationAdapter
from .pose_estimation import HumanPoseAdapter
from .pose_estimation_3d import HumanPose3dAdapter

from .dummy_adapters import XML2DetectionAdapter

from .hit_ratio import HitRatioAdapter

from .mask_rcnn import MaskRCNNAdapter
from .mask_rcnn_with_text import MaskRCNNWithTextAdapter

from .nlp import MachineTranslationAdapter, QuestionAnsweringAdapter

from .centernet import CTDETAdapter

from .mono_depth import MonoDepthAdapter

from .image_inpainting import ImageInpaintingAdapter
from .emotion_recognition_adapter import EmotionRecognitionAdapter

__all__ = [
    'Adapter',
    'AdapterField',
    'create_adapter',

    'XML2DetectionAdapter',

    'ClassificationAdapter',

    'SSDAdapter',
    'FacePersonAdapter',
    'TFObjectDetectionAPIAdapter',
    'SSDAdapterMxNet',
    'SSDONNXAdapter',
    'PyTorchSSDDecoder',
    'MTCNNPAdapter',
    'CTDETAdapter',
    'RetinaNetAdapter',
    'ClassAgnosticDetectionAdapter',

    'SegmentationAdapter',
    'BrainTumorSegmentationAdapter',

    'ReidAdapter',

    'SuperResolutionAdapter',

    'HeadPoseEstimatorAdapter',
    'VehicleAttributesRecognitionAdapter',
    'PersonAttributesAdapter',
    'AgeGenderAdapter',
    'LandmarksRegressionAdapter',
    'GazeEstimationAdapter',

    'TextDetectionAdapter',
    'TextProposalsDetectionAdapter',
    'EASTTextDetectionAdapter',

    'BeamSearchDecoder',
    'LPRAdapter',
    'CTCGreedySearchDecoder',

    'HumanPoseAdapter',
    'HumanPose3dAdapter',

    'ActionDetection',

    'HitRatioAdapter',

    'MaskRCNNAdapter',
    'MaskRCNNWithTextAdapter',

    'MachineTranslationAdapter',
    'QuestionAnsweringAdapter',

    'MonoDepthAdapter',

    'ImageInpaintingAdapter',
    
    'EmotionRecognitionAdapter'
]


### Step 4: reinstall Accuracy Checker to apply new changes

In [None]:
!pip3 install open_model_zoo/tools/accuracy_checker --user

In [None]:
!export PYTHONPATH="$PWD/open_model_zoo/tools/accuracy_checker:$PYTHONPATH"

### Step 5. Creating POT configuration file and running the quantization.

Now, lets use introduced functionality inside POT config

In [None]:
!wget https://github.com/onnx/models/raw/master/vision/body_analysis/emotion_ferplus/model/emotion-ferplus-8.onnx

In [None]:
!python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model emotion-ferplus-8.onnx --input_shape "[1, 1, 64, 64]"

In [None]:
cat advanced_materials/emotion_recognition.json

In [None]:
!pot -c emotion_recognition.json -e

It works! Now we can modify `compression` section content to find quantized model.

## Case 2: Yolo V3

Yolo V3 is complicated from the configuration perspective:
1. labeles start with 0, while in majority of detection models uses dataset with background
2. Model evaluation requires to know some details about output layers (anchors, cell size, e.t.c.)
3. Model has several outputs

Information about all necessary parameters can be found in Accuracy Checker readme.
Beside that, there is also predefined config for model inside OMZ. It can be used as template for your own model evaluation.
AccuracyChecker config for yolo_v3 can be found [here](https://github.com/opencv/open_model_zoo/blob/master/tools/accuracy_checker/configs/yolo-v3-tf.yml)

So you do not need to modify Accuracy Checker for getting correct configuration.

## Case 3: DCSCN

Superresolution network with 5D input tensor, which represents sequence of 5 frames.

At the first glance, it looks like as an unusual case where we need to create new entities for AC, but inside [annotation conversion guide](https://github.com/opencv/open_model_zoo/blob/master/tools/accuracy_checker/accuracy_checker/annotation_converters/README.md) we can find the following converter description:

`multi_frame_super_resolution` - converts dataset for super resolution task with multiple input frames usage.
* `data_dir` - path to folder, where images in low and high resolution are located.
* `lr_suffix` - low resolution file name's suffix (default lr).
* `hr_suffix` - high resolution file name's suffix (default hr).
* `annotation_loader` - the library which will be used for ground truth image reading. Supported: opencv, pillow (Optional. Default value is pillow). Note, color space of an image depends on the loader (OpenCV uses BGR, Pillow uses RGB for image reading).
* `number_input_frames` - the number of input frames per inference.

It sound very simmilar on our use case, does not it?
It can be adopted for our use case:
* handle more complicated image names
* use midle frame as gt reference

More details provided in OMZ [pull request 1083](https://github.com/opencv/open_model_zoo/pull/1083)