Skip to content

YOLOv7 Encapsulation

Po-Ting Ko edited this page Jan 14, 2024 · 1 revision

The implementation of ObjectDetector exemplifies an encapsulated design. In SNPEPipeline , it's mentioned that the Inference SDK is generally universal for most models, with the only difference lying in the handling of input and output data, referred to as pre/post-processing. This task is carried out by ObjectDetector.cpp.

Initialize

bool
Detector::init(const std::string &model_path) {
    m_snpe_task = std::move(std::unique_ptr<snpe::SNPEPipeline>(new snpe::SNPEPipeline()));
    if (!m_snpe_task->init(model_path)) {
        std::cerr << "[error] failed to initialize snpe instance" << std::endl;
        return false;
    }
    m_isInit = m_snpe_task->isInit();
    return m_isInit;
}

The basic initialization information includes

  1. Initializing an instance of SNPEPipeline
  2. Allocating memory for post-processing.

Preprocess

After preparing the image, it can be passed to the SNPE instance to get ready for forward inference. However, to minimize influencing factors, in addition to scaling the image, normalization is typically performed before inference

bool
Detector::preprocess(cv::Mat &frame) {
    cv::resize(frame, frame, cv::Size(416, 416), cv::INTER_LINEAR);

    std::vector<float> input_vec;
    for (float y = 0; y < 416; y++) {
        for (float x = 0; x < 416; x++) {
            cv::Vec3b value = frame.at<cv::Vec3b>(y, x);
            float r = static_cast<float>(value[2]) / 255.0f;
            float g = static_cast<float>(value[1]) / 255.0f;
            float b = static_cast<float>(value[0]) / 255.0f;
            input_vec.push_back(r);
            input_vec.push_back(g);
            input_vec.push_back(b);
        }
    }
    m_snpe_task->loadInputTensor(input_vec);
    return true;
}

The core of the pre-processing involves using the loadInputTensor() interface to obtain the starting address for the input layer. The incoming image is then copied to this address.

Postprocess

For different networks, the output of the forward propagation varies. However, for most applications, the primary information includes landmarks and confidence scores. These two types of information constitute the final results of inference. However, for ordinary users, this data may not be visually intuitive. Therefore, it's necessary to perform a certain level of interpretation on these outputs — a step commonly referred to as post-processing.

In the case of Python, the existence of certain libraries makes operations on high-dimensional matrix data very straightforward. Operations can be performed on an entire list along a specific dimension. However, in C++, due to the absence of such libraries, matrix operations often require nested for loops. The process of post-processing typically involves translating Python code into C++ code, and for the sake of convenience, some data copying may be involved

See the ObjectDetector.cpp for more information

Clone this wiki locally