Skip to content

Inference

Po-Ting Ko edited this page Jan 14, 2024 · 3 revisions

According to the SNPE C++ Tutorial - Build the Sample, a basic Inference SDK should include the content shown in the diagram below

image

This section of content is consistent and transparent for all models and generally does not change with variations in the model. The encapsulation of this portion is found in SNPEPipeline.cpp.

Check Available Runtime

Before performing runtime checks, in its constructor, calls the SNPE interface to check the version information of the library, serving as a fundamental check.

static zdl::DlSystem::Version_t version = zdl::SNPE::SNPEFactory::getLibraryVersion();
std::cout << "[info] SNPE version: " << version.asString().c_str() << std::endl;
  • The log will print out while executing.
  • The snpe version used is [info] SNPE version: 2.5.0.4052
if (zdl::SNPE::SNPEFactory::isRuntimeAvailable(zdl::DlSystem::Runtime_t::GPU)) {
    m_runtim = zdl::DlSystem::Runtime_t::GPU;
    std::cout << "[info] SNPE runtim: GPU" << std::endl;
} else {
    m_runtim = zdl::DlSystem::Runtime_t::CPU;
    std::cout << "[info] SNPE runtim: CPU" << std::endl;
}

It calls the corresponding interface to check whether the current platform supports the runtime set by the user. The supported runtimes by SNPE are platform-specific, and the official documentation provides a list of runtimes supported by certain SOC models:

image

The details of the runtime require an understanding of the datasheet for the platform's SOC in use, along with the necessary hardware and driver support. Typically, CPUs and GPUs (Adreno) can run successfully. Therefore, if it’s detects an unsupported runtime, it sets the runtime to CPU.

  • You could use tool snpe-net-run to check the run time is available as well

Load Network

The process of loading the model is relatively straightforward. Simply call the corresponding interface, passing the absolute path of the DLC, and it will automatically parse the DLC to obtain the basic network information.

m_container = zdl::DlContainer::IDlContainer::open(model_path);
  • DLC (Deep Learning Container), as the name suggests, is merely a container for deep learning networks

Set Network Builder Options

  • Notes that SNPE v2.5, due to its use of the C API, no longer supports the chained functions present in the deprecated C++ API of the older version.
zdl::SNPE::SNPEBuilder snpe_builder(m_container.get());
m_snpe = snpe_builder.setOutputLayers(output_layers)
                     .setRuntimeProcessor(m_runtim)
                     .setCPUFallbackMode(true)
                     .setUseUserSuppliedBuffers(false)
                     .setPerformanceProfile(zdl::DlSystem::PerformanceProfile_t::HIGH_PERFORMANCE)
                     .build();

SetOutputLayers

Setting the output layer for the current model implies that SNPE can access the output of any layer in the entire inference process, provided you have made the appropriate settings. This configuration is mandatory; if you haven't specified a particular output layer, the default behavior is to use the last layer of the model as the output. Single-output networks can rely on the default, but for networks like YOLO which typically have three output layers, relying solely on default behavior is not feasible

const char *layers_name[3] = {"Conv_134", "Conv_148", "Conv_162"};
for (auto &name: layers_name) {
    output_layers.append(name);
}

ITensors & UserBuffers

ITensors and UserBuffers represent two types of memory. ITensors correspond to regular user space memory (such as memory allocated with malloc/new), while UserBuffers correspond to DMA (ION memory). The most notable difference in usage is that ITensors involve an additional std::copy compared to UserBuffers. You could see more information here

  • In this project we using ITensors
  • The loadInputTensor() and getOutputTensor() are created as the external API

Execute

The invocation of the inference interface is relatively straightforward. Once the user places the prepared input data, simply make a direct call

bool SNPEPipeline::execute() {
    ...
    m_snpe->execute(input_tmp, output_tmp);
    ...
}