-
Notifications
You must be signed in to change notification settings - Fork 3
Inference
According to the SNPE C++ Tutorial - Build the Sample, a basic Inference SDK should include the content shown in the diagram below
This section of content is consistent and transparent for all models and generally does not change with variations in the model. The encapsulation of this portion is found in SNPEPipeline.cpp
.
Before performing runtime checks, in its constructor, calls the SNPE interface to check the version information of the library, serving as a fundamental check.
static zdl::DlSystem::Version_t version = zdl::SNPE::SNPEFactory::getLibraryVersion();
std::cout << "[info] SNPE version: " << version.asString().c_str() << std::endl;
- The log will print out while executing.
- The snpe version used is
[info] SNPE version: 2.5.0.4052
if (zdl::SNPE::SNPEFactory::isRuntimeAvailable(zdl::DlSystem::Runtime_t::GPU)) {
m_runtim = zdl::DlSystem::Runtime_t::GPU;
std::cout << "[info] SNPE runtim: GPU" << std::endl;
} else {
m_runtim = zdl::DlSystem::Runtime_t::CPU;
std::cout << "[info] SNPE runtim: CPU" << std::endl;
}
It calls the corresponding interface to check whether the current platform supports the runtime set by the user. The supported runtimes by SNPE are platform-specific, and the official documentation provides a list of runtimes supported by certain SOC models:
The details of the runtime require an understanding of the datasheet for the platform's SOC in use, along with the necessary hardware and driver support. Typically, CPUs and GPUs (Adreno) can run successfully. Therefore, if it’s detects an unsupported runtime, it sets the runtime to CPU
.
- You could use tool
snpe-net-run
to check the run time is available as well
The process of loading the model is relatively straightforward. Simply call the corresponding interface, passing the absolute path of the DLC, and it will automatically parse the DLC to obtain the basic network information.
m_container = zdl::DlContainer::IDlContainer::open(model_path);
-
DLC (Deep Learning Container)
, as the name suggests, is merely a container for deep learning networks
- Notes that SNPE
v2.5
, due to its use of the C API, no longer supports the chained functions present in the deprecated C++ API of the older version.
zdl::SNPE::SNPEBuilder snpe_builder(m_container.get());
m_snpe = snpe_builder.setOutputLayers(output_layers)
.setRuntimeProcessor(m_runtim)
.setCPUFallbackMode(true)
.setUseUserSuppliedBuffers(false)
.setPerformanceProfile(zdl::DlSystem::PerformanceProfile_t::HIGH_PERFORMANCE)
.build();
Setting the output layer for the current model implies that SNPE can access the output of any layer in the entire inference process, provided you have made the appropriate settings. This configuration is mandatory; if you haven't specified a particular output layer, the default behavior is to use the last layer of the model as the output. Single-output networks can rely on the default, but for networks like YOLO
which typically have three output layers, relying solely on default behavior is not feasible
const char *layers_name[3] = {"Conv_134", "Conv_148", "Conv_162"};
for (auto &name: layers_name) {
output_layers.append(name);
}
ITensors and UserBuffers represent two types of memory. ITensors correspond to regular user space memory (such as memory allocated with malloc/new), while UserBuffers correspond to DMA (ION memory). The most notable difference in usage is that ITensors involve an additional std::copy
compared to UserBuffers. You could see more information here
- In this project we using
ITensors
- The
loadInputTensor()
andgetOutputTensor()
are created as the external API
The invocation of the inference interface is relatively straightforward. Once the user places the prepared input data, simply make a direct call
bool SNPEPipeline::execute() {
...
m_snpe->execute(input_tmp, output_tmp);
...
}