Skip to content
MNN is a lightweight deep neural network inference engine.
C++ C Assembly Objective-C++ Metal CMake Other
Branch: master
Clone or download
liqing beta
- move docs to
- fix bugs for CPU ops TopKV2 and quantized convolution
- add enqueue map buffer error handle for OpenCL
- add nullptr protection for extra tensor desc
- add failure protection for memory acquirement
- fix slice shape calculation
- refactor binary shape calculation
Latest commit 1005c13 Aug 15, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
3rd_party beta Jul 19, 2019
benchmark Merge pull request #180 from lifengcai/master Jul 25, 2019
cmake Suppress CMake warnings: implicitly converting 'string' to 'STRING' type May 13, 2019
demo [Demo:Bugfix] update demo's so Aug 2, 2019
doc beta Aug 15, 2019
include beta Jul 11, 2019
project fix files missing in xcodeproj Aug 8, 2019
resource beta Jul 11, 2019
schema add onnx type for converter Aug 12, 2019
source beta Aug 15, 2019
test beta Jul 25, 2019
tools add onnx type for converter Aug 12, 2019
.gitignore beta 0.1.0 Apr 19, 2019
CMakeLists.txt release Aug 7, 2019
MNN.podspec beta Jun 24, 2019 beta Aug 15, 2019 beta Aug 15, 2019




MNN is a lightweight deep neural network inference engine. It loads models and do inference on devices. At present, MNN has been integrated in more than 20 apps of Alibaba-inc, such as Taobao, Tmall, Youku and etc., covering live broadcast, short video capture, search recommendation, product searching by image, interactive marketing, equity distribution, security risk control and other scenarios. In addition, MNN is also used on embedded devices, such as IoT.



  • Optimized for devices, no dependencies, can be easily deployed to mobile devices and a variety of embedded devices.
  • iOS platform: static library size for armv7+arm64 platforms is about 5MB, size increase of linked executables is about 620KB, and metallib file is about 600KB.
  • Android platform: core so size is about 400KB, OpenCL so is about 400KB, Vulkan so is about 400KB.


  • Supports Tensorflow, Caffe, ONNX, and supports common neural networks such as CNN, RNN, GAN.
  • Supports 86 Tensorflow ops, 34 Caffe ops; MNN ops: 71 for CPU, 55 for Metal, 29 for OpenCL, and 31 for Vulkan.
  • Supports iOS 8.0+, Android 4.3+ and embedded devices with POSIX interface.
  • Supports hybrid computing on multiple devices. Currently supports CPU and GPU. GPU op plugin can be loaded dynamically to replace default (CPU) op implementation.

High performance

  • Implements core computing with lots of optimized assembly code to make full use of the ARM CPU.
  • For iOS, GPU acceleration (Metal) can be turned on, which is faster than Apple's native CoreML.
  • For Android, OpenCL, Vulkan, and OpenGL are available and deep tuned for mainstream GPUs (Adreno and Mali).
  • Convolution and transposition convolution algorithms are efficient and stable. The Winograd convolution algorithm is widely used to better symmetric convolutions such as 3x3 -> 7x7.
  • Additional optimizations for the new architecture ARM v8.2 with half-precision calculation support.

Easy to use

  • Efficient image processing module, speeding up affine transform and color space transform without libyuv or opencv.
  • Provides callbacks throughout the workflow to extract data or control the execution precisely.
  • Provides options for selecting inference branch and paralleling branches on CPU and GPU.



MNN can be divided into two parts: Converter and Interpreter.

Converter consists of Frontends and Graph Optimize. The former is responsible for supporting different training frameworks. MNN currently supports Tensorflow, Tensorflow Lite, Caffe and ONNX (PyTorch/MXNet); the latter optimizes graphs by operator fusion, operator substitution, and layout adjustment.

Interpreter consists of Engine and Backends. The former is responsible for the loading of the model and the scheduling of the calculation graph; the latter includes the memory allocation and the Op implementation under each computing device. In Engine and Backends, MNN applies a variety of optimization schemes, including applying Winograd algorithm in convolution and deconvolution, applying Strassen algorithm in matrix multiplication, low-precision calculation, Neon optimization, hand-written assembly, multi-thread optimization, memory reuse, heterogeneous computing, etc.

Quick start




Scan QR code to join DingDing discussion group.


Apache 2.0


MNN participants: Taobao Technology Department, Search Engineering Team, DAMO Team, Youku and other group employees.

MNN refers to the following projects:

You can’t perform that action at this time.