beta 0.1.1.3

- fix benchmark script for older version adb - add FAQ.md - add environment requirement in Install.md - add coeff in Eltwise Op - fix bugs in strassen 1x1 data preparation - add download failure process in get_model.sh
alibaba · May 17, 2019 · 2fc8a20 · 2fc8a20
1 parent ab7a871
commit 2fc8a20
Show file tree

Hide file tree

Showing 13 changed files with 245 additions and 55 deletions.
diff --git a/benchmark/bench_android.sh b/benchmark/bench_android.sh
@@ -62,7 +62,8 @@ function bench_android() {
     find . -name "*.so" | while read solib; do
         adb push $solib  $ANDROID_DIR
     done
-    adb push benchmark.out timeProfile.out $ANDROID_DIR
+    adb push benchmark.out $ANDROID_DIR
+    adb push timeProfile.out $ANDROID_DIR
     adb shell chmod 0777 $ANDROID_DIR/benchmark.out
 
     if [ "" != "$PUSH_MODEL" ]; then

diff --git a/benchmark/models/vgg16.mnn b/benchmark/models/vgg16.mnn
diff --git a/doc/FAQ.md b/doc/FAQ.md
@@ -0,0 +1,111 @@
+## Compiling FAQ
+### Environment Requirement
+
+cmake 3.10+
+gcc 4.9+
+protobuf 3.0+
+
+__Remember to run cmake again after upgrading gcc.__
+
+
+### schema/generate.sh Relative Errors
+
+``` shell
+*** building flatc ***
+CMake Error: Could not find CMAKE_ROOT !!!
+```
+
+If the script fails with error above, your CMake was not installed correctly. 
+
+Try```sudo apt install extra-cmake-modules```or```export CMAKE_ROOT=/path/to/where_cmake_installed```to fix it.
+
+__Remember to run schema/generate.sh after editing schema (*.proto).__
+
+
+### tools/script/get_model.sh Relative Errors
+
+``` shell
+Could NOT find Protobuf (missing: Protobuf_INCLUDE_DIR)
+```
+
+``` shell
+Unrecognized syntax identifier "proto3".  This parser only recognizes "proto2".
+```
+
+If the script fails with errors above, your protobuf was not installed correctly. Follow [Protobuf's Installation Instructions](https://github.com/protocolbuffers/protobuf/blob/master/src/README.md) to install it.
+
+If there are multiple protobufs are installed and conflicts with each other, you could try solutions below:
+
+``` shell
+which protoc
+# comment the output path in .bashrc if it do NOT direct to the correct protoc.
+source .bashrc
+sudo ldconfig
+```
+
+or
+
+``` shell
+# uninstall
+sudo apt-get remove libprotobuf-dev
+sudo apt-get remove protobuf-compiler
+sudo apt-get remove python-protobuf
+sudo rm -rf /usr/local/bin/protoc
+sudo rm -rf /usr/bin/protoc
+sudo rm -rf /usr/local/include/google
+sudo rm -rf /usr/local/include/protobuf*
+sudo rm -rf /usr/include/google
+sudo rm -rf /usr/include/protobuf*
+
+# install
+sudo apt-get update
+sudo ldconfig
+sudo apt-get install libprotobuf* protobuf-compiler python-protobuf
+```
+
+### Cross-compile on Windows
+
+Cross-compile on Windows is not supported currently. You may try https://github.com/microsoft/Terminal with Linux subsystem including.
+
+
+### Quantized Models
+
+We support TensorFlow Quantized Models for now. And we plan to provide a model quantizing tool based on MNN model format, which is training free.
+
+
+### Unsupported Operations
+
+``` shell
+opConverter ==> MNN Converter NOT_SUPPORTED_OP: [ ANY_OP_NAME ]
+```
+
+If the MNNConverter fails with error above, one or more operations are not supported by MNN. You could submit an issue or leave a comment at pinned issue. If you want to implement it yourself, You can follow [our guide](AddOp_EN.md). Pull requests are always welcome.
+
+
+__TensorFlow SSD model is not supported -- usage of TensorFlow Object API produces some unsupported control logic operations in post-processing part. And the TensorFlow SSD model is not as efficient as Caffe SSD model. So, it is recommended to use the Caffe version SSD model.__
+
+
+## Runtime FAQ
+
+### What is NC4HW4 Format ?
+
+The difference between NCHW and NC4HW4 is just like the difference between color representing method planar and chunky. Imagine a 2x2 RGBA image, in planar representing (NCHW), its storage would be `RRRRGGGGBBBBAAAA`; and in chunky representing (NC4HW4), its storage would be `RGBARGBARGBARGBA`. In MNN, we pack each 4 channels for floats or 8 channels for int8s to gain better performance with SIMD.
+
+You can obtain tensor's format through ```TensorUtils::getDescribe(tensor)->dimensionFormat```. If it returns `MNN_DATA_FORMAT_NC4HW4`, the channel dim is packed, which may cause tensor's elementSize be greater than product of each dimension.
+
+### How to Convert Between Formats ?
+
+You can convert tensor format using codes below:
+
+
+``` c++
+auto srcTensor = Tensor::create({1, 224, 224, 3}, Tensor::TENSORFLOW);
+// ... set srcTensor data
+auto dstTensor = net->getSessionInput(session, NULL);
+dstTensor->copyFromHostTensor(srcTensor);
+```
+
+### Why does output tensor data copying so slow on GPU backend?
+
+If you do not wait for GPU inference to be finished (through runSessionWithCallback with sync), copyToHostTensor has to wait for it before copying data.
+
diff --git a/doc/Install_CN.md b/doc/Install_CN.md
@@ -27,7 +27,7 @@
 ## Linux|arm|aarch64|Darwin
 ### 本地编译
 步骤如下：
-1. 安装cmake（建议使用3.10或以上版本）
+1. 安装cmake（建议使用3.10或以上版本）、protobuf（使用3.0或以上版本）、gcc（使用4.9或以上版本）
 2. `cd /path/to/MNN`
 3. `./schema/generate.sh`
 4. `./tools/script/get_model.sh`（可选，模型仅demo工程需要）
@@ -72,7 +72,7 @@ make -j4
 ## Android
 
 步骤如下:
-1. 安装cmake（建议使用3.10或以上版本）
+1. 安装cmake（建议使用3.10或以上版本）、protobuf（使用3.0或以上版本）、gcc（使用4.9或以上版本）
 2. 在`https://developer.android.com/ndk/downloads/`下载安装NDK，最好不要超过r17、r18及之后的ndk版本（否则，无法使用gcc编译，且clang在编译32位的so时有bug）
 3. 在 .bashrc 或者 .bash_profile 中设置 NDK 环境变量，eg: export ANDROID_NDK=/Users/username/path/to/android-ndk-r14b
 4. `cd /path/to/MNN`
@@ -84,4 +84,9 @@ make -j4
 
 ## iOS
 
-在macOS下，用Xcode打开project/ios/MNN.xcodeproj，点击编译即可
+步骤如下：
+1. 安装protobuf（使用3.0或以上版本）
+2. `cd /path/to/MNN`
+3. `./schema/generate.sh`
+4. `./tools/script/get_model.sh`（可选，模型仅demo工程需要）
+5. 在macOS下，用Xcode打开project/ios/MNN.xcodeproj，点击编译即可
diff --git a/doc/Install_EN.md b/doc/Install_EN.md
@@ -25,10 +25,10 @@ Defaults `OFF`, When `ON`, build the Metal backend, apply GPU according to setti
 ## Linux|arm|aarch64|Darwin
 
 ### Build on Host
-1. Install cmake(cmake version >=3.10 is recommended)
+1. Install cmake (version >= 3.10 is recommended), protobuf (version >= 3.0 is required) and gcc (version >= 4.9 is required)
 2. `cd /path/to/MNN`
 3. `./schema/generate.sh`
-4. `./tools/script/get_model.sh`(optional, models are required only in demo project)
+4. `./tools/script/get_model.sh`(optional, models are needed only in demo project)
 5. `mkdir build && cd build && cmake .. && make -j4`
 
 Then you will get the MNN library(libMNN.so)
@@ -70,16 +70,20 @@ make -j4
 
 ## Android
 
-1. Install cmake(cmake version >=3.10 is recommended)
+1. Install cmake (version >=3.10 is recommended), protobuf (version >= 3.0 is required) and gcc (version >= 4.9 is required)
 2. [Download and Install NDK](https://developer.android.com/ndk/downloads/), download the the version before r17 is strongly recommended (otherwise cannot use gcc to build, and building armv7 with clang possibly will get error)
 3. Set ANDROID_NDK path, eg: `export ANDROID_NDK=/Users/username/path/to/android-ndk-r14b`
 4. `cd /path/to/MNN`
 5. `./schema/generate.sh`
-6. `./tools/script/get_model.sh`(optional, models are required only in demo project)
+6. `./tools/script/get_model.sh`(optional, models are needed only in demo project)
 7. `cd project/android`
 8. Build armv7 library: `mkdir build_32 && cd build_32 && ../build_32.sh`
 9. Build armv8 library: `mkdir build_64 && cd build_64 && ../build_64.sh`
 
 ## iOS
 
-open [MNN.xcodeproj](../project/ios/) with Xcode on macOS, then build.
+1. Install protobuf (version >= 3.0 is required)
+2. `cd /path/to/MNN`
+3. `./schema/generate.sh`
+4. `./tools/script/get_model.sh`(optional, models are needed only in demo project)
+5. open [MNN.xcodeproj](../project/ios/) with Xcode on macOS, then build.
diff --git a/schema/default/CaffeOp.fbs b/schema/default/CaffeOp.fbs
@@ -162,6 +162,7 @@ enum EltwiseType : byte {
 
 table Eltwise {
     type:EltwiseType;
+    coeff:[float];
 }
 
 table Flatten {

diff --git a/source/backend/cpu/CPUEltwise.cpp b/source/backend/cpu/CPUEltwise.cpp
@@ -20,14 +20,40 @@
 
 namespace MNN {
 
+CPUEltwise::CPUEltwise(Backend *b, const MNN::Op *op) : Execution(b) {
+    auto eltwiseParam = op->main_as_Eltwise();
+    mType             = eltwiseParam->type();
+
+    // keep compatible with old model
+    if (eltwiseParam->coeff()) {
+        const int size = eltwiseParam->coeff()->size();
+        mCoeff.resize(size);
+        memcpy(mCoeff.data(), eltwiseParam->coeff()->data(), size * sizeof(float));
+    }
+}
+
 ErrorCode CPUEltwise::onExecute(const std::vector<Tensor *> &inputs, const std::vector<Tensor *> &outputs) {
     auto inputTensor = inputs[0];
     const int size   = inputTensor->elementSize();
     auto sizeQuad    = size / 4;
 
-    auto outputTensor = outputs[0];
-    auto outputHost   = outputTensor->host<float>();
-    auto proc         = MNNMatrixProd;
+    auto outputTensor    = outputs[0];
+    auto outputHost      = outputTensor->host<float>();
+    const auto input0Ptr = inputs[0]->host<float>();
+
+    const int coeffSize = mCoeff.size();
+    bool isIdentity     = coeffSize >= 2;
+    if (isIdentity) {
+        // when Eltwise has coeff
+        if (mCoeff[0] == 1.0f && mCoeff[1] == 0.0f) {
+            memcpy(outputHost, input0Ptr, inputs[0]->size());
+            return NO_ERROR;
+        } else {
+            return NOT_SUPPORT;
+        }
+    }
+
+    auto proc = MNNMatrixProd;
     switch (mType) {
         case EltwiseType_PROD:
             proc = MNNMatrixProd;
@@ -44,7 +70,7 @@ ErrorCode CPUEltwise::onExecute(const std::vector<Tensor *> &inputs, const std::
     }
 
     auto inputT1 = inputs[1];
-    proc(outputHost, inputs[0]->host<float>(), inputT1->host<float>(), sizeQuad, 0, 0, 0, 1);
+    proc(outputHost, input0Ptr, inputT1->host<float>(), sizeQuad, 0, 0, 0, 1);
     for (int i = 2; i < inputs.size(); ++i) {
         proc(outputHost, outputHost, inputs[i]->host<float>(), sizeQuad, 0, 0, 0, 1);
     }
@@ -55,8 +81,7 @@ class CPUEltwiesCreator : public CPUBackend::Creator {
 public:
     virtual Execution *onCreate(const std::vector<Tensor *> &inputs, const std::vector<Tensor *> &outputs,
                                 const MNN::Op *op, Backend *backend) const {
-        auto elt = op->main_as_Eltwise();
-        return new CPUEltwise(backend, elt->type());
+        return new CPUEltwise(backend, op);
     }
 };
 REGISTER_CPU_OP_CREATOR(CPUEltwiesCreator, OpType_Eltwise);

diff --git a/source/backend/cpu/CPUEltwise.hpp b/source/backend/cpu/CPUEltwise.hpp
@@ -15,14 +15,13 @@
 namespace MNN {
 class CPUEltwise : public Execution {
 public:
-    CPUEltwise(Backend *b, MNN::EltwiseType type) : Execution(b), mType(type) {
-        // nothing to do
-    }
+    CPUEltwise(Backend *b, const MNN::Op *op);
     virtual ~CPUEltwise() = default;
     virtual ErrorCode onExecute(const std::vector<Tensor *> &inputs, const std::vector<Tensor *> &outputs) override;
 
 private:
     EltwiseType mType;
+    std::vector<float> mCoeff;
 };
 
 } // namespace MNN

diff --git a/source/backend/cpu/compute/Convolution1x1Strassen.cpp b/source/backend/cpu/compute/Convolution1x1Strassen.cpp
@@ -118,13 +118,13 @@ ErrorCode Convolution1x1Strassen::onResize(const std::vector<Tensor *> &inputs,
             for (oyStart = 0; oyStart * strideY - padY < 0; ++oyStart) {
                 // do nothing
             }
-            for (oyEnd = oh - 1; oyEnd * strideY - padY >= ih - 1; --oyEnd) {
+            for (oyEnd = oh - 1; oyEnd * strideY - padY >= ih; --oyEnd) {
                 // do nothing
             }
             for (oxStart = 0; oxStart * strideX - padX < 0; ++oxStart) {
                 // do nothing
             }
-            for (oxEnd = oh - 1; oxEnd * strideX - padX >= iw - 1; --oxEnd) {
+            for (oxEnd = ow - 1; oxEnd * strideX - padX >= iw; --oxEnd) {
                 // do nothing
             }
             int oyCount       = oyEnd - oyStart + 1;

diff --git a/tools/converter/source/MNNDump2Json.cpp b/tools/converter/source/MNNDump2Json.cpp
@@ -40,6 +40,8 @@ int main(int argc, const char** argv) {
             } else if (type == MNN::OpParameter::OpParameter_MatMul) {
                 opParam->main.AsMatMul()->weight.clear();
                 opParam->main.AsMatMul()->bias.clear();
+            } else if (type == MNN::OpParameter::OpParameter_PRelu) {
+                opParam->main.AsPRelu()->slope.clear();
             }
         }
         flatbuffers::FlatBufferBuilder newBuilder(1024);

diff --git a/tools/converter/source/caffe/Eltwise.cpp b/tools/converter/source/caffe/Eltwise.cpp
@@ -7,6 +7,7 @@
 //
 
 #include "OpConverter.hpp"
+#include "logkit.h"
 
 class EltWise : public OpConverter {
 public:
@@ -26,8 +27,8 @@ class EltWise : public OpConverter {
 void EltWise::run(MNN::OpT* dstOp, const caffe::LayerParameter& parameters, const caffe::LayerParameter& weight) {
     auto elt          = new MNN::EltwiseT;
     dstOp->main.value = elt;
-    auto& c           = parameters.eltwise_param();
-    switch (c.operation()) {
+    auto& caffeParam  = parameters.eltwise_param();
+    switch (caffeParam.operation()) {
         case caffe::EltwiseParameter_EltwiseOp_MAX:
             elt->type = MNN::EltwiseType_MAXIMUM;
             break;
@@ -41,5 +42,11 @@ void EltWise::run(MNN::OpT* dstOp, const caffe::LayerParameter& parameters, cons
         default:
             break;
     }
+
+    const int coffSize = caffeParam.coeff_size();
+    elt->coeff.resize(coffSize);
+    for (int i = 0; i < coffSize; ++i) {
+        elt->coeff[i] = caffeParam.coeff(i);
+    }
 }
 static OpConverterRegister<EltWise> a("Eltwise");