Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Runtime] EdgeTPU runtime for Coral Boards #4698

Merged
merged 16 commits into from Jan 16, 2020

Conversation

tmoreau89
Copy link
Contributor

@tmoreau89 tmoreau89 commented Jan 13, 2020

This PR extends the TFLite runtime to support edgeTPU-equipped Coral boards in order to measure inference time of models on edgeTPU with TVM RPC.

Instructions to run the EdgeTPU runtime experiments

Coral Board setup

You'll need to follow these instructions: https://coral.ai/docs/dev-board/get-started/

# Clone TensorFlow, and prepare the library dir
# Note the older version of TF that we'll need to use
git clone https://github.com/tensorflow/tensorflow --recursive --branch=1.8.0
cd tensorflow
mkdir tensorflow/lite/tools/make/gen
mkdir tensorflow/lite/tools/make/gen/generic-aarch64_armv8-a
mkdir tensorflow/lite/tools/make/gen/generic-aarch64_armv8-a/lib

# TF dependence
cd ~ && git clone https://github.com/google/flatbuffers.git
cd flatbuffers && cmake -G "Unix Makefiles" && make && sudo make install

# EdgeTPU lib
cd ~ && git clone https://github.com/google-coral/edgetpu.git

Cross compile tflite static library on x86 machine

# Prerequisites 
sudo apt-get update
sudo apt-get install crossbuild-essential-arm64

# cross-compile tflite library (note you need to use older version)
git clone https://github.com/tensorflow/tensorflow.git --recursive --branch=1.8.0
cd tensorflow
./tensorflow/lite/tools/make/download_dependencies.sh
./tensorflow/lite/tools/make/build_aarch64_lib.sh
# Copy the tensorflow lib over to your coral board
scp tensorflow/lite/tools/make/gen/generic-aarch64_armv8-a/lib/libtensorflow-lite.a  mendel@coral:/home/mendel/tensorflow/tensorflow/lite/tools/make/gen/generic-aarch64_armv8-a/lib/

Build TVM runtime on Coral Board

cd ~ && git clone --recursive --branch=master https://github.com/apache/incubator-tvm.git tvm
cd tvm && mkdir build && cp cmake/config.cmake build
echo 'set(USE_GRAPH_RUNTIME_DEBUG ON)' >> build/config.cmake
echo 'set(USE_TFLITE ON)' >> build/config.cmake
echo 'set(USE_TENSORFLOW_PATH /home/mendel/tensorflow)' >> build/config.cmake
echo 'set(USE_EDGETPU /home/mendel/edgetpu)' >> build/config.cmake
cd build && cmake ..
make runtime -j4

Execute the RPC server on Coral

First, follow this guide to set up a tracker for your remote devices: https://docs.tvm.ai/tutorials/autotvm/tune_relay_arm.html#start-rpc-tracker.
On the coral, once TVM runtime has been built, execute:

PYTHONPATH=/home/mendel/tvm/python:$PYTHONPATH python3 -m tvm.exec.rpc_server --tracker $TVM_TRACKER_HOST:$TVM_TRACKER_NODE --key coral

Evaluate MobileNet on Coral board

Execute the following python script:

import numpy as np

import tvm
from tvm import autotvm, relay
from tvm.contrib import tflite_runtime

target = "cpu"

# Note: replace "tracker" and 9191 with your tracker host and port name
remote = autotvm.measure.request_remote("coral", "tracker", 9191, timeout=60)
ctx = remote.cpu(0)

if target == "edge_tpu":
    tflite_fp = "mobilenet_v2_1.0_224_quant_edgetpu.tflite"
else:
    tflite_fp = "mobilenet_v2_1.0_224_quant.tflite"
input_data = np.random.rand(1,224,224,3).astype("uint8")
with open(tflite_fp, 'rb') as f:
    runtime = tflite_runtime.create(f.read(), ctx, runtime_target=target)
    runtime.set_input(0, tvm.nd.array(input_data, ctx))
    ftimer = runtime.module.time_evaluator("invoke", ctx,
            number=10,
            repeat=3)
    times = np.array(ftimer().results) * 1000
    print("It took {0:.2f}ms to run mobilenet".format(np.mean(times)))

Upon running it, you'll get:
It took 143.74ms to run mobilenet

Now, set target = "edge_tpu" and you'll get:
It took 3.22ms to run mobilenet

Notable interface changes

  • The TFLite runtime API does not expose the allocate() method anymore, and tensor allocation is done as part of the initialization process.

@@ -18,7 +18,7 @@
from .._ffi.function import get_global_func
from ..rpc import base as rpc_base

def create(tflite_model_bytes, ctx):
def create(tflite_model_bytes, ctx, target_edgetpu=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of a boolean argument, try to use a target string for future expansion: target='edge_tpu'/'cpu'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the suggestion, I've made the changes

return TFLiteModule(fcreate(bytearray(tflite_model_bytes), ctx))
fcreate = get_global_func("tvm.tflite_runtime.create")
if target_edgetpu:
fcreate = get_global_func("tvm.edgetpu_runtime.create")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these two create function share the same arguments, we can unify them as one create function with different returned runtime

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unification here is less desired due to the fact that we won't always want to build the edgeTPU runtime when building the TFLite runtime. The limitation is that we need to build TVM with the edgeTPU library which comes in a separate repo; it's an extra software dependence that is not always wanted for users of vanilla TFLite.

ctx_ = ctx;
}
// Build interpreter
if (tflite::InterpreterBuilder(*model, resolver)(&interpreter_) != kTfLiteOk) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can define macro for TFLite error checking: CHECK_STATUS(cond, msg)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the suggestion, this should be fixed by now

@ZihengJiang
Copy link
Contributor

for allocate, does tflite runtime removed the AllocateTensors API or just EdgeTPU does not need it?

@tmoreau89
Copy link
Contributor Author

@ZihengJiang thanks for the feedback! The TFLite Interpreter still has AllocateTensors; however I wan't sure if we'd ever need to call it separately from interpreter initialization. If you believe that we need to decouple them, I can revert the interface change.

@tmoreau89
Copy link
Contributor Author

@ZihengJiang I should have addressed all of your comments by now; let me know if you're happy with the changes

@ZihengJiang
Copy link
Contributor

Looks good! Thanks! @tmoreau89

@tqchen tqchen merged commit 31021d2 into apache:master Jan 16, 2020
@tmoreau89 tmoreau89 deleted the tflite_runtime branch February 13, 2020 21:26
alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 26, 2020
alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 28, 2020
zhiics pushed a commit to neo-ai/tvm that referenced this pull request Mar 2, 2020
@Msabih
Copy link

Msabih commented Aug 5, 2020

@tmoreau89
I have tried the setup with the same versions of tvm/tensorflow on the host and the board and the "cpu" part of the inference works fine. But when I set the target to edge_tpu, I get this error on the rpc server

ERROR: Internal: Unsupported data type: 0
ERROR: Node number 0 (edgetpu-custom-op) failed to prepare

And on the host machine, it says

 File "tvm_inference.py", line 21, in <module>
    runtime = tflite_runtime.create(f.read(), ctx, runtime_target=target)

  File "/home/sabih/Documents/phd_work/MAP_WORK/tvm_env/tvm/python/tvm/contrib/tflite_runtime.py", line 49, in create
    return TFLiteModule(fcreate(bytearray(tflite_model_bytes), ctx))

  File "/home/sabih/Documents/phd_work/MAP_WORK/tvm_env/tvm/python/tvm/_ffi/_ctypes/function.py", line 207, in __call__
    raise get_last_ffi_error()

tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (3) /tvm_env/tvm/build/libtvm.so(TVMFuncCall+0x69) [0x7f2fb63f8489]
  [bt] (2) /tvm_env/tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::RPCModuleNode::WrapRemote(void*)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x46) [0x7f2fb644ad36]
  [bt] (1) /tvm_env/tvm/build/libtvm.so(tvm::runtime::RPCSession::CallFunc(void*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, void* (*)(int, tvm::runtime::TVMArgValue const&), tvm::runtime::PackedFunc const*)+0x2c8) [0x7f2fb6454168]
  [bt] (0) /tvm_env/tvm/build/libtvm.so(+0xc21d6b) [0x7f2fb6450d6b]
  File "/tvm_env/tvm/src/runtime/rpc/rpc_session.cc", line 993
TVMError: Check failed: code == RPCCode: :kReturn: code=4

The inference directly on the edge TPU works fine.

@amai-gsu
Copy link

amai-gsu commented Nov 8, 2023

@tmoreau89 I have tried the setup with the same versions of tvm/tensorflow on the host and the board and the "cpu" part of the inference works fine. But when I set the target to edge_tpu, I get this error on the rpc server

ERROR: Internal: Unsupported data type: 0
ERROR: Node number 0 (edgetpu-custom-op) failed to prepare

And on the host machine, it says

 File "tvm_inference.py", line 21, in <module>
    runtime = tflite_runtime.create(f.read(), ctx, runtime_target=target)

  File "/home/sabih/Documents/phd_work/MAP_WORK/tvm_env/tvm/python/tvm/contrib/tflite_runtime.py", line 49, in create
    return TFLiteModule(fcreate(bytearray(tflite_model_bytes), ctx))

  File "/home/sabih/Documents/phd_work/MAP_WORK/tvm_env/tvm/python/tvm/_ffi/_ctypes/function.py", line 207, in __call__
    raise get_last_ffi_error()

tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (3) /tvm_env/tvm/build/libtvm.so(TVMFuncCall+0x69) [0x7f2fb63f8489]
  [bt] (2) /tvm_env/tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::RPCModuleNode::WrapRemote(void*)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x46) [0x7f2fb644ad36]
  [bt] (1) /tvm_env/tvm/build/libtvm.so(tvm::runtime::RPCSession::CallFunc(void*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, void* (*)(int, tvm::runtime::TVMArgValue const&), tvm::runtime::PackedFunc const*)+0x2c8) [0x7f2fb6454168]
  [bt] (0) /tvm_env/tvm/build/libtvm.so(+0xc21d6b) [0x7f2fb6450d6b]
  File "/tvm_env/tvm/src/runtime/rpc/rpc_session.cc", line 993
TVMError: Check failed: code == RPCCode: :kReturn: code=4

The inference directly on the edge TPU works fine.
have you solved this issue? i got a same one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants