[RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models #3567

ajtulloch · 2019-07-18T00:24:01Z

Summary

This is an alternative implementation of a subset of the TVM runtime API (and
graph runtime) that focuses entirely on reducing code size, at the expense of
functionality (no tvm.extern(..) calls via PackedFunc, CPU only, etc). It might
be worth incrementally expanding the surface area if there's interest.

Motivation

The motivation for this work was seeing what the minimal useful subset of the
TVM runtime is. This is relevant for e.g. super code-size constrained
applications in e.g. embedded/mobile. The current runtime is more like O(100KiB)
or so, so this might be compelling for some users.

The smaller surface area for auditing might make this relevant for
#3159, or the usecases I was thinking about in
#2523 (comment) re: the Rust
runtime.

Analysis

The symbols in the tvm::minimalruntime space (i.e. excluding std:: and
picojson::) are about 5KiB, so I think there's a bunch of room here (i.e. we
could replace picojson:: with jsmn or
something, and we could replace more of the std::unordered_map usage, etc with
custom primitives as well (similar to the DynArray).

tqchen · 2019-07-18T00:37:37Z

This is a great step toward putting tvm into more resource constrained devices. Given that we have another effort(uTVM @weberlo ) that aims to enable automatic optimizations, we still lack a minimum runtime that we can serve on the device.

This PR seems to bring one great step toward that direction. One thing we can try to do is to consolidate it with uTVM and put it under tvm/runtime/micro namespace later.

A fun challenge would be to further iterate to remove most needs on the OS(mainly alloc) so we can really run it on bare metal devices.

ajtulloch · 2019-07-18T00:58:09Z

@tqchen yes absolutely - from talking to you yesterday I hadn't thought of the uTVM application, but it certainly could be interesting. One possible improvement in that direction could be to create a mmap'able representation of the parsed graph_json, i.e. these fields of MinimalGraphRuntime:

  DynArray<Node> nodes_;
  DynArray<uint32_t> input_nodes_;
  DynArray<uint32_t> node_row_ptr_;
  DynArray<NodeEntry> outputs_;

which would allow us to 'allocation-free' construct the GraphRuntime (and eliminate the code-size cost of the json parser), and then the remaining allocations are the NDArray tensor allocations themselves which could be handled via a static storage plan or similar?

tqchen · 2019-07-18T01:14:01Z

Most micro controllers do have stacks(heaps) and we just need to pre-define a section in the memory space, and implement a arena style allocator (always allocate without de-allocation) and at an RAII point recycles all memory

tmoreau89 · 2019-07-18T06:34:26Z

+ 1 on getting this integrated with uTVM. @weberlo, care to take a look at this PR and make some high-level comments?

tmoreau89

This is really cool work. I wonder if we could in addition provide a simple step by step guide to deploy a simple model on a ARMv7 device with this minimal runtime. It would certainly help bring people up to speed on using this runtime on their edge devices.

mshawcroft · 2019-07-18T08:14:29Z

This looks great. As mentioned above it potentially fits well with uTVM. For use with uTVM it would be useful to have this runtime or a derivative built in C rather than C++ in order to be deployable to the various embedded environments out there that don't have C++ runtime / tooling support.

ajtulloch · 2019-07-18T17:57:56Z

This looks great. As mentioned above it potentially fits well with uTVM. For use with uTVM it would be useful to have this runtime or a derivative built in C rather than C++ in order to be deployable to the various embedded environments out there that don't have C++ runtime / tooling support.

@mshawcroft oh interesting - yeah, I started off with a pure C API (https://github.com/dmlc/tvm/pull/3567/files#diff-cf8621d821243d3ba906f0d9154abcea), but internally it's implemented with C++ (although it's deliberately designed to be compiled with -fno-rtti, -fno-exceptions, etc) - is the constraint that any use of C++ makes this unsuitable for embedded environments?

mshawcroft · 2019-07-18T21:04:58Z

@ajtulloch the situation is not black and white, at one end of the scale is pure 'C' at the other end of the scale is 'C++' using the standard c++ libraries and all the language bells and whistles, in the middle is a bunch of intermediate restricted subsets of c++ with arbitrary subsets of the c++ std library. The broadest reach lowest friction to potential users is at the C end of the scale. Aside from the language subset used, other issues are availability (and size!) of the std c++ library on a platform. The memory management strategy used (at the small end, memory fragmentation kills you, hence arbitrary use of the heap is undesirable). By way of example, last time I checked on zephyr rtos their C++ application support capability was broadly: no use of new / delete, no rtti, no exceptions, no static global object destruction.... (not that new/delete ban has a significant impact on the std c++ library available!) Other RTOS environments are richer, others are more constrained.

There is a limited cost to the tvm community to provide a 'C' runtime rather than a C++ runtime, but doing so broadens tvm's reach.

BTW.... Im really excited so see all the current activity in the uTVM, small runtime, embedded space.... ;-)

tqchen · 2019-07-18T22:05:46Z

To summarize some of the points.

No new/delete, but allows use of custom allocators that does arena-like allocations.
C++ is fine, template is fine, but maybe no stl

The arena-style allocator may be fine for most of our cases, the idea is that we always allocate and de-allocate in a bulk. This allows us to keep most of the allocation in a single user defined stack on a memory region.

void MyApp() {
  // RAII, everything allocated within the function will only get the space, 
   // de-allocate the necessary space when MyApp 
   tvm::micro::AllocatorContext ctx;
}

A slight variation would be having the allocator remember the number of object it allocates so far in the current context, when we call free, it only decreases the counter, and we recycle everything when the counter goes to zero. This should work for most cases we care about(where the allocation/free pattern are like a stack).

tqchen · 2019-07-19T04:33:28Z

Given the current discussions, perhaps we can decide on the naming, do a few improvement if you feel you can push some of them in a few days. Then we merge it in.

In terms of naming and code location, given the relation to uTVM. We could think about a good name for the minimal runtime. One example ("src/runtime/micro/standalone"), perhaps @mshawcroft @ajtulloch @weberlo has better ideas

weberlo · 2019-07-20T02:34:12Z

@ajtulloch Awesome work on this! We'll need a runtime for uTVM when we want to try self-hosted models, so the timing on this is great.

My general understanding is that it's much more common for bare-metal devices to support C, so it'd be interesting to see if we could incrementally whittle this down to pure C, like @mshawcroft said. Even if not, this would be a nice bonus for users targeting devices that do have C++ support.

If we want to merge this into the µTVM namespace, src/runtime/micro/standalone seems fine. But since this is code that would be loaded onto the device, we could also put it in src/runtime/micro/device/standalone. Then we could move the current runtime in device into its own subfolder device/host_driven (or we could name it something else).

tqchen · 2019-07-22T17:06:32Z

To make this PR actionable, @ajtulloch can you decide on the name space choices, make the changes, fix the CI and let us merge it in?

ajtulloch · 2019-07-22T21:36:08Z

OK, so changes planned are:

Move this to src/runtime/micro/standalone
Rename flag from MINIMAL_RUNTIME to MICRO_STANDALONE_RUNTIME
Fix CI

Will work on it right now, thank you folks.

ajtulloch · 2019-07-23T05:03:32Z

@tqchen does this look good to you?

tqchen

Most high level naming convention changes. The overall code looks good

include/tvm/runtime/micro/standalone/minimalruntime.h

src/runtime/micro/standalone/minimalgraphruntime.cc

tqchen · 2019-07-23T05:16:49Z

src/runtime/micro/standalone/minimalruntime_api.cc

+ * under the License.
+ */
+
+#include "minimalruntime_api.h"


utvm_runtime_api.cc

src/runtime/micro/standalone/minimalvector.h

tqchen · 2019-07-23T05:18:41Z

tests/cpp/runtime_micro_standalone_test.cc

+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file


utvm_runtime_standalone_test.cc

tqchen · 2019-07-23T05:19:45Z

@mshawcroft @weberlo @tmoreau89 please help to review if you have time and https://docs.tvm.ai/contribute/code_review.html#approve-and-request-changes-explicitly

tqchen · 2019-07-23T05:32:47Z

src/runtime/micro/standalone/minimalgraphruntime.cc

+  }
+}
+
+void parseAttrs(const picojson::object& jattr, GraphAttr* attr) {


ParseAttrs( tobe consistent with Google C style)

tqchen · 2019-07-23T05:33:53Z

src/runtime/micro/standalone/minimalgraphruntime.h

+  void* lib_handle_{nullptr};
+};
+
+struct GraphAttr {


document each struct, field and functions

src/runtime/micro/standalone/picojson.h

mshawcroft · 2019-07-23T09:23:45Z

@mshawcroft @weberlo @tmoreau89 please help to review if you have time and https://docs.tvm.ai/contribute/code_review.html#approve-and-request-changes-explicitly

So I've not had the time to study the code in detail, sorry, I would like to, but it won;t happen this week. Skimming the code does raise one immediate question:

Are we sure the memory management policy implemented in the module does not lead to fragmentation?

src/runtime/micro/standalone/minimalgraphruntime.cc

CMakeLists.txt

weberlo · 2019-07-23T23:48:35Z

@ajtulloch Which models have you been able to run on this runtime so far?

ajtulloch · 2019-07-24T11:37:10Z

@weberlo eg CPU CNNs like mobilenet, resnet, etc. One thing not supported is eg tvm.extern since we don’t support packed funcs.

weberlo · 2019-07-23T19:03:04Z

include/tvm/runtime/micro/standalone/minimalruntime.h

+ * under the License.
+ */
+
+#pragma once


I don't know if we allow #pragma once, for compatibility reasons. I hope I'm mistaken, because header guards are gross.

https://en.wikipedia.org/wiki/Pragma_once#Portability

Let us still use header guard as per google C style

tqchen

some final nits

tqchen · 2019-07-25T17:22:08Z

src/runtime/micro/standalone/minimal_vector.h

+ * under the License.
+ */
+
+#pragma once


Let us still use header guard macro as per Google C style

RUNTIME_MICRO_STANDALONE_MINIMAL_VECTOR_H_

tqchen · 2019-07-25T17:25:43Z

@antinucleon @weberlo please https://docs.tvm.ai/contribute/code_review.html#approve-and-request-changes-explicitly

To be clear, the current set of changes does not yet meet the requirement of no-std. It still depends on new/malloc, etc. Further refactor will be necessary, to make sure that the utvm standalone takes in a memory region that is pre-allocated, and only use memories from that region to allocate most of the executables.

tqchen · 2019-07-27T19:31:04Z

@ajtulloch can you act on the final comments and let us get it in:)

ajtulloch · 2019-07-29T08:29:16Z

Will do today @tqchen, my bad.

tqchen · 2019-08-01T19:53:26Z

@ajtulloch please look into the CI error and see if we can fix it.

ajtulloch · 2019-08-03T07:16:19Z

@tqchen sure, will do on the weekend.

tqchen · 2019-08-12T16:13:39Z

ping @ajtulloch

… of TVM models This is an alternative implementation of a subset of the TVM runtime API (and graph runtime) that focuses entirely on reducing code size, at the expense of functionality (no tvm.extern(..) calls via PackedFunc, CPU only, etc). It might be worth incrementally expanding the surface area if there's interest. The motivation for this work was seeing what the minimal useful subset of the TVM runtime is. This is relevant for e.g. super code-size constrained applications in e.g. embedded/mobile. The current runtime is more like O(100KiB) or so, so this might be compelling for some users. The smaller surface area for auditing might make this relevant for apache#3159, or the usecases I was thinking about in apache#2523 (comment) re: the Rust runtime. The symbols in the tvm::minimalruntime space (i.e. excluding std:: and picojson::) are about 5KiB, so I think there's a bunch of room here (i.e. we could replace picojson:: with [`jsmn`](https://zserge.com/jsmn.html) or something, and we could replace more of the `std::unordered_map` usage, etc with custom primitives as well (similar to the `DynArray`).

tqchen · 2019-09-12T19:32:46Z

Thanks @ajtulloch @weberlo @antinucleon @mshawcroft, this PR is now merged

… of TVM models (apache#3567) This is an alternative implementation of a subset of the TVM runtime API (and graph runtime) that focuses entirely on reducing code size, at the expense of functionality (no tvm.extern(..) calls via PackedFunc, CPU only, etc). It might be worth incrementally expanding the surface area if there's interest. The motivation for this work was seeing what the minimal useful subset of the TVM runtime is. This is relevant for e.g. super code-size constrained applications in e.g. embedded/mobile. The current runtime is more like O(100KiB) or so, so this might be compelling for some users. The smaller surface area for auditing might make this relevant for apache#3159, or the usecases I was thinking about in apache#2523 (comment) re: the Rust runtime. The symbols in the tvm::minimalruntime space (i.e. excluding std:: and picojson::) are about 5KiB, so I think there's a bunch of room here (i.e. we could replace picojson:: with [`jsmn`](https://zserge.com/jsmn.html) or something, and we could replace more of the `std::unordered_map` usage, etc with custom primitives as well (similar to the `DynArray`).

ajtulloch changed the title ~~[RFC] [Contrib] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models~~ [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models Jul 18, 2019

ajtulloch force-pushed the minimal-runtime branch from 3efca52 to e51b271 Compare July 18, 2019 00:28

tqchen added the status: need review label Jul 18, 2019

tmoreau89 reviewed Jul 18, 2019

View reviewed changes

ajtulloch force-pushed the minimal-runtime branch 3 times, most recently from ab32dfa to 4ec5d16 Compare July 22, 2019 23:32

tqchen requested changes Jul 23, 2019

View reviewed changes

tmoreau89 reviewed Jul 23, 2019

View reviewed changes

src/runtime/micro/standalone/minimalgraphruntime.cc Outdated Show resolved Hide resolved

CMakeLists.txt Show resolved Hide resolved

ajtulloch force-pushed the minimal-runtime branch 4 times, most recently from b6e941a to dd6f59e Compare July 23, 2019 22:18

weberlo suggested changes Jul 24, 2019

View reviewed changes

tqchen requested changes Jul 25, 2019

View reviewed changes

antinucleon approved these changes Jul 25, 2019

View reviewed changes

ajtulloch force-pushed the minimal-runtime branch 4 times, most recently from b80a33d to 4a7c3f6 Compare July 29, 2019 21:42

ajtulloch force-pushed the minimal-runtime branch from 4a7c3f6 to 4344ca1 Compare September 11, 2019 22:51

tqchen approved these changes Sep 12, 2019

View reviewed changes

tqchen added the status: accepted label Sep 12, 2019

tqchen merged commit 1de52bb into apache:master Sep 12, 2019

liangfu mentioned this pull request Sep 21, 2019

[Runtime] MISRA-C compliant TVM runtime #3934

Merged

tqchen mentioned this pull request Nov 8, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

liangfu mentioned this pull request Mar 13, 2020

[uTVM][Runtime] Deprecate uTVM Standalone Runtime #5060

Open

9 tasks

[RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models #3567

[RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models #3567

Conversation

ajtulloch commented Jul 18, 2019

Summary

Motivation

Analysis

tqchen commented Jul 18, 2019 • edited

ajtulloch commented Jul 18, 2019

tqchen commented Jul 18, 2019

tmoreau89 commented Jul 18, 2019 • edited

tmoreau89 left a comment

Choose a reason for hiding this comment

mshawcroft commented Jul 18, 2019

ajtulloch commented Jul 18, 2019

mshawcroft commented Jul 18, 2019

tqchen commented Jul 18, 2019 • edited

tqchen commented Jul 19, 2019

weberlo commented Jul 20, 2019

tqchen commented Jul 22, 2019

ajtulloch commented Jul 22, 2019

ajtulloch commented Jul 23, 2019

tqchen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Jul 23, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mshawcroft commented Jul 23, 2019

weberlo commented Jul 23, 2019

ajtulloch commented Jul 24, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Jul 25, 2019

tqchen commented Jul 27, 2019

ajtulloch commented Jul 29, 2019

tqchen commented Aug 1, 2019

ajtulloch commented Aug 3, 2019

tqchen commented Aug 12, 2019

tqchen commented Sep 12, 2019

tqchen commented Jul 18, 2019 •

edited

tmoreau89 commented Jul 18, 2019 •

edited

tqchen commented Jul 18, 2019 •

edited