[µTVM] Add virtual machine, test zephyr runtime on real hardware #6703

areusch · 2020-10-17T04:16:03Z

This PR adds two Vagrantfiles:

a µTVM base box in tools/microtvm/base-box intended to support general µTVM development. it includes all the dependencies necessary to build the Zephyr runtime and test it with attached hardware (I.e. use USB port forwarding). This means it includes cross-compilers for RISC-V, ARM, and x86, among others (see Zephyr SDK).
a specialization of the base box which mounts the local tvm directory using Host-VM shared folders, builds your local copy of TVM inside the VM, then creates a poetry (Python) virtualenv containing all TVM and Zephyr dependencies. You can use this VM to test µTVM against real hardware, for example:
tvm@microtvm:/Users/andrew/ws/tvm2$ TVM_LIBRARY_PATH=build-microtvm poetry run python3 tests/micro/qemu/test_zephyr.py --microtvm-platforms=stm32f746xx -s

This PR also includes additional transports needed to talk to real hardware, specifically a pySerial-based RPC transport layer plus utilities to invoke GDB to debug e.g. runtime problems, bad operator implementations, and to help with porting to new architectures. Because µTVM aims to be platform-agnostic, µTVM assumes only that some shell command exists to launch GDB and connect to the SoC's debug port. Due to this constraint, an additional RPC server is included: tvm.exec.microtvm_debug_shell, which uses the event-driven RPC server to host the debugger in a dedicated shell, so that signals can be forwarded to the inferior GDB.

cc @tmoreau89 @tqchen @u99127 @tom-gall @liangfu @mshawcroft

…ysical HW (#6789) * [BUGFIX] Respect infinite-timed session start timeouts. * When debugging, the intended behavior is to set the session start timeout to infinite to allow the user to configure the debugger. * At present, if a session start retry timeout is defined, the current logic will bail after the retry timeout expires. * This change makes the session start logic retry forever, once per retry timeout. * Document RPCEndpoint::Create. * Add stm32f746xx to tvm.target.micro() call; fix parameter name. * This API is expected to just be used with positional args, not kwargs, so this change isn't expected to cause any breakage. * model is more inline with the rest of the file, given TVM Target Specification RFC. * [BUGFIX] If session start fails, exit transport context manager. * If an error occurred during session setup, then complex transports e.g. DebugWrapperTransport would not de-initialize. * Align transport writes/reads in TransportLogger * fix syntax errors which were not exercised in previous PR * Remove microTVM logic from standard RPC server, add debug shell. * microTVM uses the host RPC server as a way to launch a debugger in a dedicated, separate terminal window. microTVM needs to be able to launch the debugger itself, because its model of the device flash/debug flow separates these two things into distinct operations implemented by shell commands (for maximum portability across frameworks). * microTVM can be configured to launch the debugger (e.g. GDB) in the same terminal as is used for flashing, but this is sub-optimal because then it hides any logs emitted by the device. * Using the standard RPC server was hard because GDB expects the user to issue SIGINT to interrupt program flow, but due to the RPC server's necessary use of multiprocessing, multiple signal handlers needed to be SIG_IGN'd, and further, because libtvm.so is intentionally frontend-agnostic, it's difficult to include signal handling directly in that binary (Python expects you to call PyErr_CheckSignals, but we don't require and don't want to require python-dev to compile libtvm.so, and this is the only such case where libtvm.so is expected to block the main thread for a long period of time). * Here we implement a separate microTVM debug shell python script using the non-blocking server implementation. * Add serial transport, parameterize test_zephyr to work on real hardware * add pytest test fixture, missed from previous change. * this test fixture helps to parameterize the test case * address leandron@ comment from #6703

areusch · 2020-11-02T21:03:51Z

@leandron @u99127 @manupa-arm please take a look when you have a minute and explicitly approve if you're good w/ this change

leandron

One minor comment, mainly trying to reduce the size of the target VM created here, and keeping it cleaner.

leandron · 2020-11-03T10:35:43Z

apps/microtvm/reference-vm/zephyr/base-box/setup.sh

+# nrfjprog
+cd ~
+mkdir -p nrfjprog
+wget --no-verbose -O nRFCommandLineTools1090Linuxamd64.tar.gz https://www.nordicsemi.com/-/media/Software-and-other-downloads/Desktop-software/nRF-command-line-tools/sw/Versions-10-x-x/10-9-0/nRFCommandLineTools1090Linuxamd64tar.gz


I suggest to have something here to cleanup the files/packages being downloaded in this script.

great suggestion! i've done that and we saved about 500MB.

* this often contributes to erroneous changes to that file

areusch · 2020-11-05T02:43:27Z

@leandron please take a look when you have a minute and explicitly approve if you're good w/ this change

leandron · 2020-11-05T09:02:44Z

@leandron please take a look when you have a minute and explicitly approve if you're good w/ this change

I'm happy with the current version. Would like also to hear from @manupa-arm and @u99127, if possible.

manupak

[Post-RFC opinion] we might want to just use of one of proposed requirement.txt here -- at least locally -- until such a time where the codebase is refactored to use requirement.txt.

Other than that LGTM.

manupak · 2020-11-05T09:23:20Z

apps/microtvm/reference-vm/zephyr/pyproject.toml

+[tool.poetry.dependencies]
+attrs = "^19"
+decorator = "^4.4"
+numpy = "~1.19"
+psutil = "^5"
+scipy = "^1.4"
+python = "^3.6"
+tornado = "^6"
+typed_ast = "^1.4"


@areusch , After reading the RFC, do we want to align with a one of the requirement.txt ?. So we can have synergy between this and ci-qemu ? and would suggest introduction of poetry after next RFC if and when that happens, thus the changes will go here as well.

areusch · 2020-11-05T15:56:02Z

@manupa-arm i'm not sure which requirement.txt you're referring to--we have not yet created any requirements.txt in spirit of the RFC. i'd prefer to just merge this change and treat this PR as separate from the RFC since it was released before I released the RFC. i think it's better to merge what we have now to make progress and then align the two after the RFC is implemented.

manupak

@areusch, sounds good! and agreed that can be done later -- I was actually referring to create a requirement.txt being a replica of the toml but then that might be needed to re-adjust again based on what would be agreed for the ci. Thus, agree that could be a seperate effort.

Thanks for the work!.

u99127

LGTM. Thanks for your patience and perseverance with this one :)

regards
Ramana

tqchen · 2020-11-05T16:14:56Z

Thanks @areusch @u99127 @leandron @manupa-arm !

…ysical HW (apache#6789) * [BUGFIX] Respect infinite-timed session start timeouts. * When debugging, the intended behavior is to set the session start timeout to infinite to allow the user to configure the debugger. * At present, if a session start retry timeout is defined, the current logic will bail after the retry timeout expires. * This change makes the session start logic retry forever, once per retry timeout. * Document RPCEndpoint::Create. * Add stm32f746xx to tvm.target.micro() call; fix parameter name. * This API is expected to just be used with positional args, not kwargs, so this change isn't expected to cause any breakage. * model is more inline with the rest of the file, given TVM Target Specification RFC. * [BUGFIX] If session start fails, exit transport context manager. * If an error occurred during session setup, then complex transports e.g. DebugWrapperTransport would not de-initialize. * Align transport writes/reads in TransportLogger * fix syntax errors which were not exercised in previous PR * Remove microTVM logic from standard RPC server, add debug shell. * microTVM uses the host RPC server as a way to launch a debugger in a dedicated, separate terminal window. microTVM needs to be able to launch the debugger itself, because its model of the device flash/debug flow separates these two things into distinct operations implemented by shell commands (for maximum portability across frameworks). * microTVM can be configured to launch the debugger (e.g. GDB) in the same terminal as is used for flashing, but this is sub-optimal because then it hides any logs emitted by the device. * Using the standard RPC server was hard because GDB expects the user to issue SIGINT to interrupt program flow, but due to the RPC server's necessary use of multiprocessing, multiple signal handlers needed to be SIG_IGN'd, and further, because libtvm.so is intentionally frontend-agnostic, it's difficult to include signal handling directly in that binary (Python expects you to call PyErr_CheckSignals, but we don't require and don't want to require python-dev to compile libtvm.so, and this is the only such case where libtvm.so is expected to block the main thread for a long period of time). * Here we implement a separate microTVM debug shell python script using the non-blocking server implementation. * Add serial transport, parameterize test_zephyr to work on real hardware * add pytest test fixture, missed from previous change. * this test fixture helps to parameterize the test case * address leandron@ comment from apache#6703

…che#6703) * Split transport classes into transport package. * Introduce transport timeouts. * black format * Add metadata-only artifacts * Simplify utvm rpc server API and ease handling of short packets. * add zephyr test against qemu * Add qemu build config * fix typo * cleanup zephyr main * fix nonblocking piping on some linux kernels * don't double-open transport * validate FD are in non-blocking mode * gitignore test debug files * cleanup zephyr compiler * re-comment serial until added * remove logging * add zephyr exclusions to check_file_type * add asf header * lint * black format * more pylint * kill utvm rpc_server bindings, which don't work anymore and fail pylint * fix compiler warning * fixes related to pylint * clang-format again * more black format * add qemu regression * Fix paths for qemu/ dir * fix typo * fix SETFL logic * export SessionTerminatedError and update except after moving * fix test_micro_artifact * retrigger staging CI * fix jenkins syntax hopefully * one last syntax error * Add microTVM VM setup scripts * obliterate USE_ANTLR from cmake.config * add poetry deps to pyproject.toml - mainly taken from output of `pip freeze` in ci-gpu and ci-lint * initial attempt at setup.py + autodetect libtvm_runtime SO path * hack to hardcode in build * make pyproject lock * Add ci_qemu to Jenkinsfile * build in qemu * checkpoint * create diff for jared * add missing stuff * address liangfu comments * fix new bug with list passing * release v0.0.2 * works on hardware * switch to pytest for zephyr tests * add missing import * fix option parsing * remove extraneous changes * lint * asf lint, somehow local pass didn't work * file type lint * black-format * try to fix ARMTargetParser.h #include in LLVM < 8.0 * rm misspelled deamon lines * move to apps/microtvm-vm * fetch keys from kitware server * fix path exclusions in check_file_type * retrigger CI * reorganize vm, add tutorial * fixes for reorganization - enable vagrant ssh * update ssh instructions * rm commented code * standardize reference VM release process, add prerelease test * remove -mfpu from this change * fix exit code of test_zephyr * rm unneeded files, update check_file_type * add asf header * git-black * git-black against main * git-black with docker * fixes for virtualbox * black format * install python3.8, for zephyr gdb * timestamp zephyr vm name, permits launching multiple VMs * log warning when initial vagrant destroy fails * revert changes moved into apache#6789 * address leandron@ comments * black format * black format * add --skip-build to test subcommand, detach device from other VMs * black format * address leandron@ comments * don't rm release test when building only 1 provider * revert pyproject.toml * remove need to copy pyproject.toml to root * this often contributes to erroneous changes to that file

…ysical HW (apache#6789) * [BUGFIX] Respect infinite-timed session start timeouts. * When debugging, the intended behavior is to set the session start timeout to infinite to allow the user to configure the debugger. * At present, if a session start retry timeout is defined, the current logic will bail after the retry timeout expires. * This change makes the session start logic retry forever, once per retry timeout. * Document RPCEndpoint::Create. * Add stm32f746xx to tvm.target.micro() call; fix parameter name. * This API is expected to just be used with positional args, not kwargs, so this change isn't expected to cause any breakage. * model is more inline with the rest of the file, given TVM Target Specification RFC. * [BUGFIX] If session start fails, exit transport context manager. * If an error occurred during session setup, then complex transports e.g. DebugWrapperTransport would not de-initialize. * Align transport writes/reads in TransportLogger * fix syntax errors which were not exercised in previous PR * Remove microTVM logic from standard RPC server, add debug shell. * microTVM uses the host RPC server as a way to launch a debugger in a dedicated, separate terminal window. microTVM needs to be able to launch the debugger itself, because its model of the device flash/debug flow separates these two things into distinct operations implemented by shell commands (for maximum portability across frameworks). * microTVM can be configured to launch the debugger (e.g. GDB) in the same terminal as is used for flashing, but this is sub-optimal because then it hides any logs emitted by the device. * Using the standard RPC server was hard because GDB expects the user to issue SIGINT to interrupt program flow, but due to the RPC server's necessary use of multiprocessing, multiple signal handlers needed to be SIG_IGN'd, and further, because libtvm.so is intentionally frontend-agnostic, it's difficult to include signal handling directly in that binary (Python expects you to call PyErr_CheckSignals, but we don't require and don't want to require python-dev to compile libtvm.so, and this is the only such case where libtvm.so is expected to block the main thread for a long period of time). * Here we implement a separate microTVM debug shell python script using the non-blocking server implementation. * Add serial transport, parameterize test_zephyr to work on real hardware * add pytest test fixture, missed from previous change. * this test fixture helps to parameterize the test case * address leandron@ comment from apache#6703

…che#6703) * Split transport classes into transport package. * Introduce transport timeouts. * black format * Add metadata-only artifacts * Simplify utvm rpc server API and ease handling of short packets. * add zephyr test against qemu * Add qemu build config * fix typo * cleanup zephyr main * fix nonblocking piping on some linux kernels * don't double-open transport * validate FD are in non-blocking mode * gitignore test debug files * cleanup zephyr compiler * re-comment serial until added * remove logging * add zephyr exclusions to check_file_type * add asf header * lint * black format * more pylint * kill utvm rpc_server bindings, which don't work anymore and fail pylint * fix compiler warning * fixes related to pylint * clang-format again * more black format * add qemu regression * Fix paths for qemu/ dir * fix typo * fix SETFL logic * export SessionTerminatedError and update except after moving * fix test_micro_artifact * retrigger staging CI * fix jenkins syntax hopefully * one last syntax error * Add microTVM VM setup scripts * obliterate USE_ANTLR from cmake.config * add poetry deps to pyproject.toml - mainly taken from output of `pip freeze` in ci-gpu and ci-lint * initial attempt at setup.py + autodetect libtvm_runtime SO path * hack to hardcode in build * make pyproject lock * Add ci_qemu to Jenkinsfile * build in qemu * checkpoint * create diff for jared * add missing stuff * address liangfu comments * fix new bug with list passing * release v0.0.2 * works on hardware * switch to pytest for zephyr tests * add missing import * fix option parsing * remove extraneous changes * lint * asf lint, somehow local pass didn't work * file type lint * black-format * try to fix ARMTargetParser.h #include in LLVM < 8.0 * rm misspelled deamon lines * move to apps/microtvm-vm * fetch keys from kitware server * fix path exclusions in check_file_type * retrigger CI * reorganize vm, add tutorial * fixes for reorganization - enable vagrant ssh * update ssh instructions * rm commented code * standardize reference VM release process, add prerelease test * remove -mfpu from this change * fix exit code of test_zephyr * rm unneeded files, update check_file_type * add asf header * git-black * git-black against main * git-black with docker * fixes for virtualbox * black format * install python3.8, for zephyr gdb * timestamp zephyr vm name, permits launching multiple VMs * log warning when initial vagrant destroy fails * revert changes moved into apache#6789 * address leandron@ comments * black format * black format * add --skip-build to test subcommand, detach device from other VMs * black format * address leandron@ comments * don't rm release test when building only 1 provider * revert pyproject.toml * remove need to copy pyproject.toml to root * this often contributes to erroneous changes to that file

…ysical HW (apache#6789) * [BUGFIX] Respect infinite-timed session start timeouts. * When debugging, the intended behavior is to set the session start timeout to infinite to allow the user to configure the debugger. * At present, if a session start retry timeout is defined, the current logic will bail after the retry timeout expires. * This change makes the session start logic retry forever, once per retry timeout. * Document RPCEndpoint::Create. * Add stm32f746xx to tvm.target.micro() call; fix parameter name. * This API is expected to just be used with positional args, not kwargs, so this change isn't expected to cause any breakage. * model is more inline with the rest of the file, given TVM Target Specification RFC. * [BUGFIX] If session start fails, exit transport context manager. * If an error occurred during session setup, then complex transports e.g. DebugWrapperTransport would not de-initialize. * Align transport writes/reads in TransportLogger * fix syntax errors which were not exercised in previous PR * Remove microTVM logic from standard RPC server, add debug shell. * microTVM uses the host RPC server as a way to launch a debugger in a dedicated, separate terminal window. microTVM needs to be able to launch the debugger itself, because its model of the device flash/debug flow separates these two things into distinct operations implemented by shell commands (for maximum portability across frameworks). * microTVM can be configured to launch the debugger (e.g. GDB) in the same terminal as is used for flashing, but this is sub-optimal because then it hides any logs emitted by the device. * Using the standard RPC server was hard because GDB expects the user to issue SIGINT to interrupt program flow, but due to the RPC server's necessary use of multiprocessing, multiple signal handlers needed to be SIG_IGN'd, and further, because libtvm.so is intentionally frontend-agnostic, it's difficult to include signal handling directly in that binary (Python expects you to call PyErr_CheckSignals, but we don't require and don't want to require python-dev to compile libtvm.so, and this is the only such case where libtvm.so is expected to block the main thread for a long period of time). * Here we implement a separate microTVM debug shell python script using the non-blocking server implementation. * Add serial transport, parameterize test_zephyr to work on real hardware * add pytest test fixture, missed from previous change. * this test fixture helps to parameterize the test case * address leandron@ comment from apache#6703

…che#6703) * Split transport classes into transport package. * Introduce transport timeouts. * black format * Add metadata-only artifacts * Simplify utvm rpc server API and ease handling of short packets. * add zephyr test against qemu * Add qemu build config * fix typo * cleanup zephyr main * fix nonblocking piping on some linux kernels * don't double-open transport * validate FD are in non-blocking mode * gitignore test debug files * cleanup zephyr compiler * re-comment serial until added * remove logging * add zephyr exclusions to check_file_type * add asf header * lint * black format * more pylint * kill utvm rpc_server bindings, which don't work anymore and fail pylint * fix compiler warning * fixes related to pylint * clang-format again * more black format * add qemu regression * Fix paths for qemu/ dir * fix typo * fix SETFL logic * export SessionTerminatedError and update except after moving * fix test_micro_artifact * retrigger staging CI * fix jenkins syntax hopefully * one last syntax error * Add microTVM VM setup scripts * obliterate USE_ANTLR from cmake.config * add poetry deps to pyproject.toml - mainly taken from output of `pip freeze` in ci-gpu and ci-lint * initial attempt at setup.py + autodetect libtvm_runtime SO path * hack to hardcode in build * make pyproject lock * Add ci_qemu to Jenkinsfile * build in qemu * checkpoint * create diff for jared * add missing stuff * address liangfu comments * fix new bug with list passing * release v0.0.2 * works on hardware * switch to pytest for zephyr tests * add missing import * fix option parsing * remove extraneous changes * lint * asf lint, somehow local pass didn't work * file type lint * black-format * try to fix ARMTargetParser.h #include in LLVM < 8.0 * rm misspelled deamon lines * move to apps/microtvm-vm * fetch keys from kitware server * fix path exclusions in check_file_type * retrigger CI * reorganize vm, add tutorial * fixes for reorganization - enable vagrant ssh * update ssh instructions * rm commented code * standardize reference VM release process, add prerelease test * remove -mfpu from this change * fix exit code of test_zephyr * rm unneeded files, update check_file_type * add asf header * git-black * git-black against main * git-black with docker * fixes for virtualbox * black format * install python3.8, for zephyr gdb * timestamp zephyr vm name, permits launching multiple VMs * log warning when initial vagrant destroy fails * revert changes moved into apache#6789 * address leandron@ comments * black format * black format * add --skip-build to test subcommand, detach device from other VMs * black format * address leandron@ comments * don't rm release test when building only 1 provider * revert pyproject.toml * remove need to copy pyproject.toml to root * this often contributes to erroneous changes to that file

areusch added 30 commits September 28, 2020 15:15

Split transport classes into transport package.

3a4ce80

Introduce transport timeouts.

18bc0b9

black format

709e963

Add metadata-only artifacts

6550674

Simplify utvm rpc server API and ease handling of short packets.

1d1cb54

add zephyr test against qemu

3469cde

Add qemu build config

7307c89

fix typo

3ff0fa1

cleanup zephyr main

4dd93cc

fix nonblocking piping on some linux kernels

f522d4f

don't double-open transport

cf30739

validate FD are in non-blocking mode

1160695

gitignore test debug files

818928a

cleanup zephyr compiler

053bca7

re-comment serial until added

f1fdaaf

remove logging

a18e26b

add zephyr exclusions to check_file_type

e646f7c

add asf header

1f25b58

lint

c291443

black format

43efa91

more pylint

75a7b48

kill utvm rpc_server bindings, which don't work anymore and fail pylint

a9538c7

fix compiler warning

4d2fe58

fixes related to pylint

9860e72

clang-format again

bcd5b64

more black format

2ae251c

add qemu regression

8c65383

Fix paths for qemu/ dir

f58ec63

fix typo

3d2ede3

fix SETFL logic

a65a554

areusch added 5 commits October 30, 2020 15:46

log warning when initial vagrant destroy fails

5a433ab

revert changes moved into apache#6789

e11d676

address leandron@ comments

1cd37a0

black format

6af85ec

black format

62d8405

areusch added 3 commits November 2, 2020 08:06

Merge remote-tracking branch 'origin/main' into utvm-vm

b0beb45

add --skip-build to test subcommand, detach device from other VMs

25fb98e

black format

ea61f7f

leandron reviewed Nov 3, 2020

View reviewed changes

areusch added 4 commits November 3, 2020 13:04

address leandron@ comments

01f22ac

don't rm release test when building only 1 provider

18958ca

revert pyproject.toml

acc8630

remove need to copy pyproject.toml to root

45744cc

* this often contributes to erroneous changes to that file

leandron approved these changes Nov 5, 2020

View reviewed changes

manupak reviewed Nov 5, 2020

View reviewed changes

manupak approved these changes Nov 5, 2020

View reviewed changes

u99127 approved these changes Nov 5, 2020

View reviewed changes

tqchen merged commit 7291a92 into apache:main Nov 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[µTVM] Add virtual machine, test zephyr runtime on real hardware #6703

[µTVM] Add virtual machine, test zephyr runtime on real hardware #6703

areusch commented Oct 17, 2020

areusch commented Nov 2, 2020

leandron left a comment

leandron Nov 3, 2020

areusch Nov 3, 2020

areusch commented Nov 5, 2020

leandron commented Nov 5, 2020

manupak left a comment •

edited

manupak Nov 5, 2020 •

edited

areusch commented Nov 5, 2020

manupak left a comment

u99127 left a comment

tqchen commented Nov 5, 2020

[µTVM] Add virtual machine, test zephyr runtime on real hardware #6703

[µTVM] Add virtual machine, test zephyr runtime on real hardware #6703

Conversation

areusch commented Oct 17, 2020

areusch commented Nov 2, 2020

leandron left a comment

Choose a reason for hiding this comment

leandron Nov 3, 2020

Choose a reason for hiding this comment

areusch Nov 3, 2020

Choose a reason for hiding this comment

areusch commented Nov 5, 2020

leandron commented Nov 5, 2020

manupak left a comment • edited

Choose a reason for hiding this comment

manupak Nov 5, 2020 • edited

Choose a reason for hiding this comment

areusch commented Nov 5, 2020

manupak left a comment

Choose a reason for hiding this comment

u99127 left a comment

Choose a reason for hiding this comment

tqchen commented Nov 5, 2020

manupak left a comment •

edited

manupak Nov 5, 2020 •

edited