Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[µTVM] Add virtual machine, test zephyr runtime on real hardware #6703

Merged
merged 98 commits into from Nov 5, 2020

Conversation

areusch
Copy link
Contributor

@areusch areusch commented Oct 17, 2020

This PR adds two Vagrantfiles:

  • a µTVM base box in tools/microtvm/base-box intended to support general µTVM development. it includes all the dependencies necessary to build the Zephyr runtime and test it with attached hardware (I.e. use USB port forwarding). This means it includes cross-compilers for RISC-V, ARM, and x86, among others (see Zephyr SDK).
  • a specialization of the base box which mounts the local tvm directory using Host-VM shared folders, builds your local copy of TVM inside the VM, then creates a poetry (Python) virtualenv containing all TVM and Zephyr dependencies. You can use this VM to test µTVM against real hardware, for example:
    tvm@microtvm:/Users/andrew/ws/tvm2$ TVM_LIBRARY_PATH=build-microtvm poetry run python3 tests/micro/qemu/test_zephyr.py --microtvm-platforms=stm32f746xx -s

This PR also includes additional transports needed to talk to real hardware, specifically a pySerial-based RPC transport layer plus utilities to invoke GDB to debug e.g. runtime problems, bad operator implementations, and to help with porting to new architectures. Because µTVM aims to be platform-agnostic, µTVM assumes only that some shell command exists to launch GDB and connect to the SoC's debug port. Due to this constraint, an additional RPC server is included: tvm.exec.microtvm_debug_shell, which uses the event-driven RPC server to host the debugger in a dedicated shell, so that signals can be forwarded to the inferior GDB.

cc @tmoreau89 @tqchen @u99127 @tom-gall @liangfu @mshawcroft

tqchen pushed a commit that referenced this pull request Oct 31, 2020
…ysical HW (#6789)

* [BUGFIX] Respect infinite-timed session start timeouts.

 * When debugging, the intended behavior is to set the session start
   timeout to infinite to allow the user to configure the debugger.
 * At present, if a session start retry timeout is defined, the
   current logic will bail after the retry timeout expires.
 * This change makes the session start logic retry forever, once per
   retry timeout.

* Document RPCEndpoint::Create.

* Add stm32f746xx to tvm.target.micro() call; fix parameter name.

 * This API is expected to just be used with positional args, not
   kwargs, so this change isn't expected to cause any breakage.
 * model is more inline with the rest of the file, given TVM Target
   Specification RFC.

* [BUGFIX] If session start fails, exit transport context manager.

 * If an error occurred during session setup, then complex transports
   e.g. DebugWrapperTransport would not de-initialize.

* Align transport writes/reads in TransportLogger

* fix syntax errors which were not exercised in previous PR

* Remove microTVM logic from standard RPC server, add debug shell.

 * microTVM uses the host RPC server as a way to launch a debugger in
   a dedicated, separate terminal window. microTVM needs to be able to
   launch the debugger itself, because its model of the device
   flash/debug flow separates these two things into distinct
   operations implemented by shell commands (for maximum portability
   across frameworks).
 * microTVM can be configured to launch the debugger (e.g. GDB) in the
   same terminal as is used for flashing, but this is sub-optimal
   because then it hides any logs emitted by the device.
 * Using the standard RPC server was hard because GDB expects the user
   to issue SIGINT to interrupt program flow, but due to the RPC
   server's necessary use of multiprocessing, multiple signal handlers
   needed to be SIG_IGN'd, and further, because libtvm.so is
   intentionally frontend-agnostic, it's difficult to include signal
   handling directly in that binary (Python expects you to call
   PyErr_CheckSignals, but we don't require and don't want to require
   python-dev to compile libtvm.so, and this is the only such case
   where libtvm.so is expected to block the main thread for a long
   period of time).
 * Here we implement a separate microTVM debug shell python script
   using the non-blocking server implementation.

* Add serial transport, parameterize test_zephyr to work on real hardware

* add pytest test fixture, missed from previous change.

 * this test fixture helps to parameterize the test case

* address leandron@ comment from #6703
@areusch
Copy link
Contributor Author

areusch commented Nov 2, 2020

@leandron @u99127 @manupa-arm please take a look when you have a minute and explicitly approve if you're good w/ this change

Copy link
Contributor

@leandron leandron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment, mainly trying to reduce the size of the target VM created here, and keeping it cleaner.

# nrfjprog
cd ~
mkdir -p nrfjprog
wget --no-verbose -O nRFCommandLineTools1090Linuxamd64.tar.gz https://www.nordicsemi.com/-/media/Software-and-other-downloads/Desktop-software/nRF-command-line-tools/sw/Versions-10-x-x/10-9-0/nRFCommandLineTools1090Linuxamd64tar.gz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to have something here to cleanup the files/packages being downloaded in this script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great suggestion! i've done that and we saved about 500MB.

@areusch
Copy link
Contributor Author

areusch commented Nov 5, 2020

@leandron please take a look when you have a minute and explicitly approve if you're good w/ this change

@leandron
Copy link
Contributor

leandron commented Nov 5, 2020

@leandron please take a look when you have a minute and explicitly approve if you're good w/ this change

I'm happy with the current version. Would like also to hear from @manupa-arm and @u99127, if possible.

Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Post-RFC opinion] we might want to just use of one of proposed requirement.txt here -- at least locally -- until such a time where the codebase is refactored to use requirement.txt.

Other than that LGTM.

Comment on lines +58 to +66
[tool.poetry.dependencies]
attrs = "^19"
decorator = "^4.4"
numpy = "~1.19"
psutil = "^5"
scipy = "^1.4"
python = "^3.6"
tornado = "^6"
typed_ast = "^1.4"
Copy link
Contributor

@manupak manupak Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@areusch , After reading the RFC, do we want to align with a one of the requirement.txt ?. So we can have synergy between this and ci-qemu ? and would suggest introduction of poetry after next RFC if and when that happens, thus the changes will go here as well.

@areusch
Copy link
Contributor Author

areusch commented Nov 5, 2020

@manupa-arm i'm not sure which requirement.txt you're referring to--we have not yet created any requirements.txt in spirit of the RFC. i'd prefer to just merge this change and treat this PR as separate from the RFC since it was released before I released the RFC. i think it's better to merge what we have now to make progress and then align the two after the RFC is implemented.

Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@areusch, sounds good! and agreed that can be done later -- I was actually referring to create a requirement.txt being a replica of the toml but then that might be needed to re-adjust again based on what would be agreed for the ci. Thus, agree that could be a seperate effort.

Thanks for the work!.

Copy link
Contributor

@u99127 u99127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your patience and perseverance with this one :)

regards
Ramana

@tqchen tqchen merged commit 7291a92 into apache:main Nov 5, 2020
@tqchen
Copy link
Member

tqchen commented Nov 5, 2020

Thanks @areusch @u99127 @leandron @manupa-arm !

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 2, 2020
…ysical HW (apache#6789)

* [BUGFIX] Respect infinite-timed session start timeouts.

 * When debugging, the intended behavior is to set the session start
   timeout to infinite to allow the user to configure the debugger.
 * At present, if a session start retry timeout is defined, the
   current logic will bail after the retry timeout expires.
 * This change makes the session start logic retry forever, once per
   retry timeout.

* Document RPCEndpoint::Create.

* Add stm32f746xx to tvm.target.micro() call; fix parameter name.

 * This API is expected to just be used with positional args, not
   kwargs, so this change isn't expected to cause any breakage.
 * model is more inline with the rest of the file, given TVM Target
   Specification RFC.

* [BUGFIX] If session start fails, exit transport context manager.

 * If an error occurred during session setup, then complex transports
   e.g. DebugWrapperTransport would not de-initialize.

* Align transport writes/reads in TransportLogger

* fix syntax errors which were not exercised in previous PR

* Remove microTVM logic from standard RPC server, add debug shell.

 * microTVM uses the host RPC server as a way to launch a debugger in
   a dedicated, separate terminal window. microTVM needs to be able to
   launch the debugger itself, because its model of the device
   flash/debug flow separates these two things into distinct
   operations implemented by shell commands (for maximum portability
   across frameworks).
 * microTVM can be configured to launch the debugger (e.g. GDB) in the
   same terminal as is used for flashing, but this is sub-optimal
   because then it hides any logs emitted by the device.
 * Using the standard RPC server was hard because GDB expects the user
   to issue SIGINT to interrupt program flow, but due to the RPC
   server's necessary use of multiprocessing, multiple signal handlers
   needed to be SIG_IGN'd, and further, because libtvm.so is
   intentionally frontend-agnostic, it's difficult to include signal
   handling directly in that binary (Python expects you to call
   PyErr_CheckSignals, but we don't require and don't want to require
   python-dev to compile libtvm.so, and this is the only such case
   where libtvm.so is expected to block the main thread for a long
   period of time).
 * Here we implement a separate microTVM debug shell python script
   using the non-blocking server implementation.

* Add serial transport, parameterize test_zephyr to work on real hardware

* add pytest test fixture, missed from previous change.

 * this test fixture helps to parameterize the test case

* address leandron@ comment from apache#6703
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 2, 2020
…che#6703)

* Split transport classes into transport package.

* Introduce transport timeouts.

* black format

* Add metadata-only artifacts

* Simplify utvm rpc server API and ease handling of short packets.

* add zephyr test against qemu

* Add qemu build config

* fix typo

* cleanup zephyr main

* fix nonblocking piping on some linux kernels

* don't double-open transport

* validate FD are in non-blocking mode

* gitignore test debug files

* cleanup zephyr compiler

* re-comment serial until added

* remove logging

* add zephyr exclusions to check_file_type

* add asf header

* lint

* black format

* more pylint

* kill utvm rpc_server bindings, which don't work anymore and fail pylint

* fix compiler warning

* fixes related to pylint

* clang-format again

* more black format

* add qemu regression

* Fix paths for qemu/ dir

* fix typo

* fix SETFL logic

* export SessionTerminatedError and update except after moving

* fix test_micro_artifact

* retrigger staging CI

* fix jenkins syntax hopefully

* one last syntax error

* Add microTVM VM setup scripts

* obliterate USE_ANTLR from cmake.config

* add poetry deps to pyproject.toml

 - mainly taken from output of `pip freeze` in ci-gpu and ci-lint

* initial attempt at setup.py + autodetect libtvm_runtime SO path

* hack to hardcode in build

* make pyproject lock

* Add ci_qemu to Jenkinsfile

* build in qemu

* checkpoint

* create diff for jared

* add missing stuff

* address liangfu comments

* fix new bug with list passing

* release v0.0.2

* works on hardware

* switch to pytest for zephyr tests

* add missing import

* fix option parsing

* remove extraneous changes

* lint

* asf lint, somehow local pass didn't work

* file type lint

* black-format

* try to fix ARMTargetParser.h #include in LLVM < 8.0

* rm misspelled deamon lines

* move to apps/microtvm-vm

* fetch keys from kitware server

* fix path exclusions in check_file_type

* retrigger CI

* reorganize vm, add tutorial

* fixes for reorganization

 - enable vagrant ssh

* update ssh instructions

* rm commented code

* standardize reference VM release process, add prerelease test

* remove -mfpu from this change

* fix exit code of test_zephyr

* rm unneeded files, update check_file_type

* add asf header

* git-black

* git-black against main

* git-black with docker

* fixes for virtualbox

* black format

* install python3.8, for zephyr gdb

* timestamp zephyr vm name, permits launching multiple VMs

* log warning when initial vagrant destroy fails

* revert changes moved into apache#6789

* address leandron@ comments

* black format

* black format

* add --skip-build to test subcommand, detach device from other VMs

* black format

* address leandron@ comments

* don't rm release test when building only 1 provider

* revert pyproject.toml

* remove need to copy pyproject.toml to root

 * this often contributes to erroneous changes to that file
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 4, 2020
…ysical HW (apache#6789)

* [BUGFIX] Respect infinite-timed session start timeouts.

 * When debugging, the intended behavior is to set the session start
   timeout to infinite to allow the user to configure the debugger.
 * At present, if a session start retry timeout is defined, the
   current logic will bail after the retry timeout expires.
 * This change makes the session start logic retry forever, once per
   retry timeout.

* Document RPCEndpoint::Create.

* Add stm32f746xx to tvm.target.micro() call; fix parameter name.

 * This API is expected to just be used with positional args, not
   kwargs, so this change isn't expected to cause any breakage.
 * model is more inline with the rest of the file, given TVM Target
   Specification RFC.

* [BUGFIX] If session start fails, exit transport context manager.

 * If an error occurred during session setup, then complex transports
   e.g. DebugWrapperTransport would not de-initialize.

* Align transport writes/reads in TransportLogger

* fix syntax errors which were not exercised in previous PR

* Remove microTVM logic from standard RPC server, add debug shell.

 * microTVM uses the host RPC server as a way to launch a debugger in
   a dedicated, separate terminal window. microTVM needs to be able to
   launch the debugger itself, because its model of the device
   flash/debug flow separates these two things into distinct
   operations implemented by shell commands (for maximum portability
   across frameworks).
 * microTVM can be configured to launch the debugger (e.g. GDB) in the
   same terminal as is used for flashing, but this is sub-optimal
   because then it hides any logs emitted by the device.
 * Using the standard RPC server was hard because GDB expects the user
   to issue SIGINT to interrupt program flow, but due to the RPC
   server's necessary use of multiprocessing, multiple signal handlers
   needed to be SIG_IGN'd, and further, because libtvm.so is
   intentionally frontend-agnostic, it's difficult to include signal
   handling directly in that binary (Python expects you to call
   PyErr_CheckSignals, but we don't require and don't want to require
   python-dev to compile libtvm.so, and this is the only such case
   where libtvm.so is expected to block the main thread for a long
   period of time).
 * Here we implement a separate microTVM debug shell python script
   using the non-blocking server implementation.

* Add serial transport, parameterize test_zephyr to work on real hardware

* add pytest test fixture, missed from previous change.

 * this test fixture helps to parameterize the test case

* address leandron@ comment from apache#6703
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 4, 2020
…che#6703)

* Split transport classes into transport package.

* Introduce transport timeouts.

* black format

* Add metadata-only artifacts

* Simplify utvm rpc server API and ease handling of short packets.

* add zephyr test against qemu

* Add qemu build config

* fix typo

* cleanup zephyr main

* fix nonblocking piping on some linux kernels

* don't double-open transport

* validate FD are in non-blocking mode

* gitignore test debug files

* cleanup zephyr compiler

* re-comment serial until added

* remove logging

* add zephyr exclusions to check_file_type

* add asf header

* lint

* black format

* more pylint

* kill utvm rpc_server bindings, which don't work anymore and fail pylint

* fix compiler warning

* fixes related to pylint

* clang-format again

* more black format

* add qemu regression

* Fix paths for qemu/ dir

* fix typo

* fix SETFL logic

* export SessionTerminatedError and update except after moving

* fix test_micro_artifact

* retrigger staging CI

* fix jenkins syntax hopefully

* one last syntax error

* Add microTVM VM setup scripts

* obliterate USE_ANTLR from cmake.config

* add poetry deps to pyproject.toml

 - mainly taken from output of `pip freeze` in ci-gpu and ci-lint

* initial attempt at setup.py + autodetect libtvm_runtime SO path

* hack to hardcode in build

* make pyproject lock

* Add ci_qemu to Jenkinsfile

* build in qemu

* checkpoint

* create diff for jared

* add missing stuff

* address liangfu comments

* fix new bug with list passing

* release v0.0.2

* works on hardware

* switch to pytest for zephyr tests

* add missing import

* fix option parsing

* remove extraneous changes

* lint

* asf lint, somehow local pass didn't work

* file type lint

* black-format

* try to fix ARMTargetParser.h #include in LLVM < 8.0

* rm misspelled deamon lines

* move to apps/microtvm-vm

* fetch keys from kitware server

* fix path exclusions in check_file_type

* retrigger CI

* reorganize vm, add tutorial

* fixes for reorganization

 - enable vagrant ssh

* update ssh instructions

* rm commented code

* standardize reference VM release process, add prerelease test

* remove -mfpu from this change

* fix exit code of test_zephyr

* rm unneeded files, update check_file_type

* add asf header

* git-black

* git-black against main

* git-black with docker

* fixes for virtualbox

* black format

* install python3.8, for zephyr gdb

* timestamp zephyr vm name, permits launching multiple VMs

* log warning when initial vagrant destroy fails

* revert changes moved into apache#6789

* address leandron@ comments

* black format

* black format

* add --skip-build to test subcommand, detach device from other VMs

* black format

* address leandron@ comments

* don't rm release test when building only 1 provider

* revert pyproject.toml

* remove need to copy pyproject.toml to root

 * this often contributes to erroneous changes to that file
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Dec 4, 2020
…ysical HW (apache#6789)

* [BUGFIX] Respect infinite-timed session start timeouts.

 * When debugging, the intended behavior is to set the session start
   timeout to infinite to allow the user to configure the debugger.
 * At present, if a session start retry timeout is defined, the
   current logic will bail after the retry timeout expires.
 * This change makes the session start logic retry forever, once per
   retry timeout.

* Document RPCEndpoint::Create.

* Add stm32f746xx to tvm.target.micro() call; fix parameter name.

 * This API is expected to just be used with positional args, not
   kwargs, so this change isn't expected to cause any breakage.
 * model is more inline with the rest of the file, given TVM Target
   Specification RFC.

* [BUGFIX] If session start fails, exit transport context manager.

 * If an error occurred during session setup, then complex transports
   e.g. DebugWrapperTransport would not de-initialize.

* Align transport writes/reads in TransportLogger

* fix syntax errors which were not exercised in previous PR

* Remove microTVM logic from standard RPC server, add debug shell.

 * microTVM uses the host RPC server as a way to launch a debugger in
   a dedicated, separate terminal window. microTVM needs to be able to
   launch the debugger itself, because its model of the device
   flash/debug flow separates these two things into distinct
   operations implemented by shell commands (for maximum portability
   across frameworks).
 * microTVM can be configured to launch the debugger (e.g. GDB) in the
   same terminal as is used for flashing, but this is sub-optimal
   because then it hides any logs emitted by the device.
 * Using the standard RPC server was hard because GDB expects the user
   to issue SIGINT to interrupt program flow, but due to the RPC
   server's necessary use of multiprocessing, multiple signal handlers
   needed to be SIG_IGN'd, and further, because libtvm.so is
   intentionally frontend-agnostic, it's difficult to include signal
   handling directly in that binary (Python expects you to call
   PyErr_CheckSignals, but we don't require and don't want to require
   python-dev to compile libtvm.so, and this is the only such case
   where libtvm.so is expected to block the main thread for a long
   period of time).
 * Here we implement a separate microTVM debug shell python script
   using the non-blocking server implementation.

* Add serial transport, parameterize test_zephyr to work on real hardware

* add pytest test fixture, missed from previous change.

 * this test fixture helps to parameterize the test case

* address leandron@ comment from apache#6703
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Dec 4, 2020
…che#6703)

* Split transport classes into transport package.

* Introduce transport timeouts.

* black format

* Add metadata-only artifacts

* Simplify utvm rpc server API and ease handling of short packets.

* add zephyr test against qemu

* Add qemu build config

* fix typo

* cleanup zephyr main

* fix nonblocking piping on some linux kernels

* don't double-open transport

* validate FD are in non-blocking mode

* gitignore test debug files

* cleanup zephyr compiler

* re-comment serial until added

* remove logging

* add zephyr exclusions to check_file_type

* add asf header

* lint

* black format

* more pylint

* kill utvm rpc_server bindings, which don't work anymore and fail pylint

* fix compiler warning

* fixes related to pylint

* clang-format again

* more black format

* add qemu regression

* Fix paths for qemu/ dir

* fix typo

* fix SETFL logic

* export SessionTerminatedError and update except after moving

* fix test_micro_artifact

* retrigger staging CI

* fix jenkins syntax hopefully

* one last syntax error

* Add microTVM VM setup scripts

* obliterate USE_ANTLR from cmake.config

* add poetry deps to pyproject.toml

 - mainly taken from output of `pip freeze` in ci-gpu and ci-lint

* initial attempt at setup.py + autodetect libtvm_runtime SO path

* hack to hardcode in build

* make pyproject lock

* Add ci_qemu to Jenkinsfile

* build in qemu

* checkpoint

* create diff for jared

* add missing stuff

* address liangfu comments

* fix new bug with list passing

* release v0.0.2

* works on hardware

* switch to pytest for zephyr tests

* add missing import

* fix option parsing

* remove extraneous changes

* lint

* asf lint, somehow local pass didn't work

* file type lint

* black-format

* try to fix ARMTargetParser.h #include in LLVM < 8.0

* rm misspelled deamon lines

* move to apps/microtvm-vm

* fetch keys from kitware server

* fix path exclusions in check_file_type

* retrigger CI

* reorganize vm, add tutorial

* fixes for reorganization

 - enable vagrant ssh

* update ssh instructions

* rm commented code

* standardize reference VM release process, add prerelease test

* remove -mfpu from this change

* fix exit code of test_zephyr

* rm unneeded files, update check_file_type

* add asf header

* git-black

* git-black against main

* git-black with docker

* fixes for virtualbox

* black format

* install python3.8, for zephyr gdb

* timestamp zephyr vm name, permits launching multiple VMs

* log warning when initial vagrant destroy fails

* revert changes moved into apache#6789

* address leandron@ comments

* black format

* black format

* add --skip-build to test subcommand, detach device from other VMs

* black format

* address leandron@ comments

* don't rm release test when building only 1 provider

* revert pyproject.toml

* remove need to copy pyproject.toml to root

 * this often contributes to erroneous changes to that file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants