Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RPC] Fix tracker connection termination #13420

Merged
merged 3 commits into from
Nov 21, 2022

Conversation

Icemist
Copy link
Contributor

@Icemist Icemist commented Nov 17, 2022

In a wireless connection situation (android example below) we have a device key containing a colon character(:).

ANDROID_SERIAL_NUMBER = 192.168.0.143:5555

C:\Users\icemist>docker exec -it ice_tvm_container adb devices
List of devices attached
192.168.0.143:5555     device

In this case we get an error:

ERROR:asyncio:Exception in callback None()
handle: <Handle cancelled>
Traceback (most recent call last):
  File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/venv/apache-tvm-py3.8/lib/python3.8/site-packages/tornado/platform/asyncio.py", line 206, in _handle_events
    handler_func(fileobj, events)
  File "/git/tvm/python/tvm/rpc/tornado_util.py", line 41, in _event_handler
    self._event_handler(events)
  File "/git/tvm/python/tvm/rpc/tornado_util.py", line 79, in _event_handler
    if self._update_read() and (events & self._ioloop.WRITE):
  File "/git/tvm/python/tvm/rpc/tornado_util.py", line 121, in _update_read
    self.close()
  File "/git/tvm/python/tvm/rpc/tornado_util.py", line 67, in close
    self.on_close()
  File "/git/tvm/python/tvm/rpc/tracker.py", line 298, in on_close
    self._tracker.close(self)
  File "/git/tvm/python/tvm/rpc/tracker.py", line 353, in close
    self._scheduler_map[rpc_key].remove(value)
KeyError: '127.0.0.1'

For example, test_rpc_tracker_via_proxy hangs if the device key format contains a colon.

The point is that the keys before using the split have a format like:
conn.put_value has: hexagon-dev.127.0.0.1:5555:0.513619
_tracker_pending_puts has: 127.0.0.1:5555:0.2809967317812382

@tvm-bot
Copy link
Collaborator

tvm-bot commented Nov 17, 2022

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

  • No users to tag found in teams: rpc See #10317 for details
  • Built docs for commit 9757c94 can be found here.

Generated by tvm-bot

@Icemist
Copy link
Contributor Author

Icemist commented Nov 17, 2022

@echuraev @apeskov @areusch

@Icemist Icemist force-pushed the avoronov/fix_tracker_con_term branch 2 times, most recently from b560753 to 55cca31 Compare November 19, 2022 12:36
@Icemist Icemist force-pushed the avoronov/fix_tracker_con_term branch from 55cca31 to 6153200 Compare November 19, 2022 18:11
Copy link
Contributor

@echuraev echuraev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please carefully update all using of the updated functions.

python/tvm/rpc/base.py Show resolved Hide resolved
python/tvm/rpc/proxy.py Outdated Show resolved Hide resolved
Copy link
Contributor

@echuraev echuraev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@echuraev echuraev merged commit 1b3d77a into apache:main Nov 21, 2022
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
* [RPC] Fix tracker connection termination

* [RPC] Unify work with random key

* additional usage of the random_key API change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants