Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

修复因容器与宿主机控制器连接问题导致的“Task does not exist” #120

Merged
merged 1 commit into from
Mar 5, 2024

Conversation

Tangent-90C
Copy link
Contributor

自己复现时Debug半天才找到这BUG原因……

修复因容器与宿主机控制器连接问题导致的“Task does not exist”
@zhc7
Copy link
Collaborator

zhc7 commented Feb 27, 2024

Hi, @Tangent-90C 可以详细解释一下为什么或者在何种条件下会遇到这个问题吗?按理来说--network host应该可以使得worker和controller之间正确连接。

@Tangent-90C
Copy link
Contributor Author

Hi, @Tangent-90C 可以详细解释一下为什么或者在何种条件下会遇到这个问题吗?按理来说--network host应该可以使得worker和controller之间正确连接。

我系统是windows 11,用的是基于wsl2的Docker Desktop,我在宿主机(不是wsl2)上运行AgentBench,遇到了和 #119 同样的问题,我试着进入运行中的worker,以及手动运行worker容器,用curl测试,发现都不能在 --network host 的情况下通过localhost访问到宿主机5000端口上的controller,所以我只能按过往的经验显式指定宿主机地址和做端口转发,然后就成功了。

关于这个BUG的底层原因,我猜想可能是因为windows上的docker是运行在wsl2上的,host模式虽然能共享宿主机的网络命名空间,但共享的其实是wsl2的网络,而不是windows的网络。

image

你们可以做个实验试试在windows上使用--network host,容器内的网络是不是共享的wsl的网络?(我还没实验过,我也不知道)

@Tangent-90C
Copy link
Contributor Author

刚试了下,windows中docker的--network host在我电脑上是没用的,容器和宿主机都没法相互访问,包括用wsl地址访问。
另外我docker的版本是 25.0.2

@Longin-Yu Longin-Yu requested review from Longin-Yu and removed request for Longin-Yu March 5, 2024 12:15
@Longin-Yu Longin-Yu merged commit 9274b74 into THUDM:main Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants