Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gRPC cannot be proxied with dialerProxy to freedom #2232

Closed
HirbodBehnam opened this issue Jun 20, 2023 · 18 comments
Closed

gRPC cannot be proxied with dialerProxy to freedom #2232

HirbodBehnam opened this issue Jun 20, 2023 · 18 comments

Comments

@HirbodBehnam
Copy link
Contributor

Hello
I found a quite strange bug in gRPC transport. Take a look at this config:

{
  "log": {
    "loglevel": "debug"
  },
  "inbounds": [
    {
      "listen": "127.0.0.1",
      "port": "10808",
      "protocol": "socks",
      "settings": {
        "udp": true
      }
    },
    {
      "listen": "127.0.0.1",
      "port": "10809",
      "protocol": "http"
    }
  ],
  "outbounds": [
    {
      "protocol": "trojan",
      "settings": {
        "servers": [
          {
            "address": "example.com",
            "port": 443,
            "password": "pass"
          }
        ]
      },
      "streamSettings": {
        "network": "gun",
        "security": "tls",
        "grpcSettings": {
          "serviceName": "servername"
        },
        "sockopt": {
          "dialerProxy": "direct"
        }
      }
    },
    {
      "protocol": "freedom",
      "tag": "direct"
    }
  ]
}

This config should just route gRPC traffic via a freedom outbound and thus have no difference between adding dialerProxy and not. However, I couldn't get this config to work. If I remove the sockopt from the config it works fine. Logs look like this:

2023/06/20 16:13:41 [Warning] core: Xray 1.8.3 started
2023/06/20 16:13:43 [Info] [2221018126] proxy/socks: TCP Connect request to tcp:[2a00:1450:4001:827::200e]:80
2023/06/20 16:13:43 [Info] [2221018126] app/dispatcher: default route for tcp:[2a00:1450:4001:827::200e]:80
2023/06/20 16:13:45 tcp:127.0.0.1:36732 accepted tcp:[2a00:1450:4001:827::200e]:80
2023/06/20 16:13:49 [Info] [2221018126] transport/internet/grpc: creating connection to tcp:104.21.2.133:443
2023/06/20 16:13:49 [Debug] transport/internet/grpc: using gRPC tun mode service name: `...` stream name: `Tun`
2023/06/20 16:13:49 [Info] [2221018126] transport/internet: redirecting request tcp:104.21.2.133:443 to fragment
2023/06/20 16:13:49 [Info] [2221018126] transport/internet/tcp: dialing TCP to tcp:104.21.2.133:443
2023/06/20 16:13:49 [Debug] transport/internet: dialing to tcp:104.21.2.133:443
2023/06/20 16:13:49 [Info] [2221018126] proxy/freedom: connection opened to tcp:104.21.2.133:443, local endpoint 172.16.0.2:48020, remote endpoint 104.21.2.133:443
2023/06/20 16:13:54 [Info] [2221018126] proxy/trojan: tunneling request to tcp:[2a00:1450:4001:827::200e]:80 via 104.21.2.133:443
multi read transport/internet/grpc/encoding: failed to fetch hunk from gRPC tunnel > rpc error: code = Unavailable desc = error reading from server: EOF
transport/internet/grpc/encoding: failed to send data over gRPC tunnel > EOF
2023/06/20 16:14:36 [Info] [2221018126] app/proxyman/outbound: failed to process outbound traffic > proxy/trojan: connection ends > transport/internet/grpc/encoding: failed to fetch hunk from gRPC tunnel > rpc error: code = Unavailable desc = error reading from server: EOF

And wiresharks shows that my own PC sends a FIN to server:
image
I tried digging into Xray and Google's gRPC source code with debugging and watching when the pipes get closed but I couldn't figure it out. HOWEVER, I found an alternative way to forward any traffic to a specific outbound. This method involves using dokodemo-door with a specific routing and using the dokedemo-door address as the outbound address. Consider following config file:

{
  "log": {
    "loglevel": "debug"
  },
  "inbounds": [
    {
      "listen": "127.0.0.1",
      "port": "10808",
      "protocol": "socks",
      "settings": {
        "udp": true
      }
    },
    {
      "listen": "127.0.0.1",
      "port": "10809",
      "protocol": "http"
    },
    {
      "listen": "127.0.0.1",
      "port": "28111",
      "protocol": "dokodemo-door",
      "settings": {
        "address": "104.21.2.133",
        "port": 443,
        "network": "tcp"
      },
      "tag": "fragmentedinbound"
    }
  ],
  "outbounds": [
    {
      "protocol": "trojan",
      "settings": {
        "servers": [
          {
            "address": "127.0.0.1",
            "port": 28111,
            "password": "password"
          }
        ]
      },
      "streamSettings": {
        "network": "gun",
        "security": "tls",
        "grpcSettings": {
          "serviceName": "..."
        },
        "tlsSettings": {
          "serverName": "..."
        }
      }
    },
    {
      "protocol": "freedom",
      "settings": {
        "fragment": {
          "length": "1-2",
          "interval": "0-1",
          "packets": "1"
        }
      },
      "tag": "fragment"
    },
    {
      "protocol": "freedom",
      "tag": "direct"
    }
  ],
  "routing": {
    "domainMatcher": "mph",
    "domainStrategy": "IPIfNonMatch",
    "rules": [
      {
        "domain": [
          "regexp:.*\\.ir$",
          "ext:iran.dat:ir",
          "ext:iran.dat:other"
        ],
        "outboundTag": "direct",
        "type": "field"
      },
      {
        "ip": [
          "geoip:private",
          "geoip:ir"
        ],
        "outboundTag": "direct",
        "type": "field"
      },
      {
        "inboundTag": [
          "fragmentedinbound"
        ],
        "outboundTag": "fragment",
        "type": "field"
      }
    ]
  }
}

This is basically the config which I'm currently using to connect. I'm not expecting this to be fixed anytime soon considering that there is a neat workaround.

@RPRX
Copy link
Member

RPRX commented Jun 20, 2023

感谢你的测试,可能不是 gRPC 内部的问题,而是 Xray 调用 gRPC 的问题

dialerProxy 和 gRPC 传输层是 Xray-core v1.4.0 同时引入的,可能它们之间没适配,你可以检查一下代码,然后发个 PR

@RPRX
Copy link
Member

RPRX commented Jun 25, 2023

请问修好了吗

@ghost
Copy link

ghost commented Aug 5, 2023

请问修好了吗

@RPRX

I checked this with xray v1.8.3 and I can confirm that as @HirbodBehnam mentioned, it doesn't work with gRPC. On the other hand, WS works fine.

@cty123
Copy link
Contributor

cty123 commented Aug 21, 2023

I was able to reproduce the problem, but couldn't figure out the root cause either. It seems to be a real problem, it is dropping the outbound connection somewhere inside freedom proxy.

@RPRX
Copy link
Member

RPRX commented Aug 26, 2023

@cty123 分享些经验?

@RPRX
Copy link
Member

RPRX commented Aug 26, 2023

解决这个问题应该不难,插一些 log 看一下哪里断了就发现了

@cty123
Copy link
Contributor

cty123 commented Aug 26, 2023

我已经试过了,最早是在freedom 这里断的https://github.com/XTLS/Xray-core/blob/main/proxy/freedom/freedom.go#L205, 显示的错误就是 use of closed network connection,但是不是很清楚为什么connection会断开。我看了grpc文档https://github.com/grpc/grpc-go, 根据说明打开了所有logging, 服务端这边显示是客户端先关闭的连接,但是客户端的grpc显示关闭的原因是EOF, 所以目前并不知道断连的根本原因。

@RPRX
Copy link
Member

RPRX commented Aug 26, 2023

以前有个 bug 是若 cancel gRPC 某一子连接的 ctx,会 cancel 整个 gRPC 连接,这个 EOF 可能是类似的原因

@RPRX
Copy link
Member

RPRX commented Aug 26, 2023

h2 同理,如果 h2 加 dialerProxy 也有这问题,基本上可以确定就是它了

@cty123
Copy link
Contributor

cty123 commented Aug 26, 2023

这我倒是可以试试看

@RPRX
Copy link
Member

RPRX commented Aug 26, 2023

那就交给你啦,你的反馈很有帮助,“服务端这边显示是客户端先关闭的连接,但是客户端的grpc显示关闭的原因是EOF”这个症状很符合那个 bug,解决办法就是不传原始 ctx,只复制一些关键的信息(若有必要),参考:

grpc.WithContextDialer(func(gctx context.Context, s string) (gonet.Conn, error) {
gctx = session.ContextWithID(gctx, session.IDFromContext(ctx))
gctx = session.ContextWithOutbound(gctx, session.OutboundFromContext(ctx))

@RPRX
Copy link
Member

RPRX commented Aug 26, 2023

以前有个 bug 是若 cancel gRPC 某一子连接的 ctx,会 cancel 整个 gRPC 连接,这个 EOF 可能是类似的原因

修正一下这个描述,应该是把原始 ctx 传给了子连接,子连接结束时调用了 cancel,结果整个 gRPC 连接都断开了(表现为断流)

@RPRX
Copy link
Member

RPRX commented Aug 26, 2023

以前有个 bug 是若 cancel gRPC 某一子连接的 ctx,会 cancel 整个 gRPC 连接,这个 EOF 可能是类似的原因

但这次这个 bug 盲猜符合这个描述,应该是 gRPC 把第一个子连接的 gctx 传给了 dialerProxy,然后这个 gctx 被 cancel。。。

@cty123
Copy link
Contributor

cty123 commented Aug 26, 2023

还真的跟你说的一样,我debug了好几天了都。就是你说的这个地方

grpc.WithContextDialer(func(gctx context.Context, s string) (gonet.Conn, error) {
gctx = session.ContextWithID(gctx, session.IDFromContext(ctx))
gctx = session.ContextWithOutbound(gctx, session.OutboundFromContext(ctx))

我创了个新的context传进去马上就好了,完美使用

@RPRX
Copy link
Member

RPRX commented Aug 27, 2023

以前有个 bug 是若 cancel gRPC 某一子连接的 ctx,会 cancel 整个 gRPC 连接,这个 EOF 可能是类似的原因

修正一下这个描述,应该是把原始 ctx 传给了子连接,子连接结束时调用了 cancel,结果整个 gRPC 连接都断开了(表现为断流)
但这次这个 bug 盲猜符合这个描述,应该是 gRPC 把第一个子连接的 gctx 传给了 dialerProxy,然后这个 gctx 被 cancel。。。

再次修正 & 总结:

  1. getGrpcClient 的 ctx 参数是每个被代理连接的 ctx,以前那个 bug 是没有 gctx,而由于 gRPC 只 dial 一次,相当于只认第一条被代理连接的 ctx,它被 cancel 时整条 gRPC 都会断
  2. 有了这次的 bug,我研究了一下这个 gctx 是干啥的,grpc.WithContextDialer 的前身是 grpc.WithDialer,后者有一个参数是 time.Duration,再看代码,所以这个 gctx 只是控制 dial 超时用的,dialerProxy 拿它当 *ray 的 ctx,就断得比上次的 bug 还快

@RPRX
Copy link
Member

RPRX commented Aug 27, 2023

这么说的话其实 gRPC + dialerProxy 一直都是不可用的状态

我在写增强版 XUDP 时遇到了“只想让原始 ctx 控制 dial 超时,不想让它 cancel Copy,但又想让 outbound 自身的超时策略生效”的极其复杂需求,做了一些尝试,留了两处彩蛋,最终方案是给原始 ctx 标记 TimeoutOnly,并且改造了各个 outbound:be23d5d

所以直接拿来用就行了,@cty123 你试一下给 gctx 标记 TimeoutOnly,没问题的话发个 PR,记得带上 H2

@RPRX RPRX closed this as completed in d92002a Aug 27, 2023
@RPRX
Copy link
Member

RPRX commented Aug 27, 2023

我改好了,请测试 d92002a 的 gRPC 能否使用 dialerProxy

原本 H2 应该是没这个问题的,因为它原本是 context.Background(),这次改成了 DialTLSContext,需要测试有没有引入新问题

此前传给 REALITY UClient 的 ctx 实际上没被用到,一直想改,这次顺便改成了 uConn.HandshakeContext(ctx),需要测试有没有引入新问题。但是这个超时时间,以后应当参考浏览器来设置一下,否则 GFW 故意让某次握手超时就精准识别了。

@RPRX
Copy link
Member

RPRX commented Aug 27, 2023

以前有个 bug 是若 cancel gRPC 某一子连接的 ctx,会 cancel 整个 gRPC 连接,这个 EOF 可能是类似的原因

再再次修正:记混了,不是 gRPC,是 H2 以前出现过的 bug #289 (comment) ,修复见 #289 (comment)

arm64v8a pushed a commit to MatsuriDayo/Xray-core that referenced this issue Aug 27, 2023
XTLS/Xray-core#2232 (comment)

Thank @cty123 for testing

Fixes XTLS/Xray-core#2232

BTW: Use `uConn.HandshakeContext(ctx)` in REALITY

fix(app/router): fixed a bug in geoip matching with refactoring (#2489)

* Refactor the IP address matching with netipx library
* Add a regression test for previous bug

Fixes XTLS/Xray-core#1933

---------

Co-authored-by: Loyalsoldier <10487845+Loyalsoldier@users.noreply.github.com>

fix(transport): correctly release UDS locker file (#2305)

* fix(transport): correctly release UDS locker file

* use callback function to do some jobs after create listener

Update transport/internet/reality/reality.go

Fixes XTLS/Xray-core#2491

Upgrade dependencies

Update workflows to use Go 1.21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants