Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grpc-proxy cannot handle requests from grpc-gateway correctly #18011

Open
4 tasks done
ximenzaoshi opened this issue May 15, 2024 · 13 comments · May be fixed by #18012
Open
4 tasks done

Grpc-proxy cannot handle requests from grpc-gateway correctly #18011

ximenzaoshi opened this issue May 15, 2024 · 13 comments · May be fixed by #18012

Comments

@ximenzaoshi
Copy link

Bug report criteria

What happened?

We use grpc-gateway and grpc-proxy to reduce watch requests from apisix to etcd, as apisix only provides http client for etcd for production use. Here is a simple architecture diagram:
image
The grpc-gateway is a simple program compiled by the code below:

package main

import (
	"context"
	"fmt"
	"golang.org/x/net/http2"
	"google.golang.org/grpc/credentials/insecure"
	"math"
	"net/http"
	"os"

	"github.com/grpc-ecosystem/grpc-gateway/runtime"
	"github.com/tmc/grpc-websocket-proxy/wsproxy"
	etcdservergw "go.etcd.io/etcd/api/v3/etcdserverpb/gw"
	"go.uber.org/zap"
	"google.golang.org/grpc"
	"google.golang.org/grpc/grpclog"
)

type wsProxyZapLogger struct {
	*zap.Logger
}

func (w wsProxyZapLogger) Warnln(i ...any) {
	w.Warn(fmt.Sprint(i...))
}

func (w wsProxyZapLogger) Debugln(i ...any) {
	w.Debug(fmt.Sprint(i...))
}

func main() {
	ctx := context.Background()
	ep := os.Getenv("ETCD")
	grpclog.SetLoggerV2(grpclog.NewLoggerV2(os.Stdout, os.Stderr, os.Stderr))

	opts := []grpc.DialOption{grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(math.MaxInt32))}
	opts = append(opts, grpc.WithTransportCredentials(insecure.NewCredentials()))

	conn, err := grpc.DialContext(ctx, ep, opts...)
	if err != nil {
		panic(err)
	}
	defer conn.Close()
	gwmux := runtime.NewServeMux()

	type registerHandlerFunc func(context.Context, *runtime.ServeMux, *grpc.ClientConn) error

	handlers := []registerHandlerFunc{
		etcdservergw.RegisterKVHandler,
		etcdservergw.RegisterWatchHandler,
		etcdservergw.RegisterLeaseHandler,
		etcdservergw.RegisterClusterHandler,
		etcdservergw.RegisterMaintenanceHandler,
		etcdservergw.RegisterAuthHandler,
	}
	for i := range handlers {
		if err = handlers[i](ctx, gwmux, conn); err != nil {
			panic(err)
		}
	}
	httpmux := http.NewServeMux()
	httpmux.Handle(
		"/v3/",
		wsproxy.WebsocketProxy(
			gwmux,
			wsproxy.WithRequestMutator(
				// Default to the POST method for streams
				func(_ *http.Request, outgoing *http.Request) *http.Request {
					outgoing.Method = "POST"
					return outgoing
				},
			),
			wsproxy.WithMaxRespBodyBufferSize(0x7fffffff),
			wsproxy.WithLogger(wsProxyZapLogger{}),
		),
	)
	srv := http.Server{
		Handler: httpmux,
		Addr:    "0.0.0.0:8887",
	}
	http2.ConfigureServer(&srv, &http2.Server{
		MaxConcurrentStreams: 16,
	})
	srv.ListenAndServe()
}

Start etcd(v3.5.13) and grpc-proxy locally by commands below:

nohup etcd&
nohup etcd grpc-proxy start --endpoints=localhost:2379 &

Then start grpc-gateway locally(compiled by the code above).
Send watch command to grpc-gateway with curl:

curl -X POST -d '{"create_request": {"key":"dGVzdGtleQo=","range_end":"dGVzdGtleTEK"}}' 'localhost:8887/v3/watch'

The curl command returns directly instead of waiting for watch response,with the error repsose below:

{"result":{"header":{"cluster_id":"14841639068965178418","member_id":"10276657743932975437","revision":"1","raft_term":"2"},"created":true}} {"error":{"grpc_code":1,"http_code":408,"message":"context canceled","http_status":"Request Timeout"}}

We set auth to etcd by command below:

etcdctl user add root
etcdctl auth enable

Then make a kv range call with curl:

curl -X POST -d '{"key": "dGVzdGtleQo=", "range_end": "dGVzdGtleTEK"}' http://localhost:8887/v3/kv/range

The response is as expected:

{"error":"etcdserver: user name is empty","code":2,"message":"etcdserver: user name is empty"}

However, after we get a token with the command below:

curl 'localhost:8887/v3/auth/authenticate' -X POST -d '{"name":"root","password":"123456"}
{"header"{"cluster_id":"14841639068965178418","member_id":"10276657743932975437","revision":"1","raft_term":"2"},"token":"ImhAoKYyjbUPSKoN.11"}

We make the same kv range call with auth header:

curl -H 'Authorization: ImhAoKYyjbUPSKoN.11' -X POST -d '{"key": "dGVzdGtleQo=", "range_end": "dGVzdGtleTEK"}' http://localhost:8887/v3/kv/range

We still get the user name is empty error.

{"error":"etcdserver: user name is empty","code":2,"message":"etcdserver: user name is empty"}

The expected response is invalid auth token error:

{"error":"etcdserver: invalid auth token","code":2,"message":"etcdserver: invalid auth token"}

After some digging in the source code, we found the grpc server of grpc proxy in etcd did not propagate grpc metadata properly.
And the watch cancled error may caused by the client early disconnection by grpc-gateway.
We propose a PR trying to fix the problem, expecting for reviewing.

What did you expect to happen?

Grpc proxy can handle grpc gateway requests correctly.

How can we reproduce it (as minimally and precisely as possible)?

As describe above.

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
etcd Version: 3.5.13
Git SHA: 6bbccf4da
Go Version: go1.22.2
Go OS/Arch: darwin/amd64

$ etcdctl version
etcdctl version: 3.5.13
API version: 3.5

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

No response

@ximenzaoshi
Copy link
Author

Here is the related PR #18012.

@ahrtr
Copy link
Member

ahrtr commented May 17, 2024

It seems like a valid issue. We don't see much use case on grpcProxy, so I don't think this is a major issue.

  • etcdserver not only supports grpc requests, but also REST requests (see below), so we need to double check whether the build-in gateway has similar issue.

    etcd/server/embed/serve.go

    Lines 312 to 326 in 97a9af9

    handlers := []registerHandlerFunc{
    etcdservergw.RegisterKVHandler,
    etcdservergw.RegisterWatchHandler,
    etcdservergw.RegisterLeaseHandler,
    etcdservergw.RegisterClusterHandler,
    etcdservergw.RegisterMaintenanceHandler,
    etcdservergw.RegisterAuthHandler,
    v3lockgw.RegisterLockHandler,
    v3electiongw.RegisterElectionHandler,
    }
    for _, h := range handlers {
    if err := h(ctx, gwmux, conn); err != nil {
    return nil, err
    }
    }

Can anyone double check this? @ivanvc @siyuanfoundation @chaochn47 @tjungblu

  • For this issue, we should add an simple e2e test to reproduce the issue firstly in etcd_grpcproxy_test.go in the first commit (ensure you can reproduce the issue with the e2e test), then apply your fix as the second commit.

@ximenzaoshi
Copy link
Author

I also checked the build-in gateway, it seems OK. I think the implementation is different in etcdserver and grpcproxy server, which leads to the different behavior.
We mainly use grpcproxy to reduce the watch pressure from apisix to etcdserver.
I will add an simple e2e test soon.
Thanks for your reply.

@ivanvc
Copy link
Member

ivanvc commented May 18, 2024

Can anyone double check this? @ivanvc @siyuanfoundation @chaochn47 @tjungblu

I think the built-in gateway (if I'm correct, it should be in server/etcdserver/api/v3rpc/grpc.go), is forwarding the metadata. Refer to:

chainUnaryInterceptors := []grpc.UnaryServerInterceptor{
newLogUnaryInterceptor(s),
newUnaryInterceptor(s),
grpc_prometheus.UnaryServerInterceptor,
}

And newUnaryInterceptor(...):

func newUnaryInterceptor(s *etcdserver.EtcdServer) grpc.UnaryServerInterceptor {
return func(ctx context.Context, req any, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (any, error) {
if !api.IsCapabilityEnabled(api.V3rpcCapability) {
return nil, rpctypes.ErrGRPCNotCapable
}
if s.IsMemberExist(s.MemberID()) && s.IsLearner() && !isRPCSupportedForLearner(req) {
return nil, rpctypes.ErrGRPCNotSupportedForLearner
}
md, ok := metadata.FromIncomingContext(ctx)
if ok {
ver, vs := "unknown", md.Get(rpctypes.MetadataClientAPIVersionKey)
if len(vs) > 0 {
ver = vs[0]
}
if !utf8.ValidString(ver) {
return nil, rpctypes.ErrGRPCInvalidClientAPIVersion
}
clientRequests.WithLabelValues("unary", ver).Inc()
if ks := md[rpctypes.MetadataRequireLeaderKey]; len(ks) > 0 && ks[0] == rpctypes.MetadataHasLeader {
if s.Leader() == types.ID(raft.None) {
return nil, rpctypes.ErrGRPCNoLeader
}
}
}
return handler(ctx, req)
}
}

The source code for release 3.5 is very similar, it doesn't appear to happen there either:

https://github.com/etcd-io/etcd/blob/v3.5.13/server/etcdserver/api/v3rpc/grpc.go#L45-L49 / https://github.com/etcd-io/etcd/blob/v3.5.13/server/etcdserver/api/v3rpc/interceptor.go#L46-L73

Caveats: I'm still not too familiar with etcdserver, so I apologize if my answer is misleading.

@ahrtr
Copy link
Member

ahrtr commented May 18, 2024

@ivanvc The source code you mentioned above is where etcdserver receives gRPC requests, this issue is about receiving REST(HTTP) requests.

Note that etcdserver receives REST(HTTP) request via the grpc-gateway.

I think the reason why the build-in gateway doesn't have such issue is that the gateway carries the http request header over to the following gRPC request,

Please read

The existing e2e test cases should have already verified it.

@ahrtr
Copy link
Member

ahrtr commented May 18, 2024

The existing e2e test cases should have already verified it.

https://github.com/etcd-io/etcd/blob/main/tests/e2e/v3_curl_auth_test.go

@ximenzaoshi
Copy link
Author

ximenzaoshi commented May 22, 2024

Thanks for your reply, Do we have any comments for the watch cancled error? This is another problem which I think is more important than auth. I try to fix the watch canceled error in #18012 . Looking forward for reviewing. The discussions above seem focusing on the auth problem. @ahrtr @ivanvc

@ahrtr
Copy link
Member

ahrtr commented May 22, 2024

Do we have any comments for the watch cancled error?

I am not sure what issue you are talking about here. Please feel free to raise another issue to track it if not present.

Please let's only fix one thing/issue in one PR. Please add an e2e test to reproduce the issue firstly, please read the second point in #18011 (comment)

@ximenzaoshi
Copy link
Author

Do we have any comments for the watch cancled error?

I am not sure what issue you are talking about here. Please feel free to raise another issue to track it if not present.

Please let's only fix one thing/issue in one PR. Please add an e2e test to reproduce the issue firstly, please read the second point in #18011 (comment)

Thanks for your reply, I'll add e2e tests in a fews days and recommit the PR.

The watch context cancled error is described as the first part of this issue, maybe not clearly. I copied the part below.

Start etcd(v3.5.13) and grpc-proxy locally by commands below:

nohup etcd&
nohup etcd grpc-proxy start --endpoints=localhost:2379 &

Then start grpc-gateway locally(compiled by the code above).
Send watch command to grpc-gateway with curl:

curl -X POST -d '{"create_request": {"key":"dGVzdGtleQo=","range_end":"dGVzdGtleTEK"}}' 'localhost:8887/v3/watch'

The curl command returns directly instead of waiting for watch response,with the error repsose below:

{"result":{"header":{"cluster_id":"14841639068965178418","member_id":"10276657743932975437","revision":"1","raft_term":"2"},"created":true}} {"error":{"grpc_code":1,"http_code":408,"message":"context canceled","http_status":"Request Timeout"}}

@ahrtr
Copy link
Member

ahrtr commented May 22, 2024

Then start grpc-gateway locally(compiled by the code above). Send watch command to grpc-gateway with curl:

curl -X POST -d '{"create_request": {"key":"dGVzdGtleQo=","range_end":"dGVzdGtleTEK"}}' 'localhost:8887/v3/watch'

The curl command returns directly instead of waiting for watch response,with the error repsose below:

{"result":{"header":{"cluster_id":"14841639068965178418","member_id":"10276657743932975437","revision":"1","raft_term":"2"},"created":true}} {"error":{"grpc_code":1,"http_code":408,"message":"context canceled","http_status":"Request Timeout"}}

The error is returned by your grpc-gateway, so you need to debug it firstly. Please let me know if you see any issue when you watch a key via etcd or grpc-proxy directly.

@ximenzaoshi
Copy link
Author

Then start grpc-gateway locally(compiled by the code above). Send watch command to grpc-gateway with curl:
curl -X POST -d '{"create_request": {"key":"dGVzdGtleQo=","range_end":"dGVzdGtleTEK"}}' 'localhost:8887/v3/watch'
The curl command returns directly instead of waiting for watch response,with the error repsose below:
{"result":{"header":{"cluster_id":"14841639068965178418","member_id":"10276657743932975437","revision":"1","raft_term":"2"},"created":true}} {"error":{"grpc_code":1,"http_code":408,"message":"context canceled","http_status":"Request Timeout"}}

The error is returned by your grpc-gateway, so you need to debug it firstly. Please let me know if you see any issue when you watch a key via etcd or grpc-proxy directly.

The watch error happens in this scenario:
image
Take grpc proxy away, all worked OK:
image
So I guess this indicates that the problem is with the grpc proxy.

@ahrtr
Copy link
Member

ahrtr commented May 24, 2024

The watch error happens in this scenario:

Do you still see the issue after applying the #18012?

@ximenzaoshi
Copy link
Author

The watch error happens in this scenario:

Do you still see the issue after applying the #18012?

After applying the #18012, all worked OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

4 participants