THRIFT-5073: Optional handler interceptor for processor #1992

voodoo-dn · 2020-01-23T12:52:05Z

Hi.

Thank for this great project. But I think we can introduce some improvements which will be are useful for other developers.

Inspired by https://github.com/grpc/grpc-go/blob/master/examples/features/interceptor/server/main.go.

Interceptors give opportunities for:

Logging input request and response
We can log panic situation and return TApplicationException to client instead of EOF.
We can intercept request and do stuff like some validation, etc.
Usage example: https://gist.github.com/voodoo-dn/6af6cf54b8145cc3d4d87d888c4bf907

voodoo-dn · 2020-01-23T16:11:31Z

@dcelasun hello. Test created, but some jobs are failed because of network problems.
mvn.ant.tasks.download: [get] Getting: https://repo1.maven.org/maven2/org/apache/maven/maven-ant-tasks/2.1.3/maven-ant-tasks-2.1.3.jar [get] To: /thrift/src/lib/json/test/build/tools/maven-ant-tasks-2.1.3.jar [get] Error opening connection java.io.IOException: Server returned HTTP response code: 403 for URL: https://repo1.maven.org/maven2/org/apache/maven/maven-ant-tasks/2.1.3/maven-ant-tasks-2.1.3.jar [get] Error opening connection java.io.IOException: Server returned HTTP response code: 403 for URL: https://repo1.maven.org/maven2/org/apache/maven/maven-ant-tasks/2.1.3/maven-ant-tasks-2.1.3.jar [get] Error opening connection java.io.IOException: Server returned HTTP response code: 403 for URL: https://repo1.maven.org/maven2/org/apache/maven/maven-ant-tasks/2.1.3/maven-ant-tasks-2.1.3.jar [get] Can't get https://repo1.maven.org/maven2/org/apache/maven/maven-ant-tasks/2.1.3/maven-ant-tasks-2.1.3.jar to /thrift/src/lib/json/test/build/tools/maven-ant-tasks-2.1.3.jar

dcelasun · 2020-01-23T16:13:12Z

No worries, that's a temporary failure with Maven Central. I've restarted the failed tasks.

voodoo-dn · 2020-01-23T16:46:29Z

@dcelasun Thanks, but the same problem. Maybe it start working tomorrow.

compiler/cpp/src/thrift/generate/t_go_generator.cc

lib/go/test/tests/processor_interceptor_test.go

lib/go/thrift/handler_interceptor.go

lib/go/thrift/processor_options.go

compiler/cpp/src/thrift/generate/t_go_generator.cc

voodoo-dn · 2020-01-24T10:16:25Z

@Jens-G Thanks, fixes are commited. Who must resolve conversation? Reviewer of author of PR?

dcelasun · 2020-01-24T10:17:35Z

@voodoo-dn you can resolve them yourself :)

voodoo-dn · 2020-01-24T12:13:20Z

Could you restart failed jobs, please?

voodoo-dn · 2020-01-24T14:18:03Z

What is next steps?

dcelasun · 2020-01-24T15:25:53Z

Tests look fine now (the remaining failure is unrelated to this PR). I'll take a closer look at the PR over the weekend.

Jens-G · 2020-01-27T21:50:31Z

Looks good to me, no further objections from my side. Not tested though, only code review.

voodoo-dn · 2020-01-29T10:12:11Z

Hello. Sorry, but we have dependencies on this PR and do not want to use fork. When you can merge this PR? Thanks.

dcelasun · 2020-01-29T10:21:19Z

Sorry, I still haven't got a chance to properly review this PR. I'll try to make some time for it over the weekend.

Also, while not necessarily a deal breaker, it would be great if you'd implement this for at least one more language that Thrift supports. Single language "optional features" like these cause great divergence between libraries and is painful to maintain in the long term.

voodoo-dn · 2020-01-29T10:48:52Z

Thanks for the answer. Our team will think about this feature for other language.

voodoo-dn · 2020-02-11T12:26:19Z

@dcelasun Hello. Sorry, but we cannot create feature for another languages, because we have no time. Can you say what I must to do to speedup merging PR to the master branch? Thanks.

dcelasun · 2020-02-12T15:13:17Z

OK, finally got a chance to look into this. The lack of another language is unfortunate, but it's not a deal breaker.

My only concern is NullInterceptor. Can we get rid of it? If the user doesn't pass any interceptors, we shouldn't call any. There should be no default interceptors.

voodoo-dn · 2020-02-13T13:28:17Z

Hello. Without NullObject(NullInterceptor) code will be more complicated with if/else conditions(if interceptor set do first case, if not - second case). So, NullInterceptor is only a proxy to original handler, and do code more readable and easy to support/understand.

dcelasun · 2020-02-13T13:29:53Z

I understand that, but the additional if/else will better for performance in the default case which is no interceptors.

voodoo-dn · 2020-02-13T13:35:34Z

Okay, I understand your position. Maybe you have performance tests to check degradation after my feature? Is 1-2ns a critical impact on performance?

dcelasun · 2020-02-13T13:36:21Z

It's not a critical impact, but it's an avoidable impact just the same.

voodoo-dn · 2020-02-13T13:41:19Z

So without removing NullInterceptor you will not merge PR?

dcelasun · 2020-02-13T13:52:32Z

I don't see why this needs to be so confrontational. Every change to the codebase is something we have to maintain, usually indefinitely. So the barrier for entry for any new feature must be high. This PR is already weakened by its single language, out-of-spec nature so the least it could do is have zero impact on users who are not using this feature (which is the default case). When zero impact requires such a small and basic change (if/else) I don't understand your resistance towards it.

So yes, please make the change. Thank you.

voodoo-dn · 2020-02-13T14:18:05Z

Okay, thanks, I will fix it. But it will be like spagettie code :)

voodoo-dn · 2020-02-20T08:24:33Z

Hi, @dcelasun. Code was rewrited on if/else instead of NullInterceptor. But pipelines are failed and I cant find info in log about any errors for golang tests. Can you help me? Thanks.

voodoo-dn · 2020-10-08T19:06:00Z

In your interceptor implementations both requests and responses are interface{}, e.g. typeless. Why can they not be thrift.TStruct instead?

You are totally right, it this TStruct instead of interface{} is very good solution.

Middlewares can do everything in the list you mentioned, but without the requests and responses. And looking at your example code, they are only used in logging. So it seems to me that they are not something required to function, but more like nice to have things.

Are you sure? Logging is like "nice to have"? How you will develop something, where request and responses in binary data and you do not know what data creates errors in you application? Do you think that is better to add 100 calls to logger for every thrift handler?

So to summarize my impression is that this is a relatively high price to pay (with compiler changes and weaker type system) for a relatively low gain (as I mentioned in the discussion in the ticket, it's doable in middleware as well, just more complex). I'm also doubtful of the usefulness of blindly logging all requests/responses for every request in every endpoint.

Weak type system will be fixed if you are agree to merge this MR.
Guys, your project is great, but let's face it:
gRPC is more popular(without great things as Exception, that your project supports), there is bigger community, it's contains such things as Interceptor(what is from your vision is useless) from the box. Without new features which scares you, your project become less popular and will die without community supporting. I don't injure you, but your relation to new features reflect attitude to community. I do not say: if you do not merge it, we will move our projects to gRPC, not, i'm trying to understand why such great feature as Interceptor, which is done not by you, you didn't spent time on coding and testing(this Interceptor is using half year in production without any problems by our company) makes such problems.

Also I'm curious: how do you implement your maskArg function so it works for all your different request types? (Serhii mentioned that they have 100 endpoints, and I assume they are from the same company as you?)

What is maskArg do: marshal args to json, then we cut from json some sensitive data and log this json.
Yes, we work with Serhii in one team.

Sorry if I offended you in some way. Have a good time.

fishy · 2020-10-08T19:26:55Z

Logging is certainly important, but I'm not so sure how important it is to log all requests/responses for every request.

Granted everyone's situation is different, but my experience is that we only log errors in production, and in most cases the error messages have enough information to debug the issue. And if that's not enough, in most of (our) cases it's not something in the request, more logging/metrics near the place the error happens are much more useful than just logging the whole request.

"Weaker type system" is in refer to the usage of interface{} as the type of the requests/responses.

What is maskArg do: marshal args to json, then we cut from json some sensitive data and log this json.

The cutting part sounds very inefficient if you are doing that after marshaled to json. Do you unmarshal json again to strip the sensitive data, or use some regexp to manipulate the json string directly? (this is purely my curiosity, nothing to do with this PR)

dcelasun · 2020-10-08T19:59:50Z

I'm not so sure how important it is to log all requests/responses for every request.

Agreed. In my experience this is only needed rarely for debugging and when that happens something like this works just fine inside the handler:

func (s *Server) ExampleHandler(ctx context.Context, req *gen.ExampleRequest) (resp *gen.ExampleResponse, respErr error) {
	t := trace.New(ctx, "example").Input(req)
	defer func() { t.Output(resp, respErr).Send() }()
	
	// rest of the handler code...
}

So yeah, I think the cost of this change is way too high to avoid those 2 lines of code.

voodoo-dn · 2020-10-08T20:07:05Z

Logging is certainly important, but I'm so sure how important it is to log all requests/responses for every request.

Logs collecting is cheap, but allows you to reproduce system behavior. When you works with microservices, it's must have feature to log everything.

Granted everyone's situation is different, but my experience is that we only log errors in production, and in most cases the error messages have enough information to debug the issue. And if that's not enough, in most of (our) cases it's not something in the request, more logging/metrics near the place the error happens are much more useful than just logging the whole request.

For example: some your endpoints starts return internal errors. You do not have middlewares, you haven't added any logging to thrift handler. What will you do? Do hotfix on production and add logger?)

"Weaker type system" is in refer to the usage of interface{} as the type of the requests/responses.

I understand. I can make arg and result as TStruct, it's not a problem.

What is maskArg do: marshal args to json, then we cut from json some sensitive data and log this json.

The cutting part sounds very inefficient if you are doing that after marshaled to json. Do you unmarshal json again to strip the sensitive data, or use some regexp to manipulate the json string directly? (this is purely my curiosity, nothing to do with this PR)

Yes, there is not the efficient algorithm, we will fix it when we might have problems with performance.

fishy · 2020-10-08T20:18:07Z

For example: some your endpoints starts return internal errors. You do not have middlewares, you haven't added any logging to thrift handler. What will you do? Do hotfix on production and add logger?)

How did they start to return internal errors? If it's panic, the log will automatically have the stack trace so you know on which line it panicked. If it's an error returned by your handler code, sure you know why you would return an error there?

Please pardon my lack of imagination, but the only situation I can imagine that logging the whole request would help, is that there are some corner case from the input you didn't handle well, like having problem to handle negative number for an unsigned int, or having problems with string input being too short. But even in those cases, logging the callstack at the time you return error/panic are usually much more useful than having the whole request logged.

voodoo-dn · 2020-10-08T20:34:38Z

So yeah, I think the cost of this change is way too high to avoid those 2 lines of code.

I think better to say: 2 line on code in 100 places.

Your middleware cannot log information, haven't information about which handler method was called.
If you want take data from request and you protocol is Binary, you should copy values from in TProtocol and deserialize data, lol, very efficient when do it twice.

For metrics - the same, you do not know which handler method was called, you know only service name.
How you will collect metrics without handler method name?
But I can answer instead of you: you can't do it at all.

How do you collect traces/metrics in real project? Where your jaeger span starts and how you take "serviceName:methodName" to collect metrics?

dcelasun · 2020-10-08T20:41:44Z

How do you collect traces/metrics in real project? Where your jaeger span starts and how you take "serviceName:methodName" to collect metrics?

With middlewares, the handler name is available. See the original commit for examples.

How do you collect traces/metrics in real project? Where your jaeger span starts and how you take "serviceName:methodName" to collect metrics?

Either with middlewares (the new way), or by a 2-line statement inside the handler (the old way).

fishy · 2020-10-08T20:42:02Z

you do not know which handler method was called

That's not true. the name in ProcessorMiddleware is the name of the endpoint. Or do you have a different definition of "service name" and "handler name" from me?

fishy · 2020-10-08T20:44:23Z

To clarify, in the example of https://github.com/apache/thrift/blob/daf620915714b76fce517b376b963440d1f34089/test/Service.thrift: the "service name" is Service and the "endpoint name"/"handler name" is testEpisode (this is the one you get from the name arg from middleware)

voodoo-dn · 2020-10-08T21:00:15Z

I have mistaken with naming. Service has N methods. Middleware doesn't know anything about method name.

voodoo-dn · 2020-10-08T21:01:56Z

you do not know which handler method was called

That's not true. the name in ProcessorMiddleware is the name of the endpoint. Or do you have a different definition of "service name" and "handler name" from me?

It's true. Name is only name of service, not name of the method.

fishy · 2020-10-08T21:20:27Z

you do not know which handler method was called

That's not true. the name in ProcessorMiddleware is the name of the endpoint. Or do you have a different definition of "service name" and "handler name" from me?

It's true. Name is only name of service, not name of the method.

Can we take a step back to get on the same page first?

Could you please take a look at this concrete example: https://github.com/apache/thrift/blob/daf620915714b76fce517b376b963440d1f34089/test/Service.thrift, and tell me what is the "service name" and "method/handler/endpoint name" in your definition?

voodoo-dn · 2020-10-08T21:30:45Z

Service name is Service. Method name is testEpisode.

fishy · 2020-10-08T21:31:59Z

middleware gets testEpisode as the name arg, not Service

fishy · 2020-10-09T00:55:20Z

Since the only gap between interceptor idea and already existing middleware are the requests/responses, here's an idea to add that ability, without any compiler changes (this can even be from a third party library, it doesn't use any unexported function from the thrift library).

First you just need to implement a RecordingTProtocol (name TBD, it can also be InterceptorTProtocol if you like):

type RecordingTProtocol struct {
  Recording, Actual TProtocol
}

func (rp RecordingTProtocol) ReadMessageBegin() (name string, typeId TMessageType, seqid int32, err error) {
  name, typeId, seqid, err = rp.Actual.ReadMessageBegin()
  if err != nil {
    err = rp.Recording.WriteMessageBegin(name, typeId, seqid)
  }
}

// TODO: implement other TProtocol read functions

func (rp RecordingTProtocol) WriteMessageBegin(name string, typeId TMessageType, seqid int32) (err error) {
  rp.Recording.WriteMessageBegin(name, typeId, seqid)
  err = rp.Actual.WriteMessageBegin(name, typeId, seqid)
}

// TODO: implement other TProtocol write functions

Then you can just use it in a middleware implementation and combine it with TMemoryBuffer as the underlying TTransport for the recording one. For example for json logging:

var pool = sync.Pool{
  New: func() interface{} {
    return thrift.NewTMemoryBuffer()
  },
}

func loggingMiddleware(name string, next thrift.TProcessorFunction) thrift.TProcessorFunction {
  return thrift.WrappedTProcessorFunction{
    Wrapped: func(ctx context.Context, seqID int32, in, out thrift.TProtocol) (_ bool, err thrift.TException) {
      inBuf := pool.Get().(*thrift.TMemoryBuffer)
      inBuf.Reset()
      iproto := thrift.NewTSimpleJSONProtocol(inBuf)
      // Since in thrift compiled Processor functions, we call ReadMessageBegin
      // before calling the actual processor, we will have the ReadMessageEnd at
      // the end without the pairing ReadMessageBegin call, and that will cause
      // TSimpleJSONProtocol to panic.
      iproto.WriteMessageBegin(ctx, "" /* name */, thrift.CALL, seqID)
      in = TRecordingProtocol{
        Actual:    in,
        Recording: iproto,
      }

      outBuf := pool.Get().(*thrift.TMemoryBuffer)
      outBuf.Reset()
      oproto := thrift.NewTSimpleJSONProtocol(outBuf)
      out = TRecordingProtocol{
        Actual:    out,
        Recording: oproto,
      }

      defer func() {
        iproto.Flush(ctx)
        logger("method", name, "request", inBuf.String(), "response", outBuf.String(), "err", err)
      }()

      return next.Process(ctx, seqID, in, out)
    },
  }
}

And I can also use it to record the size on the wire (also I think this is inefficient and it's better done in envoy/etc.):

func sizeRecordingMiddleware(name string, next thrift.TProcessorFunction) thrift.TProcessorFunction {
  return thrift.WrappedTProcessorFunction{
    Wrapped: func(ctx context.Context, seqID int32, in, out thrift.TProtocol) (bool, thrift.TException) {
      inBuf := pool.Get().(*thrift.TMemoryBuffer)
      inBuf.Reset()
      outBuf := pool.Get().(*thrift.TMemoryBuffer)
      outBuf.Reset()

      defer func() {
        requestSizeHistogram.Observe(inBuf.Len())
        responseSizeHistogram.Observe(outBuf.Len())
        pool.Put(inBuf)
        pool.Put(outBuf)
      }()

      iproto := factory.GetProtocol(inBuf)
      in = TRecordingProtocol{
        Actual:    in,
        Recording: iproto,
      }
      oproto := factory.GetProtocol(outBuf)
      out = TRecordingProtocol{
        Actual:    out,
        Recording: oproto,
      }

      return next.Process(ctx, seqID, in, out)
    },
  }
}

If people find it useful I can totally just implement this RecordingTProtocol in thrift library.

voodoo-dn · 2020-10-09T05:08:20Z

Will it work with TBinaryProtocol?

fishy · 2020-10-09T17:46:31Z

It works for any TProtocol.

voodoo-dn · 2020-10-09T18:09:23Z

It works for any TProtocol.

I meant: can I receive human readable data from TBinaryProtocol and log it?

fishy · 2020-10-09T18:10:22Z

It works for any TProtocol.

I meant: can I receive human readable data from TBinaryProtocol and log it?

No. For human readable data you should use TSimpleJSONProtocol or TJSONProtocol.

voodoo-dn · 2020-10-09T18:18:16Z

It works for any TProtocol.

I meant: can I receive human readable data from TBinaryProtocol and log it?

No. For human readable data you should use TSimpleJSONProtocol or TJSONProtocol.

Are you think that is ok approach?)

fishy · 2020-10-09T18:21:55Z

It works for any TProtocol.

I meant: can I receive human readable data from TBinaryProtocol and log it?

No. For human readable data you should use TSimpleJSONProtocol or TJSONProtocol.

Are you think that is ok approach?)

What do you mean? In my personal opinion json logging all requests and responses are very inefficient and not ok (I tried that loggingMiddleware in our staging, and because our requests and responses are so large our log ingester cannot handle those super large log entries). But it's as ok as your implementation of json marshal all requests and responses with interceptor.

voodoo-dn · 2020-10-09T18:34:39Z

I meant that you have chosen approach, where information logging allowed only when you use TJsonProtocol, instead of interceptor, that can work with any protocol.

But I understand your position about code that grows in generator.

I think that I haven't arguments and I think you can close MR. Thanks for discussion. Good luck and be healthy!

This also comes from the discussion of apache#1992 (comment). I think TDebugProtocol is a better fit for this feature than creating a new TProtocol implementation. The Dump field is not added to TDebugProtocolFactory because I don't think it makes sense from the factory setup. In vast majority cases users would need direct access to the underlying TMemoryBuffer to make it useful, which is easier this way than an additional TTransportFactory plus TProtocolFactory to make TDebugProtocolFactory way too complex.

This also comes from the discussion of apache#1992 (comment). I think TDebugProtocol is a better fit for this feature than creating a new TProtocol implementation. The DuplicateTo field is not added to TDebugProtocolFactory because I don't think it makes sense from the factory setup. In vast majority cases users would need direct access to the underlying TMemoryBuffer to make it useful, which is easier this way than an additional TTransportFactory plus TProtocolFactory to make TDebugProtocolFactory way too complicated.

This also comes from the discussion of #1992 (comment). I think TDebugProtocol is a better fit for this feature than creating a new TProtocol implementation. The DuplicateTo field is not added to TDebugProtocolFactory because I don't think it makes sense from the factory setup. In vast majority cases users would need direct access to the underlying TMemoryBuffer to make it useful, which is easier this way than an additional TTransportFactory plus TProtocolFactory to make TDebugProtocolFactory way too complicated.

stale · 2020-12-13T01:58:20Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale · 2020-12-20T14:28:10Z

This issue has been automatically closed due to inactivity. Thank you for your contributions.

voodoo-dn force-pushed the feature/handler-processor-with-options branch 2 times, most recently from 274e722 to a314be7 Compare January 23, 2020 15:23

Jens-G requested changes Jan 23, 2020

View reviewed changes

voodoo-dn force-pushed the feature/handler-processor-with-options branch from a314be7 to 829c674 Compare January 24, 2020 10:12

voodoo-dn requested a review from Jens-G January 24, 2020 15:21

incu6us force-pushed the feature/handler-processor-with-options branch from 0451a77 to 829c674 Compare February 11, 2020 12:19

THRIFT-5073

6b47e4c

voodoo-dn force-pushed the feature/handler-processor-with-options branch from 27d3e03 to 6b47e4c Compare February 19, 2020 16:28

fishy mentioned this pull request Oct 14, 2020

Add DuplicateTo protocol to go's TDebugProtocol #2261

Merged

stale bot added the wontfix label Dec 13, 2020

stale bot closed this Dec 20, 2020

THRIFT-5073: Optional handler interceptor for processor #1992

THRIFT-5073: Optional handler interceptor for processor #1992

Conversation

voodoo-dn commented Jan 23, 2020

voodoo-dn commented Jan 23, 2020

dcelasun commented Jan 23, 2020

voodoo-dn commented Jan 23, 2020

voodoo-dn commented Jan 24, 2020

dcelasun commented Jan 24, 2020

voodoo-dn commented Jan 24, 2020

voodoo-dn commented Jan 24, 2020

dcelasun commented Jan 24, 2020

Jens-G commented Jan 27, 2020

voodoo-dn commented Jan 29, 2020

dcelasun commented Jan 29, 2020

voodoo-dn commented Jan 29, 2020

voodoo-dn commented Feb 11, 2020

dcelasun commented Feb 12, 2020 • edited

voodoo-dn commented Feb 13, 2020

dcelasun commented Feb 13, 2020

voodoo-dn commented Feb 13, 2020

dcelasun commented Feb 13, 2020

voodoo-dn commented Feb 13, 2020

dcelasun commented Feb 13, 2020

voodoo-dn commented Feb 13, 2020

voodoo-dn commented Feb 20, 2020

voodoo-dn commented Oct 8, 2020 • edited

fishy commented Oct 8, 2020 • edited

dcelasun commented Oct 8, 2020

voodoo-dn commented Oct 8, 2020

fishy commented Oct 8, 2020

voodoo-dn commented Oct 8, 2020

dcelasun commented Oct 8, 2020

fishy commented Oct 8, 2020

fishy commented Oct 8, 2020

voodoo-dn commented Oct 8, 2020

voodoo-dn commented Oct 8, 2020

fishy commented Oct 8, 2020

voodoo-dn commented Oct 8, 2020

fishy commented Oct 8, 2020

fishy commented Oct 9, 2020 • edited

voodoo-dn commented Oct 9, 2020

fishy commented Oct 9, 2020

voodoo-dn commented Oct 9, 2020

fishy commented Oct 9, 2020

voodoo-dn commented Oct 9, 2020

fishy commented Oct 9, 2020

voodoo-dn commented Oct 9, 2020

stale bot commented Dec 13, 2020

stale bot commented Dec 20, 2020

dcelasun commented Feb 12, 2020 •

edited

voodoo-dn commented Oct 8, 2020 •

edited

fishy commented Oct 8, 2020 •

edited

fishy commented Oct 9, 2020 •

edited