Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

THRIFT-5073: Optional handler interceptor for processor #1992

Conversation

voodoo-dn
Copy link

Hi.

Thank for this great project. But I think we can introduce some improvements which will be are useful for other developers.

Inspired by https://github.com/grpc/grpc-go/blob/master/examples/features/interceptor/server/main.go.

Interceptors give opportunities for:

Logging input request and response
We can log panic situation and return TApplicationException to client instead of EOF.
We can intercept request and do stuff like some validation, etc.
Usage example: https://gist.github.com/voodoo-dn/6af6cf54b8145cc3d4d87d888c4bf907

@voodoo-dn voodoo-dn force-pushed the feature/handler-processor-with-options branch 2 times, most recently from 274e722 to a314be7 Compare January 23, 2020 15:23
@voodoo-dn
Copy link
Author

@dcelasun hello. Test created, but some jobs are failed because of network problems.
mvn.ant.tasks.download: [get] Getting: https://repo1.maven.org/maven2/org/apache/maven/maven-ant-tasks/2.1.3/maven-ant-tasks-2.1.3.jar [get] To: /thrift/src/lib/json/test/build/tools/maven-ant-tasks-2.1.3.jar [get] Error opening connection java.io.IOException: Server returned HTTP response code: 403 for URL: https://repo1.maven.org/maven2/org/apache/maven/maven-ant-tasks/2.1.3/maven-ant-tasks-2.1.3.jar [get] Error opening connection java.io.IOException: Server returned HTTP response code: 403 for URL: https://repo1.maven.org/maven2/org/apache/maven/maven-ant-tasks/2.1.3/maven-ant-tasks-2.1.3.jar [get] Error opening connection java.io.IOException: Server returned HTTP response code: 403 for URL: https://repo1.maven.org/maven2/org/apache/maven/maven-ant-tasks/2.1.3/maven-ant-tasks-2.1.3.jar [get] Can't get https://repo1.maven.org/maven2/org/apache/maven/maven-ant-tasks/2.1.3/maven-ant-tasks-2.1.3.jar to /thrift/src/lib/json/test/build/tools/maven-ant-tasks-2.1.3.jar

@dcelasun
Copy link
Member

No worries, that's a temporary failure with Maven Central. I've restarted the failed tasks.

@voodoo-dn
Copy link
Author

@dcelasun Thanks, but the same problem. Maybe it start working tomorrow.

@voodoo-dn voodoo-dn force-pushed the feature/handler-processor-with-options branch from a314be7 to 829c674 Compare January 24, 2020 10:12
@voodoo-dn
Copy link
Author

@Jens-G Thanks, fixes are commited. Who must resolve conversation? Reviewer of author of PR?

@dcelasun
Copy link
Member

@voodoo-dn you can resolve them yourself :)

@voodoo-dn
Copy link
Author

Could you restart failed jobs, please?

@voodoo-dn
Copy link
Author

What is next steps?

@voodoo-dn voodoo-dn requested a review from Jens-G January 24, 2020 15:21
@dcelasun
Copy link
Member

Tests look fine now (the remaining failure is unrelated to this PR). I'll take a closer look at the PR over the weekend.

@Jens-G
Copy link
Member

Jens-G commented Jan 27, 2020

Looks good to me, no further objections from my side. Not tested though, only code review.

@voodoo-dn
Copy link
Author

Hello. Sorry, but we have dependencies on this PR and do not want to use fork. When you can merge this PR? Thanks.

@dcelasun
Copy link
Member

Sorry, I still haven't got a chance to properly review this PR. I'll try to make some time for it over the weekend.

Also, while not necessarily a deal breaker, it would be great if you'd implement this for at least one more language that Thrift supports. Single language "optional features" like these cause great divergence between libraries and is painful to maintain in the long term.

@voodoo-dn
Copy link
Author

Thanks for the answer. Our team will think about this feature for other language.

@incu6us incu6us force-pushed the feature/handler-processor-with-options branch from 0451a77 to 829c674 Compare February 11, 2020 12:19
@voodoo-dn
Copy link
Author

@dcelasun Hello. Sorry, but we cannot create feature for another languages, because we have no time. Can you say what I must to do to speedup merging PR to the master branch? Thanks.

@dcelasun
Copy link
Member

dcelasun commented Feb 12, 2020

OK, finally got a chance to look into this. The lack of another language is unfortunate, but it's not a deal breaker.

My only concern is NullInterceptor. Can we get rid of it? If the user doesn't pass any interceptors, we shouldn't call any. There should be no default interceptors.

@voodoo-dn
Copy link
Author

Hello. Without NullObject(NullInterceptor) code will be more complicated with if/else conditions(if interceptor set do first case, if not - second case). So, NullInterceptor is only a proxy to original handler, and do code more readable and easy to support/understand.

@dcelasun
Copy link
Member

I understand that, but the additional if/else will better for performance in the default case which is no interceptors.

@voodoo-dn
Copy link
Author

Okay, I understand your position. Maybe you have performance tests to check degradation after my feature? Is 1-2ns a critical impact on performance?

@dcelasun
Copy link
Member

It's not a critical impact, but it's an avoidable impact just the same.

@voodoo-dn
Copy link
Author

So without removing NullInterceptor you will not merge PR?

@dcelasun
Copy link
Member

I don't see why this needs to be so confrontational. Every change to the codebase is something we have to maintain, usually indefinitely. So the barrier for entry for any new feature must be high. This PR is already weakened by its single language, out-of-spec nature so the least it could do is have zero impact on users who are not using this feature (which is the default case). When zero impact requires such a small and basic change (if/else) I don't understand your resistance towards it.

So yes, please make the change. Thank you.

@voodoo-dn
Copy link
Author

Okay, thanks, I will fix it. But it will be like spagettie code :)

@voodoo-dn voodoo-dn force-pushed the feature/handler-processor-with-options branch from 27d3e03 to 6b47e4c Compare February 19, 2020 16:28
@voodoo-dn
Copy link
Author

Hi, @dcelasun. Code was rewrited on if/else instead of NullInterceptor. But pipelines are failed and I cant find info in log about any errors for golang tests. Can you help me? Thanks.

@voodoo-dn
Copy link
Author

voodoo-dn commented Oct 8, 2020

In your interceptor implementations both requests and responses are interface{}, e.g. typeless. Why can they not be thrift.TStruct instead?

You are totally right, it this TStruct instead of interface{} is very good solution.

Middlewares can do everything in the list you mentioned, but without the requests and responses. And looking at your example code, they are only used in logging. So it seems to me that they are not something required to function, but more like nice to have things.

Are you sure? Logging is like "nice to have"? How you will develop something, where request and responses in binary data and you do not know what data creates errors in you application? Do you think that is better to add 100 calls to logger for every thrift handler?

So to summarize my impression is that this is a relatively high price to pay (with compiler changes and weaker type system) for a relatively low gain (as I mentioned in the discussion in the ticket, it's doable in middleware as well, just more complex). I'm also doubtful of the usefulness of blindly logging all requests/responses for every request in every endpoint.

Weak type system will be fixed if you are agree to merge this MR.
Guys, your project is great, but let's face it:
gRPC is more popular(without great things as Exception, that your project supports), there is bigger community, it's contains such things as Interceptor(what is from your vision is useless) from the box. Without new features which scares you, your project become less popular and will die without community supporting. I don't injure you, but your relation to new features reflect attitude to community. I do not say: if you do not merge it, we will move our projects to gRPC, not, i'm trying to understand why such great feature as Interceptor, which is done not by you, you didn't spent time on coding and testing(this Interceptor is using half year in production without any problems by our company) makes such problems.

Also I'm curious: how do you implement your maskArg function so it works for all your different request types? (Serhii mentioned that they have 100 endpoints, and I assume they are from the same company as you?)

What is maskArg do: marshal args to json, then we cut from json some sensitive data and log this json.
Yes, we work with Serhii in one team.

Sorry if I offended you in some way. Have a good time.

@fishy
Copy link
Member

fishy commented Oct 8, 2020

Logging is certainly important, but I'm not so sure how important it is to log all requests/responses for every request.

Granted everyone's situation is different, but my experience is that we only log errors in production, and in most cases the error messages have enough information to debug the issue. And if that's not enough, in most of (our) cases it's not something in the request, more logging/metrics near the place the error happens are much more useful than just logging the whole request.

"Weaker type system" is in refer to the usage of interface{} as the type of the requests/responses.

What is maskArg do: marshal args to json, then we cut from json some sensitive data and log this json.

The cutting part sounds very inefficient if you are doing that after marshaled to json. Do you unmarshal json again to strip the sensitive data, or use some regexp to manipulate the json string directly? (this is purely my curiosity, nothing to do with this PR)

@dcelasun
Copy link
Member

dcelasun commented Oct 8, 2020

I'm not so sure how important it is to log all requests/responses for every request.

Agreed. In my experience this is only needed rarely for debugging and when that happens something like this works just fine inside the handler:

func (s *Server) ExampleHandler(ctx context.Context, req *gen.ExampleRequest) (resp *gen.ExampleResponse, respErr error) {
	t := trace.New(ctx, "example").Input(req)
	defer func() { t.Output(resp, respErr).Send() }()
	
	// rest of the handler code...
}

So yeah, I think the cost of this change is way too high to avoid those 2 lines of code.

@voodoo-dn
Copy link
Author

Logging is certainly important, but I'm so sure how important it is to log all requests/responses for every request.

Logs collecting is cheap, but allows you to reproduce system behavior. When you works with microservices, it's must have feature to log everything.

Granted everyone's situation is different, but my experience is that we only log errors in production, and in most cases the error messages have enough information to debug the issue. And if that's not enough, in most of (our) cases it's not something in the request, more logging/metrics near the place the error happens are much more useful than just logging the whole request.

For example: some your endpoints starts return internal errors. You do not have middlewares, you haven't added any logging to thrift handler. What will you do? Do hotfix on production and add logger?)

"Weaker type system" is in refer to the usage of interface{} as the type of the requests/responses.

I understand. I can make arg and result as TStruct, it's not a problem.

What is maskArg do: marshal args to json, then we cut from json some sensitive data and log this json.

The cutting part sounds very inefficient if you are doing that after marshaled to json. Do you unmarshal json again to strip the sensitive data, or use some regexp to manipulate the json string directly? (this is purely my curiosity, nothing to do with this PR)

Yes, there is not the efficient algorithm, we will fix it when we might have problems with performance.

@fishy
Copy link
Member

fishy commented Oct 8, 2020

For example: some your endpoints starts return internal errors. You do not have middlewares, you haven't added any logging to thrift handler. What will you do? Do hotfix on production and add logger?)

How did they start to return internal errors? If it's panic, the log will automatically have the stack trace so you know on which line it panicked. If it's an error returned by your handler code, sure you know why you would return an error there?

Please pardon my lack of imagination, but the only situation I can imagine that logging the whole request would help, is that there are some corner case from the input you didn't handle well, like having problem to handle negative number for an unsigned int, or having problems with string input being too short. But even in those cases, logging the callstack at the time you return error/panic are usually much more useful than having the whole request logged.

@voodoo-dn
Copy link
Author

So yeah, I think the cost of this change is way too high to avoid those 2 lines of code.

I think better to say: 2 line on code in 100 places.

Your middleware cannot log information, haven't information about which handler method was called.
If you want take data from request and you protocol is Binary, you should copy values from in TProtocol and deserialize data, lol, very efficient when do it twice.

For metrics - the same, you do not know which handler method was called, you know only service name.
How you will collect metrics without handler method name?
But I can answer instead of you: you can't do it at all.

How do you collect traces/metrics in real project? Where your jaeger span starts and how you take "serviceName:methodName" to collect metrics?

@dcelasun
Copy link
Member

dcelasun commented Oct 8, 2020

How do you collect traces/metrics in real project? Where your jaeger span starts and how you take "serviceName:methodName" to collect metrics?

With middlewares, the handler name is available. See the original commit for examples.

How do you collect traces/metrics in real project? Where your jaeger span starts and how you take "serviceName:methodName" to collect metrics?

Either with middlewares (the new way), or by a 2-line statement inside the handler (the old way).

@fishy
Copy link
Member

fishy commented Oct 8, 2020

you do not know which handler method was called

That's not true. the name in ProcessorMiddleware is the name of the endpoint. Or do you have a different definition of "service name" and "handler name" from me?

@fishy
Copy link
Member

fishy commented Oct 8, 2020

To clarify, in the example of https://github.com/apache/thrift/blob/daf620915714b76fce517b376b963440d1f34089/test/Service.thrift: the "service name" is Service and the "endpoint name"/"handler name" is testEpisode (this is the one you get from the name arg from middleware)

@voodoo-dn
Copy link
Author

I have mistaken with naming. Service has N methods. Middleware doesn't know anything about method name.

@voodoo-dn
Copy link
Author

you do not know which handler method was called

That's not true. the name in ProcessorMiddleware is the name of the endpoint. Or do you have a different definition of "service name" and "handler name" from me?

It's true. Name is only name of service, not name of the method.

@fishy
Copy link
Member

fishy commented Oct 8, 2020

you do not know which handler method was called

That's not true. the name in ProcessorMiddleware is the name of the endpoint. Or do you have a different definition of "service name" and "handler name" from me?

It's true. Name is only name of service, not name of the method.

Can we take a step back to get on the same page first?

Could you please take a look at this concrete example: https://github.com/apache/thrift/blob/daf620915714b76fce517b376b963440d1f34089/test/Service.thrift, and tell me what is the "service name" and "method/handler/endpoint name" in your definition?

@voodoo-dn
Copy link
Author

Service name is Service. Method name is testEpisode.

@fishy
Copy link
Member

fishy commented Oct 8, 2020

middleware gets testEpisode as the name arg, not Service

@fishy
Copy link
Member

fishy commented Oct 9, 2020

Since the only gap between interceptor idea and already existing middleware are the requests/responses, here's an idea to add that ability, without any compiler changes (this can even be from a third party library, it doesn't use any unexported function from the thrift library).

First you just need to implement a RecordingTProtocol (name TBD, it can also be InterceptorTProtocol if you like):

type RecordingTProtocol struct {
  Recording, Actual TProtocol
}

func (rp RecordingTProtocol) ReadMessageBegin() (name string, typeId TMessageType, seqid int32, err error) {
  name, typeId, seqid, err = rp.Actual.ReadMessageBegin()
  if err != nil {
    err = rp.Recording.WriteMessageBegin(name, typeId, seqid)
  }
}

// TODO: implement other TProtocol read functions

func (rp RecordingTProtocol) WriteMessageBegin(name string, typeId TMessageType, seqid int32) (err error) {
  rp.Recording.WriteMessageBegin(name, typeId, seqid)
  err = rp.Actual.WriteMessageBegin(name, typeId, seqid)
}

// TODO: implement other TProtocol write functions

Then you can just use it in a middleware implementation and combine it with TMemoryBuffer as the underlying TTransport for the recording one. For example for json logging:

var pool = sync.Pool{
  New: func() interface{} {
    return thrift.NewTMemoryBuffer()
  },
}

func loggingMiddleware(name string, next thrift.TProcessorFunction) thrift.TProcessorFunction {
  return thrift.WrappedTProcessorFunction{
    Wrapped: func(ctx context.Context, seqID int32, in, out thrift.TProtocol) (_ bool, err thrift.TException) {
      inBuf := pool.Get().(*thrift.TMemoryBuffer)
      inBuf.Reset()
      iproto := thrift.NewTSimpleJSONProtocol(inBuf)
      // Since in thrift compiled Processor functions, we call ReadMessageBegin
      // before calling the actual processor, we will have the ReadMessageEnd at
      // the end without the pairing ReadMessageBegin call, and that will cause
      // TSimpleJSONProtocol to panic.
      iproto.WriteMessageBegin(ctx, "" /* name */, thrift.CALL, seqID)
      in = TRecordingProtocol{
        Actual:    in,
        Recording: iproto,
      }

      outBuf := pool.Get().(*thrift.TMemoryBuffer)
      outBuf.Reset()
      oproto := thrift.NewTSimpleJSONProtocol(outBuf)
      out = TRecordingProtocol{
        Actual:    out,
        Recording: oproto,
      }

      defer func() {
        iproto.Flush(ctx)
        logger("method", name, "request", inBuf.String(), "response", outBuf.String(), "err", err)
      }()

      return next.Process(ctx, seqID, in, out)
    },
  }
}

And I can also use it to record the size on the wire (also I think this is inefficient and it's better done in envoy/etc.):

func sizeRecordingMiddleware(name string, next thrift.TProcessorFunction) thrift.TProcessorFunction {
  return thrift.WrappedTProcessorFunction{
    Wrapped: func(ctx context.Context, seqID int32, in, out thrift.TProtocol) (bool, thrift.TException) {
      inBuf := pool.Get().(*thrift.TMemoryBuffer)
      inBuf.Reset()
      outBuf := pool.Get().(*thrift.TMemoryBuffer)
      outBuf.Reset()

      defer func() {
        requestSizeHistogram.Observe(inBuf.Len())
        responseSizeHistogram.Observe(outBuf.Len())
        pool.Put(inBuf)
        pool.Put(outBuf)
      }()

      iproto := factory.GetProtocol(inBuf)
      in = TRecordingProtocol{
        Actual:    in,
        Recording: iproto,
      }
      oproto := factory.GetProtocol(outBuf)
      out = TRecordingProtocol{
        Actual:    out,
        Recording: oproto,
      }

      return next.Process(ctx, seqID, in, out)
    },
  }
}

If people find it useful I can totally just implement this RecordingTProtocol in thrift library.

@voodoo-dn
Copy link
Author

Will it work with TBinaryProtocol?

@fishy
Copy link
Member

fishy commented Oct 9, 2020

It works for any TProtocol.

@voodoo-dn
Copy link
Author

It works for any TProtocol.

I meant: can I receive human readable data from TBinaryProtocol and log it?

@fishy
Copy link
Member

fishy commented Oct 9, 2020

It works for any TProtocol.

I meant: can I receive human readable data from TBinaryProtocol and log it?

No. For human readable data you should use TSimpleJSONProtocol or TJSONProtocol.

@voodoo-dn
Copy link
Author

It works for any TProtocol.

I meant: can I receive human readable data from TBinaryProtocol and log it?

No. For human readable data you should use TSimpleJSONProtocol or TJSONProtocol.

Are you think that is ok approach?)

@fishy
Copy link
Member

fishy commented Oct 9, 2020

It works for any TProtocol.

I meant: can I receive human readable data from TBinaryProtocol and log it?

No. For human readable data you should use TSimpleJSONProtocol or TJSONProtocol.

Are you think that is ok approach?)

What do you mean? In my personal opinion json logging all requests and responses are very inefficient and not ok (I tried that loggingMiddleware in our staging, and because our requests and responses are so large our log ingester cannot handle those super large log entries). But it's as ok as your implementation of json marshal all requests and responses with interceptor.

@voodoo-dn
Copy link
Author

I meant that you have chosen approach, where information logging allowed only when you use TJsonProtocol, instead of interceptor, that can work with any protocol.

But I understand your position about code that grows in generator.

I think that I haven't arguments and I think you can close MR. Thanks for discussion. Good luck and be healthy!

fishy added a commit to fishy/thrift that referenced this pull request Oct 14, 2020
This also comes from the discussion of
apache#1992 (comment).
I think TDebugProtocol is a better fit for this feature than creating a
new TProtocol implementation.

The Dump field is not added to TDebugProtocolFactory because I don't
think it makes sense from the factory setup. In vast majority cases
users would need direct access to the underlying TMemoryBuffer to make
it useful, which is easier this way than an additional TTransportFactory
plus TProtocolFactory to make TDebugProtocolFactory way too complex.
fishy added a commit to fishy/thrift that referenced this pull request Oct 19, 2020
This also comes from the discussion of
apache#1992 (comment).
I think TDebugProtocol is a better fit for this feature than creating a
new TProtocol implementation.

The DuplicateTo field is not added to TDebugProtocolFactory because I
don't think it makes sense from the factory setup. In vast majority
cases users would need direct access to the underlying TMemoryBuffer to
make it useful, which is easier this way than an additional
TTransportFactory plus TProtocolFactory to make TDebugProtocolFactory
way too complicated.
fishy added a commit that referenced this pull request Oct 19, 2020
This also comes from the discussion of
#1992 (comment).
I think TDebugProtocol is a better fit for this feature than creating a
new TProtocol implementation.

The DuplicateTo field is not added to TDebugProtocolFactory because I
don't think it makes sense from the factory setup. In vast majority
cases users would need direct access to the underlying TMemoryBuffer to
make it useful, which is easier this way than an additional
TTransportFactory plus TProtocolFactory to make TDebugProtocolFactory
way too complicated.
@stale
Copy link

stale bot commented Dec 13, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Dec 13, 2020
@stale
Copy link

stale bot commented Dec 20, 2020

This issue has been automatically closed due to inactivity. Thank you for your contributions.

@stale stale bot closed this Dec 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants