Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout while sending batch #100

Closed
e96wic opened this issue Apr 1, 2019 · 6 comments · Fixed by #105
Closed

Timeout while sending batch #100

e96wic opened this issue Apr 1, 2019 · 6 comments · Fixed by #105
Assignees
Labels
bug Something isn't working

Comments

@e96wic
Copy link
Contributor

e96wic commented Apr 1, 2019

We have a microservice running that sends one message to an eventhub every second. We're seeing timeouts (context.deadlineExceeded) in this setup multiple times per day with a timeoutCtx of 5 seconds. Sometimes it helps to create a new hub object, at other times sending the subsequent message also fails with the same timeout and error. We have another microservice written in Java that has a similar logic where we don't see this behaviour.

Until now we saw these errors:

  • amqp: connection closed
  • amqp: link closed

Are there any options we can try out, how should we deal with reconnecting?

Environment

  • OS: Linux
  • Go version: 1.11
  • Version of Library: 1.1.3
@devigned
Copy link
Member

devigned commented Apr 4, 2019

The sender should handle reestablishing a connection and link upon failure.

Have you noticed any network partitions between your service and Event Hubs?

If possible, it would be great to run the client with tracing on. The library is instrumented using OpenCensus. Should be able to visualize a trace and see if there are any errors similar to what was done in #81.

Also, you can run the service with env DEBUG_LEVEL=3 and -tags debug, which will log output AMQP information to STDOUT.

With that information, we should be able to quickly determine the root cause.

@devigned devigned self-assigned this Apr 4, 2019
@devigned devigned added the bug Something isn't working label Apr 4, 2019
@yngveh
Copy link

yngveh commented Apr 11, 2019

We are also seeing errrors "amqp: link closed" and "amqp: connection closed".

After som digging into the code it seems like the recovering is not happening for these cases. In the trySend method in sender.go the errors above is hitting the default case and is not recovered since they are created with ErrLinkClosed = errors.New("amqp: link closed") and ErrConnClosed = errors.New("amqp: connection closed") (from vcabbage/amqp) and are of type "string errors"

switch err.(type) {
case *amqp.Error, *amqp.DetachError, net.Error:
	if netErr, ok := err.(net.Error); ok {
		if !netErr.Temporary() {
			return netErr
		}
	}

	duration := s.recoveryBackoff.Duration()
	log.For(ctx).Debug("amqp error, delaying " + string(duration/time.Millisecond) + " millis: " + err.Error())
	time.Sleep(duration)
	err = s.Recover(ctx)
	if err != nil {
		log.For(ctx).Debug("failed to recover connection")
	} else {
		log.For(ctx).Debug("recovered connection")
		s.recoveryBackoff.Reset()
	}
default:
	return err
}

@devigned
Copy link
Member

I think this is related: Azure/azure-service-bus-go#116

I'll get a fix in by the end of the week.

@BassOfLion
Copy link

Hi @devigned, any updates on the fix?

@devigned
Copy link
Member

Sorry for the delay. Should get this pulled in shortly.

@devigned
Copy link
Member

Please grab v1.1.5. If the issue persists, please reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants