Skip to content

NATS transport infinity reconnect race condition #1311

@truecooler

Description

@truecooler

Hi everyone, i have encountered race condition transport reconnect issue with stable reproduction.

Problem details
Looks like NATS transport occasionally looses the network connection.
image

Investigation showed that there is race condition between consumer restart process and native nats client callback.

  1. TransportCheckProcessor occasionally checks transport for healthy:
    image

  2. IConsumerRegister.Default.ReStart calls method Pulse that cancels task cancellation token _cts , sets _isHealthy to true, and calls Execute in order to run consumer:
    image

  3. Meanwhile already running consumer thread receives _cts cancellation, then calls dispose that causes native nats client to be closed and disposed.
    image

    client.Listening(_pollingDelay, _cts.Token);

  4. When native nats client closes, it triggers ConnectedEventHandler(looks like typo, Disconnected is implied) event, that calls OnLogCallback that reassigns _isHealthy value to false
    image
    image

    OnLogCallback!(logArgs);

As a result of race, _isHealthy seems to be false almost the time, that causes infinity reconnect loop

Reproduction
See my repo for reproduction: Just set your nats server and run Worker2
https://github.com/truecooler/CAPTest/tree/master/Worker2

Excepted: consumer connects without problem, no network issues

Actual: consumer looses the connection and occasianally prints:
"NATS server connection error."

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions