Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - DO NOT MERGE - Avoid a race condition when creating child actors #1320

Closed
wants to merge 8 commits into from

Conversation

huntc
Copy link
Contributor

@huntc huntc commented Nov 11, 2018

This PR requires #1315 to be merged and then re-basing. For now, review just the last commit.

We observed an InvalidActorNameException when creating child actors under load. This could have been due to the following sequence when receiving a “Publish Received Locally” message:

  1. prl received, producer actor created
  2. producer actor terminates and sends a termination message
  3. prl received before termination message is received, the parent actor creates another producer
  4. the termination message is received and then attempts to create another producer with the same name

The solution is to explicitly track active consumers and producers rather than rely on another data structure such as context.children, which will be updated in response to other events.

Client publications wouldn’t previously withstand a connection being lost when publishing data, whereas server publications would. I’ve now aligned with the client publication handling with the server publication handling as they should be exactly the same.
Session objects can now be told to perform a command directly. The command at this point is just PUBLISH, and the rationale is that the session object is the only thing that can guarantee the QoS requirements of a PUBLISH.

Prior to this work, PUBLISH commands were communicated with a command flow, which is often associated with a network connection. The problem with that approach was that PUBLISH commands could be lost if a network connection was lost i.e. the command is consumed, sent via TCP, TCP fails, and the command is lost.
We observed an `InvalidActorNameException` when creating child actors. This could have been due to the following sequence when receiving a “Publish Received Locally” message:

1. prl received, producer actor created
2. producer actor terminates and sends a termination message
3. prl received before termination message is received, the parent actor creates another producer
4. the termination message is received and then attempts to create another producer with the same name

The solution is to explicitly track active consumers and producers rather than rely on another data structure such as `context.children`, which will be updated in response to other events.
@huntc
Copy link
Contributor Author

huntc commented Nov 13, 2018

Handled by #1327

@huntc huntc closed this Nov 13, 2018
@huntc huntc deleted the publication-race-fix branch November 13, 2018 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant