Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure, that Result-ack is sent before Completion-ack. #4115

Merged
merged 2 commits into from
Nov 23, 2018

Conversation

cbickel
Copy link
Contributor

@cbickel cbickel commented Nov 15, 2018

Description

On invoking an action blocking, a ResultMessage and afterwards a CompletionMessage will be sent from the invoker to the controller. The CompletionMessage is always very small. Currently those two messages are sent asynchronously. If log-collection is very fast, they are sent to Kafka at the same point in time.
This could result into the condition, that the loadbalancer receives the completion-ack before the result ack. This will make the response to the customer slower, as the controller first needs to go against Kafka.

This PR will ensure, that the completion message will be sent after the reult message has been published successfully to Kafka.

Related issue and scope

  • I opened an issue to propose and discuss this change (#????)

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

@cbickel cbickel requested a review from dubee November 15, 2018 16:19
@cbickel cbickel added the review Review for this PR has been requested and yet needs to be done. label Nov 15, 2018
// should have arrived earlier and already resolved it. If it didn't arrive yet (what could happen if log
// collection is really fast and the result is very large), it will most likely arrive in a few milliseconds.
// If that message really got lost, the mechanism of DB-polling will take place.
if (!entry.blocking) entry.promise.trySuccess(Left(aid))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be a problem with the line activations.remove(aid) as well. Since the ID is removed from the activations TrieMap, the processResult method may not be able to find the ID when activations.get(aid).map is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. That's a very good point.

if (job.msg.blocking) {
activation.foreach(
val sendResult = if (job.msg.blocking) {
activation.map(
sendActiveAck(tid, _, job.msg.blocking, job.msg.rootControllerIndex, job.msg.user.namespace.uuid, false))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shoudl reconcile any errors here into a Future.successful or use onComplete below. Otherwise you'll end up not sending the second message if the first failed. Unlikely, but just a teeny bit more defensive. I'd go for onComplete below, because that makes it more clear on the calling side, that you don't care about errors.

FWIW I like this solution a lot better than the one before.

@codecov-io
Copy link

codecov-io commented Nov 20, 2018

Codecov Report

Merging #4115 into master will decrease coverage by 4.94%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4115      +/-   ##
==========================================
- Coverage   86.25%   81.31%   -4.95%     
==========================================
  Files         151      151              
  Lines        7263     7269       +6     
  Branches      468      470       +2     
==========================================
- Hits         6265     5911     -354     
- Misses        998     1358     +360
Impacted Files Coverage Δ
...pache/openwhisk/core/invoker/InvokerReactive.scala 83.17% <ø> (+0.93%) ⬆️
.../openwhisk/core/containerpool/ContainerProxy.scala 93.42% <100%> (+0.17%) ⬆️
...core/database/cosmosdb/RxObservableImplicits.scala 0% <0%> (-100%) ⬇️
...core/database/cosmosdb/CosmosDBArtifactStore.scala 0% <0%> (-95.54%) ⬇️
...sk/core/database/cosmosdb/CosmosDBViewMapper.scala 0% <0%> (-92.6%) ⬇️
...whisk/core/database/cosmosdb/CosmosDBSupport.scala 0% <0%> (-83.34%) ⬇️
...abase/cosmosdb/CosmosDBArtifactStoreProvider.scala 0% <0%> (-62.5%) ⬇️
...in/scala/org/apache/openwhisk/common/Counter.scala 40% <0%> (-20%) ⬇️
.../scala/org/apache/openwhisk/utils/Exceptions.scala 20% <0%> (-20%) ⬇️
...whisk/connector/kafka/KafkaProducerConnector.scala 72.5% <0%> (-15%) ⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e97b6ce...c8b5215. Read the comment docs.

@@ -530,8 +530,16 @@ class ContainerProxy(
activationWithLogs
.map(_.fold(_.activation, identity))
.foreach { activation =>
// Sending the completionMessage to the controller asynchronously.
sendActiveAck(tid, activation, job.msg.blocking, job.msg.rootControllerIndex, job.msg.user.namespace.uuid, true)
// Sending the completionMessage to the controller asynchronously. But not before result message is sent.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe im confused by the view on github - is the active ack sent twice for blocking invokes?

Copy link
Contributor

@jiangpengcheng jiangpengcheng Nov 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, one is ComplectionMessage and the other is ResultMessage

@cbickel cbickel changed the title Don't react on Completion message of blocking requests. Ensure, that Result-ack arrives before Completion-ack in Loadbalancer. Nov 21, 2018
@cbickel cbickel changed the title Ensure, that Result-ack arrives before Completion-ack in Loadbalancer. Ensure, that Result-ack is sent before Completion-ack. Nov 21, 2018
@cbickel
Copy link
Contributor Author

cbickel commented Nov 22, 2018

PG3#3673 is 🔵

Copy link
Member

@rabbah rabbah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbickel I added a commit which adds some comments (that helped me understand the changes). What do you think?

Also you'll notice I added a type alias so the active ack method is easier to find. I'd suggest we do that for some of the other function closures.

…ion messages.

Adds a type alias for the active ack messages, and document the interface.
@cbickel
Copy link
Contributor Author

cbickel commented Nov 23, 2018

@rabbah Thanks a lot for your proposed changes :)

Using these type aliases in more places makes totally sense to me.

@rabbah rabbah merged commit b803c64 into apache:master Nov 23, 2018
@cbickel cbickel deleted the ack branch November 23, 2018 22:17
BillZong pushed a commit to BillZong/openwhisk that referenced this pull request Nov 18, 2019
Improves comments to clarify the ordering of result and completion messages.
Adds a type alias for the active ack messages, and document the interface.

Co-authored-by: Christian Bickel <cbickel@de.ibm.com>
Co-authored-by: Rodric Rabbah <rodric@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review Review for this PR has been requested and yet needs to be done.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants