Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the processAcknowledgement may process CompletionMessage first for blocking incovation even the message is sent ahead of ResultMessage #4134

Closed
jiangpengcheng opened this issue Nov 26, 2018 · 7 comments
Assignees

Comments

@jiangpengcheng
Copy link
Contributor

Environment details:

  • local deployment
  • latest master c045b42

Steps to reproduce the issue:

  1. create a simple action
function main(){
  return {"payload": "hello world"};
}
  1. invoke actions
for i in `seq 1 10000`; do wsk -i action invoke hello -r; sleep 1s; done

Provide the expected results and outputs:

I think the expected result is not to fetch activation from database if ResultMessage contains that data:

[2018-11-26T09:31:51.910Z] [INFO] [#tid_f8c1c812c5052456ad7b026025b222da] POST /api/v1/namespaces/whisk.system/actions/hello blocking=true&result=true
[2018-11-26T09:31:51.911Z] [INFO] [#tid_f8c1c812c5052456ad7b026025b222da] [BasicAuthenticationDirective] authenticate: 789c46b1-71f6-4ed5-8c54-816aa4f8c502
[2018-11-26T09:31:51.912Z] [INFO] [#tid_f8c1c812c5052456ad7b026025b222da] [Identity] [GET] serving from cache: CacheKey(789c46b1-71f6-4ed5-8c54-816aa4f8c502) [marker:database_cacheHit_count:2]
[2018-11-26T09:31:51.915Z] [INFO] [#tid_f8c1c812c5052456ad7b026025b222da] [WhiskActionMetaData] [GET] serving from cache: CacheKey(whisk.system/hello) [marker:database_cacheHit_count:6]
[2018-11-26T09:31:51.915Z] [INFO] [#tid_f8c1c812c5052456ad7b026025b222da] [ActionsApi]  [marker:controller_blockingActivation_start:6]
[2018-11-26T09:31:51.915Z] [INFO] [#tid_f8c1c812c5052456ad7b026025b222da] [ActionsApi] action activation id: d208f976626f429388f976626fb2937a [marker:controller_loadbalancer_start:6]
[2018-11-26T09:31:51.916Z] [INFO] [#tid_f8c1c812c5052456ad7b026025b222da] [ShardingContainerPoolBalancer] posting to 'invoker0/0' with activation id 'd208f976626f429388f976626fb2937a' [marker:controller_kafka_start:7]
[2018-11-26T09:31:51.918Z] [INFO] [#tid_f8c1c812c5052456ad7b026025b222da] [ShardingContainerPoolBalancer] posted to invoker0[0][22] [marker:controller_kafka_finish:9:2]
[2018-11-26T09:31:51.918Z] [INFO] [#tid_f8c1c812c5052456ad7b026025b222da] [ActionsApi]  [marker:controller_loadbalancer_finish:9:3]
[2018-11-26T09:31:51.946Z] [INFO] [#tid_f8c1c812c5052456ad7b026025b222da] [ShardingContainerPoolBalancer] received result ack for 'd208f976626f429388f976626fb2937a'
[2018-11-26T09:31:51.948Z] [INFO] [#tid_f8c1c812c5052456ad7b026025b222da] [ActionsApi]  [marker:controller_blockingActivation_finish:37:31]
[2018-11-26T09:31:51.948Z] [INFO] [#tid_f8c1c812c5052456ad7b026025b222da] [BasicHttpService] [marker:http_post.200_count:39:39]
[2018-11-26T09:31:51.949Z] [INFO] [#tid_f8c1c812c5052456ad7b026025b222da] [ShardingContainerPoolBalancer] received completion ack for 'd208f976626f429388f976626fb2937a'

Provide the actual results and outputs:

but actually it's not

[2018-11-26T09:31:49.772Z] [INFO] [#tid_057f382dc49a732e93c8dfcfbb8429f6] POST /api/v1/namespaces/whisk.system/actions/hello blocking=true&result=true
[2018-11-26T09:31:49.773Z] [INFO] [#tid_057f382dc49a732e93c8dfcfbb8429f6] [BasicAuthenticationDirective] authenticate: 789c46b1-71f6-4ed5-8c54-816aa4f8c502
[2018-11-26T09:31:49.774Z] [INFO] [#tid_057f382dc49a732e93c8dfcfbb8429f6] [Identity] [GET] serving from cache: CacheKey(789c46b1-71f6-4ed5-8c54-816aa4f8c502) [marker:database_cacheHit_count:2]
[2018-11-26T09:31:49.776Z] [INFO] [#tid_057f382dc49a732e93c8dfcfbb8429f6] [WhiskActionMetaData] [GET] serving from cache: CacheKey(whisk.system/hello) [marker:database_cacheHit_count:4]
[2018-11-26T09:31:49.777Z] [INFO] [#tid_057f382dc49a732e93c8dfcfbb8429f6] [ActionsApi]  [marker:controller_blockingActivation_start:5]
[2018-11-26T09:31:49.777Z] [INFO] [#tid_057f382dc49a732e93c8dfcfbb8429f6] [ActionsApi] action activation id: 83662d410643423fa62d410643323f7e [marker:controller_loadbalancer_start:5]
[2018-11-26T09:31:49.777Z] [INFO] [#tid_057f382dc49a732e93c8dfcfbb8429f6] [ShardingContainerPoolBalancer] posting to 'invoker0/0' with activation id '83662d410643423fa62d410643323f7e' [marker:controller_kafka_start:5]
[2018-11-26T09:31:49.785Z] [INFO] [#tid_057f382dc49a732e93c8dfcfbb8429f6] [ShardingContainerPoolBalancer] posted to invoker0[0][21] [marker:controller_kafka_finish:12:7]
[2018-11-26T09:31:49.785Z] [INFO] [#tid_057f382dc49a732e93c8dfcfbb8429f6] [ActionsApi]  [marker:controller_loadbalancer_finish:12:7]
[2018-11-26T09:31:49.814Z] [INFO] [#tid_057f382dc49a732e93c8dfcfbb8429f6] [ShardingContainerPoolBalancer] received completion ack for '83662d410643423fa62d410643323f7e'
[2018-11-26T09:31:49.815Z] [INFO] [#tid_057f382dc49a732e93c8dfcfbb8429f6] [ShardingContainerPoolBalancer] received result ack for '83662d410643423fa62d410643323f7e'
[2018-11-26T09:31:50.832Z] [INFO] [#tid_057f382dc49a732e93c8dfcfbb8429f6] [CouchDbRestStore] [GET] 'whisk_local_activations' finding document: 'id: whisk.system/83662d410643423fa62d410643323f7e' [marker:database_getDocument_start:1060]
[2018-11-26T09:31:50.842Z] [INFO] [#tid_057f382dc49a732e93c8dfcfbb8429f6] [CouchDbRestStore]  [marker:database_getDocument_finish:1070:10]
[2018-11-26T09:31:50.843Z] [INFO] [#tid_057f382dc49a732e93c8dfcfbb8429f6] [ActionsApi] retrieved activation for blocking invocation via DB polling [marker:controller_blockingActivationDatabaseRetrieval_count:1071]
[2018-11-26T09:31:50.843Z] [INFO] [#tid_057f382dc49a732e93c8dfcfbb8429f6] [ActionsApi]  [marker:controller_blockingActivation_finish:1071:1066]
[2018-11-26T09:31:50.844Z] [INFO] [#tid_057f382dc49a732e93c8dfcfbb8429f6] [BasicHttpService] [marker:http_post.200_count:1072:1072]

Additional information you deem important:

@jiangpengcheng jiangpengcheng changed the title the processAcknowledgement may process CompletionMessage for blocking incovation even the message is sent ahead of ResultMessage the processAcknowledgement may process CompletionMessage first for blocking incovation even the message is sent ahead of ResultMessage Nov 26, 2018
@rabbah
Copy link
Member

rabbah commented Nov 26, 2018

The DB poll is periodic and there is a race between getting the results from the poller vs active ack. Until the poll is cancelled (which is scheduled by the active ack), it remains active. Further, the future which continues with its callback could be executed in any order. Because of how close the log statements appear in time, I don't think the effect is necessarily what you might be hinting at: that the result message is ignored for completing the blocking request --- which you can assert with a test otherwise.

I saw you opened a PR so will take a look at that as well.

@jiangpengcheng
Copy link
Contributor Author

jiangpengcheng commented Nov 27, 2018

add logging.warn in PrimitiveActions.scala:

    // 1. Wait for the active-ack to happen. Either immediately resolve the promise or poll the database quickly
    //    in case of an incomplete active-ack (record too large for example).
    activeAckResponse.foreach {
      case Right(activation)     => result.trySuccess(Right(activation))
      case Left(aid) if volatile => result.trySuccess(Left(aid))
      case _ =>
        logging.warn(this, s"Received a left for activation: $docid") // added log
        pollActivation(docid, context, result, i => 1.seconds + (2.seconds * i), maxRetries = 4)
    }

and found the warn log:

[2018-11-27T10:04:02.054Z] [INFO] [#tid_fc7c0e23fc514d8d98c69fedb4cba8e2] GET /api/v1/web/whisk.system/default/hello.json __ow_volatile=true
[2018-11-27T10:04:02.057Z] [INFO] [#tid_fc7c0e23fc514d8d98c69fedb4cba8e2] [Identity] [GET] serving from cache: CacheKey(whisk.system) [marker:database_cacheHit_count:2]
[2018-11-27T10:04:02.057Z] [INFO] [#tid_fc7c0e23fc514d8d98c69fedb4cba8e2] [WhiskActionMetaData] [GET] serving from cache: CacheKey(whisk.system/hello) [marker:database_cacheHit_count:3]
[2018-11-27T10:04:02.058Z] [INFO] [#tid_fc7c0e23fc514d8d98c69fedb4cba8e2] [WebActionsApi]  [marker:controller_blockingActivation_start:4]
[2018-11-27T10:04:02.059Z] [INFO] [#tid_fc7c0e23fc514d8d98c69fedb4cba8e2] [WebActionsApi] action activation id: e098c63507c14b5498c63507c15b548f [marker:controller_loadbalancer_start:4]
[2018-11-27T10:04:02.059Z] [INFO] [#tid_fc7c0e23fc514d8d98c69fedb4cba8e2] [ShardingContainerPoolBalancer] posting to 'invoker0/0' with activation id 'e098c63507c14b5498c63507c15b548f' [marker
:controller_kafka_start:5]
[2018-11-27T10:04:02.061Z] [INFO] [#tid_fc7c0e23fc514d8d98c69fedb4cba8e2] [ShardingContainerPoolBalancer] posted to invoker0[0][3968] [marker:controller_kafka_finish:7:2]
[2018-11-27T10:04:02.061Z] [INFO] [#tid_fc7c0e23fc514d8d98c69fedb4cba8e2] [WebActionsApi]  [marker:controller_loadbalancer_finish:7:3]
[2018-11-27T10:04:02.097Z] [INFO] [#tid_fc7c0e23fc514d8d98c69fedb4cba8e2] [ShardingContainerPoolBalancer] received completion ack for 'e098c63507c14b5498c63507c15b548f'
[2018-11-27T10:04:02.097Z] [WARN] [#tid_fc7c0e23fc514d8d98c69fedb4cba8e2] [WebActionsApi] Received a left for activation: whisk.system/e098c63507c14b5498c63507c15b548f
[2018-11-27T10:04:02.098Z] [INFO] [#tid_fc7c0e23fc514d8d98c69fedb4cba8e2] [ShardingContainerPoolBalancer] received result ack for 'e098c63507c14b5498c63507c15b548f'
[2018-11-27T10:04:03.113Z] [INFO] [#tid_fc7c0e23fc514d8d98c69fedb4cba8e2] [CouchDbRestStore] [GET] 'whisk_local_activations' finding document: 'id: whisk.system/e098c63507c14b5498c63507c15b54
8f' [marker:database_getDocument_start:1058]
[2018-11-27T10:04:03.123Z] [INFO] [#tid_fc7c0e23fc514d8d98c69fedb4cba8e2] [CouchDbRestStore]  [marker:database_getDocument_finish:1069:10]
[2018-11-27T10:04:03.124Z] [INFO] [#tid_fc7c0e23fc514d8d98c69fedb4cba8e2] [WebActionsApi] retrieved activation for blocking invocation via DB polling [marker:controller_blockingActivationData
baseRetrieval_count:1070]
[2018-11-27T10:04:03.124Z] [INFO] [#tid_fc7c0e23fc514d8d98c69fedb4cba8e2] [WebActionsApi]  [marker:controller_blockingActivation_finish:1070:1066]
[2018-11-27T10:04:03.125Z] [INFO] [#tid_fc7c0e23fc514d8d98c69fedb4cba8e2] [BasicHttpService] [marker:http_get.200_count:1071:1071]

the reason why I found this is that we have a custom feature which can skip saving activations into DB when some request parameters are set, then some of these requests will hang until timeout and return a 202 response

@tysonnorris
Copy link
Contributor

I ran into similar, and think it is due to:

Future[Unit] = {
  AcknowledegmentMessage.parse(raw) match {
   ...<message handling in any order>...
  }
}

In my case, I'm trying to ensure the activation db lookup is not done at controller, and disabling that db lookup locally makes the "first" activation fail - seems like the initial message parse is slower than subsequent ones.
Moving the Future below AcknowledegmentMessage.parse(raw) fixes it I think. I will see if #4135 fixes it as well.

@rabbah
Copy link
Member

rabbah commented Nov 29, 2018

@tysonnorris also please take a look at #4145 (I think 4135 had a bug wrt that can affect blocking requests).

@tysonnorris
Copy link
Contributor

Will take a look - I'm not sure these are related to the root symptom of result message sent before completion message, but completion message is processed first? Will test out both of these.

@tysonnorris
Copy link
Contributor

In #4145 it looks like the result is used properly for blocking activations now; however, there is still suspicious/inappropriate order of kafka message handling, at least on the first activation in the controller; I'll create a separate PR for that, since in general I think we expect kafka messages processing to at least be triggered in order (not necessarily completed in order).

@rabbah
Copy link
Member

rabbah commented Nov 29, 2018

Can you open an issue first to document the symptom - the info is scattered in a couple of prs I think. 🙏🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants