Productionize: GetShardRegionStats returns empty shard set on ask timeout #27100

migesok · 2019-06-06T10:22:42Z

Akka version: 2.5.22

When some of the shards are too busy GetShardRegionStats starts to return an empty shard set even though I see that some shards are handling requests. Consequently the same behaviour is in cluster/shards Akka Management HTTP API as it uses GetShardRegionStats.
Also it can be seen in the implementation code:

  def replyToRegionStatsQuery(ref: ActorRef): Unit = {
    askAllShards[Shard.ShardStats](Shard.GetShardStats)
      .map { shardStats =>
        ShardRegionStats(shardStats.map {
          case (shardId, stats) => (shardId, stats.entityCount)
        }.toMap)
      }
      .recover {
        case x: AskTimeoutException => ShardRegionStats(Map.empty)
      }
      .pipeTo(ref)
  }

And the ask timeout is hardcoded and there is no way to override it:

  def askAllShards[T: ClassTag](msg: Any): Future[Seq[(ShardId, T)]] = {
    implicit val timeout: Timeout = 3.seconds
    Future.sequence(shards.toSeq.map {
      case (shardId, ref) => (ref ? msg).mapTo[T].map(t => (shardId, t))
    })
  }

Judging by the code the same applies to other sharding state inspection methods like GetShardRegionState.
I haven't found this behaviour documented anywhere, it took me some time to figure out that if the management API says I have 0 shards allocated it doesn't always mean it.
Would be great to at least to document it, but best to change it somehow so inability to collect the data for some shards is presented to the sharding inspection API user.
Also it would be great if the ask timeout was configurable.

The text was updated successfully, but these errors were encountered:

johanandren · 2019-06-16T13:59:26Z

Both not producing an answer on failure and making the timeout configurable sounds good to me. I think the reason it is hardcoded is that it initially wasn't meant as a message protocol for production but tests and then grew to be used in production scenarios.

Would you be up to do a PR with the two changes?

helena · 2019-06-28T10:15:35Z

I'll take this. My understanding of what we want to do is

Handle the case when a GetShardRegionStats is issued, and some of the shards are too busy, fix the current return of case x: AskTimeoutException => ShardRegionStats(Map.empty) because they are not empty, just busy handling requests.
Investigate if additional, similar modification to GetShardRegionState response is needed for the same reason
Make the timeout configurable for askAllShards

helena · 2019-07-01T16:47:30Z

Regarding the return of all stats or an empty Map - from the scaladoc

If the timeout is reached without answers from all shard regions the reply will contain an empty map of regions.

So we are doing this as a flag to determine if the sharded cluster is "ready" yet it is a problem when used against an already running sharded cluster, where some shards are busy and can't respond within the overarching timeout.

I'll update this as well.

helena · 2019-07-01T21:15:01Z

Once this is merged I will push a PR against Akka Management for making routeGetShardInfo route use the new configurable timeout.

…akka#27299 - precursor to work coming in: Productionize: GetShardRegionStats returns empty shard set on ask timeout akka#27100

…akka#27100. Includes all PR review suggestions.

…eout akka#27100

johanandren added 1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted help wanted Issues that the core team will likely not have time to work on t:cluster-tools labels Jun 16, 2019

helena added 3 - in progress Someone is working on this ticket and removed help wanted Issues that the core team will likely not have time to work on labels Jun 28, 2019

helena self-assigned this Jun 28, 2019

helena changed the title ~~GetShardRegionStats returns empty shard set on ask timeout~~ Productionize: GetShardRegionStats returns empty shard set on ask timeout Jun 28, 2019

This was referenced Jul 4, 2019

Cluster Sharding: configurable shard query timeout for askAllShards #27273

Closed

Cluster Sharding: make askAllShards not a hard coded timeout #27273 #27274

Merged

Cluster Sharding: separate cluster sharding readiness from stats query #27299

Closed

helena added t:cluster-sharding and removed t:cluster-tools labels Jul 8, 2019

patriknw added t:cluster-sharding and removed t:cluster-sharding labels Jul 9, 2019

helena mentioned this issue Jul 11, 2019

Cluster Sharding: add the new shard and region meta for opt-in logging or other #27328

Closed

helena pushed a commit to helena/akka that referenced this issue Jul 15, 2019

Cluster Sharding: separate cluster sharding readiness from stats query …

7b3542c

…akka#27299 - precursor to work coming in: Productionize: GetShardRegionStats returns empty shard set on ask timeout akka#27100

helena mentioned this issue Jul 15, 2019

Prepare to separate cluster sharding readiness from shards queries #27360

Closed

helena pushed a commit to helena/akka that referenced this issue Jul 15, 2019

Cluster Sharding: separate cluster sharding readiness from stats query …

b2637a4

…akka#27299 - precursor to work coming in: Productionize: GetShardRegionStats returns empty shard set on ask timeout akka#27100

helena pushed a commit to helena/akka that referenced this issue Jul 16, 2019

Removed protocol changes here, will only add a new field in the parent …

159ff51

…akka#27100. Includes all PR review suggestions.

helena pushed a commit to helena/akka that referenced this issue Jul 17, 2019

Removed protocol changes here, will only add a new field in the parent …

545f807

…akka#27100. Includes all PR review suggestions.

helena pushed a commit to helena/akka that referenced this issue Jul 17, 2019

Removed protocol changes here, will only add a new field in the parent …

7e61e91

…akka#27100. Includes all PR review suggestions.

helena mentioned this issue Jul 22, 2019

Productionize: GetShardRegionStats returns empty shard set on ask timeout #27395

Merged

helena pushed a commit to helena/akka that referenced this issue Jul 24, 2019

Productionize: GetShardRegionStats returns empty shard set on ask tim…

6c24926

…eout akka#27100

helena pushed a commit to helena/akka that referenced this issue Jul 24, 2019

Productionize: GetShardRegionStats returns empty shard set on ask tim…

7c8d317

…eout akka#27100

helena pushed a commit to helena/akka that referenced this issue Jul 24, 2019

Productionize: GetShardRegionStats returns empty shard set on ask tim…

5e8d261

…eout akka#27100

helena added this to the 2.6.0-M5 milestone Jul 25, 2019

helena closed this as completed Jul 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Productionize: GetShardRegionStats returns empty shard set on ask timeout #27100

Productionize: GetShardRegionStats returns empty shard set on ask timeout #27100

migesok commented Jun 6, 2019

johanandren commented Jun 16, 2019

helena commented Jun 28, 2019

helena commented Jul 1, 2019 •

edited

Loading

helena commented Jul 1, 2019

Productionize: GetShardRegionStats returns empty shard set on ask timeout #27100

Productionize: GetShardRegionStats returns empty shard set on ask timeout #27100

Comments

migesok commented Jun 6, 2019

johanandren commented Jun 16, 2019

helena commented Jun 28, 2019

helena commented Jul 1, 2019 • edited Loading

helena commented Jul 1, 2019

helena commented Jul 1, 2019 •

edited

Loading