Skip to content

Remove caching of CurrentAdjacentCells in Cargo#21514

Merged
Mailaender merged 1 commit into
OpenRA:bleedfrom
RoosterDragon:cargo-uncache
Aug 1, 2024
Merged

Remove caching of CurrentAdjacentCells in Cargo#21514
Mailaender merged 1 commit into
OpenRA:bleedfrom
RoosterDragon:cargo-uncache

Conversation

@RoosterDragon
Copy link
Copy Markdown
Member

@RoosterDragon RoosterDragon commented Aug 1, 2024

In #21507 we stopped caching the values with ToArray to resolve a desync. But even caching the enumerable can lead to a desync, so remove the caching entirely.

Follow up to #21506. Fixes #17863


Let's explain how the code that cached values via ToArray could desync.

Usually, the cell given by self.Location matches with the cell given by self.GetTargetablePositions(). However if the unit is moving and close to the boundary between two cells, it is possible for the targetable position to be an adjacent cell instead.

Combined with the fact hovering over the unit will evaluate CurrentAdjacentCells only for the local player and not everybody, the following sequence becomes possible to induce a desync:

  • As the APC is moving into the last cell before unloading, the local player hovers over it. self.Location is the last cell, but self.GetTargetablePositions() gives the previous cell (as the unit is close to the boundary between the cells)
  • The local player then caches CurrentAdjacentCells. The cache key of self.Location is the final cell, but the values are calculated for self.GetTargetablePositions() of an adjacent cell.
  • When the order to unload is resolved, the cache key of CurrentAdjacentCells is already self.Location and so CurrentAdjacentCells is not updated.
  • The units unload into cells based on the adjacent cell.

Then, for other players in the game:

  • The hover does nothing for these players.
  • When the order is resolved, CurrentAdjacentCells is out of date and is re-evaluated.
  • self.Location and self.GetTargetablePositions() are both the last cell, because the unit has finished moving.
  • So the cache is updated with a key of self.Location and values from the same cell.
  • The units unload into cells based on the current cell.

As the units unload into different cells, a desync occurs. Ultimately the cause here is that cache key is insufficient - self.Location can have the same value but the output can differ. The function isn't a pure function so memoizing the result via ToArray() isn't sound.

Reverting it to cache the enumerable, which is then lazily re-evaluated reduces the scope of possible desyncs but is NOT a full solve. The cached enumerable caches the result of Actor.GetTargetablePositions() which isn't a fully lazy sequence. A different result is returned depending on EnabledTargetablePositions.Any(). Therefore, if the traits were to enable/disable inbetween, then we can still end up with different results. Memoizing the enumerable isn't sound either!

Currently our only trait is HitShape which is enabled based on conditions. A condition that enables/disables it based on movement would be one way to trigger this scenario. Let's say you have a unit where you toggle between two hit shapes when it is moving and when it stops moving. That would allow you to replicate the above scenario once again.

Instead of trying to come up with a sound caching mechanism in the face of a series of complex inputs, we just give up on trying to cache this information at all.

In 05ed9d9 we stopped caching the values with ToArray to resolve a desync. But even caching the enumerable can lead to a desync, so remove the caching entirely.

----

Let's explain how the code that cached values via ToArray could desync.

Usually, the cell given by `self.Location` matches with the cell given by `self.GetTargetablePositions()`. However if the unit is moving and close to the boundary between two cells, it is possible for the targetable position to be an adjacent cell instead.

Combined with the fact hovering over the unit will evaluate `CurrentAdjacentCells` only for the local player and not everybody, the following sequence becomes possible to induce a desync:
- As the APC is moving into the last cell before unloading, the local player hovers over it. `self.Location` is the last cell, but `self.GetTargetablePositions()` gives the *previous* cell (as the unit is close to the boundary between the cells)
- The local player then caches `CurrentAdjacentCells`. The cache key of `self.Location` is the final cell, but the values are calculated for `self.GetTargetablePositions()` of an *adjacent* cell.
- When the order to unload is resolved, the cache key of `CurrentAdjacentCells` is already `self.Location` and so `CurrentAdjacentCells` is *not* updated.
- The units unload into cells based on the *adjacent* cell.

Then, for other players in the game:
- The hover does nothing for these players.
- When the order is resolved, `CurrentAdjacentCells` is out of date and is re-evaluated.
- `self.Location` and `self.GetTargetablePositions()` are both the last cell, because the unit has finished moving.
- So the cache is updated with a key of `self.Location` and values from the *same* cell.
- The units unload into cells based on the *current* cell.

As the units unload into different cells, a desync occurs. Ultimately the cause here is that cache key is insufficient - `self.Location` can have the same value but the output can differ. The function isn't a pure function so memoizing the result via `ToArray()` isn't sound.

Reverting it to cache the enumerable, which is then lazily re-evaluated reduces the scope of possible desyncs but is NOT a full solve. The cached enumerable caches the result of `Actor.GetTargetablePositions()` which isn't a fully lazy sequence. A different result is returned depending on `EnabledTargetablePositions.Any()`. Therefore, if the traits were to enable/disable inbetween, then we can still end up with different results. Memoizing the enumerable isn't sound either!

Currently our only trait is `HitShape` which is enabled based on conditions. A condition that enables/disables it based on movement would be one way to trigger this scenario. Let's say you have a unit where you toggle between two hit shapes when it is moving and when it stops moving. That would allow you to replicate the above scenario once again.

Instead of trying to come up with a sound caching mechanism in the face of a series of complex inputs, we just give up on trying to cache this information at all.
Copy link
Copy Markdown
Member

@Mailaender Mailaender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not show any signs of desyncs relating to ejecting and ejecting on death.

@Mailaender Mailaender merged commit 88fb83b into OpenRA:bleed Aug 1, 2024
@Mailaender
Copy link
Copy Markdown
Member

Changelog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cargo>EjectOnDeath can cause desyncs

2 participants