Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: scene switch stream corruption #504

Closed
wants to merge 9 commits into from

Conversation

LukeStampfli
Copy link
Contributor

@LukeStampfli LukeStampfli commented Feb 26, 2021

This is a fix for a rare race condition which causes an exception and screws over the entire simulation on the client side.

The Bug

  1. New client connects to server.

  2. Server send a packet (p1) containing spawn information of all objects (o1, o2, ....) to new client (reliable sequenced)

  3. Server code invokes RPC on existing object (o1) shortly after creating a reliable unsequenced message (p2)

  4. Client receives p2 first and stores it as a buffered message.

  5. Client receives p1 sets the buffer of m_InputStreamWrapper to received message and starts spawning object o1. While spawning object o1 it checks for buffered messages and finds p2.

  6. Client applies p2 to newly spawned object. It does so by calling HandleIncommingData with the buffered message. This overides the buffer of m_InputStreamWrapper with the content of p2.

  7. Client continues spawning objects (o2) but m_InputStreamWrapper now contains the wrong buffer of the message p2 instead of. Clients reads PrefabHash and receives a corrupted value (most of the time ulong.maxvalue) and throws.

Standard RPC fixes this to some extend because it uses the reliable sequenced channel by default (I think). I still think we should fix this because if we ever end up sending something unsequenced we will get these weird stream corruptions again.

This fix works for the current implementation of MLAPI and as long as we don't end up nesting HandleIncommingData more then twice. If that's the case we will need to switch a more sophisticated pool. PooledBitStream cannot be used for this because we override the target array of the stream which is not supported.

@@ -1030,6 +1030,10 @@ private void HandleRawTransportPoll(NetEventType eventType, ulong clientId, Chan
}

private readonly BitStream m_InputStreamWrapper = new BitStream(new byte[0]);
bool m_InputStreamWrapperUsed;
// The fallback wrapper is used in case we have to handle incoming data but the InputStreamWrapper is already being used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice comment

Copy link
Contributor

@mattwalsh-unity mattwalsh-unity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 comments

Copy link
Contributor

@mattwalsh-unity mattwalsh-unity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...just pls hoist that comment into a code comment

@mattwalsh-unity mattwalsh-unity changed the title fix:scene switch stream corruption fix: scene switch stream corruption Feb 26, 2021
# Conflicts:
#	com.unity.multiplayer.mlapi/Runtime/Core/NetworkManager.cs
@@ -918,6 +918,16 @@ private void HandleRawTransportPoll(NetworkEvent networkEvent, ulong clientId, N
}

private readonly NetworkBuffer m_InputBufferWrapper = new NetworkBuffer(new byte[0]);
bool m_InputBufferWrapperUsed;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool m_InputBufferWrapperUsed;
private bool m_InputBufferWrapperUsed;

@ThusSpokeNomad
Copy link
Contributor

ThusSpokeNomad commented Mar 23, 2021

@LukeStampfli — we recently merged release back into develop and also it's been a while since we last tested this. I wonder if you could merge latest develop into this PR then also implement a simple test that exposes the underlying issue consistently so that we could make sure that this fix is indeed fixes the problem (we should have some form of a test going with this regardless IMHO) 🙂

@LukeStampfli
Copy link
Contributor Author

I think there is a better solution to this (improving the underlying code rather then hotfixing this). Will start a discussion once I get time.

@ThusSpokeNomad ThusSpokeNomad deleted the fix/scene-switch-stream-corruption branch April 1, 2021 13:07
@NoelStephensUnity
Copy link
Collaborator

NoelStephensUnity commented Apr 1, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Bug Report
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants