New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed race conditions in cClientHandle's State. #3439

Merged
merged 1 commit into from Nov 22, 2016

Conversation

Projects
None yet
6 participants
@madmaxoft
Copy link
Member

madmaxoft commented Nov 19, 2016

Hopefully this PR should fix all the issues seen lately with cClientHandle. There were race conditions that caused keeping the player object in the game after the player has disconnected, and more.

Fixes #3418.
Fixes #3383.
Fixes #3345.
Fixes #3182.
Fixes #2893.
Fixes #2839.
Fixes #2835.
Fixes #2682.
Fixes #2644.
Fixes #2608.
Fixes #2588.
Fixes #2529.
Fixes #1895.

@madmaxoft madmaxoft added this to the cClientHandle stability milestone Nov 19, 2016

@madmaxoft

This comment has been minimized.

Copy link
Member

madmaxoft commented Nov 19, 2016

Mostly, the change is about adding a cCriticalSection wrapped around cClientHandle::m_State, so that each write is protected. If such a write depends on a previous read, the read is included in the CS lock. Still, there are places where the value of m_State is not as important (say, when deciding what chunks to send next, it's not exactly important if the client has just disconnected - the chunk will be chosen and then the clienthandle destroyed anyway), so there are places where it makes sense to read the m_State value without holding the CS. For that reason, m_State is kept as a std::atomic.

@Seadragon91

This comment has been minimized.

Copy link
Contributor

Seadragon91 commented Nov 21, 2016

No crash on my tests. Ready for merge?

@NiLSPACE
Copy link
Member

NiLSPACE left a comment

I haven't done allot of C++ lately, but I couldn't find anything I found strange.

@@ -318,84 +322,100 @@ void cClientHandle::Kick(const AString & a_Reason)

void cClientHandle::Authenticate(const AString & a_Name, const AString & a_UUID, const Json::Value & a_Properties)
{
if (m_State != csAuthenticating)
cWorld * World;
{

This comment has been minimized.

@sphinxc0re

sphinxc0re Nov 21, 2016

Contributor

Why is this scope here? Does it do anything?

This comment has been minimized.

@madmaxoft

madmaxoft Nov 21, 2016

Member

It limits the lock(m_CSState) on the next line to lock only within this scope, rather than till the function end. Specifically, I want to avoid calling plugins (line 433) with this lock held as much as possible.

This comment has been minimized.

@sphinxc0re

sphinxc0re Nov 21, 2016

Contributor

Ah, I didn't know cpp had scope detetection

This comment has been minimized.

@madmaxoft

madmaxoft Nov 22, 2016

Member

It's simple RAII - acquire resource in constructor, release in destructor. C++ guarantees the destructor to be called upon scope exit.

@sphinxc0re

This comment has been minimized.

Copy link
Contributor

sphinxc0re commented Nov 21, 2016

Looks good to me

@tigerw

This comment has been minimized.

Copy link
Member

tigerw commented Nov 22, 2016

It is, of course, my duty to inform you of both #3115 and the available evidence suggesting that multithreaded access is rarely conducive to a clean and verifiably bug-free codebase.

But yeah, if it closes so many issues, why not? (are you sure they're all to do with the client handle? We shouldn't accidentally bury an unrelated bug)

@madmaxoft

This comment has been minimized.

Copy link
Member

madmaxoft commented Nov 22, 2016

Yeah, 3115 is our Lord Savior, Hallelujah! :D But the same as the Messiah, it's coming in the future and no-one knows when. Not to mention it's a super-complicated change that no-one trusts.

Yes, multi-threading is rarely clean, but it IS needed. And this PR actually cleans it up, since it clearly defines the conditions for how the clienthandle is to be accessed by multiple threads. And yes, it needs to be accessed by multiple threads - we don't want to do world ticking and chunk generating and network manipulation all in the same thread, do we?

Yes, I've manually gone through all the issues and they all can be tracked back to that race condition.

@madmaxoft

This comment has been minimized.

Copy link
Member

madmaxoft commented Nov 22, 2016

@tigerw, are you against a merge, or not?

@tigerw

This comment has been minimized.

Copy link
Member

tigerw commented Nov 22, 2016

It's not that I'm hoping for a single thread to do everything, just multiple threads which have obvious places for data input and output, are self-contained, and which do not interfere with each other's state in the middle of their operation.

As they say, if xoft's a dinosaur, then I'm a novice, and novices don't have the experience or ability to trace through the intricacies of complex thread interactions when we want make some protocol change.

Definitely not against a merge. As you say, the other change is coming in the far future. :P

@Seadragon91 Seadragon91 merged commit 59463be into master Nov 22, 2016

4 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/branch AppVeyor build succeeded
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@Seadragon91 Seadragon91 deleted the ClientHandleThreading branch Nov 22, 2016

@tigerw

This comment has been minimized.

Copy link
Member

tigerw commented Dec 19, 2016

Could you expound a bit on where exactly m_State was being improperly accessed concurrently?

@madmaxoft

This comment has been minimized.

Copy link
Member

madmaxoft commented Dec 19, 2016

It was accessed from both the tick thread and the networking thread.

@SafwatHalaby

This comment has been minimized.

Copy link
Member

SafwatHalaby commented Dec 26, 2016

Thanks!

@SafwatHalaby

This comment has been minimized.

Copy link
Member

SafwatHalaby commented Dec 28, 2016

Yes, I've manually gone through all the issues and they all can be tracked back to that race condition.

Could you explain how this fixes #2835? It was really painful to debug and I found nothing.

@madmaxoft

This comment has been minimized.

Copy link
Member

madmaxoft commented Dec 29, 2016

I don't remember all the details anymore, but I know the issue fixed by this PR was about race conditions in cClientHandle - the object was being used from multiple threads and it could be used from one thread while another thread was already deleting the object. In the special case of 2835, I suspect that there was a concurrent read / write of the m_State member (which is undefined behavior in C++11) which caused one of the reads to report a value that has never been there, thus taking the wrong code path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment