Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add thread safe guard to throw exception when share DBConnection cross threads #635

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

liuh-80
Copy link
Contributor

@liuh-80 liuh-80 commented Jun 20, 2022

Why I did it

For performance erason, swsscommon is not thread safe.
Code using swsscommon may make mistake to share db connection cross thread, and it's difficult to debug.
To slove this issue, add ThreadSafeGuard class to detect and throw exception when db connection shared by multiple thread.

How I did it

Add ThreadSafeCheck class to check and throw exception when DBConnect shared by multiple thread.

How to verify it

Pass all test case.
Add new test case to cover new code.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111

Description for the changelog

Add ThreadSafeCheck class to check and throw exception when DBConnect shared by multiple thread.

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@liuh-80 liuh-80 requested a review from qiluo-msft June 20, 2022 08:14
@liuh-80 liuh-80 marked this pull request as ready for review June 20, 2022 08:14
@@ -33,6 +33,7 @@ inline void guard(FUNC func, const char* command)

RedisReply::RedisReply(RedisContext *ctx, const RedisCommand& command)
{
ThreadSafeGuard safeguard(ctx->getRunningFlag(), string(command.c_str()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use std::mutex ? and mutex guard ? instead of implementing yourself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I know the libswsscommon is not thread safe by design, so we add this class to catch the multi-thread scenario and throw exception.

If we add lock here, it will impact the critical performance, like route sync or warm reboot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think std::mutext will impact performace here, mutext is really quick, implementation internal is on atomic exchange, its 1 assembly instruction, so this is not performance critical here, and using already given tools make it simpler and better understanding for someone that is new to the code

if you are throwing here then how this will be handled in production scenario? just crash ?

Copy link
Contributor Author

@liuh-80 liuh-80 Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with QI, I have better undertsand about whu not use mutex, mutext need context switch which take microseconds, and there a some critical scenario can't accept this delay for example update thousands of routes.

Here is my closed PR for add a mutex, which contains more information:
#634

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that depends on the mutex, you could have potentially spin lock since this operation could be very fast, and you dont need to use recursive_mutex as in showed PR, since you will not have recursive call here, still for me using std::mutex here is better than implementing it from scratch, also why crash on multiple thread access? and not wait ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ThreadSafeGuard is not a lock, it's just check race condition and throw exception when it happens.
Also, the code inside RedisContext ctor sometimes very slowly, for example run a LUA script.

Here is an example how crash happen:
caclmgrd currently share same RedisContext with multiple thread:
Thread 1: run 'keys' command
Thread 2: run 'hgetall' command

Then when thread 1 read command result, it read the 'keys' result, however the thread 1 process the result as 'hgetall' result, so the code crash.

@liuh-80
Copy link
Contributor Author

liuh-80 commented Jun 23, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@liuh-80
Copy link
Contributor Author

liuh-80 commented Jul 14, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ZhaohuiS
Copy link

@liuh-80 Can this PR fix the issue sonic-net/sonic-buildimage#10883?
If multiple threads run different commands for same DBConnection at same time in caclmgrd, will this change avoid crash and just return exception?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants