Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msg/async: fix crash that writing char to nonblock-fd gets EAGAIN in EventCenter::wakeup #13822

Merged
merged 1 commit into from Mar 20, 2017

Conversation

Projects
None yet
4 participants
@Liuchang0812
Copy link
Contributor

Liuchang0812 commented Mar 7, 2017

Signed-off-by: liuchang0812 liuchang0812@gmail.com

@yuyuyu101 take a look, please

@yuyuyu101

This comment has been minimized.

Copy link
Member

yuyuyu101 commented Mar 7, 2017

EWOULDBLOCK is the same value as EAGAIN

@Liuchang0812

This comment has been minimized.

Copy link
Contributor Author

Liuchang0812 commented Mar 7, 2017

@Liuchang0812 Liuchang0812 force-pushed the Liuchang0812:wip-event-center-bug branch from b4e2348 to 8caa989 Mar 8, 2017

@Liuchang0812

This comment has been minimized.

Copy link
Contributor Author

Liuchang0812 commented Mar 8, 2017

Change Log:

  1. remove EWOULDBLOCK check
  2. rebase and repush commit

@yuyuyu101 thanks

@yuyuyu101

This comment has been minimized.

Copy link
Member

yuyuyu101 commented Mar 8, 2017

hmm, I don't think it will return EAGAIN for eventfd... if did, it should be bug....

@Liuchang0812

This comment has been minimized.

Copy link
Contributor Author

Liuchang0812 commented Mar 8, 2017

@yuyuyu101 We benchmark H-release with async-messager, and get one core-dump that assert failed at https://github.com/ceph/ceph/blob/hammer/src/msg/async/Event.cc#L246 with errno EAGAIN.

IMO, nonblocking pipe's writing would gets EAGAIN.

@Liuchang0812

This comment has been minimized.

Copy link
Contributor Author

Liuchang0812 commented Mar 8, 2017

some log that we added for this

Mar  3 20:38:14  bash: write failed: -1 errno 11 msg/async/Event.cc: In function 'void EventCenter::wakeup()' thread 7
f1d64b89700 time 2017-03-03 20:38:14.861197
Mar  3 20:38:14  bash: msg/async/Event.cc: 278: FAILED assert(n == 1)
@yuyuyu101

This comment has been minimized.

Copy link
Member

yuyuyu101 commented Mar 8, 2017

hammer? I don't think hammer's issue will apply to master... They're totally different... Do you backport master's async msgr to hammer or?

@Liuchang0812

This comment has been minimized.

Copy link
Contributor Author

Liuchang0812 commented Mar 8, 2017

@yuyuyu101 I have read the code on master, there are same problem seems. It would be better to backport all changes to Hammer.

you added EAGAIN check in read pipe in this commit f62f28c

@Liuchang0812

This comment has been minimized.

Copy link
Contributor Author

Liuchang0812 commented Mar 8, 2017

@yuyuyu101 sorry, release is Jewel

@yuyuyu101

This comment has been minimized.

Copy link
Member

yuyuyu101 commented Mar 8, 2017

@Liuchang0812 read is different to write, it make sense that we would like to get EAGAIN from reading. but writing must be successful. Otherwise it means potential thread wakeup leaking risk.

@yuyuyu101

This comment has been minimized.

Copy link
Member

yuyuyu101 commented Mar 8, 2017

do you mean you still meet the crash in master branch?

@Liuchang0812

This comment has been minimized.

Copy link
Contributor Author

Liuchang0812 commented Mar 8, 2017

no, I don't test it on master code.

@yuyuyu101

This comment has been minimized.

Copy link
Member

yuyuyu101 commented Mar 8, 2017

Jewel release exists already known issues about async msgr. Since K release, we make it default.

@Liuchang0812

This comment has been minimized.

Copy link
Contributor Author

Liuchang0812 commented Mar 8, 2017

thanks for your explaining. It seems that wakeup logical is same between Jewel and Master branch. We should take a deep look and try to fix this problem.

Here is libev's evpipe impletation:
https://github.com/enki/libev/blob/93823e6ca699df195a6c7b8bfa6006ec40ee0003/ev.c#L2466

@yuyuyu101

This comment has been minimized.

Copy link
Member

yuyuyu101 commented Mar 8, 2017

... AFAK I haven't see any bug report related to this in stable version. If met, it must be a bug we need to handle instead of ignore this.

@Liuchang0812

This comment has been minimized.

Copy link
Contributor Author

Liuchang0812 commented Mar 8, 2017

ok, do you have any suggestion to fix above crash in jewel release? We have to use Jewel now. appreciated!

@Liuchang0812

This comment has been minimized.

Copy link
Contributor Author

Liuchang0812 commented Mar 8, 2017

@tchaikov take a look please?

@yuyuyu101

This comment has been minimized.

Copy link
Member

yuyuyu101 commented Mar 8, 2017

we don't have plan to backport now. why not simple msgr until L release?

@Liuchang0812

This comment has been minimized.

Copy link
Contributor Author

Liuchang0812 commented Mar 8, 2017

We have too many OSDs, simple msgr will issue pthread_create failed. 🤣

@yuyuyu101

This comment has been minimized.

Copy link
Member

yuyuyu101 commented Mar 8, 2017

@Liuchang0812 what about upgrading to K?

@Liuchang0812

This comment has been minimized.

Copy link
Contributor Author

Liuchang0812 commented Mar 8, 2017

@yuyuyu101 i will have a try, need to discusse with others

@Liuchang0812

This comment has been minimized.

Copy link
Contributor Author

Liuchang0812 commented Mar 8, 2017

@yuyuyu101 I found the reason. We use epool ET mode on pipe, but async msgr read 256 chars one-time. so we will not read pipe forever. Need we backport related patches to Jewel.

@yuyuyu101

This comment has been minimized.

Copy link
Member

yuyuyu101 commented Mar 8, 2017

ok, but keep in mine.. it still exists several problems...

@Liuchang0812

This comment has been minimized.

Copy link
Contributor Author

Liuchang0812 commented Mar 8, 2017

ldout(cct, 1) << __func__ << " write notify pipe failed: " << cpp_strerror(errno) << dendl;
ceph_abort();
if (errno != EAGAIN)
{

This comment has been minimized.

Copy link
@yuyuyu101

yuyuyu101 Mar 9, 2017

Member

make this if () {
}

style

msg/async: fix crash that writing char to nonblock-fd gets EAGAIN in …
…EventCenter::wakeup

Signed-off-by: liuchang0812 <liuchang0812@gmail.com>

@Liuchang0812 Liuchang0812 force-pushed the Liuchang0812:wip-event-center-bug branch from 8caa989 to 55884c5 Mar 9, 2017

@Liuchang0812

This comment has been minimized.

Copy link
Contributor Author

Liuchang0812 commented Mar 9, 2017

Changelog:

  1. corrected code style
  2. rebased and repushed

@yuyuyu101 I would like to back-port some commits to Jewel, is it ok?

@yuyuyu101

This comment has been minimized.

Copy link
Member

yuyuyu101 commented Mar 9, 2017

@Liuchang0812 of course

@tchaikov tchaikov merged commit 6554403 into ceph:master Mar 20, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.