New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msg/async: avoid requeue racing with handle_write #15324

Merged
merged 2 commits into from May 29, 2017

Conversation

Projects
None yet
3 participants
@yuyuyu101
Member

yuyuyu101 commented May 27, 2017

No description provided.

yuyuyu101 added some commits May 27, 2017

msg/async: avoid requeue racing with handle_write
when the thread is calling AsyncConnection::handle_write, another thread may
replace it and requeue all messages. Because we remove the write_lock
protection for handle_write caller, it may happen sent racing with out_q

Fix: http://tracker.ceph.com/issues/20093

Signed-off-by: Haomai Wang <haomai@xsky.com>
msg/async: keep _has_next_outgoing calling under write_lock
Signed-off-by: Haomai Wang <haomai@xsky.com>
@yuyuyu101

This comment has been minimized.

Member

yuyuyu101 commented May 27, 2017

plz run multi qa rounds to verify

@yuyuyu101

This comment has been minimized.

Member

yuyuyu101 commented May 27, 2017

@liewegas liewegas merged commit 676fc82 into ceph:master May 29, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details

@yuyuyu101 yuyuyu101 deleted the yuyuyu101:wip-20093 branch May 30, 2017

@ue90

This comment has been minimized.

ue90 commented Jun 14, 2017

when I apply this patch, there are some problems with the start of ceph with rdma. The cpu is fully consumed by osd in one of three nodes.

@yuyuyu101

This comment has been minimized.

Member

yuyuyu101 commented Jun 15, 2017

@ue90 could you tell more?

@ue90

This comment has been minimized.

ue90 commented Jun 16, 2017

top info:
%Cpu(s): 41.5 us, 53.8 sy, 0.0 ni, 2.3 id, 1.2 wa, 0.0 hi, 1.3 si, 0.0 st
KiB Mem : 32990886+total, 25883768 free, 94821912 used, 20920316+buff/cache
KiB Swap: 65535996 total, 65075712 free, 460284 used. 23342251+avail Mem
2562447 root 20 0 4148284 2.918g 676864 S 115.9 0.9 5:47.60 ceph-osd
2562408 root 20 0 3722232 2.734g 570288 S 112.0 0.9 5:03.81 ceph-osd
2562372 root 20 0 3821256 2.774g 608568 S 110.0 0.9 4:59.57 ceph-osd
2562433 root 20 0 3724424 2.695g 627028 S 109.4 0.9 4:52.28 ceph-osd
2562441 root 20 0 3741676 2.690g 639128 S 108.7 0.9 5:14.24 ceph-osd
2562445 root 20 0 3835860 2.740g 623344 S 106.8 0.9 5:44.17 ceph-osd
2562423 root 20 0 3661768 2.671g 588396 S 106.1 0.8 4:50.35 ceph-osd
2562377 root 20 0 3838764 2.732g 600336 S 105.5 0.9 5:50.05 ceph-osd
2562429 root 20 0 3697716 2.659g 610716 S 103.2 0.8 4:46.08 ceph-osd
2562394 root 20 0 3553988 2.558g 553712 S 101.0 0.8 4:03.84 ceph-osd
2562437 root 20 0 3727560 2.707g 637364 S 101.0 0.9 5:20.65 ceph-osd
2562431 root 20 0 3666944 2.628g 601100 S 100.0 0.8 4:49.74 ceph-osd
2562439 root 20 0 3642332 2.639g 578344 S 99.4 0.8 4:35.22 ceph-osd
2562417 root 20 0 3708912 2.677g 579968 S 95.8 0.9 4:54.17 ceph-osd
2562421 root 20 0 3600244 2.563g 586032 S 95.1 0.8 4:34.28 ceph-osd
2562435 root 20 0 3862480 2.735g 647488 S 95.1 0.9 5:37.73 ceph-osd
2562410 root 20 0 3717312 2.658g 627440 S 92.9 0.8 5:19.62 ceph-osd
2562379 root 20 0 3575584 2.565g 563276 S 90.9 0.8 4:17.35 ceph-osd
2562378 root 20 0 3566408 2.567g 528648 S 90.6 0.8 4:16.89 ceph-osd
2562443 root 20 0 3873532 2.779g 625532 S 88.3 0.9 5:02.54 ceph-osd
2562368 root 20 0 3787604 2.714g 582700 S 87.1 0.9 4:59.55 ceph-osd
2562419 root 20 0 3659756 2.657g 592552 S 86.7 0.8 4:24.10 ceph-osd
2562449 root 20 0 3670984 2.591g 559400 S 85.4 0.8 4:46.28 ceph-osd
2562380 root 20 0 3632080 2.573g 603100 S 84.8 0.8 5:02.58 ceph-osd
2562427 root 20 0 3696568 2.569g 590360 S 84.8 0.8 4:16.68 ceph-osd
2562381 root 20 0 3697688 2.598g 592816 S 82.2 0.8 5:01.01 ceph-osd
2562415 root 20 0 3640388 2.598g 543696 S 82.2 0.8 4:42.81 ceph-osd
2562425 root 20 0 3719220 2.652g 603084 S 79.0 0.8 5:15.26 ceph-osd
2562451 root 20 0 3762876 2.677g 582408 S 77.0 0.9 5:10.26 ceph-osd
2562375 root 20 0 3627732 2.564g 586604 S 76.1 0.8 4:32.15 ceph-osd
2562413 root 20 0 3668932 2.610g 627484 S 75.7 0.8 4:38.99 ceph-osd
2561844 root 20 0 1557340 1.006g 106104 S 5.5 0.3 0:37.92 ceph-mon
2562376 root 20 0 2819160 1.903g 103836 S 1.6 0.6 0:15.30 ceph-osd
2565459 root 20 0 1421048 0.988g 237080 S 1.3 0.3 0:15.58 ceph-mds

ceph info:
===> ceph -s Thu Jun 15 20:30:18 CST 2017
grant_status:trial, left_days:60
cluster 6dd275f6-2f68-4e6b-99d4-6ff5630c8524
health HEALTH_OK
23 requests are blocked > 32 sec
monmap e1: 3 mons at {ceph236=192.168.1.16:6789/0,ceph237=192.168.1.17:6789/0,ceph238=192.168.1.18:6789/0}
election epoch 26, quorum 0,1,2 ceph236,ceph237,ceph238
fsmap e34: 1/1/1 up {0=1=up:active}, 1 up:standby
osdmap e528: 104 osds: 104 up, 104 in
flags sortbitwise
pgmap v4091: 4224 pgs, 2 pools, 872 GB data, 218 kobjects
1308 GB used, 366 TB / 368 TB avail
4224 active+clean

@yuyuyu101

This comment has been minimized.

Member

yuyuyu101 commented Jun 16, 2017

the problem?

@ue90

This comment has been minimized.

ue90 commented Jun 16, 2017

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment