New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
msg/async/rdma: fix a coredump introduced by PR #18053, #18204
Conversation
@yuyuyu101 @tchaikov pls help review this fix; thanks. |
@@ -244,17 +244,20 @@ void RDMADispatcher::polling() | |||
perf_logger->set(l_msgr_rdma_inflight_tx_chunks, inflight); | |||
if (num_dead_queue_pair) { | |||
Mutex::Locker l(lock); // FIXME reuse dead qp because creating one qp costs 1 ms | |||
for (auto &i : dead_queue_pairs) { | |||
auto it = dead_queue_pairs.begin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my bad, i didn't realize that we were removing elements while iterating thru the vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your quick review, anyway.
src/msg/async/rdma/RDMAStack.cc
Outdated
perf_logger->dec(l_msgr_rdma_active_queue_pair); | ||
--num_dead_queue_pair; | ||
if (i->get_tx_wr()) { | ||
ldout(cct, 10) << __func__ << " bypass qp=" << i << " tx_wr=" << i->get_tx_wr() << dendl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose improve this to level 20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@yuyuyu101 Log level is bumped up, pls help double check; thanks. |
@tchaikov could you pls help me clarify the needs-qa label? Does that mean this commit will go though some automatic QA tests before merging? Thanks. |
@ownedu see http://docs.ceph.com/docs/master/dev/#integration-tests-aka-ceph-qa-suite also, could you squash your commits into a single one? i think you've addressed @yuyuyu101 's concern. |
@tchaikov Regarding the needs-qa, got it and thanks; and the commits combination is done. |
where the iterator is not working properly after erase(). Signed-off-by: Yan Lei <yongyou.yl@alibaba-inc.com>
8f90d71
to
322f87f
Compare
@yuyuyu101 @tchaikov The latest commit failed with "make check", probably by bad connections, could you pls help take a look? And if so, how to redo "make check"? The following tests FAILED:
|
jenkins, retest this please. |
Passed; thanks @tchaikov |
@yuyuyu101 @tchaikov QA teuthology testing will be started automatically once it is scheduled? |
@ownedu please define "schedule". |
@tchaikov I see "Job stats for this page: 2080 queued, 158 failed, 72 waiting, 102 running, 47 dead, 1348 passed, (3807 total)" in http://pulpito.ceph.com, if I understand correctly this PR's QA is queued and waiting for test. Plz correct me if I misunderstood anything. |
AFAICT, pulpito is not triggered by any label on github.
no
yes. some kind developer will collect the "needs-qa" PRs for testing them in batch, and analyze the test result, then hopefully merge the tested PRs if all goes well. and it does not happen automatically. normally, this takes up to one week. |
@tchaikov gotcha; thanks. |
where the iterator is not working properly after erase().
introduced by #18053
Signed-off-by: Yan Lei yongyou.yl@alibaba-inc.com