Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msg/async/rdma: fix a coredump introduced by PR #18053, #18204

Merged
merged 1 commit into from Oct 13, 2017

Conversation

ownedu
Copy link
Contributor

@ownedu ownedu commented Oct 10, 2017

where the iterator is not working properly after erase().

introduced by #18053

Signed-off-by: Yan Lei yongyou.yl@alibaba-inc.com

@ownedu
Copy link
Contributor Author

ownedu commented Oct 10, 2017

@yuyuyu101 @tchaikov pls help review this fix; thanks.

@@ -244,17 +244,20 @@ void RDMADispatcher::polling()
perf_logger->set(l_msgr_rdma_inflight_tx_chunks, inflight);
if (num_dead_queue_pair) {
Mutex::Locker l(lock); // FIXME reuse dead qp because creating one qp costs 1 ms
for (auto &i : dead_queue_pairs) {
auto it = dead_queue_pairs.begin();
Copy link
Contributor

@tchaikov tchaikov Oct 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, i didn't realize that we were removing elements while iterating thru the vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your quick review, anyway.

@tchaikov tchaikov changed the title msg/async/rdma: fix a coredump bug which is introduced by PR #18053, msg/async/rdma: fix a coredump introduced by PR #18053, Oct 10, 2017
perf_logger->dec(l_msgr_rdma_active_queue_pair);
--num_dead_queue_pair;
if (i->get_tx_wr()) {
ldout(cct, 10) << __func__ << " bypass qp=" << i << " tx_wr=" << i->get_tx_wr() << dendl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose improve this to level 20

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@ownedu
Copy link
Contributor Author

ownedu commented Oct 11, 2017

@yuyuyu101 Log level is bumped up, pls help double check; thanks.

@ownedu
Copy link
Contributor Author

ownedu commented Oct 11, 2017

@tchaikov could you pls help me clarify the needs-qa label? Does that mean this commit will go though some automatic QA tests before merging? Thanks.

@tchaikov
Copy link
Contributor

tchaikov commented Oct 11, 2017

@ownedu see http://docs.ceph.com/docs/master/dev/#integration-tests-aka-ceph-qa-suite

also, could you squash your commits into a single one? i think you've addressed @yuyuyu101 's concern.

@ownedu
Copy link
Contributor Author

ownedu commented Oct 11, 2017

@tchaikov Regarding the needs-qa, got it and thanks; and the commits combination is done.

where the iterator is not working properly after erase().

Signed-off-by: Yan Lei <yongyou.yl@alibaba-inc.com>
@ownedu ownedu force-pushed the wip-fix-async-rdma-coredump branch from 8f90d71 to 322f87f Compare October 11, 2017 06:33
@ownedu
Copy link
Contributor Author

ownedu commented Oct 11, 2017

@yuyuyu101 @tchaikov The latest commit failed with "make check", probably by bad connections, could you pls help take a look? And if so, how to redo "make check"?

The following tests FAILED:
2 - run-cli-tests (Failed)
Errors while running CTest
Build step 'Execute shell' marked build as failure
[PostBuildScript] - Execution post build scripts.
[ceph-pull-requests] $ /bin/sh -xe /tmp/jenkins7902536043534808187.sh

  • sudo reboot
    Agent went offline during the build
    ERROR: Connection was broken: java.io.EOFException
    at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638)
    at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113)
    at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853)
    at java.io.ObjectInputStream.(ObjectInputStream.java:349)
    at hudson.remoting.ObjectInputStreamEx.(ObjectInputStreamEx.java:48)
    at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)

@tchaikov
Copy link
Contributor

how to redo "make check"

jenkins, retest this please.

@ownedu
Copy link
Contributor Author

ownedu commented Oct 11, 2017

Passed; thanks @tchaikov

@ownedu
Copy link
Contributor Author

ownedu commented Oct 11, 2017

@yuyuyu101 @tchaikov QA teuthology testing will be started automatically once it is scheduled?

@tchaikov
Copy link
Contributor

@ownedu please define "schedule".

@ownedu
Copy link
Contributor Author

ownedu commented Oct 11, 2017

@tchaikov I see "Job stats for this page: 2080 queued, 158 failed, 72 waiting, 102 running, 47 dead, 1348 passed, (3807 total)" in http://pulpito.ceph.com, if I understand correctly this PR's QA is queued and waiting for test.

Plz correct me if I misunderstood anything.

@tchaikov
Copy link
Contributor

AFAICT, pulpito is not triggered by any label on github.

is queued

no

and waiting for test.

yes.

some kind developer will collect the "needs-qa" PRs for testing them in batch, and analyze the test result, then hopefully merge the tested PRs if all goes well. and it does not happen automatically. normally, this takes up to one week.

@ownedu
Copy link
Contributor Author

ownedu commented Oct 11, 2017

@tchaikov gotcha; thanks.

@tchaikov tchaikov merged commit 5f021b2 into ceph:master Oct 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants