Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msg/async/rdma: Fix memory leak of OSD #13101

Merged
merged 1 commit into from Jan 26, 2017

Conversation

Projects
None yet
3 participants
@Adirl
Copy link

Adirl commented Jan 25, 2017

We can delete qp only in RDMADispatcher::handle_async_event() which call
"erase_qpn" to enable deletion.

Signed-off-by: Sarit Zubakov saritz@mellanox.com

msg/async/rdma: Fix memory leak of OSD
We can delete qp only in RDMADispatcher::handle_async_event() which call
"erase_qpn" to enable deletion.

issue: 959004

Change-Id: Iab69cb365b37a09e9608d4b3c595e05278bbe021
Signed-off-by: Sarit Zubakov <saritz@mellanox.com>
@Adirl

This comment has been minimized.

Copy link
Author

Adirl commented Jan 25, 2017

@saritz

This comment has been minimized.

Copy link
Contributor

saritz commented Jan 25, 2017

@yuyuyu101, @Adirl

We noticed that the leak is mainly cause by creating QPs without releasing them.
Calling delete qp at the destructor of Inifiniband::QueueuPair caused segmentation fault (two thread called to delete same QP). The right way to release, according to Haomai, is to call to call "handle_async_event()" at /src/msg/async/rdma/RDMAStack.cc - void RDMADispatcher::polling() - uncommented - handle_async_event();

With this fix we noticed a significant reduction of the leak from ~110MB per 10 minutes to ~20MB per 10 minutes. We are still analysing what left.

@yuyuyu101

This comment has been minimized.

Copy link
Member

yuyuyu101 commented Jan 25, 2017

could this passed ceph_test_msgr?

@Adirl

This comment has been minimized.

Copy link
Author

Adirl commented Jan 26, 2017

not tested using unittest but we verified to solve 80% of memory leak on real cluster with 3 nodes and 1 mon + 3 OSD total,
running traffic from FIO client and monitoring QP count and memory usage.

looks good
still having a small leak - unknown root cause

@yuyuyu101 yuyuyu101 merged commit 8548eda into ceph:master Jan 26, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details

@Adirl Adirl deleted the Adirl:fix_mem_leak branch Apr 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.