Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gdamore/win io lock #1831

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Gdamore/win io lock #1831

wants to merge 4 commits into from

Conversation

gdamore
Copy link
Contributor

@gdamore gdamore commented May 5, 2024

This is a trial run, and it may or may not work.

When closing pipes, we defer them to be reaped, but also leave
them in the match list where they might be picked up by ep_match,
or leak.  It's best to reap these proactively and ensure that they
are not allowed to life longer once they have errored during the
negotiation phase.
Copy link

codecov bot commented May 5, 2024

Codecov Report

Attention: Patch coverage is 81.81818% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 79.50%. Comparing base (aac4dc3) to head (a5f1a8f).

Files Patch % Lines
src/sp/transport/tcp/tcp.c 0.00% 1 Missing ⚠️
src/sp/transport/tls/tls.c 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1831      +/-   ##
==========================================
+ Coverage   79.32%   79.50%   +0.17%     
==========================================
  Files          95       95              
  Lines       21491    21484       -7     
==========================================
+ Hits        17048    17081      +33     
+ Misses       4443     4403      -40     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -23,6 +23,8 @@ static int win_io_nthr = 0;
static HANDLE win_io_h = NULL;
static nni_thr *win_io_thrs;

static SRWLock win_io_lock = SRWLOCK_INIT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static SRWLock win_io_lock = SRWLOCK_INIT;
static SRWLOCK win_io_lock = SRWLOCK_INIT;

DWORD num;
// Make absolutely sure there is no I/O running.
if (io->f != INVALID_HANDLE) {
CancelIoEx(io->f, &o->olpd);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CancelIoEx(io->f, &o->olpd);
CancelIoEx(io->f, &io->olpd);

CloseHandle((HANDLE) io->olpd.hEvent);
DWORD num;
// Make absolutely sure there is no I/O running.
if (io->f != INVALID_HANDLE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (io->f != INVALID_HANDLE) {
if (io->f != INVALID_HANDLE_VALUE) {

@@ -331,7 +331,8 @@ nni_ipc_listener_alloc(nng_stream_listener **lp, const nng_url *url)
if ((l = NNI_ALLOC_STRUCT(l)) == NULL) {
return (NNG_ENOMEM);
}
if ((rv = nni_win_io_init(&l->io, ipc_accept_cb, l)) != 0) {
if ((rv = nni_win_io_init(&l->io, INVALID_HANDLE, ipc_accept_cb, l)) !=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ((rv = nni_win_io_init(&l->io, INVALID_HANDLE, ipc_accept_cb, l)) !=
if ((rv = nni_win_io_init(&l->io, INVALID_HANDLE_VALUE, ipc_accept_cb, l)) !=

Copy link
Contributor

@alzix alzix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code does compile after applying the suggestions - but the use-after-free is not solved

@@ -61,26 +69,44 @@ nni_win_io_register(HANDLE h)
}

int
nni_win_io_init(nni_win_io *io, nni_win_io_cb cb, void *ptr)
nni_win_io_init(nni_win_io *io, HANDLE h, nni_win_io_cb cb, void *ptr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to update the header file as well

Comment on lines 451 to 452
if (((rv = nni_win_io_init(&c->recv_io, s, tcp_recv_cb, c)) != 0) ||
((rv = nni_win_io_init(&c->send_io, s, tcp_send_cb, c)) != 0) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (((rv = nni_win_io_init(&c->recv_io, s, tcp_recv_cb, c)) != 0) ||
((rv = nni_win_io_init(&c->send_io, s, tcp_send_cb, c)) != 0) ||
if (((rv = nni_win_io_init(&c->recv_io, (HANDLE) s, tcp_recv_cb, c)) != 0) ||
((rv = nni_win_io_init(&c->send_io, (HANDLE) s, tcp_send_cb, c)) != 0) ||

@@ -448,8 +448,8 @@ nni_win_tcp_init(nni_tcp_conn **connp, SOCKET s)
c->ops.s_get = tcp_get;
c->ops.s_set = tcp_set;

if (((rv = nni_win_io_init(&c->recv_io, tcp_recv_cb, c)) != 0) ||
((rv = nni_win_io_init(&c->send_io, tcp_send_cb, c)) != 0) ||
if (((rv = nni_win_io_init(&c->recv_io, s, tcp_recv_cb, c)) != 0) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remember to update the function declaration in win_impl.h:125

@@ -322,7 +322,7 @@ nni_tcp_listener_accept(nni_tcp_listener *l, nni_aio *aio)
c->listener = l;
c->conn_aio = aio;
nni_aio_set_prov_data(aio, c);
if (((rv = nni_win_io_init(&c->conn_io, tcp_accept_cb, c)) != 0) ||
if (((rv = nni_win_io_init(&c->conn_io, s, tcp_accept_cb, c)) != 0) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (((rv = nni_win_io_init(&c->conn_io, s, tcp_accept_cb, c)) != 0) ||
if (((rv = nni_win_io_init(&c->conn_io, (HANDLE) s, tcp_accept_cb, c)) != 0) ||

@@ -208,7 +208,7 @@ nni_tcp_dial(nni_tcp_dialer *d, const nni_sockaddr *sa, nni_aio *aio)

c->peername = ss;

if ((rv = nni_win_io_init(&c->conn_io, tcp_dial_cb, c)) != 0) {
if ((rv = nni_win_io_init(&c->conn_io, s, tcp_dial_cb, c)) != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ((rv = nni_win_io_init(&c->conn_io, s, tcp_dial_cb, c)) != 0) {
if ((rv = nni_win_io_init(&c->conn_io, (HANDLE) s, tcp_dial_cb, c)) != 0) {

@@ -67,7 +67,7 @@ nni_plat_udp_open(nni_plat_udp **udpp, nni_sockaddr *sa)
(void) setsockopt(
u->s, IPPROTO_IPV6, IPV6_V6ONLY, (char *) &no, sizeof(no));

if (((rv = nni_win_io_init(&u->rxio, udp_recv_cb, u)) != 0) ||
if (((rv = nni_win_io_init(&u->rxio, u->s, udp_recv_cb, u)) != 0) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (((rv = nni_win_io_init(&u->rxio, u->s, udp_recv_cb, u)) != 0) ||
if (((rv = nni_win_io_init(&u->rxio, (HANDLE) u->s, udp_recv_cb, u)) != 0) ||

@gdamore
Copy link
Contributor Author

gdamore commented May 6, 2024

Hi @alzix - obviously I didn't get a chance to try building this on Windows myself (except via CI/CID) before I ran out of time. I'll fix these .. .but if you've already done so, can you tell me if the approach at least resolves the problems you've been observing?

@alzix
Copy link
Contributor

alzix commented May 7, 2024

no i still got a crash
image
image

@KnappSas
Copy link

KnappSas commented May 7, 2024

For the sake of completeness I want to mention that the reqrep code provided by @alzix also sometimes crashes in nni_stat_inc. It also looks like the code still hangs sometimes.

@alzix
Copy link
Contributor

alzix commented May 7, 2024

I also saw the crash in the nni_stat_inc especially when running with a debugger and stopping threads for some time.
IMO we better concentrate on this critical issue first.
Statistics can be compiled out if it cause problems. @KnappSas - can you please open a different issue for it - so it won't get forgotten

@mikisch81
Copy link

HI @gdamore, I also tried this branch on a Windows machine and got the same crash:
Screenshot 2024-05-13 at 3 44 57 PM

It seems that when calling nni_list_node_remove(&aio->a_prov_node) than a_prov_node has the next&prev pointers point to free or uninitialized memory, I wasn't able to understand where is this node gets initialized/freed.

It is very easy to reproduce as it happens almost immediately when running the modified reqrep demo.

@itayzafrir
Copy link

itayzafrir commented May 22, 2024

Hi @gdamore and all.

I've been looking into this and it seems as if the pipe is being deallocated/freed (in ipc_pipe_fini) but after that a call to ipc_send_cb is received (from win_io_handler).
Inside ipc_send_cb the nni_aio pointer has the same address of the tx_aio member of ipc_pipe. Since tx_aio is a by value member and since the the ipc_pipe has been deallocated then this pointer is a dangling pointer.

image

Also worth mentioning that sometimes, not very often though there's no crash but a deadlock, where both client and server are waiting on signals.

@gdamore
Copy link
Contributor Author

gdamore commented May 23, 2024

THanks -- I've been looking at this in spare cycles, but very few such cycles exist, which is why it's taking time. I'll try to get to it in a day or so.

@gdamore
Copy link
Contributor Author

gdamore commented May 25, 2024

So the design is meant to ensure that all callbacks are done, and the pipe is totally unregistered before we freed it. I've missed something. I'm going to resolve this one way the other this weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants