Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deregister() sometimes fails with "No such file or directory" on MacOS #911

Closed
rom1v opened this issue Jan 23, 2019 · 19 comments · Fixed by #1021
Closed

deregister() sometimes fails with "No such file or directory" on MacOS #911

rom1v opened this issue Jan 23, 2019 · 19 comments · Fixed by #1021
Milestone

Comments

@rom1v
Copy link
Contributor

rom1v commented Jan 23, 2019

Several users reported this error on MacOS.

Today, I got more logs (see discussion from Genymobile/gnirehtet#136 (comment)) in which poll.register() returns a Token(3), but passing this token to poll.deregister() fails with No such file or directory.

This happens only on MacOS (apparently), so I guess it is related to the kqueue backend.

@Thomasdezeeuw
Copy link
Collaborator

The only way I can think of dergister returning ENOENT is by either double deregistering, or not registering in the first place.

If I'm correct https://github.com/Genymobile/gnirehtet/blob/2bf8a3cfbe2168031897b8f917c088fe6d47710c/relay-rust/src/relay/tcp_connection.rs#L788 is the line that causes this. Could you add an assertion/check that ensure that closed == false? This way we can make sure that it isn't double deregistered.

@rom1v
Copy link
Contributor Author

rom1v commented Jan 24, 2019

Thank you for your answer.

Could you add an assertion/check that ensure that closed == false? This way we can make sure that it isn't double deregistered.

I will ask to the user who can reproduce the problem. But with the provided logs (with this patch Genymobile/gnirehtet#136 (comment)), we already see that Token(3) is registered once and not deregistered twice.

@sile
Copy link

sile commented Feb 20, 2019

I encountered a similar problem with FreeBSD.
It can be reproduced as follows:

//
// FreeBSD
//
$ uname -a
FreeBSD freebsd-11-1 11.2-RELEASE-p9 FreeBSD 11.2-RELEASE-p9 #0: Tue Feb  5 15:30:36 UTC 2019     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

$ cargo new miotest
$ cd miotest/
$ echo 'mio = "0.6.16"' >> Cargo.toml
$ vi src/main.rs   // See below code
$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.03s
     Running `target/debug/miotest`
thread 'main' panicked at 'This expression fails on FreeBSD, but doesn't on Linux: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/libcore/result.rs:1009:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

//
// Debian
//
$ uname -a
Linux instance-1 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 GNU/Linux
// .. abbrev ..
$ cargo run
Succeeded
// src/main.rs: This code is copied from https://docs.rs/mio/0.6.16/mio/#example.
use mio::net::{TcpListener, TcpStream};
use mio::*;

const SERVER: Token = Token(0);
const CLIENT: Token = Token(1);

fn main() {
    let addr = "127.0.0.1:13265".parse().unwrap();

    let server = TcpListener::bind(&addr).unwrap();

    let poll = Poll::new().unwrap();

    poll.register(&server, SERVER, Ready::readable(), PollOpt::edge())
        .unwrap();

    let sock = TcpStream::connect(&addr).unwrap();

    // NOTE(sile): `PollOpt::oneshot()` is added by me.
    poll.register(
        &sock,
        CLIENT,
        Ready::readable(),
        PollOpt::edge() | PollOpt::oneshot(),
    )
    .unwrap();

    let mut events = Events::with_capacity(1024);

    loop {
        poll.poll(&mut events, None).unwrap();

        for event in events.iter() {
            match event.token() {
                SERVER => {
                    let _ = server.accept();
                }
                CLIENT => {
                    // NOTE(sile): The following two lines are added by me.
                    poll.deregister(&sock)
                        .expect("This expression fails on FreeBSD, but doesn't on Linux");
                    println!("Succeeded");
                    return;
                }
                _ => unreachable!(),
            }
        }
    }
}

@asomers
Copy link
Collaborator

asomers commented Feb 20, 2019

This looks like a bug in how mio handles oneshot on kqueue-based systems:

With kqueue there are only two possible states for a kevent: enabled or disabled. EV_ONESHOT means that the kevent will become disabled after the first occurrence. However, Linux's epoll has three states: registered, not registered, or registered-but-disabled. EPOLLONESHOT means that a file will transition to the disabled state after it returns a single event. Here's a description of how EPOLLONESHOT should be used.

But mio seems to be treating EPOLLONESHOT and EV_ONESHOT identically.

Does mio even allow multiple threads to poll the same file descriptor at the same time? If not, then EPOLLONESHOT doesn't make sense and we should remove that option. If so, then we should add EPOLLEXCLUSIVE because it's better than EPOLLONESHOT. In any case, perhaps we should rename one or both of the oneshots since they aren't equivalent?

@carllerche
Copy link
Member

@asomers I'm not exactly following your proposal.

@asomers
Copy link
Collaborator

asomers commented Feb 20, 2019

I'm saying that since EPOLLONESHOT and EV_ONESHOT do different things, they shouldn't use the same apparently cross-platform API. Instead, mio should do one of three things:

  1. PollOpt::oneshot should emulate EPOLLONESHOT behavior on kqueue-based systems.
  2. PollOpt::oneshot should emulate EV_ONESHOT behavior on epoll-based systems
  3. PollOpt::oneshot should be deprecated, and new non-portable options added, like PollOpt::epolloneshot and PollOpt::kqoneshot, or something like that.

@carllerche
Copy link
Member

Unfortunately, things will get more complicated as the windows side of things is reworked. I'd like to switch the windows impl to something like: https://github.com/piscisaureus/wepoll

The downside is this strategy only supports oneshot.

@carllerche
Copy link
Member

Searching around, I also found: https://github.com/jiixyj/epoll-shim/blob/master/src/epoll.c

@asomers
Copy link
Collaborator

asomers commented Feb 20, 2019

That epoll-shim looks awfully heavyweight, and it doesn't even handle EPOLLONESHOT.

@carllerche
Copy link
Member

I think the general strategy is to use some of the user data associated with the FD to track enough state to map the epoll API. We should be able to do this as well?

@carllerche
Copy link
Member

@asomers I haven't looked in detail, what looks heavyweight?

@asomers
Copy link
Collaborator

asomers commented Feb 20, 2019

That whole file you just linked. It looks like it's trying to emulate the epoll C API on kqueue-based systems.

@carllerche
Copy link
Member

@asomers it also sounds like the problem stems from using deregister. One option might be to remove deregister from the portable API.

It already is not possible to move a FD between Poll instances, so deregister shouldn't be needed?

@asomers
Copy link
Collaborator

asomers commented Feb 20, 2019

That would be another option. But are there any other legitimate use cases for using deregister? Does anybody ever decide to ignore a socket without closing it?

@przygienda
Copy link

przygienda commented Feb 20, 2019 via email

@carllerche
Copy link
Member

@asomers @przygienda well, ideally we could do something like remove deregister from the "portable" API but keep it exposed when you opt into platform specific behavior.

We probably need to make a list of the various inconsistencies across platforms and see how we can unify.

@carllerche
Copy link
Member

So, Poll could have a deregister_fd on platforms that support it...

That said, the biggest problem with using oneshot as the portable strategy is that it adds additional epoll_ctl calls on linux. I wonder how much that would impact performance.

@Thomasdezeeuw
Copy link
Collaborator

I just tried this with mio_st and it worked, I'll try to figure what is different on mio.

@Thomasdezeeuw
Copy link
Collaborator

So mio_st actually cheats a bit and simply ignores ENOENT errors. I think I went with this approach because the result is the same, the fd is no longer registered. Would that be an acceptable solution to this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants