Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix replica closing socket #1167

Merged
merged 1 commit into from May 1, 2023
Merged

Conversation

dranikpg
Copy link
Contributor

@dranikpg dranikpg commented Apr 30, 2023

This bug was found inside test_cancel_replication_immediately of the regression tests

https://github.com/dragonflydb/dragonfly/actions/runs/4828930130/jobs/8603358782

The stacktrace suggests that we try to shut down a socket that is already closed or not initalized. What is more,

[Error generic:125 while calling check_connection_error(ec, kConnErr)](replica.cc:181] Error generic:125 while calling check_connection_error(ec, kConnErr))

indicates that it must be that Replica::Start is cancelled after the socket is initialized (the pointer is valid), but before it's connected (fd = -1)

I first looked only at the iouring socket's code and saw that fd is not changed even on errors, which made me think of many different causes for this bug. But as pytests are running with epoll, it simply looks like the socket failed to connect, set fd_ to -1 and the contex was cancelled.

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
@dranikpg dranikpg marked this pull request as ready for review April 30, 2023 19:53
@romange
Copy link
Collaborator

romange commented May 1, 2023

the stacktrace, for reference:

F20230428 09:20:14.401660 15791 fiber_socket_base.cc:144] Check failed: fd_ >= 0 (-1 vs. 0) 
*** Check failure stack trace: ***
*** SIGABRT received at time=1682673614 on cpu 1 ***
PC: @     0xffff86b12d78  (unknown)  raise
    @     0xaaaabc3f2eec         64  absl::lts_20230125::WriteFailureInfo()
    @     0xaaaabc3f314c       4816  absl::lts_20230125::AbslFailureSignalHandler()
    @     0xffff87d788bc        304  (unknown)
    @     0xffff86affaac        336  abort
    @     0xaaaabc3727fc        176  google::DumpStackTraceAndExit()
    @     0xaaaabc368580         16  google::LogMessage::Fail()
    @     0xaaaabc36848c        160  google::LogMessage::SendToLog()
    @     0xaaaabc367cf8         80  google::LogMessage::Flush()
    @     0xaaaabc36b4c8         32  google::LogMessageFatal::~LogMessageFatal()
    @     0xaaaabc1f2cf0        176  util::LinuxSocketBase::Shutdown()
    @     0xaaaabbc175ac        176  dfly::Replica::CloseSocket()::{lambda()#1}::operator()()
    @     0xaaaabbc2d6dc         32  util::detail::ResultMover<>::Apply<>()
W20230428 09:20:14.457376 15790 uring_proactor.cc:190] CQE error: 125 cqe.flags=0
    @     0xaaaabbc2aa84         48  util::fb2::ProactorBase::Await<>()::{lambda()#1}::operator()()
    @     0xaaaabbc44dc8         48  std::__invoke_impl<>()
    @     0xaaaabbc42034         48  std::__invoke<>()
    @     0xaaaabbc3fbac         48  std::__apply_impl<>()
    @     0xaaaabbc3fc38         48  std::apply<>()
    @     0xaaaabbc3fe50         96  util::fb2::detail::WorkerFiberImpl<>::run_()

@dranikpg dranikpg merged commit 3fd4e27 into dragonflydb:main May 1, 2023
6 checks passed
@dranikpg dranikpg deleted the fix-replica-bug branch May 28, 2023 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants