Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible issues with IST donors going back to Synced too soon #106

Closed
jayjanssen opened this issue Sep 3, 2014 · 18 comments
Closed

Possible issues with IST donors going back to Synced too soon #106

jayjanssen opened this issue Sep 3, 2014 · 18 comments

Comments

@jayjanssen
Copy link

I noticed an IST donor only stayed in the Donor state for a very brief period, but the IST took much longer in this log:

2014-09-03 17:44:31 3263 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 184675)
2014-09-03 17:44:31 3263 [Note] WSREP: Member 0.0 (node1) synced with group.
2014-09-03 17:44:31 3263 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 184675)
2014-09-03 17:44:32 3263 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.70.4:4444/xtrabackup_sst' --auth 'sst:secret' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf'   '' --gtid '6841ce1a-3362-11e4-8898-466a773de7c8:162353' --bypass'
2014-09-03 17:44:32 3263 [Note] WSREP: sst_donor_thread signaled with 0
2014-09-03 17:44:32 3263 [Note] WSREP: (6836b14c, 'tcp://0.0.0.0:4567') turning message relay requesting off
2014-09-03 17:44:32 3263 [Note] WSREP: Synchronized with group, ready for connections
2014-09-03 17:44:32 3263 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-09-03 17:44:32 3263 [Note] WSREP: async IST sender starting to serve tcp://192.168.70.4:4568 sending 162354-184650
WSREP_SST: [INFO] Streaming with xbstream (20140903 17:44:33.322)
WSREP_SST: [INFO] Using socat as streamer (20140903 17:44:33.326)
WSREP_SST: [INFO] Bypassing the SST for IST (20140903 17:44:34.792)
WSREP_SST: [INFO] Evaluating xbstream -c ${INFO_FILE} ${IST_FILE} | socat -u stdio TCP:192.168.70.4:4444; RC=( ${PIPESTATUS[@]} ) (20140903 17:44:34.817)
WSREP_SST: [INFO] Total time on donor: 0 seconds (20140903 17:44:34.967)
2014-09-03 17:44:34 3263 [ERROR] WSREP: sst sent called when not SST donor, state SYNCED
WSREP_SST: [INFO] Cleaning up temporary directories (20140903 17:44:35.035)
2014-09-03 17:44:55 3263 [Warning] WSREP: Protocol violation. JOIN message sender 0.0 (node1) is not in state transfer (SYNCED). Message ignored.
2014-09-03 17:44:55 3263 [Note] WSREP: async IST sender served
2014-09-03 17:44:55 3263 [Note] WSREP: 1.0 (node3): State transfer from 0.0 (node1) complete.
2014-09-03 17:44:56 3263 [Note] WSREP: Member 1.0 (node3) synced with group.

My concerns are:

  1. The error in the log
  2. If it is safe for such a Donor to be in the middle of an IST if another Joiner decides it needs a state transfer from the same node.
@ayurchen
Copy link
Member

ayurchen commented Sep 3, 2014

Looks like a leftover from the recent intelligent node selection refactoring. Node tries to join two times.

@ayurchen ayurchen added the bug label Sep 3, 2014
@ayurchen ayurchen added this to the 3.7 milestone Sep 3, 2014
@dirtysalt
Copy link
Contributor

I think this issue is introduced here c2b489f#diff-3500af29aec4b4fd6a26e6e9194792c0R374

About (2), it's safe. A node could serve IST for several nodes, but could only serve SST for one node. And that's why we add gcs_.join(donor_seq) (to switch to SYNCED state ASAP, so we can serve for other nodes) there if it does IST.

I think the whole process is safe. It's a optimization. Our intention is to offer more donors by instructing node goes to SYNCED state ASAP if it does IST. But seems it breaks some contracts.

@ayurchen I think maybe it's better to delete this line because perhaps it's just a premature optimization. What do you think ?

@dirtysalt
Copy link
Contributor

About (1), a possible scenario:

  • node2 and node3 want to join to node1 at the same time. they both need IST.
  • node1 responses to node2 first, calls gcs_.join(donor_seq), then does IST to node2.
  • node1 goes to SYNCED state.
2014-09-03 17:44:31 3263 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 184675)
2014-09-03 17:44:31 3263 [Note] WSREP: Member 0.0 (node1) synced with group.
2014-09-03 17:44:31 3263 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 184675)
  • node1 responses to node3 then, calls gcs_.join(donor_seq), then does IST to node3. But unfortunately when SST is finished, node1 is already in SYNCED state.
2014-09-03 17:44:34 3263 [ERROR] WSREP: sst sent called when not SST donor, state SYNCED

@temeo
Copy link
Contributor

temeo commented Sep 5, 2014

Maybe related #101.

@ayurchen
Copy link
Member

ayurchen commented Sep 5, 2014

Yan, To be frank I never got to the bottom of those lines and comments there. And now it still seems unclear. But the thing is that in the case of IST the node should join ASAP - i.e. right after it does empty SST. And should not try to join afterwards.

Check line replicator_smm.cpp:1182. I think the whole ist_sst_ member should be disposed of. It was a bad workaround for something long ago. In any case, be it IST or SST, the donor still does SST and always calls sst_sent() - and that's where it should call gcs_.join(). So what we need IMO:

  • get rid of ist_sst_
  • remove the line you pointed to
  • remove calling gcs_.join() from IST code: galera::ist::AsyncSenderMap::remove()

@dirtysalt
Copy link
Contributor

In any case, be it IST or SST, the donor still does SST and always calls sst_sent() - and that's where it should call gcs_.join().

Yes, my point is, in normal process, gcs_.join() should be called only when sst_sent.

dirtysalt added a commit that referenced this issue Sep 17, 2014
…ter when it exits. it is introduced because sometimes sst_sent may not be called.
dirtysalt added a commit that referenced this issue Oct 1, 2014
…IST or SST for only one node, which is not we expect
dirtysalt added a commit that referenced this issue Oct 1, 2014
@temeo
Copy link
Contributor

temeo commented Oct 2, 2014

Fix merged to 3.x, closing.

@dirtysalt
Copy link
Contributor

General Rule: normally, gcs_join should be called in sst_sent function, which is called when sst is done. But there is an exception, that if we just do IST, then we can call gcs_join right after we fork IST thread. In that way, one node could serve IST for serval nodes. (see ReplicatorSMM::process_state_req in replicator_str.cpp)

There are some possible error cases.

  1. SST fails. (at slave thread)
  2. IST fails when connecting to receiver. (at slave thread)
  3. IST fails when sending trxs. (at IST thread)

About case 1, if SST fails, then sst_donate_cb_ returns negative code, gcs_join will be called immediately at the exit of process_state_req. (so either joiner or donor don't hang)

About case 2, if IST fails when connecting to receiver. Sender connects to receiver at Sender's ctor. So if IST fails when connecting to receiver, then ist_senders_.run will fail too, and gcs_join will be called immediately at the exit of process_state_req too. (so either joiner or donor don't hang)

About case 3, if IST fails when sending trxs. Joiner will detect error at Receiver::run(), and will escalate this error to ReplicatorSMM::recv_IST. So at last joiner will be aware of IST failure. Donor side will detect error too at run_async_sender (ist.cpp), but unfortunately it does not call gcs_join, which will cause donor side will stay at DONOR state forever instead of changing to JOINED/SYNCED state.

So to fix case 3 but don't break general rule, we need to call gcs_join in IST thread, but only handling error cases (to help donor change back to JOINED/SYNCED state)

@ayurchen
Copy link
Member

ayurchen commented Oct 6, 2014

Yan, I think you're wrong here. Precisely to allow the donor to serve several ISTs at once it should not be in DONOR state when IST is served. So it should join right after SST is done. That is:

  1. either in sst_sent() (if SST was served)
  2. at the end of process_state_req() (if SST request was empty, i.e. joiner was not waiting for SST and SST was not served)

So this is all there is.

The issue of the JOINER hanging in case IST fails (at whatever stage) should be reported and fixed separately, as it affects only joiners and only in case of misconfiguration.

@dirtysalt
Copy link
Contributor

@ayurchen

either in sst_sent() (if SST was served)
at the end of process_state_req() (if SST request was empty, i.e. joiner was not
waiting for SST and SST was not served)

Right. And that's current implementation

The issue of the JOINER hanging in case IST fails (at whatever stage) should be
reported and fixed separately, as it affects only joiners and only in case of
misconfiguration.

JOINER won't hang, but DONOR will. If donor fails to send trxs(for unknown reason) when only doing IST(so SST request is empty), it will be in DONOR state forever and can not serve state transfer any more. The commit b83db28 is to fix that problem. (call gcs_join to change itself back to SYNCED state)

@philip-galera
Copy link
Contributor

I have an MTR test case for this bug. Please let me know when I can push it.

@ayurchen
Copy link
Member

ayurchen commented Oct 7, 2014

Yan,

Right. And that's current implementation

But

The commit b83db28 is to fix that problem. (call gcs_join to change itself back to SYNCED state)

calls gcs_.join() not from process_state_req(), but from IST code. And it seems to happen AFTER IST is complete. Please correct me if I'm wrong.

@dirtysalt
Copy link
Contributor

Alex, I have written the general rule and error handling of state transfer in this comment #106 (comment).

calls gcs_.join() not from process_state_req(), but from IST code. And it seems to happen AFTER IST is complete. Please correct me if I'm wrong.

if we call gcs_join after IST is complete, then donor could serve IST for only one node, the node will stay at DONOR state until IST completes. So we have to call gcs_join after IST thread has been forked(that's gcs_join in process_state_req), instead of IST completes.

@dirtysalt
Copy link
Contributor

Alex, sorry, I must misunderstand you.

It does call gcs_.join in process_state_req(). The reason you don't see it is because gh106 has been merged into 3.x before. You can see the whole gh106 tree here. https://github.com/codership/galera/blob/gh106/galera/src/replicator_str.cpp#L307

@ayurchen
Copy link
Member

ayurchen commented Oct 8, 2014

Yes, so my concern is this method:

void galera::ist::AsyncSenderMap::remove(AsyncSender* as, wsrep_seqno_t seqno)
{
    gu::Critical crit(monitor_);
    std::set<AsyncSender*>::iterator i(senders_.find(as));
    if (i == senders_.end())
    {
        throw gu::NotFound();
    }
    senders_.erase(i);
    if (seqno < 0) {
        gcs_.join(seqno);
    }
}

why is it calling gcs_.join()?

@dirtysalt
Copy link
Contributor

ok. seems I'm wrong. The reason why calls gcs_.join in IST is because of following case:

We are gonna do IST and SST is not empty, so we expect gcs_.join is called in sst_sent to join back to cluster. If some exception happens when we are transferring trxs, I worry that sst_sent won't be called. So gcs_.join won't be called either, and donor node can not be changed back to 'SYNCED' state.

But seems I'm wrong. If we are gonna do IST and SST is not empty, sst_donate_cb_ is called with by_pass == true. And actually if by_pass == true, sst_sent will be called definitely. So node will be chnged back to 'SYNCED' state definitely, and will be not affected the case that some exception happens when transferring trxs.

@ayurchen
Copy link
Member

ayurchen commented Oct 9, 2014

Yan, I think you're still looking at it from the wrong perspective. The
matter is simple: to serve multiple ISTs at once, donor must be in
JOINED/SYNCED state while doing that. Therefore, sending JOIN after
IST is incorrect. In any case donor must send JOIN way before that.

And so far cases are 4

  1. donor sends full SST
  2. donor sends bypass SST
  3. donor sends no SST (empty SST request)
  4. some sort of error while processing STR

In the first two cases, sst_sent() will be called and send JOIN

In the last two cases JOIN must be sent from process_state_req()

IST code should not be involved in this in any way.

On 2014-10-09 04:37, yan.zhang wrote:

ok. seems I'm wrong. The reason why calls gcs_.join in IST is
because of following case:

We are gonna do IST and SST is not empty, so we expect gcs_.join
is called in sst_sent to join back to cluster. If some exception
happens when we are transferring trxs, I worry that sst_sent
won't be called. So gcs_.join won't be called either, and donor
node can not be changed back to 'SYNCED' state.

But seems I'm wrong. If we are gonna do IST and SST is not empty,
sst_donate_cb_ is called with by_pass == true. And
actually if by_pass == true, sst_sent will be called
definitely. So node will be chnged back to 'SYNCED' state definitely,
and will be not affected the case that some exception happens when
transferring trxs.


Reply to this email directly or view it on GitHub:
#106 (comment)

@dirtysalt
Copy link
Contributor

OK. I was too focusing on details. But you are right. Thanks for explanation.

janlindstrom pushed a commit to MariaDB/galera that referenced this issue Oct 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants