Possible issues with IST donors going back to Synced too soon #106

jayjanssen · 2014-09-03T18:06:45Z

I noticed an IST donor only stayed in the Donor state for a very brief period, but the IST took much longer in this log:

2014-09-03 17:44:31 3263 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 184675)
2014-09-03 17:44:31 3263 [Note] WSREP: Member 0.0 (node1) synced with group.
2014-09-03 17:44:31 3263 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 184675)
2014-09-03 17:44:32 3263 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.70.4:4444/xtrabackup_sst' --auth 'sst:secret' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf'   '' --gtid '6841ce1a-3362-11e4-8898-466a773de7c8:162353' --bypass'
2014-09-03 17:44:32 3263 [Note] WSREP: sst_donor_thread signaled with 0
2014-09-03 17:44:32 3263 [Note] WSREP: (6836b14c, 'tcp://0.0.0.0:4567') turning message relay requesting off
2014-09-03 17:44:32 3263 [Note] WSREP: Synchronized with group, ready for connections
2014-09-03 17:44:32 3263 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-09-03 17:44:32 3263 [Note] WSREP: async IST sender starting to serve tcp://192.168.70.4:4568 sending 162354-184650
WSREP_SST: [INFO] Streaming with xbstream (20140903 17:44:33.322)
WSREP_SST: [INFO] Using socat as streamer (20140903 17:44:33.326)
WSREP_SST: [INFO] Bypassing the SST for IST (20140903 17:44:34.792)
WSREP_SST: [INFO] Evaluating xbstream -c ${INFO_FILE} ${IST_FILE} | socat -u stdio TCP:192.168.70.4:4444; RC=( ${PIPESTATUS[@]} ) (20140903 17:44:34.817)
WSREP_SST: [INFO] Total time on donor: 0 seconds (20140903 17:44:34.967)
2014-09-03 17:44:34 3263 [ERROR] WSREP: sst sent called when not SST donor, state SYNCED
WSREP_SST: [INFO] Cleaning up temporary directories (20140903 17:44:35.035)
2014-09-03 17:44:55 3263 [Warning] WSREP: Protocol violation. JOIN message sender 0.0 (node1) is not in state transfer (SYNCED). Message ignored.
2014-09-03 17:44:55 3263 [Note] WSREP: async IST sender served
2014-09-03 17:44:55 3263 [Note] WSREP: 1.0 (node3): State transfer from 0.0 (node1) complete.
2014-09-03 17:44:56 3263 [Note] WSREP: Member 1.0 (node3) synced with group.

My concerns are:

The error in the log
If it is safe for such a Donor to be in the middle of an IST if another Joiner decides it needs a state transfer from the same node.

The text was updated successfully, but these errors were encountered:

ayurchen · 2014-09-03T19:30:26Z

Looks like a leftover from the recent intelligent node selection refactoring. Node tries to join two times.

dirtysalt · 2014-09-05T09:44:57Z

I think this issue is introduced here c2b489f#diff-3500af29aec4b4fd6a26e6e9194792c0R374

About (2), it's safe. A node could serve IST for several nodes, but could only serve SST for one node. And that's why we add gcs_.join(donor_seq) (to switch to SYNCED state ASAP, so we can serve for other nodes) there if it does IST.

I think the whole process is safe. It's a optimization. Our intention is to offer more donors by instructing node goes to SYNCED state ASAP if it does IST. But seems it breaks some contracts.

@ayurchen I think maybe it's better to delete this line because perhaps it's just a premature optimization. What do you think ?

dirtysalt · 2014-09-05T09:53:10Z

About (1), a possible scenario:

node2 and node3 want to join to node1 at the same time. they both need IST.
node1 responses to node2 first, calls gcs_.join(donor_seq), then does IST to node2.
node1 goes to SYNCED state.

2014-09-03 17:44:31 3263 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 184675)
2014-09-03 17:44:31 3263 [Note] WSREP: Member 0.0 (node1) synced with group.
2014-09-03 17:44:31 3263 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 184675)

node1 responses to node3 then, calls gcs_.join(donor_seq), then does IST to node3. But unfortunately when SST is finished, node1 is already in SYNCED state.

2014-09-03 17:44:34 3263 [ERROR] WSREP: sst sent called when not SST donor, state SYNCED

temeo · 2014-09-05T13:24:07Z

Maybe related #101.

ayurchen · 2014-09-05T15:54:17Z

Yan, To be frank I never got to the bottom of those lines and comments there. And now it still seems unclear. But the thing is that in the case of IST the node should join ASAP - i.e. right after it does empty SST. And should not try to join afterwards.

Check line replicator_smm.cpp:1182. I think the whole ist_sst_ member should be disposed of. It was a bad workaround for something long ago. In any case, be it IST or SST, the donor still does SST and always calls sst_sent() - and that's where it should call gcs_.join(). So what we need IMO:

get rid of ist_sst_
remove the line you pointed to
remove calling gcs_.join() from IST code: galera::ist::AsyncSenderMap::remove()

dirtysalt · 2014-09-07T05:17:15Z

In any case, be it IST or SST, the donor still does SST and always calls sst_sent() - and that's where it should call gcs_.join().

Yes, my point is, in normal process, gcs_.join() should be called only when sst_sent.

…ter when it exits. it is introduced because sometimes sst_sent may not be called.

…IST or SST for only one node, which is not we expect

temeo · 2014-10-02T10:46:21Z

Fix merged to 3.x, closing.

dirtysalt · 2014-10-04T14:26:55Z

General Rule: normally, gcs_join should be called in sst_sent function, which is called when sst is done. But there is an exception, that if we just do IST, then we can call gcs_join right after we fork IST thread. In that way, one node could serve IST for serval nodes. (see ReplicatorSMM::process_state_req in replicator_str.cpp)

There are some possible error cases.

SST fails. (at slave thread)
IST fails when connecting to receiver. (at slave thread)
IST fails when sending trxs. (at IST thread)

About case 1, if SST fails, then sst_donate_cb_ returns negative code, gcs_join will be called immediately at the exit of process_state_req. (so either joiner or donor don't hang)

About case 2, if IST fails when connecting to receiver. Sender connects to receiver at Sender's ctor. So if IST fails when connecting to receiver, then ist_senders_.run will fail too, and gcs_join will be called immediately at the exit of process_state_req too. (so either joiner or donor don't hang)

About case 3, if IST fails when sending trxs. Joiner will detect error at Receiver::run(), and will escalate this error to ReplicatorSMM::recv_IST. So at last joiner will be aware of IST failure. Donor side will detect error too at run_async_sender (ist.cpp), but unfortunately it does not call gcs_join, which will cause donor side will stay at DONOR state forever instead of changing to JOINED/SYNCED state.

So to fix case 3 but don't break general rule, we need to call gcs_join in IST thread, but only handling error cases (to help donor change back to JOINED/SYNCED state)

ayurchen · 2014-10-06T21:02:16Z

Yan, I think you're wrong here. Precisely to allow the donor to serve several ISTs at once it should not be in DONOR state when IST is served. So it should join right after SST is done. That is:

either in sst_sent() (if SST was served)
at the end of process_state_req() (if SST request was empty, i.e. joiner was not waiting for SST and SST was not served)

So this is all there is.

The issue of the JOINER hanging in case IST fails (at whatever stage) should be reported and fixed separately, as it affects only joiners and only in case of misconfiguration.

dirtysalt · 2014-10-07T01:56:42Z

@ayurchen

either in sst_sent() (if SST was served)
at the end of process_state_req() (if SST request was empty, i.e. joiner was not
waiting for SST and SST was not served)

Right. And that's current implementation

The issue of the JOINER hanging in case IST fails (at whatever stage) should be
reported and fixed separately, as it affects only joiners and only in case of
misconfiguration.

JOINER won't hang, but DONOR will. If donor fails to send trxs(for unknown reason) when only doing IST(so SST request is empty), it will be in DONOR state forever and can not serve state transfer any more. The commit b83db28 is to fix that problem. (call gcs_join to change itself back to SYNCED state)

philip-galera · 2014-10-07T10:38:00Z

I have an MTR test case for this bug. Please let me know when I can push it.

ayurchen · 2014-10-07T21:51:34Z

Yan,

Right. And that's current implementation

But

The commit b83db28 is to fix that problem. (call gcs_join to change itself back to SYNCED state)

calls gcs_.join() not from process_state_req(), but from IST code. And it seems to happen AFTER IST is complete. Please correct me if I'm wrong.

dirtysalt · 2014-10-08T06:02:51Z

Alex, I have written the general rule and error handling of state transfer in this comment #106 (comment).

calls gcs_.join() not from process_state_req(), but from IST code. And it seems to happen AFTER IST is complete. Please correct me if I'm wrong.

if we call gcs_join after IST is complete, then donor could serve IST for only one node, the node will stay at DONOR state until IST completes. So we have to call gcs_join after IST thread has been forked(that's gcs_join in process_state_req), instead of IST completes.

dirtysalt · 2014-10-08T13:44:54Z

Alex, sorry, I must misunderstand you.

It does call gcs_.join in process_state_req(). The reason you don't see it is because gh106 has been merged into 3.x before. You can see the whole gh106 tree here. https://github.com/codership/galera/blob/gh106/galera/src/replicator_str.cpp#L307

ayurchen · 2014-10-08T14:07:32Z

Yes, so my concern is this method:

void galera::ist::AsyncSenderMap::remove(AsyncSender* as, wsrep_seqno_t seqno)
{
    gu::Critical crit(monitor_);
    std::set<AsyncSender*>::iterator i(senders_.find(as));
    if (i == senders_.end())
    {
        throw gu::NotFound();
    }
    senders_.erase(i);
    if (seqno < 0) {
        gcs_.join(seqno);
    }
}

why is it calling gcs_.join()?

dirtysalt · 2014-10-09T01:37:01Z

ok. seems I'm wrong. The reason why calls gcs_.join in IST is because of following case:

We are gonna do IST and SST is not empty, so we expect gcs_.join is called in sst_sent to join back to cluster. If some exception happens when we are transferring trxs, I worry that sst_sent won't be called. So gcs_.join won't be called either, and donor node can not be changed back to 'SYNCED' state.

But seems I'm wrong. If we are gonna do IST and SST is not empty, sst_donate_cb_ is called with by_pass == true. And actually if by_pass == true, sst_sent will be called definitely. So node will be chnged back to 'SYNCED' state definitely, and will be not affected the case that some exception happens when transferring trxs.

ayurchen · 2014-10-09T06:29:33Z

Yan, I think you're still looking at it from the wrong perspective. The
matter is simple: to serve multiple ISTs at once, donor must be in
JOINED/SYNCED state while doing that. Therefore, sending JOIN after
IST is incorrect. In any case donor must send JOIN way before that.

And so far cases are 4

donor sends full SST
donor sends bypass SST
donor sends no SST (empty SST request)
some sort of error while processing STR

In the first two cases, sst_sent() will be called and send JOIN

In the last two cases JOIN must be sent from process_state_req()

IST code should not be involved in this in any way.

On 2014-10-09 04:37, yan.zhang wrote:

ok. seems I'm wrong. The reason why calls gcs_.join in IST is
because of following case:

We are gonna do IST and SST is not empty, so we expect gcs_.join
is called in sst_sent to join back to cluster. If some exception
happens when we are transferring trxs, I worry that sst_sent
won't be called. So gcs_.join won't be called either, and donor
node can not be changed back to 'SYNCED' state.

But seems I'm wrong. If we are gonna do IST and SST is not empty,
sst_donate_cb_ is called with by_pass == true. And
actually if by_pass == true, sst_sent will be called
definitely. So node will be chnged back to 'SYNCED' state definitely,
and will be not affected the case that some exception happens when
transferring trxs.

Reply to this email directly or view it on GitHub:
#106 (comment)

dirtysalt · 2014-10-09T06:52:15Z

OK. I was too focusing on details. But you are right. Thanks for explanation.

Refs codership/galera-features#101 - make sure service thread is flushed

ayurchen added the bug label Sep 3, 2014

ayurchen added this to the 3.7 milestone Sep 3, 2014

ayurchen assigned dirtysalt Sep 3, 2014

dirtysalt added the 2 - Working label Sep 15, 2014

dirtysalt mentioned this issue Sep 15, 2014

sst sent called when not SST donor, state SYNCED #82

Closed

dirtysalt added a commit that referenced this issue Sep 15, 2014

Refs #106: remove ist_sst_ and unnecessary gcs_.join

88a0928

dirtysalt added a commit that referenced this issue Sep 17, 2014

Refs #106: remove if condition. call gcs_.join anyway when sst is done

4a26f94

dirtysalt added a commit that referenced this issue Sep 17, 2014

Refs #106: add join flags to ist sender which means join back to clus…

70c690c

…ter when it exits. it is introduced because sometimes sst_sent may not be called.

dirtysalt added a commit that referenced this issue Oct 1, 2014

Refs #106: call gcs_join right after IST. otherwise node could serve …

cdd8255

…IST or SST for only one node, which is not we expect

dirtysalt added a commit that referenced this issue Oct 1, 2014

Refs #106: remove duplicate condition

34e21a5

temeo closed this as completed Oct 2, 2014

dirtysalt reopened this Oct 2, 2014

dirtysalt added a commit that referenced this issue Oct 4, 2014

Refs #106: call gcs_join in IST thread when seqno is negative

b83db28

dirtysalt added a commit that referenced this issue Oct 9, 2014

Refs #106: remove redundant gcs_join in IST

d1726d2

dirtysalt mentioned this issue Oct 15, 2014

gcs protocol violation with JOIN and SYNC messages #101

Open

dirtysalt closed this as completed Oct 25, 2014

jayjanssen unassigned dirtysalt Dec 10, 2014

janlindstrom pushed a commit to MariaDB/galera that referenced this issue Oct 13, 2018

Merge pull request codership#106 from codership/gh101

fb4f24d

Refs codership/galera-features#101 - make sure service thread is flushed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible issues with IST donors going back to Synced too soon #106

Possible issues with IST donors going back to Synced too soon #106

jayjanssen commented Sep 3, 2014

ayurchen commented Sep 3, 2014

dirtysalt commented Sep 5, 2014

dirtysalt commented Sep 5, 2014

temeo commented Sep 5, 2014

ayurchen commented Sep 5, 2014

dirtysalt commented Sep 7, 2014

temeo commented Oct 2, 2014

dirtysalt commented Oct 4, 2014

ayurchen commented Oct 6, 2014

dirtysalt commented Oct 7, 2014

philip-galera commented Oct 7, 2014

ayurchen commented Oct 7, 2014

dirtysalt commented Oct 8, 2014

dirtysalt commented Oct 8, 2014

ayurchen commented Oct 8, 2014

dirtysalt commented Oct 9, 2014

ayurchen commented Oct 9, 2014

dirtysalt commented Oct 9, 2014

Possible issues with IST donors going back to Synced too soon #106

Possible issues with IST donors going back to Synced too soon #106

Comments

jayjanssen commented Sep 3, 2014

ayurchen commented Sep 3, 2014

dirtysalt commented Sep 5, 2014

dirtysalt commented Sep 5, 2014

temeo commented Sep 5, 2014

ayurchen commented Sep 5, 2014

dirtysalt commented Sep 7, 2014

temeo commented Oct 2, 2014

dirtysalt commented Oct 4, 2014

ayurchen commented Oct 6, 2014

dirtysalt commented Oct 7, 2014

philip-galera commented Oct 7, 2014

ayurchen commented Oct 7, 2014

dirtysalt commented Oct 8, 2014

dirtysalt commented Oct 8, 2014

ayurchen commented Oct 8, 2014

dirtysalt commented Oct 9, 2014

ayurchen commented Oct 9, 2014

dirtysalt commented Oct 9, 2014