Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This should never happen ... data_receiver.pony line 204 #3053

Closed
slfritchie opened this issue Nov 8, 2019 · 4 comments
Closed

This should never happen ... data_receiver.pony line 204 #3053

slfritchie opened this issue Nov 8, 2019 · 4 comments

Comments

@slfritchie
Copy link
Contributor

Is this a bug, feature request, or feedback?

Bug

What is the current behavior?

Intermittent failure during crash & restart of both workers + connector sink in a 2-worker cluster:

1573183454.396979,Process Barrier CheckpointRollbackBarrierToken(Rollback 35, Checkpoint 41) at Step 118220738956116809284192365394298960305 from 153334998190390531865055671443648537309
1573183454.396995,DataReceiver: forward_barrier to step (or step group) 62968662664975620516753261006049468147 from 162316623258918752529117274718862584432 -> seq id 1, last_seen: 0
1573183454.397002,Rcvd pipeline msg at DataReceiver
1573183454.397008,NONE,none,seq_id 450 _last_id_seen 1
This should never happen: failure in /home/vagrant/wallaroo/lib/wallaroo/core/data_receiver/data_receiver.pony at line 204

The 2nd-to-last log message comes from this small patch at https://gist.github.com/slfritchie/652d3765e0fb4cd6ad59b55899014f81. This patch also affects the line number reported by the Fail() statement.

What is the expected behavior?

No crash

What OS and version of Wallaroo are you using?

Ubuntu 16.04/Xenial + master branch at commit 4323b71 + connector-sink-fixes branch/PR #3042

Steps to reproduce?

$ export WALLAROO_TOP=/your/path/to/top/of/wallaroo/repo
$ cd $WALLAROO_TOP
$ dd if=$HOME/wallaroo/testing/data/market_spread/nbbo/r3k-symbols_nbbo-fixish.msg bs=1000000 count=1 | od -x | sed 's/^/T/' | sed -n
'1,/T3641060/p' | perl -ne 'print "\0\0\0"; print "1"; print' > /tmp/input-file.txt
$ cd testing/correctness/scripts/effectively-once
$ export WALLAROO_THRESHOLDS='*.8'
$ export WALLAROO_BIN=$WALLAROO_TOP/examples/pony/passthrough/passthrough
$ . ./sample-env-vars.sh
$ make -C ../../../..     PONYCFLAGS="--verbose=1 -d -Dresilience -Dtrace -Dcheckpoint_trace -Didentify_routing_ids"
build-examples-pony-passthrough build-testing-tools-external_sender     build-utils-cluster_shrinker build-utils-data_receiver
build-testing-tools-fixed_length_message_blaster
$ ./master-crasher.sh 2 no-sanity crash0 crash1 crash-sink
@slfritchie
Copy link
Contributor Author

@jtfmumm
Copy link
Contributor

jtfmumm commented Nov 8, 2019 via email

@slfritchie
Copy link
Contributor Author

The gist mentioned in the description text is the small patch that disrupts the line numbers.

diff --git a/lib/wallaroo/core/boundary/boundary.pony b/lib/wallaroo/core/boundary/boundary.pony
index 6c947cbd3..46778acfd 100644
--- a/lib/wallaroo/core/boundary/boundary.pony
+++ b/lib/wallaroo/core/boundary/boundary.pony
@@ -352,6 +352,9 @@ actor OutgoingBoundary is (Consumer & TCPActor)
 
   fun ref receive_ack(acked_seq_id: SeqId) =>
     ifdef debug then
+      if not (acked_seq_id > _lowest_queue_id) then
+        @printf[I32]("BUMMER: receive_ack: acked_seq_id %s _lowest_queue_id %s\n".cstring(), acked_seq_id.string().cstring(), _lowest_queue_id.string().cstring()) /////// SLF TODO NEW BUG, make new GH ticket?
+      end
       Invariant(acked_seq_id > _lowest_queue_id)
     end

@slfritchie
Copy link
Contributor Author

I've messed up the description of this problem, sorry, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants