Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid call to checkpoint_complete on _CpRbOps state _CpRbWaitingForCheckpoint #3123

Open
slfritchie opened this issue Apr 20, 2020 · 0 comments

Comments

@slfritchie
Copy link
Contributor

slfritchie commented Apr 20, 2020

Is this a bug, feature request, or feedback?

Bug

What is the current behavior?

When Wallaroo is compiled with resilience=on and used with a connector sink, if Wallaroo is started before the sink is accepting connections, then Wallaroo will crash at its 2nd checkpoint.

1587415964.414633,DEBUG,ConnectorSink,ECTransition:: _ExtConnConnected -> _ExtConnTwoPCReady
1587415964.414637,DEBUG,2PC,_ExtConnTwoPCReady _advertise_status true
1587415964.414641,DEBUG,ConnectorSink,Send conn_ready to CpRb
1587415964.414646,DEBUG,ConnectorSink,_CpRbInit.conn_ready: acking CheckpointBarrierToken(2)
1587415964.414651,DEBUG,ConnectorSink,Send commit to barrier coordinator for CheckpointBarrierToken(2)
1587415964.414659,DEBUG,ConnectorSink,CpRbTransition:: _CpRbInit -> _CpRbWaitingForCheckpoint
1587415964.414663,DEBUG,ConnectorSink,open_valve
1587415964.414667,DEBUG,ConnectorSink,open_valve: already open
1587415964.414671,DEBUG,ConnectorSink,open_valve: size = 0
1587415964.414675,DEBUG,ConnectorSink,CpRbTransition:: _CpRbInit -> _CpRbWaitingForCheckpoint DONE
1587415964.414714,Checkpoint_Initiator: Checkpoint Barrier CheckpointBarrierToken(2) Complete
1587415964.414719,CheckpointInitiator: event_log_write_checkpoint_id()
1587415964.414793,EventLog: write_checkpoint_id 2!!
1587415964.414802,EventLog: Writing CheckpointId for CheckpointId 2
1587415964.414823,Transition to _WaitingForCheckpointInitiationEventLogPhase: checkpoint_id 3
1587415964.414840,_WaitingForEventLogIdWrittenPhase: event_log_id_written from initializer for 2
1587415964.415090,_WaitingForEventLogIdWrittenPhase: acked_workers: 1, workers: 1
1587415964.415100,CheckpointInitiator: Checkpoint 2 is complete!
1587415964.415104,Saving current CheckpointId 2 last complete CheckpointId 2 RollbackId 0
1587415964.415764,Fulfilling checkpoint promise
1587415964.415856,DEBUG,2PC,2PC: Checkpoint complete 2 _twopc.txn_id is 
1587415964.415864,CRITICAL,ConnectorSink,Invalid call to checkpoint_complete on _CpRbOps state _CpRbWaitingForCheckpoint
This should never happen: failure in /home/vagrant/wallaroo/lib/wallaroo/core/sink/connector_sink/connector_sink_checkpoint_rollback.pony at line 48

What is the expected behavior?

No crash

What OS and version of Wallaroo are you using?

Ubuntu 18.04/Bionic LTS + master branch @ commit 9b86161

Steps to reproduce?

cd testing/correctness/scripts/effectively-once
export WALLAROO_TOP=$HOME/wallaroo ; # Edit to fit your environment
export WALLAROO_BIN=$WALLAROO_TOP/examples/pony/passthrough/passthrough
export PATH=${PATH}:.
export PYTHONPATH=$WALLAROO_TOP/machida/lib
export WALLAROO_THRESHOLDS='*.8' # Turns on verbose logging @ debug level
make -C ../../../.. \
    PONYCFLAGS="--verbose=1 -d -Dresilience -Dtrace -Dcheckpoint_trace -Didentify_routing_ids" \
    build-examples-pony-passthrough build-testing-tools-external_sender \
    build-utils-cluster_shrinker build-utils-data_receiver \
    build-testing-tools-fixed_length_message_blaster

. ./sample-env-vars.sh
./reset.sh
./start-initializer.sh -n 1
sleep 3
$WALLAROO_TOP/testing/correctness/tests/aloc_sink/aloc_sink /tmp/sink-out.output /tmp/sink-out.abort 7200 > /tmp/sink-out.stdout-stderr 2>&1 &
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant