-
Notifications
You must be signed in to change notification settings - Fork 597
Introduce stateful restorer in stmgr. #2091
Conversation
all checkpoint save/restore as well as exactly once messages
CHECK(stateful_restorer_); | ||
|
||
// Start the restore process | ||
stateful_restorer_->StartRestore(_checkpoint_id, _restore_txid, pplan_); | ||
} | ||
|
||
// Called by TmasterClient when it receives directive from tmaster | ||
// to restore the topology to _checkpoint_id checkpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment here is duplicated with the one in the above method. Could you update it to correctly describe the actual behavior of this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the comment
@@ -587,6 +610,11 @@ const proto::system::PhysicalPlan* StMgr::GetPhysicalPlan() const { return pplan | |||
|
|||
void StMgr::HandleStreamManagerData(const sp_string&, | |||
proto::stmgr::TupleStreamMessage2* _message) { | |||
if (stateful_restorer_ && stateful_restorer_->InProgress()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have some metrics for the amount of data being dropped for both stmgr data and instance data for future tracking and investigation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
// in case we are not in 2pc | ||
if (stateful_restorer_) { | ||
if (!stateful_restorer_->InProgress() && tmaster_client_) { | ||
LOG(INFO) << "We lost connection withi stmgr " << _stmgr_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/withi/with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
if (stateful_restorer_->InProgress()) { | ||
// We are in the middle of a restore | ||
stateful_restorer_->HandleAllInstancesConnected(); | ||
} else if (tmaster_client_ && tmaster_client_->IsConnected()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you elaborate why in this case the ResetTopologyState
message needs to be sent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the description
* Introduce stateful restorer in stmgr. This allows stmgr to handle all checkpoint save/restore as well as exactly once messages * Added metrics to keep track of bytes/tuples discarded during restore
This allows stmgr to handle all checkpoint save/restore as well as exactly once messages.