Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve feedback when a partition is starting #9963

Closed
npepinpe opened this issue Aug 3, 2022 · 3 comments · Fixed by #10707
Closed

Improve feedback when a partition is starting #9963

npepinpe opened this issue Aug 3, 2022 · 3 comments · Fixed by #10707
Assignees
Labels
kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. version:8.2.0-alpha1 Marks an issue as being completely or in parts released in 8.2.0-alpha1 version:8.2.0 Marks an issue as being completely or in parts released in 8.2.0

Comments

@npepinpe
Copy link
Member

npepinpe commented Aug 3, 2022

Description

If a follower is far behind a leader, starting the Raft server may take a long time. Right now, the startup will simply say "Raft server is waiting to be READY", and then no other feedback is provided.

We should provide more feedback here to let users know something is happening, or at least point them where they can find out what is happening (e.g. the metrics dashboard).

Solution to be discussed in a kickoff.

@npepinpe npepinpe added the kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. label Aug 3, 2022
@menski
Copy link
Contributor

menski commented Aug 5, 2022

Next step: discuss possible solution, afterwards reevaluate the priority and effort

@deepthidevaki deepthidevaki self-assigned this Oct 12, 2022
@deepthidevaki
Copy link
Contributor

I tried the following. Until the server is ready, every 30 seconds it logs the current commit index. Following is an example:

2022-10-12 17:43:02.248 CEST
zeebe
RaftServer{raft-partition-partition-3} - Setting firstCommitIndex to 5112060. RaftServer is ready only after it has committed events upto this index
2022-10-12 17:43:02.250 CEST
zeebe
RaftServer{raft-partition-partition-3} - Commit index is 4897865. RaftServer is ready only after it has committed events up to index 5112060
2022-10-12 17:43:32.250 CEST
zeebe
RaftServer{raft-partition-partition-3} - Commit index is 4923669. RaftServer is ready only after it has committed events up to index 5112060
2022-10-12 17:44:02.254 CEST
zeebe
RaftServer{raft-partition-partition-3} - Commit index is 4949424. RaftServer is ready only after it has committed events up to index 5112060
2022-10-12 17:44:32.255 CEST
zeebe
RaftServer{raft-partition-partition-3} - Commit index is 4977335. RaftServer is ready only after it has committed events up to index 5112060
2022-10-12 17:45:02.288 CEST
zeebe
RaftServer{raft-partition-partition-3} - Commit index is 5005669. RaftServer is ready only after it has committed events up to index 5112060
2022-10-12 17:45:32.291 CEST
zeebe
RaftServer{raft-partition-partition-3} - Commit index is 5034895. RaftServer is ready only after it has committed events up to index 5112060
2022-10-12 17:46:02.292 CEST
zeebe
RaftServer{raft-partition-partition-3} - Commit index is 5065788. RaftServer is ready only after it has committed events up to index 5112060
2022-10-12 17:46:32.292 CEST
zeebe
RaftServer{raft-partition-partition-3} - Commit index is 5096333. RaftServer is ready only after it has committed events up to index 5112060
2022-10-12 17:46:47.973 CEST
zeebe
RaftServer{raft-partition-partition-3} - Commit index is 5112062. RaftServer is ready
2022-10-12 17:46:47.974 CEST
zeebe
RaftPartitionServer{raft-partition-partition-3} - Successfully started server for partition PartitionId{id=3, group=raft-partition} in 228674ms

It is logged only when commit index is update. If no new events are committed, then nothing will be logged. In that case, we can also manually increase the log level to debug and see why is stuck.

Is this enough? Do you have other ideas?

@npepinpe
Copy link
Member Author

No, I think it's a good idea :) I can also imagine making use of this technique in other places 👍 (either throttling by time or number of calls 🤷)

@korthout korthout added the version:8.2.0-alpha1 Marks an issue as being completely or in parts released in 8.2.0-alpha1 label Nov 1, 2022
@npepinpe npepinpe added the version:8.2.0 Marks an issue as being completely or in parts released in 8.2.0 label Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. version:8.2.0-alpha1 Marks an issue as being completely or in parts released in 8.2.0-alpha1 version:8.2.0 Marks an issue as being completely or in parts released in 8.2.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants