New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flusher target to flush WAL #2075
Conversation
a86f05b
to
88fca44
Compare
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
88fca44
to
750d935
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're exposing a bunch of stuff and changing a ton of things. Whats the downside of just exposing ingester.sweepUsers
?
pkg/flusher/flusher.go
Outdated
|
||
// Sleeping to give a chance to Prometheus | ||
// to collect the metrics. | ||
time.Sleep(1 * time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a flag to control this duration already. Can we reuse it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sleep comes inside lifecycler. We won't initialize lifecycler here.
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
4b4c64f
to
9d1f38d
Compare
I have simplified this quite a bit to not reimplement existing stuff. I have tested it and seems to be working (from the logs). Flushing happens before starting the server (in the init phase), so, we are not able to capture any metrics. I am waiting for #2119 to get in so that I can start the flushing in parallel with starting the server. |
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2abf660
to
f94fd65
Compare
@gouthamve rebased and now the metrics are visible with the async start from the new services. @pstibrany I have made some changes in the services for the flusher target to work as a job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand, we still block during WAL replay and might be running the metrics endpoint. Can you double check?
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
ced5ba1
to
b92df09
Compare
@pstibrany addressed all your comments. Also, I put the replay of WAL in the starting function to be in sync with #2222. |
Please rebase on master. Cortex now returns error from |
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for addressing my feedback. I've left some non-blocking comments.
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
This is part of the WAL work where during scaling down if the
/shutdown
endpoint was not hit to flush the chunks, this flusher target would be used as a job to flush those chunks.Supersedes #1747
I have already tested and according to the logs I can say that it is being flushed properly. (I was not able to get the metrics, investigating it)
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]