Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flusher target to flush WAL #2075

Merged
merged 10 commits into from Mar 10, 2020
Merged

Conversation

codesome
Copy link
Contributor

@codesome codesome commented Feb 4, 2020

This is part of the WAL work where during scaling down if the /shutdown endpoint was not hit to flush the chunks, this flusher target would be used as a job to flush those chunks.

Supersedes #1747

I have already tested and according to the logs I can say that it is being flushed properly. (I was not able to get the metrics, investigating it)

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@codesome codesome mentioned this pull request Feb 4, 2020
@codesome codesome force-pushed the flusher-target branch 3 times, most recently from a86f05b to 88fca44 Compare February 4, 2020 13:07
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Copy link
Contributor

@gouthamve gouthamve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're exposing a bunch of stuff and changing a ton of things. Whats the downside of just exposing ingester.sweepUsers?

CHANGELOG.md Outdated Show resolved Hide resolved

// Sleeping to give a chance to Prometheus
// to collect the metrics.
time.Sleep(1 * time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a flag to control this duration already. Can we reuse it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sleep comes inside lifecycler. We won't initialize lifecycler here.

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
@codesome codesome force-pushed the flusher-target branch 6 times, most recently from 4b4c64f to 9d1f38d Compare February 14, 2020 10:13
@codesome
Copy link
Contributor Author

I have simplified this quite a bit to not reimplement existing stuff. I have tested it and seems to be working (from the logs). Flushing happens before starting the server (in the init phase), so, we are not able to capture any metrics. I am waiting for #2119 to get in so that I can start the flushing in parallel with starting the server.

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
@gouthamve
Copy link
Contributor

@codesome Can you rebase and make it a service now that #2166 is merged?

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
@codesome codesome force-pushed the flusher-target branch 4 times, most recently from 2abf660 to f94fd65 Compare March 5, 2020 17:34
@codesome
Copy link
Contributor Author

codesome commented Mar 5, 2020

@gouthamve rebased and now the metrics are visible with the async start from the new services.

@pstibrany I have made some changes in the services for the flusher target to work as a job.

Copy link
Contributor

@gouthamve gouthamve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, we still block during WAL replay and might be running the metrics endpoint. Can you double check?

pkg/util/services/basic_service.go Outdated Show resolved Hide resolved
pkg/flusher/flusher.go Outdated Show resolved Hide resolved
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
pkg/cortex/modules.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/flusher/flusher.go Outdated Show resolved Hide resolved
pkg/flusher/flusher.go Outdated Show resolved Hide resolved
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
@codesome
Copy link
Contributor Author

@pstibrany addressed all your comments. Also, I put the replay of WAL in the starting function to be in sync with #2222.

@pstibrany
Copy link
Contributor

Please rebase on master. Cortex now returns error from Run method in cortex.go if any service has failed, but it should ignore util.ErrStopCortex while doing so.

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for addressing my feedback. I've left some non-blocking comments.

pkg/cortex/modules.go Outdated Show resolved Hide resolved
pkg/flusher/flusher.go Outdated Show resolved Hide resolved
pkg/flusher/flusher.go Outdated Show resolved Hide resolved
pkg/flusher/flusher.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
@pstibrany pstibrany merged commit 4f5c545 into cortexproject:master Mar 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants