Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise alertmanager config loading #3898

Merged

Conversation

pracucci
Copy link
Contributor

@pracucci pracucci commented Mar 3, 2021

What this PR does:
There are two bottlenecks in the alertmanager configs loading from the object storage:

  1. Configs are loaded sequentially
  2. When alertmanager sharding is enabled, we load configs for all users, regardless they're owned by the alertmanager replica

This PR addresses both issues. To do it I had to refactor the AlertStore interface to split between the listing of users and loading of configs for the owned users:

  1. Removed ListAlertConfigs
  2. Added ListAllUsers and GetAlertConfigs (reason why I've added GetAlertConfigs instead of calling GetAlertConfig for every user is because with GetAlertConfigs the configdb implementation is optimised)
  3. Added concurrent fetching to GetAlertConfigs in the object store client implementations

Which issue(s) this PR fixes:
N/A

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
…tenantAlertmanager

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@@ -125,7 +123,7 @@ template_files:
}

am := &MultitenantAlertmanager{
store: noopAlertStore{},
store: prepareFilesystemAlertStore(t),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use an empty real storage than a mocked one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be simpler to use bucketclient.NewBucketAlertStore on in-memory bucket? No need to cleanup anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently don't allow to configure the in-memory bucket via the bucket.Config. Instead of doing changes to use it, I just used the filesystem-based one which is few lines more of code because of the temporarily directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to use bucket.Config, do we?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func prepareFilesystemAlertStore(t *testing.T) alertstore.AlertStore {
	obj := objstore.NewInMemBucket()
	return bucketclient.NewBucketAlertStore(obj, nil, log.NewNopLogger())
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, done. I was smoking some good weed when I first replied you.


alertmanagers, err := am.ring.Get(ringHasher.Sum32(), RingOp, nil, nil, nil)
alertmanagers, err := am.ring.Get(shardByUser(userID), RingOp, nil, nil, nil)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have shardByUser(): let's use it.

ringHasher := fnv.New32a()
// Hasher never returns err.
_, _ = ringHasher.Write([]byte(userID))
func (am *MultitenantAlertmanager) isUserOwned(userID string) bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's easier to use if it can't return error and the error case is handled internally.

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good, but I have doubts about using concurrency.ForEachUser in its current form.

pkg/alertmanager/alertstore/store.go Outdated Show resolved Hide resolved
@@ -125,7 +123,7 @@ template_files:
}

am := &MultitenantAlertmanager{
store: noopAlertStore{},
store: prepareFilesystemAlertStore(t),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be simpler to use bucketclient.NewBucketAlertStore on in-memory bucket? No need to cleanup anything.

cfgs = make(map[string]alertspb.AlertConfigDesc, len(userIDs))
)

err := concurrency.ForEachUser(ctx, userIDs, fetchConcurrency, func(ctx context.Context, userID string) error {
Copy link
Contributor

@pstibrany pstibrany Mar 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concurrency.ForEachUser continues fetching configs even if some user returns error. But this method returns nil in case of errors. There seems to be a mismatch.

I don't think using approach based on errgroup would be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I've created a generic utility concurrency.ForEach() which works with []interface{} and used it. What do you think?

pkg/alertmanager/alertstore/bucketclient/bucket_client.go Outdated Show resolved Hide resolved
cfgs = make(map[string]alertspb.AlertConfigDesc, len(userIDs))
)

err := concurrency.ForEachUser(ctx, userIDs, fetchConcurrency, func(ctx context.Context, userID string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about using concurrency.ForEachUser vs returning nil applies here.

CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Contributor

@gotjosh gotjosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! All my comments are nits and completely optional.

pkg/alertmanager/multitenant.go Outdated Show resolved Hide resolved
pkg/alertmanager/multitenant.go Show resolved Hide resolved
pkg/alertmanager/alertstore/store_test.go Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
… used it

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my feedback. Nice work.

@pracucci pracucci merged commit 2b94533 into cortexproject:master Mar 4, 2021
@pracucci pracucci deleted the optimise-alertmanager-config-loading branch March 4, 2021 08:28
roystchiang pushed a commit to roystchiang/cortex that referenced this pull request Apr 6, 2022
* Added ListAllUsers() to AlertStore

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added local store unit tests

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added GetAlertConfigs to AlertStore

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Replace ListAlertConfigs with ListAllUsers + GetAlertConfigs in MultitenantAlertmanager

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Removed unused ListAlertConfigs

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Replace noopAlertStore with a the filesystem-based storage

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Concurrently load alertmanager configs from object storage

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added CHANGELOG entry

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed PR number in CHANGELOG entry

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed linter

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Addressed nits in reviews

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Improved unit tests

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Created concurrency.ForEach() utility which breaks on first error and used it

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Simplify alert store used in unit tests

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Simplified ForEachUser() and ForEach() utilities

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants