New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise alertmanager config loading #3898
Optimise alertmanager config loading #3898
Conversation
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
…tenantAlertmanager Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
pkg/alertmanager/api_test.go
Outdated
@@ -125,7 +123,7 @@ template_files: | |||
} | |||
|
|||
am := &MultitenantAlertmanager{ | |||
store: noopAlertStore{}, | |||
store: prepareFilesystemAlertStore(t), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to use an empty real storage than a mocked one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be simpler to use bucketclient.NewBucketAlertStore
on in-memory bucket? No need to cleanup anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently don't allow to configure the in-memory bucket via the bucket.Config
. Instead of doing changes to use it, I just used the filesystem-based one which is few lines more of code because of the temporarily directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to use bucket.Config
, do we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func prepareFilesystemAlertStore(t *testing.T) alertstore.AlertStore {
obj := objstore.NewInMemBucket()
return bucketclient.NewBucketAlertStore(obj, nil, log.NewNopLogger())
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, done. I was smoking some good weed when I first replied you.
|
||
alertmanagers, err := am.ring.Get(ringHasher.Sum32(), RingOp, nil, nil, nil) | ||
alertmanagers, err := am.ring.Get(shardByUser(userID), RingOp, nil, nil, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have shardByUser()
: let's use it.
ringHasher := fnv.New32a() | ||
// Hasher never returns err. | ||
_, _ = ringHasher.Write([]byte(userID)) | ||
func (am *MultitenantAlertmanager) isUserOwned(userID string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's easier to use if it can't return error
and the error case is handled internally.
Signed-off-by: Marco Pracucci <marco@pracucci.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good, but I have doubts about using concurrency.ForEachUser
in its current form.
pkg/alertmanager/api_test.go
Outdated
@@ -125,7 +123,7 @@ template_files: | |||
} | |||
|
|||
am := &MultitenantAlertmanager{ | |||
store: noopAlertStore{}, | |||
store: prepareFilesystemAlertStore(t), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be simpler to use bucketclient.NewBucketAlertStore
on in-memory bucket? No need to cleanup anything.
cfgs = make(map[string]alertspb.AlertConfigDesc, len(userIDs)) | ||
) | ||
|
||
err := concurrency.ForEachUser(ctx, userIDs, fetchConcurrency, func(ctx context.Context, userID string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concurrency.ForEachUser
continues fetching configs even if some user returns error. But this method returns nil
in case of errors. There seems to be a mismatch.
I don't think using approach based on errgroup
would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I've created a generic utility concurrency.ForEach()
which works with []interface{}
and used it. What do you think?
cfgs = make(map[string]alertspb.AlertConfigDesc, len(userIDs)) | ||
) | ||
|
||
err := concurrency.ForEachUser(ctx, userIDs, fetchConcurrency, func(ctx context.Context, userID string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about using concurrency.ForEachUser
vs returning nil
applies here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! All my comments are nits and completely optional.
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
… used it Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my feedback. Nice work.
* Added ListAllUsers() to AlertStore Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added local store unit tests Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added GetAlertConfigs to AlertStore Signed-off-by: Marco Pracucci <marco@pracucci.com> * Replace ListAlertConfigs with ListAllUsers + GetAlertConfigs in MultitenantAlertmanager Signed-off-by: Marco Pracucci <marco@pracucci.com> * Removed unused ListAlertConfigs Signed-off-by: Marco Pracucci <marco@pracucci.com> * Replace noopAlertStore with a the filesystem-based storage Signed-off-by: Marco Pracucci <marco@pracucci.com> * Concurrently load alertmanager configs from object storage Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added CHANGELOG entry Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed PR number in CHANGELOG entry Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed linter Signed-off-by: Marco Pracucci <marco@pracucci.com> * Addressed nits in reviews Signed-off-by: Marco Pracucci <marco@pracucci.com> * Improved unit tests Signed-off-by: Marco Pracucci <marco@pracucci.com> * Created concurrency.ForEach() utility which breaks on first error and used it Signed-off-by: Marco Pracucci <marco@pracucci.com> * Simplify alert store used in unit tests Signed-off-by: Marco Pracucci <marco@pracucci.com> * Simplified ForEachUser() and ForEach() utilities Signed-off-by: Marco Pracucci <marco@pracucci.com>
What this PR does:
There are two bottlenecks in the alertmanager configs loading from the object storage:
This PR addresses both issues. To do it I had to refactor the
AlertStore
interface to split between the listing of users and loading of configs for the owned users:ListAlertConfigs
ListAllUsers
andGetAlertConfigs
(reason why I've addedGetAlertConfigs
instead of callingGetAlertConfig
for every user is because withGetAlertConfigs
the configdb implementation is optimised)GetAlertConfigs
in the object store client implementationsWhich issue(s) this PR fixes:
N/A
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]