-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change alertmanager-svc-headless from http to grpc port + expose 9094 TCP and UDP #435
Conversation
Signed-off-by: AlexandreRoux <alexandreroux42@protonmail.com>
Signed-off-by: AlexandreRoux <alexandreroux42@protonmail.com>
Signed-off-by: AlexandreRoux <alexandreroux42@protonmail.com>
Hello, it would be great if we could add some logic for populating the peers list. I think it would look somthing like this(not tested):
This looks to be the way that the prometheus-community alertmanager is handling peers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure how this is supposed to be configured correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about the following? @AlexandreRoux
CC: @kd7lxl
Is seeding the cluster peers necessary when we have memberlist enabled? |
- name: grpc | ||
containerPort: {{ .Values.config.server.grpc_listen_port }} | ||
protocol: TCP | ||
- containerPort: 9094 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm conflicted on this. The magic number 9094
bothers me. It would be nice to be able to support the configuration and only default to 9094
, but that's awkward because the port number is embedded at the end of the listen string. Something like this (but this is kind of ugly):
- containerPort: 9094 | |
- containerPort: {{ (.Values.config.alertmanager.cluster.listen_address | default "0.0.0.0:9094" | split ":").1 }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about introducing an extra variable with a docstring in values.yaml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't manage to figure out a way rather than the above suggestion of @kd7lxl to make 9094 configurable since it will always come from config.alertmanager.cluster.listen_address
and the need to split the port at the end.
I will push that for now (to progress on the PL) but suggestion are welcome.
I believe gossip memberlist and gossip cluster are two separate things here, please correct me if I am wrong but I asked in slack channel about that specific question : |
Pretty sure you are correct with this assessment. That's probably also why it's giving us so much grief. |
I redeployed my alertmanagers by removing
so we indeed need to seed peers (too many "ee" in that phrase) and the two available options are :
to reply to @dpericaxon comment
Isn't this a templating values.yaml problem rather than a pod/args templating problem since we are passing
|
Signed-off-by: AlexandreRoux <alexandreroux42@protonmail.com>
Where are we with this? |
Signed-off-by: Niclas Schad <niclas.schad@gmail.com>
I've consolidated the proposed changes here #460 Feel free to try it out and let me know if that works. |
Signed-off-by: AlexandreRoux <alexandreroux42@protonmail.com>
@nschad / @kd7lxl - I thought to be mostly done here but I noticed a very strange behavior : each replica of my alertmanager cluster is debugging about not finding old replica in cluster.go :
but my current peers are at follow :
I don't know how to explain that 😕 (yet), I am also very confused on that port 9094 that should be only cluster related and not memberlist. I may post that on Slack let's see. |
Might be normal that the ring has old instances in its ring temporarily. Do these logs disappear after a few minutes? |
In my case it seems to last abnormally long. I am curious to know if someone else have the same behavior. |
Signed-off-by: AlexandreRoux <alexandreroux42@protonmail.com>
… disable Signed-off-by: AlexandreRoux <alexandreroux42@protonmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a breaking change. I would rather not remove it. Also the configuration below then does make little sense.
The {{- if .Values.alertmanager.statefulSet.enabled -}}
would become obsolete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also currently if you set statefulSet
to false (the default) and deploy cortex then you only get the non-headless service of alertmanager. Makes no sense.
Suggestion:
Keep the alertmanager deployment (maybe clean that up too if necessary) but don't enable support for clustering out-of-the-box. Let's keep it as it is.
Signed-off-by: AlexandreRoux <alexandreroux42@protonmail.com>
Signed-off-by: AlexandreRoux <alexandreroux42@protonmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run helm-docs
in the repo root to update the README.md?
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions. |
superseded by #494 |
What this PR does:
This PR correct the alertmanager-headless service with exposing grpc port instead of http.
This PR also expose port 9094 TCP and UDP for gossip cluster in alertmanager statefulset.
Which issue(s) this PR fixes:
This PR does not fix an issue BUT is related to conversation in #420
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]