-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Respect network scope in prometheus plugin #58
Conversation
This commit adds a test case which tests prometheus-labeled ports in all three network modes: host, bridge and container. Add ports and labels to test case
fe5c045
to
808537d
Compare
The failing test on this PR is a flake. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✨
@@ -202,7 +202,7 @@ func (p *Prometheus) GetAllURLs() (map[string]URLAndAddress, error) { | |||
return allURLs, err | |||
} | |||
|
|||
for _, url := range getMesosTaskPrometheusURLs(tasks, p.mesosHostname) { | |||
for _, url := range getMesosTaskPrometheusURLs(tasks) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to call getTaskIP
above this line and pass in the IP, keeping this and the other function signatures the same? It would also only run the getTaskIP
logic once as opposed to in multiple places.
I also did a cursory search of p.mesosHostname
and it doesn't look like it's used anywhere anymore so maybe we can do a clean up and remove that field and getMesosHostname
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change this PR makes is that instead of using the hostname to form the metrics source URL, we now use the task's ip address. So we no longer need to pass the hostname in.
Each task can potentially have a different IP address. Since this method retrieves URLs from multiple tasks, passing it a single IP address won't work - it needs to use the IP address from each task.
WRT your second point, you're quite right - I had thought it was used in getMesosClient
, but it isn't. I'll remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I see, missed that loop the first time round :). It looks like both getEndpointsFromTaskPorts
and getEndpointFromTaskLabels
get passed the same task, so maybe getTaskIP
can just be called once on the task and passed into both of those functions to clean that up a bit and avoid the extra call - but it's not a big deal and I won't block on it 😸
The DC/OS integration tests have caught an issue! In bridge mode, we need either to use |
@@ -202,7 +202,7 @@ func (p *Prometheus) GetAllURLs() (map[string]URLAndAddress, error) { | |||
return allURLs, err | |||
} | |||
|
|||
for _, url := range getMesosTaskPrometheusURLs(tasks, p.mesosHostname) { | |||
for _, url := range getMesosTaskPrometheusURLs(tasks) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I see, missed that loop the first time round :). It looks like both getEndpointsFromTaskPorts
and getEndpointFromTaskLabels
get passed the same task, so maybe getTaskIP
can just be called once on the task and passed into both of those functions to clean that up a bit and avoid the extra call - but it's not a big deal and I won't block on it 😸
This reverts commit 1a80b57.
The correct hostname for a task is determined by the network-scope label on the port. If set to 'container', the hostname is the first IP address in the last container status. If set to 'host' or absent, the hostname is the same as the local node.
retest this please |
the |
I wonder if networks isolation (e.g. some iptables rules) in calico will not interfere here - i.e. it will not be possible to make a connection from host to the container that is attached to the SDN network. |
@vespian thanks for pointing this out. We can't readily work around it, so we'll document this as a limitation: https://jira.mesosphere.com/browse/DCOS-56350 |
This PR fixes an issue where metrics were pulled via the local agent IP address instead of the task's IP, which could prevent it working in container mode.
Instead of joining the port up to the local IP, we now use the ip_addresses field in container status to collect the container's declared ip address, which is present in all networking modes.
Unit tests: 7eb7ee4
Implementation: ce332c6
The final five commits update the existing tests to use a sample IP address instead of their original value.