Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics Endpoint dose not work with Istio Sidecar #2720

Closed
domechn opened this issue Nov 30, 2020 · 11 comments
Closed

Metrics Endpoint dose not work with Istio Sidecar #2720

domechn opened this issue Nov 30, 2020 · 11 comments
Labels

Comments

@domechn
Copy link

domechn commented Nov 30, 2020

Describe the bug

As described in the doc

Note: we do not use prometheus.io/port annotation in this configuration.

But Istio Sidecar will overwrite the annotations which related to Prometheus and the metrics for the business service will be collected internally at Sidecar.

Because of the lack of prometheus/port, Sidecar defaults to http://127.0.0.1:80/prometheus to collect metrics.

Environment

  • Cloud Provider:
AlibabaCloud ACK
  • Kubernetes Cluster Version [Output of kubectl version]
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.9-aliyun.1", GitCommit:"4f7ea78", GitTreeState:"", BuildDate:"2020-05-08T07:29:59Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
  • Deployed Seldon System Images: [Output of kubectl get --namespace seldon-system deploy seldon-controller-manager -o yaml | grep seldonio]

Alternatively run echo "#### Kubernetes version:\n $(kubectl version) \n\n#### Seldon Images:\n$(kubectl get --namespace seldon-system deploy seldon-controller-manager -o yaml | grep seldonio)"

 value: docker.io/seldonio/engine:1.5.0
 value: docker.io/seldonio/seldon-core-executor:1.5.0
 image: docker.io/seldonio/seldon-core-operator:1.5.0
  • Istio version
1.8.0

Sidecar Log

2020-11-30T13:38:37.446855Z	error	failed scraping application metrics: error scraping http://localhost:80/prometheus: Get "http://localhost:80/prometheus": dial tcp 127.0.0.1:80: connect: connection refused
2020-11-30T13:38:52.446774Z	error	failed scraping application metrics: error scraping http://localhost:80/prometheus: Get "http://localhost:80/prometheus": dial tcp 127.0.0.1:80: connect: connection refused
2020-11-30T13:39:07.446908Z	error	failed scraping application metrics: error scraping http://localhost:80/prometheus: Get "http://localhost:80/prometheus": dial tcp 127.0.0.1:80: connect: connection refused
@domechn domechn added bug triage Needs to be triaged and prioritised accordingly labels Nov 30, 2020
@ryandawsonuk
Copy link
Contributor

So the problem is that we're requiring the scrape path to be /prometheus and the istio sidecar doesn't use that.

We normally suggest to disable istio sidecars by using

spec:
  predictors:
  - annotations:
      sidecar.istio.io/inject: "false"

Or do you want to be able to use the sidecar? Is it only the sidecar that fails to scrape? Does the scraping work for the main model?

@domechn
Copy link
Author

domechn commented Dec 3, 2020

Or do you want to be able to use the sidecar?

Yes, I hope to use sidecar for traffic management and other tasks

Is it only the sidecar that fails to scrape? Does the scraping work for the main model?

I can’t guarantee, but when I ran the example provided by seldon, I found no other problems

@ukclivecox
Copy link
Contributor

So we need to update the sidecar to allow the correct Prometheus scrape path? Is it an http/https issue?

@ukclivecox ukclivecox added awaiting-feedback and removed triage Needs to be triaged and prioritised accordingly labels Dec 3, 2020
@domechn
Copy link
Author

domechn commented Dec 3, 2020

I think it is an issue of Operator

Maybe just need to set the Annotation prometheus.io/port when it creates the Deployment

https://github.com/SeldonIO/seldon-core/blob/master/operator/controllers/seldondeployment_engine.go#L496

@ericandrewmeadows
Copy link
Contributor

ericandrewmeadows commented Dec 7, 2020

@domgoer and @cliveseldon - I can confirm that @domgoer's suggestion is correct.

Previously, we had the following error:

2020-12-07T17:28:21.670730Z	error	failed scraping application metrics: error scraping http://localhost:80/prometheus: Get "http://localhost:80/prometheus": dial tcp 127.0.0.1:80: connect: connection refused

Now we have zero logs of this value. Istio automatically merges all metrics on its own, so this will resolve this issue.

i.e., you only need to uncomment the "prometheus.io/port" line, and it's solved.

@ukclivecox
Copy link
Contributor

Closing. Please reopen if still an issue.

@szczeles
Copy link
Contributor

It looks this issue it not solved yet.

For anyone else struggling with it, it looks that setting the two following envs in the operator's pod helps:

EXECUTOR_SERVER_PORT: '80'
ENGINE_SERVER_PORT: '80'

The first causes the executor to listen on port 80, so the metrics are available at the path Istio expects (without need for the annotation - that is the main issue here). The second env points Istio's VirtualServer into right port.

This doesn't solve the issue completely - as the annotation would do - but at least allows the metrics to be scraped without recompilation.

If you're using helm chart, these are the overrides you need to set: --set executor.user=0 --set executor.port=80 --set engine.port=80.

@Shahard2
Copy link

It looks this issue it not solved yet.

For anyone else struggling with it, it looks that setting the two following envs in the operator's pod helps:

EXECUTOR_SERVER_PORT: '80'
ENGINE_SERVER_PORT: '80'

The first causes the executor to listen on port 80, so the metrics are available at the path Istio expects (without need for the annotation - that is the main issue here). The second env points Istio's VirtualServer into right port.

This doesn't solve the issue completely - as the annotation would do - but at least allows the metrics to be scraped without recompilation.

If you're using helm chart, these are the overrides you need to set: --set executor.user=0 --set executor.port=80 --set engine.port=80.

Thanks mate.
And if I installed with Istioctl?

istioctl manifest apply --set meshConfig.enablePrometheusMerge=true --set executor.user=0 --set executor.port=80 --set engine.port=80

didn't work..

@szczeles
Copy link
Contributor

@Shahard2 I don't know for sure, but I think istioctl is used to deploy istio itself. These three params should be passed to seldon setup (no need to touch istio) via helm install/helm upgrade

@spi-x-i
Copy link

spi-x-i commented Mar 30, 2022

The issue looks like is not yet solved. @Shahard2 solution worked albeit it implies a few issues with Istio overall health, in my case at least. What it really solved was adding the annotation to my SeldonDeployment:

...
spec:
  name: my-model
  ....
  predictors:
  - annotations:
      prometheus.io/port: '8000'

This doesn't affect the overall operator cooperation with Istio.

I work with seldon-core:1.13.1.

So I believe this solution would be the recommended one.

@cliveseldon might re-opening the issue be beneficial in your opinion?

@ukclivecox ukclivecox reopened this Mar 30, 2022
@agrski
Copy link
Contributor

agrski commented Mar 30, 2022

Another alternative is to not use metrics merging - this allows the use of Istio sidecars, but also allows all metrics to be exposed as if the sidecar weren't there:

prometheus.istio.io/merge-metrics: "false"

The main issue is that Istio's metrics merging assumes a single application endpoint to be exposed for metrics, because that's what the Prometheus annotations approach expects. For pods with multiple containers producing metrics (e.g. a Seldon executor and a Seldon microservice or MLServer), this assumption breaks.

Metrics merging docs for reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants