Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in elasticsearch index of metrics server #2971

Closed
fg91 opened this issue Feb 16, 2021 · 0 comments · Fixed by #2972
Closed

Bug in elasticsearch index of metrics server #2971

fg91 opened this issue Feb 16, 2021 · 0 comments · Fixed by #2972
Labels
bug triage Needs to be triaged and prioritised accordingly

Comments

@fg91
Copy link
Contributor

fg91 commented Feb 16, 2021

Describe the bug

Here, the elasticsearch_index appears to be constructed in the wrong way:

This:

# Currently only supports SELDON inference type (not kfserving)
                    elasticsearch_index = f"inference-log-{seldon_namespace}-seldon-{SELDON_DEPLOYMENT_ID}-{SELDON_PREDICTOR_ID}"

should rather be:

# Currently only supports SELDON inference type (not kfserving)
                    elasticsearch_index = f"inference-log-seldon-{seldon_namespace}-{SELDON_DEPLOYMENT_ID}-{SELDON_PREDICTOR_ID}"

Reason:
When deploying a model to the namespace production, the elasticsearch index is named as follows:
'inference-log-seldon-production-multiclass-model-default'

To reproduce

The easiest way to reproduce this bug is to follow this notebook from the seldon examples but instead of choosing the namespace seldon choose a different one so that the error can manifest:

kubectl create namespace seldon_debug || echo "namespace already created"
kubectl config set-context $(kubectl config current-context) --namespace=seldon_debug

Expected behaviour

When deploying a metrics server that get's triggered when feedback is sent, it is supposed to lookup the respective document in elasticsearch. This, however only works by chance if the namespace is seldon as then the bug described above does not manifest.

Environment

Kubernetes version:

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.14-gke.1600", GitCommit:"7c407f5cc8632f9af5a2657f220963aa7f1c46e7", GitTreeState:"clean", BuildDate:"2020-12-07T09:22:27Z", GoVersion:"go1.13.15b4", Compiler:"gc", Platform:"linux/amd64"}

Seldon Images:

      value: docker.io/seldonio/engine:1.6.0
      value: docker.io/seldonio/seldon-core-executor:1.6.0
    image: docker.io/seldonio/seldon-core-operator:1.6.0

Model Details

Logs of the metrics server:

[I 210216 16:33:11 __main__:111] Extra args: {}
[I 210216 16:33:11 server:122] Registering model:multiclassserver
[I 210216 16:33:11 server:113] Listening on port 8080
[I 210216 16:33:17 web:2243] 200 GET /v1/metrics (10.212.1.18) 3.39ms
[I 210216 16:33:32 web:2243] 200 GET /v1/metrics (10.212.1.18) 1.37ms
[I 210216 16:33:42 cm_model:103] PROCESSING Feedback Event.
[I 210216 16:33:42 cm_model:104] {'Host': 'seldon-multiclass-model-metrics.production.svc.cluster.local:80', 'User-Agent': 'Go-http-client/1.1', 'Content-Length': '156', 'Ce-Endpoint': 'default', 'Ce-Id': 'e3f56826-ad3a-4638-9059-0d550c6186e9', 'Ce-Inferenceservicename': 'multiclass-model', 'Ce-Knativearrivaltime': '2021-02-16T16:33:41.195465611Z', 'Ce-Modelid': 'classifier', 'Ce-Namespace': 'production', 'Ce-Requestid': 'abc71989-f625-4ff4-bc9b-6b02967968d0', 'Ce-Source': 'http://:8000/', 'Ce-Specversion': '1.0', 'Ce-Time': '2021-02-16T16:33:40.82106025Z', 'Ce-Traceparent': '00-11bf7dd2281de401dfdb00f41b9c6cc7-89240c365d3cefb2-00', 'Ce-Type': 'io.seldon.serving.feedback', 'Content-Type': 'application/json', 'Traceparent': '00-11bf7dd2281de401dfdb00f41b9c6cc7-b017a64a85547fde-00', 'Accept-Encoding': 'gzip'}
[I 210216 16:33:42 cm_model:105] ----
[W 210216 16:33:42 base:269] GET http://elasticsearch-master.seldon-logs.svc.cluster.local:9200/inference-log-production-seldon-multiclass-model-default/_doc/abc71989-f625-4ff4-bc9b-6b02967968d0 [status:404 request:0.331s]
[E 210216 16:33:42 web:1793] Uncaught exception POST / (10.212.1.24)
    HTTPServerRequest(protocol='http', host='seldon-multiclass-model-metrics.production.svc.cluster.local:80', method='POST', uri='/', version='HTTP/1.1', remote_ip='10.212.1.24')
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.7/site-packages/tornado/web.py", line 1702, in _execute
        result = method(*self.path_args, **self.path_kwargs)
      File "/microservice/adserver/server.py", line 242, in post
        response = self.model.process_event(request, headers)
      File "/microservice/adserver/cm_model.py", line 159, in process_event
        error, status_code=400, reason="METRICS_SERVER_ERROR"
    seldon_core.flask_utils.SeldonMicroserviceException
@fg91 fg91 added bug triage Needs to be triaged and prioritised accordingly labels Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug triage Needs to be triaged and prioritised accordingly
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant