Grafana Dashboard not updating the deployments #1854

Deunitato-sentient · 2020-05-20T05:59:27Z

I have encountered a problem whereby newly deployed deployments are not being captured by grafana dashboard for Seldon-core-analytics. I am able to view any old deployments I have made before creation of the grafana dashboard following the metrics helm example in the documentation.

Is this a bug or did I not do something correctly.

axsaucedo · 2020-05-20T06:05:08Z

What version of seldon-core-operator and seldon-core-analytics are you running? Master currently was updated and requires #1809 to match. Otherwise you need to make sure you are aligned with the compatibility table https://docs.seldon.io/projects/seldon-core/en/latest/reference/upgrading.html#wrapper-compatibility-table.

Once you share the versions of both libraries we'll have a better insight on what the issue is.

Deunitato-sentient · 2020-05-21T03:29:38Z

Currently, I am using Seldon-core 1.1.0 and Seldon core analytics also 1.1.0.
Also, I realised that I left the namespace of my deployments and ambassador under default, does this affect anything?
Prometheus however, does reflect the deployments thou.

Seldon-core-analytics is configured under seldon-system namespace and I did a portforward using the command kubectl port-forward svc/seldon-core-analytics-grafana 3000:80 -n seldon-system

ryandawsonuk · 2020-05-21T08:36:50Z

Could you try port-forwarding to prometheus instead of grafana and in prometheus run each of:

seldon_api_executor_client_requests_seconds_bucket{deployment_name=~".*",kubernetes_namespace=~"seldon",service="/predict",model_image=~".*",predictor_name=~".*",predictor_version=~".*",model_name=~".*",model_version=~".*"}

seldon_api_engine_client_requests_seconds_bucket{deployment_name=~".*",kubernetes_namespace=~"seldon",service="/predict",model_image=~".*",predictor_name=~".*",predictor_version=~".*",model_name=~".*",model_version=~".*"}

seldon_api_executor_client_requests_seconds_count{deployment_name=~".*",kubernetes_namespace=~".*",predictor_name=~".*"}

seldon_api_engine_client_requests_seconds_count{deployment_name=~"iris",kubernetes_namespace=~".*",predictor_name=~".*"}

That should help us determine the problem. I'm guessing you have left the executor enabled as that's the default now instead of the Java engine. The metrics did change when that change was made. But with the new analytics I'd expect the new models to be showing metrics and not the old ones.

Deunitato-sentient · 2020-05-21T09:30:09Z

I ran the following command to port forward to prom:
kubectl port-forward svc/seldon-core-analytics-prometheus-seldon 3001:80 -n seldon-system

Since my namespace is in default, i changed all the seldon into default in the following test commands.

Test Commands ran in Prometheus:

seldon_api_executor_client_requests_seconds_bucket{deployment_name=~".*",kubernetes_namespace=~"seldon",service="/predict",model_image=~".*",predictor_name=~".*",predictor_version=~".*",model_name=~".*",model_version=~".*"}

Results: Show all the old models but not the new models

seldon_api_engine_client_requests_seconds_bucket{deployment_name=~".*",kubernetes_namespace=~"seldon",service="/predict",model_image=~".*",predictor_name=~".*",predictor_version=~".*",model_name=~".*",model_version=~".*"}

Results: Shows nothing

seldon_api_executor_client_requests_seconds_count{deployment_name=~".*",kubernetes_namespace=~".*",predictor_name=~".*"}
Result: Show all old models but not the new one

seldon_api_engine_client_requests_seconds_count{deployment_name=~"iris",kubernetes_namespace=~".*",predictor_name=~".*"}

Result: Show nothing

Is there something that needs to be set up first or am I doing something wrong..

ryandawsonuk · 2020-05-21T11:12:22Z

Oh in that first query kubernetes_namespace=~~"seldon" should be kubernetes_namespace=~~".". But I guess you figured that out as you got results. You can experiment with replacing values with . and removing fields to broaden the query. Then you might find the new model data. Strange that the data for new models is missing.

What version of seldon core were the old models created with?

Is it an option for you to uninstall and reinstall seldon core analytics? Is that how you upgraded seldon core analytics?

Deunitato-sentient · 2020-05-21T12:09:52Z

It is a fresh installation of seldon-core-analytics, all the models created are using 1.1.0 and no upgrading was done

Just to clarify, the old and new models I meant by
old = Models deployed before before installing seldon-core-analytics
new = models deployed after installing seldon-core-analytics

Grafana will not show any new models being deployed and will not show any models that were re-deployed

ryandawsonuk · 2020-05-21T13:49:37Z

I guess all the pods are up and responding to requests. Would you be able to copy a working and non-working pod's manifest so we can compare?

ryandawsonuk · 2020-05-21T18:11:14Z

FWIW, it has been working for me so just thinking about possible differences - 1) I've been running latest master and 2) I've been installing by cloning the repo and doing a helm install from the local path rather than from the published/hosted version of the chart

ericandrewmeadows · 2020-05-21T18:29:53Z

I end up having this same problem, when using a router; all pods are functional in the graph, when I bash into a bod and run the following, the data is there:

from requests import get
get("http://localhost:6000/prometheus")

When running non-router models, nothing comes in at all. This is consistent across all 1.1.0 deployments, using the Helm charts (seldon-core-operator, and seldon-core-analytics from the 1.1.0 repo tag).

Deunitato-sentient · 2020-05-22T04:10:15Z

This is my deployment.yaml

Non-working:

apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: seldon-metrics
  labels:
    model_name: times2-plus2
    api_type: microservice
    microservice_type: ai
spec:
  annotations:
    seldon.io/executor: "true"
  name: times2-plus2
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - name: model
          image: <My image 2>
          imagePullPolicy: Always
        - name: transformer
          image: <My image 1>
          imagePullPolicy: Always
        imagePullSecrets: 
          - name: gcr-json-key
    graph:
      name: transformer
      type: TRANSFORMER
      endpoint:
        type: REST
      children: 
      - name: model
        type: MODEL
        endpoint:
          type: REST
    name: t1-t2
    replicas: 1

Working:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: seldon-model-combiner
  labels:
    model_name: times2-plus2
    api_type: microservice
    microservice_type: ai
spec:
  name: times2-plus2
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - name: combiner
          image: <Image-combiner>
          imagePullPolicy: IfNotPresent
        - name: subprocess-times2
          image: <Image 3>
          imagePullPolicy: Always
        - name: subprocess-plus2
          image: <Image 4>
          imagePullPolicy: Always
    graph:
        name: combiner
        type: COMBINER
        endpoint:
          type: REST
        children: 
        - name: subprocess-times2
          type: MODEL
          endpoint:
            type: REST
        - name: subprocess-plus2
          type: MODEL
          endpoint:
            type: REST
    name: times2-plus2-pod
    replicas: 1

I did download the seldon-core repo locally in my cloud before using helm install.
I only cloned the stable 1.1 version rather than the master

ryandawsonuk · 2020-05-28T08:38:36Z

@Deunitato-sentient Interesting. So the working and non-working ones are quite different. Are you sure the failure is related to the model being newly-deployed and not differences in the manifest? Have you been able to test that (e.g. by removing a working one and adding it in again)?

Deunitato-sentient · 2020-05-29T09:46:22Z

So I manage to see some new deployments in my grafana when I made a different deployment and used different container name.
However, if I redeploy with a different tag, it does not show up in grafana

May I ask if it's related to issue #618 and if there is any fixes for this bug?

Deunitato-sentient · 2020-05-29T09:48:31Z

@Deunitato-sentient Interesting. So the working and non-working ones are quite different. Are you sure the failure is related to the model being newly-deployed and not differences in the manifest? Have you been able to test that (e.g. by removing a working one and adding it in again)?

If I delete the deployment yaml file and reapply it, it does not show in the Grafana anymore

axsaucedo · 2020-07-15T13:55:33Z

@Deunitato-sentient could you test with the latest version of seldon core and with an updated wrapper? We have done some testing of grafana metrics and it all seems to work

ukclivecox · 2020-08-20T09:26:29Z

Please reopen if can be replicated on latest

Deunitato-sentient added the triage Needs to be triaged and prioritised accordingly label May 20, 2020

axsaucedo mentioned this issue May 20, 2020

configurable metrics port name for analytics #1809

Closed

ukclivecox removed the triage Needs to be triaged and prioritised accordingly label May 21, 2020

axsaucedo added this to To do in 1.3 via automation Jul 15, 2020

axsaucedo added this to the 1.3 milestone Jul 15, 2020

ukclivecox closed this as completed Aug 20, 2020

1.3 automation moved this from To do to Done Aug 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grafana Dashboard not updating the deployments #1854

Grafana Dashboard not updating the deployments #1854

Deunitato-sentient commented May 20, 2020

axsaucedo commented May 20, 2020

Deunitato-sentient commented May 21, 2020

ryandawsonuk commented May 21, 2020 •

edited

Loading

Deunitato-sentient commented May 21, 2020 •

edited

Loading

ryandawsonuk commented May 21, 2020

Deunitato-sentient commented May 21, 2020 •

edited

Loading

ryandawsonuk commented May 21, 2020

ryandawsonuk commented May 21, 2020

ericandrewmeadows commented May 21, 2020

Deunitato-sentient commented May 22, 2020 •

edited

Loading

ryandawsonuk commented May 28, 2020

Deunitato-sentient commented May 29, 2020 •

edited

Loading

Deunitato-sentient commented May 29, 2020 •

edited

Loading

axsaucedo commented Jul 15, 2020

ukclivecox commented Aug 20, 2020

Grafana Dashboard not updating the deployments #1854

Grafana Dashboard not updating the deployments #1854

Comments

Deunitato-sentient commented May 20, 2020

axsaucedo commented May 20, 2020

Deunitato-sentient commented May 21, 2020

ryandawsonuk commented May 21, 2020 • edited Loading

Deunitato-sentient commented May 21, 2020 • edited Loading

ryandawsonuk commented May 21, 2020

Deunitato-sentient commented May 21, 2020 • edited Loading

ryandawsonuk commented May 21, 2020

ryandawsonuk commented May 21, 2020

ericandrewmeadows commented May 21, 2020

Deunitato-sentient commented May 22, 2020 • edited Loading

ryandawsonuk commented May 28, 2020

Deunitato-sentient commented May 29, 2020 • edited Loading

Deunitato-sentient commented May 29, 2020 • edited Loading

axsaucedo commented Jul 15, 2020

ukclivecox commented Aug 20, 2020

ryandawsonuk commented May 21, 2020 •

edited

Loading

Deunitato-sentient commented May 21, 2020 •

edited

Loading

Deunitato-sentient commented May 21, 2020 •

edited

Loading

Deunitato-sentient commented May 22, 2020 •

edited

Loading

Deunitato-sentient commented May 29, 2020 •

edited

Loading

Deunitato-sentient commented May 29, 2020 •

edited

Loading