Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grafana Dashboard not updating the deployments #1854

Closed
Deunitato-sentient opened this issue May 20, 2020 · 15 comments
Closed

Grafana Dashboard not updating the deployments #1854

Deunitato-sentient opened this issue May 20, 2020 · 15 comments
Projects
Milestone

Comments

@Deunitato-sentient
Copy link

I have encountered a problem whereby newly deployed deployments are not being captured by grafana dashboard for Seldon-core-analytics. I am able to view any old deployments I have made before creation of the grafana dashboard following the metrics helm example in the documentation.

Is this a bug or did I not do something correctly.

@Deunitato-sentient Deunitato-sentient added the triage Needs to be triaged and prioritised accordingly label May 20, 2020
@axsaucedo
Copy link
Contributor

What version of seldon-core-operator and seldon-core-analytics are you running? Master currently was updated and requires #1809 to match. Otherwise you need to make sure you are aligned with the compatibility table https://docs.seldon.io/projects/seldon-core/en/latest/reference/upgrading.html#wrapper-compatibility-table.

Once you share the versions of both libraries we'll have a better insight on what the issue is.

@Deunitato-sentient
Copy link
Author

Currently, I am using Seldon-core 1.1.0 and Seldon core analytics also 1.1.0.
Also, I realised that I left the namespace of my deployments and ambassador under default, does this affect anything?
Prometheus however, does reflect the deployments thou.

Seldon-core-analytics is configured under seldon-system namespace and I did a portforward using the command kubectl port-forward svc/seldon-core-analytics-grafana 3000:80 -n seldon-system

@ryandawsonuk
Copy link
Contributor

ryandawsonuk commented May 21, 2020

Could you try port-forwarding to prometheus instead of grafana and in prometheus run each of:

seldon_api_executor_client_requests_seconds_bucket{deployment_name=~".*",kubernetes_namespace=~"seldon",service="/predict",model_image=~".*",predictor_name=~".*",predictor_version=~".*",model_name=~".*",model_version=~".*"}

seldon_api_engine_client_requests_seconds_bucket{deployment_name=~".*",kubernetes_namespace=~"seldon",service="/predict",model_image=~".*",predictor_name=~".*",predictor_version=~".*",model_name=~".*",model_version=~".*"}

seldon_api_executor_client_requests_seconds_count{deployment_name=~".*",kubernetes_namespace=~".*",predictor_name=~".*"}

seldon_api_engine_client_requests_seconds_count{deployment_name=~"iris",kubernetes_namespace=~".*",predictor_name=~".*"}

That should help us determine the problem. I'm guessing you have left the executor enabled as that's the default now instead of the Java engine. The metrics did change when that change was made. But with the new analytics I'd expect the new models to be showing metrics and not the old ones.

@Deunitato-sentient
Copy link
Author

Deunitato-sentient commented May 21, 2020

I ran the following command to port forward to prom:
kubectl port-forward svc/seldon-core-analytics-prometheus-seldon 3001:80 -n seldon-system

Since my namespace is in default, i changed all the seldon into default in the following test commands.

Test Commands ran in Prometheus:

seldon_api_executor_client_requests_seconds_bucket{deployment_name=~".*",kubernetes_namespace=~"seldon",service="/predict",model_image=~".*",predictor_name=~".*",predictor_version=~".*",model_name=~".*",model_version=~".*"}

Results: Show all the old models but not the new models

seldon_api_engine_client_requests_seconds_bucket{deployment_name=~".*",kubernetes_namespace=~"seldon",service="/predict",model_image=~".*",predictor_name=~".*",predictor_version=~".*",model_name=~".*",model_version=~".*"}

Results: Shows nothing

seldon_api_executor_client_requests_seconds_count{deployment_name=~".*",kubernetes_namespace=~".*",predictor_name=~".*"}
Result: Show all old models but not the new one

seldon_api_engine_client_requests_seconds_count{deployment_name=~"iris",kubernetes_namespace=~".*",predictor_name=~".*"}

Result: Show nothing

Is there something that needs to be set up first or am I doing something wrong..

@ukclivecox ukclivecox removed the triage Needs to be triaged and prioritised accordingly label May 21, 2020
@ryandawsonuk
Copy link
Contributor

Oh in that first query kubernetes_namespace="seldon" should be kubernetes_namespace=".". But I guess you figured that out as you got results. You can experiment with replacing values with . and removing fields to broaden the query. Then you might find the new model data. Strange that the data for new models is missing.

What version of seldon core were the old models created with?

Is it an option for you to uninstall and reinstall seldon core analytics? Is that how you upgraded seldon core analytics?

@Deunitato-sentient
Copy link
Author

Deunitato-sentient commented May 21, 2020

It is a fresh installation of seldon-core-analytics, all the models created are using 1.1.0 and no upgrading was done

Just to clarify, the old and new models I meant by
old = Models deployed before before installing seldon-core-analytics
new = models deployed after installing seldon-core-analytics

Grafana will not show any new models being deployed and will not show any models that were re-deployed

@ryandawsonuk
Copy link
Contributor

I guess all the pods are up and responding to requests. Would you be able to copy a working and non-working pod's manifest so we can compare?

@ryandawsonuk
Copy link
Contributor

FWIW, it has been working for me so just thinking about possible differences - 1) I've been running latest master and 2) I've been installing by cloning the repo and doing a helm install from the local path rather than from the published/hosted version of the chart

@ericandrewmeadows
Copy link
Contributor

I end up having this same problem, when using a router; all pods are functional in the graph, when I bash into a bod and run the following, the data is there:

from requests import get
get("http://localhost:6000/prometheus")

When running non-router models, nothing comes in at all. This is consistent across all 1.1.0 deployments, using the Helm charts (seldon-core-operator, and seldon-core-analytics from the 1.1.0 repo tag).

@Deunitato-sentient
Copy link
Author

Deunitato-sentient commented May 22, 2020

This is my deployment.yaml

Non-working:

apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: seldon-metrics
  labels:
    model_name: times2-plus2
    api_type: microservice
    microservice_type: ai
spec:
  annotations:
    seldon.io/executor: "true"
  name: times2-plus2
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - name: model
          image: <My image 2>
          imagePullPolicy: Always
        - name: transformer
          image: <My image 1>
          imagePullPolicy: Always
        imagePullSecrets: 
          - name: gcr-json-key
    graph:
      name: transformer
      type: TRANSFORMER
      endpoint:
        type: REST
      children: 
      - name: model
        type: MODEL
        endpoint:
          type: REST
    name: t1-t2
    replicas: 1

Working:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: seldon-model-combiner
  labels:
    model_name: times2-plus2
    api_type: microservice
    microservice_type: ai
spec:
  name: times2-plus2
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - name: combiner
          image: <Image-combiner>
          imagePullPolicy: IfNotPresent
        - name: subprocess-times2
          image: <Image 3>
          imagePullPolicy: Always
        - name: subprocess-plus2
          image: <Image 4>
          imagePullPolicy: Always
    graph:
        name: combiner
        type: COMBINER
        endpoint:
          type: REST
        children: 
        - name: subprocess-times2
          type: MODEL
          endpoint:
            type: REST
        - name: subprocess-plus2
          type: MODEL
          endpoint:
            type: REST
    name: times2-plus2-pod
    replicas: 1

I did download the seldon-core repo locally in my cloud before using helm install.
I only cloned the stable 1.1 version rather than the master

@ryandawsonuk
Copy link
Contributor

@Deunitato-sentient Interesting. So the working and non-working ones are quite different. Are you sure the failure is related to the model being newly-deployed and not differences in the manifest? Have you been able to test that (e.g. by removing a working one and adding it in again)?

@Deunitato-sentient
Copy link
Author

Deunitato-sentient commented May 29, 2020

So I manage to see some new deployments in my grafana when I made a different deployment and used different container name.
However, if I redeploy with a different tag, it does not show up in grafana

May I ask if it's related to issue #618 and if there is any fixes for this bug?

@Deunitato-sentient
Copy link
Author

Deunitato-sentient commented May 29, 2020

@Deunitato-sentient Interesting. So the working and non-working ones are quite different. Are you sure the failure is related to the model being newly-deployed and not differences in the manifest? Have you been able to test that (e.g. by removing a working one and adding it in again)?

If I delete the deployment yaml file and reapply it, it does not show in the Grafana anymore

@axsaucedo
Copy link
Contributor

@Deunitato-sentient could you test with the latest version of seldon core and with an updated wrapper? We have done some testing of grafana metrics and it all seems to work

@axsaucedo axsaucedo added this to To do in 1.3 via automation Jul 15, 2020
@axsaucedo axsaucedo added this to the 1.3 milestone Jul 15, 2020
@ukclivecox
Copy link
Contributor

Please reopen if can be replicated on latest

1.3 automation moved this from To do to Done Aug 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
1.3
  
Done
Development

No branches or pull requests

5 participants