Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing custom metrics in our Python model #245

Closed
bbarn3y opened this issue Oct 3, 2018 · 12 comments · Fixed by #281
Closed

Accessing custom metrics in our Python model #245

bbarn3y opened this issue Oct 3, 2018 · 12 comments · Fixed by #281
Assignees

Comments

@bbarn3y
Copy link

bbarn3y commented Oct 3, 2018

We are using Seldon to wrap our Python model images and we deploy these images with Kubeflow seldon serving. We also have seldon-core-analytics set up on our Kubernetes cluster and we can get and see the default Prometheus metrics using Grafana.

However, we need to extend these default metrics with our own metrics defined in the model and if possible these new metrics should behave the same way as the default ones meaning that they should be queriable and visualizable by Grafana.

We're thinking about extending seldon-core Python Wrapper's Flask REST endpoint with our own route for querying custom metrics. How could we set this up in a way that makes these custom metrics accessible in Grafana? How can we tell seldon-core-analytics' Prometheus to scrape our custom metrics?

Or maybe there is another way to add custom metrics that we didn't find?

@gsunner
Copy link
Member

gsunner commented Oct 12, 2018

@bbarn3y I've checked in PR #252

If you can get latest code in master and take a look at the example in

examples/models/mean_classifier_with_custom_endpoints/

The MeanClassifier.py has example of how to add your own endpoints.
The example is one where the custom endpoint can be used for prometheus metrics.

@bbarn3y
Copy link
Author

bbarn3y commented Oct 15, 2018

We reviewed the code and we'd like to ask if it would be possible to move the endpoint logic into microservice.py.

We think that handling the endpoints is usually not the responsibility of the model developer.
The model developer should just have to define a callback function in the model code for querying the custom model metrics. Would this be possible?

By the way, are there plans to implement similar functionality with other supported languages (e.g. Java)?

@gsunner
Copy link
Member

gsunner commented Oct 16, 2018

Initially we did think about a simple callback function but it seemed a bit limited.

The optional "custom_service" gives users a lot of flexibility to create multiple endpoints and use multiple types of methods calls such as GET/POST etc depending on their own use-case.

I've added a notebook in a recent checkin:

examples/models/mean_classifier_with_custom_endpoints/mean_classifier_with_custom_endpoints.ipynb

This has a full example of how to setup a custom endpoint for metrics, create a seldon deployment and then check the values scraped by prometheus.

No plans for a java version just yet, but it is in the roadmap.

@sasvaritoni
Copy link
Contributor

I agree that this flexibility regarding creating multiple endpoints may be useful in some scenarios.

However, I think that adding custom metrics to the model is such a basic requirement (esp. for detecting model degradation) that it should really be supported in a more native way by Seldon.

You can not always expect the data scientist who develops the model to add this kind of code. Also as much of the implementation as possible should be put into the generic microservice code, and just as little as needed into the model code (just as it is the case currently with the microservice endpoints). The simple callback function would be a much cleaner and uniform way to achieve this, specifically for Prometheus metrics. (By potentially also keeping the current custom service for other, more complex use cases).

@sasvaritoni
Copy link
Contributor

Also another question:
Will the custom metrics exposed from the model finally be visible with the same attributes (deployment_name, predictor_name, predictor_version, model_name, model_image, model_image) just as we have currently with the generic metrics currently available for all models? (e.g. seldon_api_engine_server_requests_, etc)

@ukclivecox
Copy link
Contributor

ukclivecox commented Oct 16, 2018

I think one solution could be to allow users to send back a "metrics" dict in their response which the Seldon orchestrator will pick up and push any key values out to prometheus with the standard naming and the users key with the given value.
If the value is a number it will be incremented, e,g,

{
  "meta": {
  "metrics" : {
  "mymetric1" : 1,
  "mymetric2" : 2
  }
  },
  "data": {
    "names": ["proba"],
    "tensor": {
      "shape": [1, 1],
      "values": [0.08298236340068435]
    }
  }
}

@sasvaritoni
Copy link
Contributor

I really like this approach, so the model developer would just need to send back the "metrics" dict in the metadata as you described.
I think this would ensure a "centralized" solution, requiring just the minimal effort from the model code developer.

@ukclivecox
Copy link
Contributor

Here is a proposal. Feedback welcome

@sasvaritoni
Copy link
Contributor

I like the proposal.

Just one comment regarding the enum MetricType:
I think it should align with the metric types used by Prometheus as far as possible:
https://prometheus.io/docs/concepts/metric_types/

Potentially metrics of type histogram & summary will be represented by several metrics of type gauge or counter, so in this case we might not need separate types for histogram & summary.

@ukclivecox
Copy link
Contributor

I have updated the proposal with some rationale for not following Prometheus exactly. Be good to get your opinions.

@bbarn3y
Copy link
Author

bbarn3y commented Oct 20, 2018

Thanks for the proposal, it sounds good to me.

@sasvaritoni
Copy link
Contributor

Thanks for the clarifications! Sounds good to me too.

@ukclivecox ukclivecox assigned ukclivecox and unassigned gsunner Nov 4, 2018
@ukclivecox ukclivecox added this to To do in 0.2.4 Nov 5, 2018
@ukclivecox ukclivecox added this to In progress in 0.2.5 Nov 8, 2018
0.2.5 automation moved this from In progress to Done Nov 9, 2018
agrski pushed a commit that referenced this issue Dec 2, 2022
* Started data flow doc

* dataflow doc

* remove double here

* Update dataflow.md

- Adding more details about why intermediate data / lineage is a key feature
- Adding further references

Co-authored-by: Sherif Akoush <sherif.akoush@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
0.2.4
  
To do
0.2.5
  
Done
Development

Successfully merging a pull request may close this issue.

4 participants