Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Benchmark test for PredictionEngine #1014

Merged
merged 1 commit into from Sep 26, 2018

Conversation

najeeb-kazmi
Copy link
Member

Adds a benchmark test to measure performance of doing many single predictions with PredictionEngine.

Closes #1013

@dnfclas
Copy link

dnfclas commented Sep 25, 2018

CLA assistant check
All CLA requirements met.

@justinormont
Copy link
Contributor

justinormont commented Sep 25, 2018

Can you build the single prediction benchmarks from the two models produced in the Scoring Speed section of #711?

These will be more representative of user models by being a bit larger. You can take the SetupScoringSpeedTests() code to produce these on demand (and we should). The current models in this PR are quite small. We can include a very small one also; the small model focuses on overhead in the scoring process beyond the featurization and learners.

Having only tiny models would focus our energy on improving the speed of components not very representative of what users see taking time in their prediction pipeline.

I would recommend measuring:

  • Time to first prediction (cold start time in ms)
  • Throughput of prediction (# predictions / sec) -- note: multi-threaded is recommended
  • Latency of prediction (ms)

cc: @markusweimer

@justinormont justinormont added the perf Performance and Benchmarking related label Sep 25, 2018
@Anipik
Copy link
Contributor

Anipik commented Sep 25, 2018

@justinormont why we are storing these trained model in the repo , shouldnt we produce them in GlobalSetup ?

@justinormont
Copy link
Contributor

@justinormont why we are storing these trained model in the repo; shouldnt we produce them in GlobalSetup?

I agree. Perhaps we could put the new code within the existing files (so we don't have to replicate the GlobalSetup sections?

@Anipik
Copy link
Contributor

Anipik commented Sep 25, 2018

Time to first prediction (cold start time in ms)

can be measured using the launchCount feature, similar to the way we are measuring time for training the model

Perhaps we could put the new code within the existing files (so we don't have to replicate the GlobalSetup sections?

how about putting the common code https://github.com/dotnet/machinelearning/blob/master/test/Microsoft.ML.Benchmarks/Helpers.cs
And then we can call this code from setup.

Latency of prediction (ms)

which latency are we talking about here ?

@TomFinley
Copy link
Contributor

TomFinley commented Sep 25, 2018

the small model focuses on overhead in the scoring process beyond the featurization and learners.

I think you might be right. :) A PR titled "Benchmark test for PredictionEngine" where the main file added is PredictionEngineBench.cs is benchmarking the PredictionEngine, and trying as hard as possible to keep the benchmark focused on that piece of code. From my perspefctive that's good, not bad, the major point of why I at least want this is to highlight the issues in what has historically been and still is a troublesome piece of code. More complex scenarios are fine, but getting the simple scenario right is terribly important. I have an excuse as to why a more complex pipeline might be slow, but I have no excuse for a user that asks why it takes microseconds to do a prediction on BC, for example -- 9 multiply-adds. That's nanoseconds, not microseconds. As the person that's most recently worked on this in #973, the benchmark that you're suggesting would simply not have been as useful to me as I was trying to analyze the hotspots in PredictionEngine, as the ones currently under review here.

Your other notes about timings and whatnot seem OK, except for the note about "multi-threaded is recommended." That seems more like a scenario that is best served by measuring the batch prediction engine or even dataview pipelines itself, not a simple prediction engine. Let's focus on getting the simple things right. Subsequent PRs can maybe refine.


In reply to: 424195591 [](ancestors = 424195591)

Copy link
Contributor

@Anipik Anipik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM , just post the numbers on the pR or issue

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great now thanks much @najeeb-kazmi .

Copy link
Member

@codemzs codemzs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Zruty0
Copy link
Contributor

Zruty0 commented Sep 26, 2018

My suggestion, after reading the above, is:

  1. keep this test as is, so that we will have a good benchmark of prediction engine in isolation.
  2. write another benchmark (or several) to exercise the bigger models. This will test, essentially, the transform performance for prediction.
  3. I don't think we should run anything multi-threaded. I would rather run many one-time prediction in one thread, to get more accurate benchmarks.
  4. I don't think any benchmarks that we have in ML.NET should be used as any form of user-facing 'validation' etc. I think the whole reason for benchmarks here is for us to observe perf, find bottlenecks and improve perf. User-facing perf benchmarking is a separate topic.

@Zruty0
Copy link
Contributor

Zruty0 commented Sep 26, 2018

Number 2 should not happen in this PR


In reply to: 424802788 [](ancestors = 424802788)

@justinormont justinormont merged commit 36c75d9 into dotnet:master Sep 26, 2018
@justinormont
Copy link
Contributor

justinormont commented Sep 26, 2018

Thanks @najeeb-kazmi for the great benchmarks.

The multi-threaded case I was envisioning was to test how well we scale for the user scenario of running a web server handling ML prediction requests. When building a web service like this, it's good to be able to handle concurrent requests among multiple threads (multiple worker processes is another route). I haven't read the new PredictionEngine code to know if we do anything to either help/hinder multi-threaded predictions.

@najeeb-kazmi najeeb-kazmi deleted the predictor branch September 26, 2018 19:38
@najeeb-kazmi najeeb-kazmi restored the predictor branch October 2, 2018 01:11
@najeeb-kazmi najeeb-kazmi deleted the predictor branch January 30, 2020 01:28
@dotnet dotnet locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
perf Performance and Benchmarking related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants