Add Benchmark test for PredictionEngine #1014

najeeb-kazmi · 2018-09-25T00:11:41Z

Adds a benchmark test to measure performance of doing many single predictions with PredictionEngine.

dnfclas · 2018-09-25T00:11:53Z

All CLA requirements met.

justinormont · 2018-09-25T03:26:01Z

Can you build the single prediction benchmarks from the two models produced in the Scoring Speed section of #711?

These will be more representative of user models by being a bit larger. You can take the SetupScoringSpeedTests() code to produce these on demand (and we should). The current models in this PR are quite small. We can include a very small one also; the small model focuses on overhead in the scoring process beyond the featurization and learners.

Having only tiny models would focus our energy on improving the speed of components not very representative of what users see taking time in their prediction pipeline.

I would recommend measuring:

Time to first prediction (cold start time in ms)
Throughput of prediction (# predictions / sec) -- note: multi-threaded is recommended
Latency of prediction (ms)

cc: @markusweimer

test/Microsoft.ML.Benchmarks/PredictionEngineBench.cs

Anipik · 2018-09-25T05:30:38Z

@justinormont why we are storing these trained model in the repo , shouldnt we produce them in GlobalSetup ?

justinormont · 2018-09-25T05:34:10Z

@justinormont why we are storing these trained model in the repo; shouldnt we produce them in GlobalSetup?

I agree. Perhaps we could put the new code within the existing files (so we don't have to replicate the GlobalSetup sections?

Anipik · 2018-09-25T05:38:12Z

Time to first prediction (cold start time in ms)

can be measured using the launchCount feature, similar to the way we are measuring time for training the model

Perhaps we could put the new code within the existing files (so we don't have to replicate the GlobalSetup sections?

how about putting the common code https://github.com/dotnet/machinelearning/blob/master/test/Microsoft.ML.Benchmarks/Helpers.cs
And then we can call this code from setup.

Latency of prediction (ms)

which latency are we talking about here ?

TomFinley · 2018-09-25T15:33:17Z

the small model focuses on overhead in the scoring process beyond the featurization and learners.

I think you might be right. :) A PR titled "Benchmark test for PredictionEngine" where the main file added is PredictionEngineBench.cs is benchmarking the PredictionEngine, and trying as hard as possible to keep the benchmark focused on that piece of code. From my perspefctive that's good, not bad, the major point of why I at least want this is to highlight the issues in what has historically been and still is a troublesome piece of code. More complex scenarios are fine, but getting the simple scenario right is terribly important. I have an excuse as to why a more complex pipeline might be slow, but I have no excuse for a user that asks why it takes microseconds to do a prediction on BC, for example -- 9 multiply-adds. That's nanoseconds, not microseconds. As the person that's most recently worked on this in #973, the benchmark that you're suggesting would simply not have been as useful to me as I was trying to analyze the hotspots in PredictionEngine, as the ones currently under review here.

Your other notes about timings and whatnot seem OK, except for the note about "multi-threaded is recommended." That seems more like a scenario that is best served by measuring the batch prediction engine or even dataview pipelines itself, not a simple prediction engine. Let's focus on getting the simple things right. Subsequent PRs can maybe refine.

In reply to: 424195591 [](ancestors = 424195591)

test/Microsoft.ML.Benchmarks/PredictionEngineBench.cs

test/Microsoft.ML.Benchmarks/Microsoft.ML.Benchmarks.csproj

test/Microsoft.ML.Benchmarks/PredictionEngineBench.cs

Anipik

LGTM , just post the numbers on the pR or issue

test/Microsoft.ML.Benchmarks/PredictionEngineBench.cs

TomFinley

Looks great now thanks much @najeeb-kazmi .

codemzs

Zruty0 · 2018-09-26T17:29:42Z

My suggestion, after reading the above, is:

keep this test as is, so that we will have a good benchmark of prediction engine in isolation.
write another benchmark (or several) to exercise the bigger models. This will test, essentially, the transform performance for prediction.
I don't think we should run anything multi-threaded. I would rather run many one-time prediction in one thread, to get more accurate benchmarks.
I don't think any benchmarks that we have in ML.NET should be used as any form of user-facing 'validation' etc. I think the whole reason for benchmarks here is for us to observe perf, find bottlenecks and improve perf. User-facing perf benchmarking is a separate topic.

Zruty0 · 2018-09-26T17:30:13Z

Number 2 should not happen in this PR

In reply to: 424802788 [](ancestors = 424802788)

justinormont · 2018-09-26T18:07:10Z

Thanks @najeeb-kazmi for the great benchmarks.

The multi-threaded case I was envisioning was to test how well we scale for the user scenario of running a web server handling ML prediction requests. When building a web service like this, it's good to be able to handle concurrent requests among multiple threads (multiple worker processes is another route). I haven't read the new PredictionEngine code to know if we do anything to either help/hinder multi-threaded predictions.

najeeb-kazmi requested review from codemzs and TomFinley September 25, 2018 00:11

justinormont requested review from justinormont and markusweimer September 25, 2018 03:28

justinormont added the perf Performance and Benchmarking related label Sep 25, 2018

justinormont requested a review from Anipik September 25, 2018 03:48