Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce time taken for first prediction, Fixes #4428 #4456

Closed
wants to merge 0 commits into from

Conversation

@ashbhandare
Copy link
Contributor

ashbhandare commented Nov 7, 2019

Fixes #4428
As reported in the issue, the first prediction takes a lot longer than the subsequent predictions.
This is due to the tensorflow initialization/graph optimizations happenning on at the time of the first prediction.
In order to mask the time within the PredictionEngine creation, this change runs the prediction
on the tensorflow graph with a dummy tensor of the same size as an Imagenet image, of zeros.

@codecov

This comment has been minimized.

Copy link

codecov bot commented Nov 7, 2019

Codecov Report

Merging #4456 into master will increase coverage by 0.69%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #4456      +/-   ##
==========================================
+ Coverage   74.03%   74.72%   +0.69%     
==========================================
  Files         905      906       +1     
  Lines      159107   159307     +200     
  Branches    17128    17145      +17     
==========================================
+ Hits       117794   119042    +1248     
+ Misses      36545    35456    -1089     
- Partials     4768     4809      +41
Flag Coverage Δ
#Debug 74.72% <100%> (+0.69%) ⬆️
#production 70.09% <100%> (+0.9%) ⬆️
#test 90.11% <100%> (-0.01%) ⬇️
Impacted Files Coverage Δ
.../Microsoft.ML.Vision/ImageClassificationTrainer.cs 92.29% <100%> (+14.61%) ⬆️
...cenariosWithDirectInstantiation/TensorflowTests.cs 91.19% <100%> (+0.04%) ⬆️
...c/Microsoft.ML.FastTree/Utils/ThreadTaskManager.cs 79.48% <0%> (-20.52%) ⬇️
...rosoft.ML.AutoML/ColumnInference/TextFileSample.cs 59.6% <0%> (-5.97%) ⬇️
test/Microsoft.ML.AutoML.Tests/DatasetUtil.cs 88.59% <0%> (-2.64%) ⬇️
src/Microsoft.ML.Maml/MAML.cs 24.75% <0%> (-1.46%) ⬇️
...c/Microsoft.ML.SamplesUtils/SamplesDatasetUtils.cs 39.77% <0%> (ø)
src/Microsoft.ML.AutoML/Sweepers/Parameters.cs 83.89% <0%> (+1.69%) ⬆️
...osoft.ML.KMeansClustering/KMeansPlusPlusTrainer.cs 84.64% <0%> (+2.69%) ⬆️
src/Microsoft.ML.OnnxTransformer/OnnxTypeParser.cs 59.13% <0%> (+5.37%) ⬆️
... and 17 more
@ashbhandare ashbhandare marked this pull request as ready for review Nov 8, 2019
@ashbhandare ashbhandare requested a review from dotnet/mlnet-core as a code owner Nov 8, 2019
@ashbhandare ashbhandare requested review from codemzs and CESARDELATORRE Nov 8, 2019
var fakeBatchHandle = GCHandle.Alloc(fakeBatch, GCHandleType.Pinned);
IntPtr fakeBatchPtr = fakeBatchHandle.AddrOfPinnedObject();
int fakeBatchSizeInBytes = sizeof(float) * fakeBatch.Length;
var fakeOutput = _runner.AddInput(new Tensor(fakeBatchPtr, fakeTensorShape, TF_DataType.TF_FLOAT, fakeBatchSizeInBytes), 0).Run();

This comment has been minimized.

Copy link
@eerhardt

eerhardt Nov 8, 2019

Member

Are we sure doing this is a good idea overall?

I can think of the following places where this is actually worse for a user:

  1. The total time take is now longer because we are making this "fake" prediction.
  2. The user creates multiple PredictionEngines in their application - we will be issuing "fake" requests even for the 2nd, 3rd, etc.
  3. The case where they create a PredictionEngine, but never make a prediction. (not very common, but still calling it out)

If some customers want this, they can "warm up" TensorFlow themselves. We shouldn't make every user pay.

I think the intention of the issue was to profile and investigate if there is something unnecessary happening, or if we could do something another way to speed it up. I'm not sure "masking" the time taken is a good solution.

This comment has been minimized.

Copy link
@codemzs

codemzs Nov 8, 2019

Member

I agree with you @eerhardt. I have been looking if TF has ways to warm things up but the one thing to try is to save graph in SaveModel format and see if that fixes this issue. Thanks for the investigation @ashbhandare but like @eerhardt says the point was to root cause the delay which you did but let’s try to not put hacks in.

This comment has been minimized.

Copy link
@ashbhandare

ashbhandare Nov 8, 2019

Author Contributor

I agree that if there is a better way to do this, that should go in instead, but the issue specifically asked for masking this time as an alternative, which is why this PR exists.
The fastest way of inferring in tensorflow from what I found is to freeze the model, apply certain graph optimizations like constant folding, remove unused nodes from the graph. This can be done by using the graph transform tools (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms). I did not find a straightforward way of using these in ML .Net when we save our trained model.


In reply to: 344231882 [](ancestors = 344231882)

This comment has been minimized.

Copy link
@codemzs

codemzs Nov 8, 2019

Member

I would try with SaveModel format first before spending too much cycles on anything too fancy.

This comment has been minimized.

Copy link
@ashbhandare

ashbhandare Nov 11, 2019

Author Contributor

SavedModel format is not supported by TF .Net (SciSharp/TensorFlow.NET#445). The one way that is supported is tf.train.saver.save(), which saves the checkpoint. Since we are freezing our trained model already(in this process, converting variable nodes to constants, and extracting the inference graph), I doubt that the saved model format would give much benefit(since the primary intution of using the saved_model format was that one can directly load the inference meta-graph https://www.tensorflow.org/guide/saved_model .)

This comment has been minimized.

Copy link
@Oceania2018

Oceania2018 Nov 11, 2019

Run the sessions in a long-running process is better for end-user.
But there is another solution provided by TFX: https://www.tensorflow.org/tfx/serving/saved_model_warmup @codemzs @eerhardt

This comment has been minimized.

Copy link
@codemzs

codemzs Nov 11, 2019

Member

@Oceania2018 Thanks! @ashbhandare this is what I was talking about and yes, SaveModel is supported in TF .NET code, we have tests that exercise that code path, please check retrain tests.

This comment has been minimized.

Copy link
@ashbhandare

ashbhandare Nov 11, 2019

Author Contributor

@codemzs I tried and failed to find a way of saving using the tensorflow's python-like api : https://www.tensorflow.org/api_docs/python/tf/saved_model/save, which produces a SavedModel with the following directory structure: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md#components. I might be missing something.

@ashbhandare ashbhandare force-pushed the ashbhandare:PredictionTime branch from f7ab356 to 11607ab Nov 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.