New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Execute is blocking main thread #236
Comments
Hi @matkg , You can try to profile and increase the number of layers per frame to balance the time. |
Hey @Aurimasp thanks for the profiler suggestion didn't thought about it! I re read the doc and i changed my code up a little: IEnumerator EvalCameraImageCoroutine()
{
dh.sw.Start();
Destroy(tex);
tex = Utils.renderTextureToTexture2D(mainCameraTexture);
Tensor textureTensor = new Tensor(tex, channels: 3);
var it = worker.StartManualSchedule(textureTensor);
workerbusy = true;
int count = 0;
while (it.MoveNext())
{
++count;
if (count % 20 == 0)
{
Task.Run(() => worker.FlushSchedule(false));
yield return null;
}
}
worker.FlushSchedule(true);
Tensor output = worker.PeekOutput("output");
var results = Utils.getBoundingBoxFromTensor(output, 1920, 1088);
results.ForEach(res => dh.log(res.ToString()));
textureTensor.Dispose();
output.Dispose();
workerbusy = false;
dh.stopSW("");
} Now i flush only every 20th frame which keeps my fps around 30 with an inference time of 200ms. But I noticed that the Here a picture of the profiler (the spikes are almost all due to MoveNext() |
Hi Matkg, May i ask what backend you are using? I would expect Burst work to happens on threads and compute work to happens on GPU, is you are using one of those backend this might be bug indeed. In that case could you share a small repro please? Thanks! |
Hey there thanks for the reply. I am using the ComputePrecompiled setting which should run on the GPU. Anyways I've managed to shrink down the execution time by reducing the input size of my yolov5n model. (model input was 1920x1088 and now is 192x192) That seemed to be the real issue which I overlooked. It now computes with an inference time of about 10ms which gives a 120 fps. Thanks for the replies but I think the issue was on my side. |
Hey! Ok thanks for feedback! I feel it is still quite unexpected that PrecompileCompute backend would take 65ms of CPU timing even at high input resolution. Something might be fishy somewhere :) I'm closing the bug for now however please feel free to reopen as needed. Florent |
Hey there I'm trying to use Barracuda with the YOLOv5s network and I'm running into performance issues. Inside the update function im calling
worker.Execute(textureTensor);
Problem is the execute command is taking approximately 100 milliseconds to complete which results in 8 to 9 fps during runtime.
I already tried using a coroutine calling
FlushSchedule
each frame and then yielding but then the model takes 3 seconds to execute which is not ideal.Here the coroutine:
As you can see i already tried to run
worker.FlushSchedule(false);
async which resulted in a race condition inside the Barracuda framework. It seems that calling the Execute() function async would solve my problems but that results in an error which is telling me that Execute can only be called from the main thread.Also calling FlushSchedule multiple times per frame made the performance even worse.
Any help is appreciated.
The text was updated successfully, but these errors were encountered: