You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is a possible case because GEMM takes almost all time under FP32.
In such case, small noise of time of GEMM may affect the latency obviously. In your case, the relative difference of latency is about 2%, which may be a noise. For such cases, FT and pytorch should have similar latency.
We don't suggest using FP32 for transformer model because FP16 can bring lots of speedup without accuracy drop.
@byshiue I found I didn't search GEMM info which caused using default gemm. Does searching a best algo will boost time a little bit? Does this gemm info file can cross different PC with same GPU card model?
Sorry for delay reply. Searching best algo may improve the speed. It is case by case.
In general, the gemm info file can used in different devices with same GPU.
I am runing on vit got unexpected result:
it's even slower than pytorch.....
The text was updated successfully, but these errors were encountered: