-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to calculate TFLOPS in LSTM.cu #7
Comments
The vast majority of FLOPs in an LSTM are in the matrix multiplications. A single matrix multiplication requires 2MN(K+1) FLOPs. There are 8 matrix multiplications per layer per timestep, and in this case M=K=hiddenSize, N=minibatch. Therefore the total FLOPs are:
In reality there's a few more FLOPs due to biases and activation functions. These are only significant if hiddenSize is very small as they scale linearly with hiddenSize rather than with the square. In any case, this will give you the total approximate number of FLOPs. Divide by time and multiply by 10^-12, and you have TFLOPS. |
@JAppleyard Hi, why a single matrix multiplication requires 2MN(K+1), I mean, it should be MN(2K-1) right? |
The output of this code is runtime, but what i want to compare is throughput, how do i convert the runtime into TFLOPS.
I mean how the computation is related to the other parameters.
The text was updated successfully, but these errors were encountered: