Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to calculate TFLOPS in LSTM.cu #7

Closed
yhuanghamu opened this issue May 3, 2016 · 2 comments
Closed

How to calculate TFLOPS in LSTM.cu #7

yhuanghamu opened this issue May 3, 2016 · 2 comments

Comments

@yhuanghamu
Copy link

yhuanghamu commented May 3, 2016

The output of this code is runtime, but what i want to compare is throughput, how do i convert the runtime into TFLOPS.
I mean how the computation is related to the other parameters.

@yhuanghamu yhuanghamu changed the title How to calculate TFLOPS How to calculate TFLOPS in LSTM.cu May 3, 2016
@JAppleyard
Copy link
Contributor

JAppleyard commented May 4, 2016

The vast majority of FLOPs in an LSTM are in the matrix multiplications. A single matrix multiplication requires 2MN(K+1) FLOPs. There are 8 matrix multiplications per layer per timestep, and in this case M=K=hiddenSize, N=minibatch. Therefore the total FLOPs are:

layers * timesteps * 8 * 2 * hiddenSize * minibatch * (hiddenSize + 1).

In reality there's a few more FLOPs due to biases and activation functions. These are only significant if hiddenSize is very small as they scale linearly with hiddenSize rather than with the square.

In any case, this will give you the total approximate number of FLOPs. Divide by time and multiply by 10^-12, and you have TFLOPS.

@harrism harrism closed this as completed May 10, 2016
@xiezhq-hermann
Copy link

xiezhq-hermann commented Jan 8, 2019

@JAppleyard Hi, why a single matrix multiplication requires 2MN(K+1), I mean, it should be MN(2K-1) right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants