-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FP8 vs FP16 performance (seq2seq transformer with te.Linear replacing nn.Linear layers) #230
Comments
Same problem. The only performance gain I got is from a bigger batch size. But implementation problems in Accelerate (model conversion takes much more memory) don't allow to use it. |
Hi vince62s, could you share your benchmark script for replica the issue? :) |
well I don't know if you really want to check the code, but here is my branch of the FP8 changes. |
Here is what I am getting (see below)
FP8 slower than FP16
for FP16, multiples of 16 make things slower than multiple of 8
Am I missing something ?
Batch_size_multiple 16 // Seqlen multiple 16
FP8 (adam)
[2023-05-17 22:20:28,534 INFO] Step 100/300000; acc: 16.1; ppl: 6038.0; xent: 8.7; lr: 0.00002; sents: 31328; bsz: 2145/2545/78; 14043/16656 tok/s; 61 sec;
[2023-05-17 22:21:06,060 INFO] Step 200/300000; acc: 20.6; ppl: 1059.6; xent: 7.0; lr: 0.00005; sents: 26736; bsz: 2164/2561/67; 23063/27297 tok/s; 99 sec;
[2023-05-17 22:21:43,862 INFO] Step 300/300000; acc: 25.3; ppl: 466.3; xent: 6.1; lr: 0.00007; sents: 27760; bsz: 2181/2576/69; 23082/27262 tok/s; 136 sec;
[2023-05-17 22:22:21,180 INFO] Step 400/300000; acc: 27.6; ppl: 315.5; xent: 5.8; lr: 0.00010; sents: 24400; bsz: 2138/2526/61; 22912/27074 tok/s; 174 sec;
[2023-05-17 22:22:58,740 INFO] Step 500/300000; acc: 30.4; ppl: 236.7; xent: 5.5; lr: 0.00012; sents: 26688; bsz: 2148/2535/67; 22880/27001 tok/s; 211 sec;
FP16 (adam)
[2023-05-17 22:24:39,883 INFO] Step 100/300000; acc: 16.2; ppl: 6127.8; xent: 8.7; lr: 0.00002; sents: 31328; bsz: 2145/2545/78; 18771/22265 tok/s; 46 sec;
[2023-05-17 22:25:04,966 INFO] Step 200/300000; acc: 20.6; ppl: 1061.8; xent: 7.0; lr: 0.00005; sents: 26736; bsz: 2164/2561/67; 34504/40838 tok/s; 71 sec;
[2023-05-17 22:25:30,067 INFO] Step 300/300000; acc: 25.3; ppl: 467.8; xent: 6.1; lr: 0.00007; sents: 27760; bsz: 2181/2576/69; 34760/41057 tok/s; 96 sec;
[2023-05-17 22:25:55,069 INFO] Step 400/300000; acc: 27.4; ppl: 320.1; xent: 5.8; lr: 0.00010; sents: 24400; bsz: 2138/2526/61; 34199/40411 tok/s; 121 sec;
[2023-05-17 22:26:19,589 INFO] Step 500/300000; acc: 30.1; ppl: 241.5; xent: 5.5; lr: 0.00012; sents: 26688; bsz: 2148/2535/67; 35048/41359 tok/s; 145 sec;
FP16 (fusedadam)
[2023-05-17 22:28:29,266 INFO] Step 100/300000; acc: 16.1; ppl: 6160.6; xent: 8.7; lr: 0.00002; sents: 31328; bsz: 2145/2545/78; 20312/24092 tok/s; 42 sec;
[2023-05-17 22:28:49,956 INFO] Step 200/300000; acc: 20.6; ppl: 1063.8; xent: 7.0; lr: 0.00005; sents: 26736; bsz: 2164/2561/67; 41830/49509 tok/s; 63 sec;
[2023-05-17 22:29:11,128 INFO] Step 300/300000; acc: 25.3; ppl: 468.3; xent: 6.1; lr: 0.00007; sents: 27760; bsz: 2181/2576/69; 41213/48678 tok/s; 84 sec;
[2023-05-17 22:29:32,063 INFO] Step 400/300000; acc: 27.4; ppl: 320.2; xent: 5.8; lr: 0.00010; sents: 24400; bsz: 2138/2526/61; 40842/48260 tok/s; 105 sec;
[2023-05-17 22:29:52,720 INFO] Step 500/300000; acc: 30.2; ppl: 241.3; xent: 5.5; lr: 0.00012; sents: 26688; bsz: 2148/2535/67; 41603/49095 tok/s; 126 sec;
Batch_size_multiple 8 // Seqlen multiple 8
FP16 (Fusedadam)
[2023-05-17 22:32:08,412 INFO] Step 100/300000; acc: 16.0; ppl: 6256.0; xent: 8.7; lr: 0.00002; sents: 34120; bsz: 2337/2766/85; 22346/26446 tok/s; 42 sec;
[2023-05-17 22:32:29,029 INFO] Step 200/300000; acc: 20.9; ppl: 1047.4; xent: 7.0; lr: 0.00005; sents: 31128; bsz: 2349/2772/78; 45571/53777 tok/s; 62 sec;
[2023-05-17 22:32:49,643 INFO] Step 300/300000; acc: 24.6; ppl: 482.1; xent: 6.2; lr: 0.00007; sents: 26808; bsz: 2346/2776/67; 45523/53867 tok/s; 83 sec;
[2023-05-17 22:33:10,198 INFO] Step 400/300000; acc: 27.0; ppl: 326.7; xent: 5.8; lr: 0.00010; sents: 28448; bsz: 2341/2771/71; 45563/53917 tok/s; 104 sec;
[2023-05-17 22:33:30,629 INFO] Step 500/300000; acc: 30.0; ppl: 242.5; xent: 5.5; lr: 0.00012; sents: 27072; bsz: 2338/2764/68; 45773/54123 tok/s; 124 sec;
The text was updated successfully, but these errors were encountered: