[Codegen][LLVM] Add ability to turn on fast math flags #9223

AndrewZhaoLuo · 2021-10-07T22:11:13Z

Benchmarked on GCP n1-standard-4 (Broadwell) and n2-standard-4 (Cascade Lake) instances:

Overall broadwell had many improvements while for cascade lake there were more regressions. Runtime has been normalized by the mean of the baseline trial (without fastmath flags). Each trial was repeated with n=5

Test code:

See https://github.com/AndrewZhaoLuo/TVM-Sandbox/blob/fd08f88c12c9562a0e0f72dd7ff60f398452de35/codegen/test_export_to_ll.py#L8

"""Exporting to an LLVM assembly language file (.ll)"""

import os

import tvm
from tvm import te

if __name__ == "__main__":
    TARGET = "llvm -O=3 --fast-math"

    n = 128
    A = te.placeholder((n,), name="A", dtype="float32")
    B = te.placeholder((n,), name="B", dtype="float32")
    C = te.compute(A.shape, lambda *i: A(*i) + B(*i), name="C")
    s = tvm.te.create_schedule(C.op)
    m = tvm.lower(s, [A, B, C], name="test_add")
    rt_mod = tvm.build(m, target=TARGET)
    rt_mod.save("test.ll")

By setting the environment flag the generated LLVM ASM code is different:

Without fastmath:

; Function Attrs: nofree noinline norecurse nounwind
define internal fastcc void @test_add_compute_(i8* noalias nocapture align 128 %0, i8* noalias nocapture readonly align 128 %1, i8* noalias nocapture readonly align 128 %2) unnamed_addr #1 {
entry:
  %3 = bitcast i8* %1 to <2 x float>*
  %4 = load <2 x float>, <2 x float>* %3, align 128, !tbaa !114
  %5 = bitcast i8* %2 to <2 x float>*
  %6 = load <2 x float>, <2 x float>* %5, align 128, !tbaa !117
  %7 = fadd <2 x float> %4, %6
  %8 = bitcast i8* %0 to <2 x float>*
  store <2 x float> %7, <2 x float>* %8, align 128, !tbaa !120
  ret void
}

With fastmath:

; Function Attrs: nofree noinline norecurse nounwind
define internal fastcc void @test_add_compute_(i8* noalias nocapture align 128 %0, i8* noalias nocapture readonly align 128 %1, i8* noalias nocapture readonly align 128 %2) unnamed_addr #1 {
entry:
  %3 = bitcast i8* %1 to <2 x float>*
  %4 = load <2 x float>, <2 x float>* %3, align 128, !tbaa !114
  %5 = bitcast i8* %2 to <2 x float>*
  %6 = load <2 x float>, <2 x float>* %5, align 128, !tbaa !117
  %7 = fadd fast <2 x float> %6, %4
  %8 = bitcast i8* %0 to <2 x float>*
  store <2 x float> %7, <2 x float>* %8, align 128, !tbaa !120
  ret void
}

Note the fast tag to the fadd operations now.

masahi · 2021-10-08T09:34:41Z

I think it is better to make use of a target attribute, e.g. target = "llvm -fastmath"

AndrewZhaoLuo · 2021-10-12T22:48:04Z

This is ready for review now

masahi · 2021-10-14T00:08:46Z

src/target/llvm/llvm_common.cc

+  // semantics. These just enable these optimizations if the proper IR flags
+  // are set.
+  opt.UnsafeFPMath = true;
+  opt.NoInfsFPMath = true;


Better not to change the default values unless there is a good reason.

I need to look deeper at the LLVM code, but I think these optimization respect "fastmath" flags. So if we turn these optimizations on and run it on generated IR without fastmath flags, it should have the same behavior as before.

But yes, let me double check.

Hmm, yes these settings in Clang are passed in from LangOpts which describe the dialect of C or C++ that is accepted. Don't understand it fully and don't want to change the specification here so turned it off.

masahi · 2021-10-14T00:12:53Z

src/target/llvm/llvm_common.cc

-  llvm::TargetMachine* tm =
-      llvm_target->createTargetMachine(target_triple, mcpu, mattr, opt, llvm::Reloc::PIC_);
+
+  Integer llvm_opt_level = target->GetAttr<Integer>("O").value_or(Integer(2));


See

tvm/src/target/llvm/codegen_llvm.cc

Line 346 in 3229cb3

builder.OptLevel = 3;

. I think they refer to the same opt level.

I don't see why users would want to choose an opt level other than 3. However, internally we may want to prefer faster compile time for the constant folding use case (which currently compiles every subgraph with opt level = 3).

Hmm good catch. Need to see what the difference between the two optimization settings is.

As for the right Opt-Level, I think 3 can lead to slow downs in some situations (granted this is about gcc but same idea):
https://wiki.gentoo.org/wiki/GCC_optimization#-O

-O3: the highest level of optimization possible. It enables optimizations that are expensive in terms of compile time and memory usage. Compiling with -O3 is not a guaranteed way to improve performance, and in fact, in many cases, can slow down a system due to larger binaries and increased memory usage. -O3 is also known to break several packages. Using -O3 is not recommended. However, it also enables -ftree-vectorize so that loops in the code get vectorized and will use AVX YMM registers.

I did run a trial of fast math + changes the TargetMachine opt level to O2 and some models were faster and some were slower.

So we should add the flag to make it easy to test.

Hmm so the link listed is for the PassManager. Higher opt level = more passes much like we have relay opt level. It appears to be associated with -O3 in clang. However, CodeGenOpts appears to be a separate thing that is set in clang too https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/BackendUtil.cpp#L935. Looks like this emits the assembly. This is also associated with clang's -O3.

So the flag should control everything.

I made it based on the -O flag added. Default is still 2 however (which is the default for clang I believe)

AndrewZhaoLuo · 2021-10-18T21:39:33Z

PTAL @masahi

src/target/llvm/codegen_llvm.cc

masahi · 2021-10-18T21:54:09Z

src/target/llvm/codegen_llvm.cc

+
+    default:
+      // CodeGenOpt::Level::Aggressive
+      builder.OptLevel = 3;


A related comment: This code path should hit by default, otherwise this change would introduce regression.

Hmm, I see fair enough, I think OptLevel 3 should not be the default but let me get some data to support this first. Changed to make OptLevel 3 the default.

I think we just inherited opt level being 3 by default from Halide: https://github.com/halide/Halide/blob/ed87acb466f13144be235a33a30f242e30a6a74f/src/CodeGen_LLVM.cpp#L1145

masahi · 2021-10-18T21:55:00Z

@AndrewZhaoLuo I think -O=3 syntax is weird, how about -opt-level=3?

AndrewZhaoLuo · 2021-10-18T21:57:16Z

Done, renamed -O --> -opt-level

masahi · 2021-10-18T22:01:45Z

cc @kparzysz-quic @junrushao1994 @tqchen

* flags to turn off and on * turn fast math on always * llvm more opts * move to default codegen opt * TODO * add fast math options to llvm target * move to using new target attributes * llvm fast math target opt code * add -O flags * fix todo lint * support llvm 4.0, 5.0 * use same opt level as target machine * revert TargetOptions * fix thing * prevent regression in llvm * togglable opt-levels Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain>

AndrewZhaoLuo added 2 commits October 7, 2021 15:06

flags to turn off and on

18ca603

turn fast math on always

1c8da0f

AndrewZhaoLuo added 3 commits October 8, 2021 17:06

llvm more opts

444e51a

move to default codegen opt

6938d70

TODO

17fa49c

AndrewZhaoLuo mentioned this pull request Oct 11, 2021

[WIP] [Codegen] [CUDA] enable fast math flags #9252

Closed

Andrew Zhao Luo added 4 commits October 12, 2021 11:45

add fast math options to llvm target

2795510

move to using new target attributes

02cd251

llvm fast math target opt code

3d6c2c3

add -O flags

b244dec

AndrewZhaoLuo changed the title ~~[WIP][Codegen][LLVM] Add ability to turn on fast math flags~~ [Codegen][LLVM] Add ability to turn on fast math flags Oct 12, 2021

AndrewZhaoLuo marked this pull request as ready for review October 12, 2021 22:36

AndrewZhaoLuo requested review from junrushao, kparzysz-quic, masahi, tqchen, vinx13 and ZihengJiang as code owners October 12, 2021 22:36

fix todo lint

0c5d38b

support llvm 4.0, 5.0

c9ac146

masahi reviewed Oct 14, 2021

View reviewed changes

AndrewZhaoLuo added 2 commits October 18, 2021 14:26

use same opt level as target machine

99ae59f

revert TargetOptions

d9e3524

fix thing

5466663

masahi reviewed Oct 18, 2021

View reviewed changes

src/target/llvm/codegen_llvm.cc Show resolved Hide resolved

masahi reviewed Oct 18, 2021

View reviewed changes

AndrewZhaoLuo added 2 commits October 18, 2021 14:55

prevent regression in llvm

cfeb699

togglable opt-levels

2abbed5

masahi approved these changes Oct 18, 2021

View reviewed changes

masahi merged commit f095595 into apache:main Oct 19, 2021

slyubomirsky mentioned this pull request Oct 21, 2021

[codegen][LLVM][bugfix] Specify argument to FastMathFlags setAllowContract #9337

Merged

junrushao mentioned this pull request Nov 2, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

AndrewZhaoLuo mentioned this pull request Jun 27, 2022

set FLUSH_TO_ZERO ON by default #11906

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Codegen][LLVM] Add ability to turn on fast math flags #9223

[Codegen][LLVM] Add ability to turn on fast math flags #9223

AndrewZhaoLuo commented Oct 7, 2021 •

edited

masahi commented Oct 8, 2021

AndrewZhaoLuo commented Oct 12, 2021

masahi Oct 14, 2021

AndrewZhaoLuo Oct 14, 2021

AndrewZhaoLuo Oct 18, 2021

masahi Oct 14, 2021 •

edited

AndrewZhaoLuo Oct 14, 2021 •

edited

AndrewZhaoLuo Oct 18, 2021

AndrewZhaoLuo Oct 18, 2021

AndrewZhaoLuo commented Oct 18, 2021

masahi Oct 18, 2021

AndrewZhaoLuo Oct 18, 2021

masahi Oct 18, 2021

masahi commented Oct 18, 2021

AndrewZhaoLuo commented Oct 18, 2021

masahi commented Oct 18, 2021

[Codegen][LLVM] Add ability to turn on fast math flags #9223

[Codegen][LLVM] Add ability to turn on fast math flags #9223

Conversation

AndrewZhaoLuo commented Oct 7, 2021 • edited

masahi commented Oct 8, 2021

AndrewZhaoLuo commented Oct 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masahi Oct 14, 2021 • edited

Choose a reason for hiding this comment

AndrewZhaoLuo Oct 14, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewZhaoLuo commented Oct 18, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masahi commented Oct 18, 2021

AndrewZhaoLuo commented Oct 18, 2021

masahi commented Oct 18, 2021

AndrewZhaoLuo commented Oct 7, 2021 •

edited

masahi Oct 14, 2021 •

edited

AndrewZhaoLuo Oct 14, 2021 •

edited