Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler FE: Providing optimization group options #5784

Open
seanshpark opened this issue Jan 22, 2021 · 12 comments
Open

Compiler FE: Providing optimization group options #5784

seanshpark opened this issue Jan 22, 2021 · 12 comments

Comments

@seanshpark
Copy link
Contributor

Original from #5780 (comment)


As a suggestion, what if we provide the -O1, -O2, and -O3 options like c compiler?

As you might expect, it's kind of a group of optimization options. Probably the most stable optimizations are reflected in -O1, the slightly challenging ones in -O2, and the risky, perhaps current --all items are reflected through -O3. Of course, the configuration of the options in this group can be changed at any time at our discretion, so users can use it without confusion. The long format of this option would be around --optimization-level={1,2,3}.

In addition, options such as -O4 or --experimental could be added.

@llFreetimell

This comment has been minimized.

@seanshpark

This comment has been minimized.

@llFreetimell

This comment has been minimized.

@seanshpark

This comment has been minimized.

@llFreetimell

This comment has been minimized.

@mhs4670go
Copy link
Contributor

mhs4670go commented Sep 17, 2021

Options can be categorized like below.

from b346889

Based on luci::CircleOptimizer::Options::Algorithm

Fusing

FuseActivationFunction
FuseBatchNormWithConv
FuseAddWithTConv
FuseBatchNormWithDwConv
FuseBatchNormWithTConv
FuseBCQ
FuseInstanceNorm
FuseMeanWithMean
FusePreActivationBatchNorm
FuseTransposeWithMean

Replace, Substitute

ReplaceMulAddWithDepthwiseConv
ReplaceSubWithAdd
ResolveCustomOpAdd
ResolveCustomOpBatchMatMul
ResolveCustomOpMatMul
ResolveCustomOpMaxPoolWithArgmax
SubstitutePackToReshape
SubstitutePadV2ToPad
SubstituteSplitVToSplit
SubstituteSqueezeToReshape
SubstituteStridedSliceToReshape
SubstituteTransposeToReshape
TransformMinMaxToRelu6Pass
TransformMinReluToRelu6Pass

Remove

RemoveFakeQuant
RemoveQuantDequantSeq
RemoveRedundantReshape
RemoveRedundantTranspose
RemoveUnnecessaryReshape
RemoveUnnecessarySlice
RemoveUnnecessaryStridedSlice
RemoveUnnecessarySplit

Constant folding

FoldAddV2
FoldCast
FoldDequantize
FoldDepthwiseConv2D
FoldSparseToDense
ForwardReshapeToUnaryOp

Value modification

MakeBatchNormGammaPositive
ExpandBroadcastConst # not sure

Need user decision

ShuffleWeightTo16x1Float32
convert_nchw_to_nhwc
nchw_to_nhwc_input_shape
nchw_to_nhwc_output_shape


How about this?

O1

Fusing
Replace, Substitute
Remove
Constant folding

O2

O1 + Value modification

@seanshpark @llFreetimell @jinevening

@seanshpark
Copy link
Contributor Author

How about this?

Seems OK at first look :)

  • but didn't think much -_-;;;

@jinevening
Copy link
Contributor

@mhs4670go In #5784 (comment), I think ForwardReshapeToUnaryOp can be enabled by default, but it is not constant folding (this helps RemoveRedundantReshape)

IMHO The below passes should not be enabled by default, because they are not always beneficial.

FusePreActivationBatchNorm - Not used due to low quantization accuracy
MakeBatchNormGammaPositive - Not used due to low quantization accuracy

ReplaceMulAddWithDepthwiseConv - Backend specific (useful when we have a premature backend)
ReplaceSubWithAdd - Backend specific
ExpandBroadcastConst - Backend specific
RemoveQuantDequantSeq - Backend specific
RemoveFakeQuant - Backend specific

@mhs4670go
Copy link
Contributor

@jinevening Thank you for comment.

I've applied your comment.

from b346889

Based on luci::CircleOptimizer::Options::Algorithm

Fusing

FuseActivationFunction
FuseBatchNormWithConv
FuseAddWithTConv
FuseBatchNormWithDwConv
FuseBatchNormWithTConv
FuseBCQ
FuseInstanceNorm
FuseMeanWithMean
FuseTransposeWithMean

Replace, Substitute

ReplaceMulAddWithDepthwiseConv
ReplaceSubWithAdd
ResolveCustomOpAdd
ResolveCustomOpBatchMatMul
ResolveCustomOpMatMul
ResolveCustomOpMaxPoolWithArgmax
SubstitutePackToReshape
SubstitutePadV2ToPad
SubstituteSplitVToSplit
SubstituteSqueezeToReshape
SubstituteStridedSliceToReshape
SubstituteTransposeToReshape
TransformMinMaxToRelu6Pass
TransformMinReluToRelu6Pass
ForwardReshapeToUnaryOp # moved from Constant folding

Remove

RemoveFakeQuant
RemoveQuantDequantSeq
RemoveRedundantReshape
RemoveRedundantTranspose
RemoveUnnecessaryReshape
RemoveUnnecessarySlice
RemoveUnnecessaryStridedSlice
RemoveUnnecessarySplit

Constant folding

FoldAddV2
FoldCast
FoldDequantize
FoldDepthwiseConv2D
FoldSparseToDense

Value modification

ExpandBroadcastConst


O1

Fusing
Replace, Substitute
Remove
Constant folding

O2

Need user decision

ShuffleWeightTo16x1Float32
convert_nchw_to_nhwc
nchw_to_nhwc_input_shape
nchw_to_nhwc_output_shape
FusePreActivationBatchNorm # Not used due to low quantization accuracy
MakeBatchNormGammaPositive # Not used due to low quantization accuracy
Replace, Substitute # backend specific
Value modification # backend specific


I'm gonna post a PR with these categories.

@lemmaa
Copy link
Member

lemmaa commented Sep 23, 2021

FYI, in case of gcc, as a convention that is already widely used

-O, -O1

  • With -O, the compiler tries to reduce code size and execution time, without performing any optimizations that take a great deal of compilation time.

-O2

  • -O2 Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to -O, this option increases both compilation time and the performance of the generated code.

-O0

  • -O0 Reduce compilation time and make debugging produce the expected results. This is the default. <-- At least we need to accept this option.
 -O
 -O1 Optimize.  Optimizing compilation takes somewhat more time, and a lot more memory for a large function.

     With -O, the compiler tries to reduce code size and execution time, without performing any optimizations
     that take a great deal of compilation time.

     -O turns on the following optimization flags:

     -fauto-inc-dec -fbranch-count-reg -fcombine-stack-adjustments -fcompare-elim -fcprop-registers -fdce
     -fdefer-pop -fdelayed-branch -fdse -fforward-propagate -fguess-branch-probability -fif-conversion
     -fif-conversion2 -finline-functions-called-once -fipa-profile -fipa-pure-const -fipa-reference
     -fipa-reference-addressable -fmerge-constants -fmove-loop-invariants -fomit-frame-pointer
     -freorder-blocks -fshrink-wrap -fshrink-wrap-separate -fsplit-wide-types -fssa-backprop -fssa-phiopt
     -ftree-bit-ccp -ftree-ccp -ftree-ch -ftree-coalesce-vars -ftree-copy-prop -ftree-dce
     -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-phiprop -ftree-pta -ftree-scev-cprop
     -ftree-sink -ftree-slsr -ftree-sra -ftree-ter -funit-at-a-time

 -O2 Optimize even more.  GCC performs nearly all supported optimizations that do not involve a space-speed
     tradeoff.  As compared to -O, this option increases both compilation time and the performance of the
     generated code.

     -O2 turns on all optimization flags specified by -O.  It also turns on the following optimization flags:

     -falign-functions  -falign-jumps -falign-labels  -falign-loops -fcaller-saves -fcode-hoisting
     -fcrossjumping -fcse-follow-jumps  -fcse-skip-blocks -fdelete-null-pointer-checks -fdevirtualize
     -fdevirtualize-speculatively -fexpensive-optimizations -fgcse  -fgcse-lm -fhoist-adjacent-loads
     -finline-small-functions -findirect-inlining -fipa-bit-cp  -fipa-cp  -fipa-icf -fipa-ra  -fipa-sra
     -fipa-vrp -fisolate-erroneous-paths-dereference -flra-remat -foptimize-sibling-calls -foptimize-strlen
     -fpartial-inlining -fpeephole2 -freorder-blocks-algorithm=stc -freorder-blocks-and-partition
     -freorder-functions -frerun-cse-after-loop -fschedule-insns  -fschedule-insns2 -fsched-interblock
     -fsched-spec -fstore-merging -fstrict-aliasing -fthread-jumps -ftree-builtin-call-dce -ftree-pre
     -ftree-switch-conversion  -ftree-tail-merge -ftree-vrp

     Please note the warning under -fgcse about invoking -O2 on programs that use computed gotos.

     NOTE: In Ubuntu 8.10 and later versions, -D_FORTIFY_SOURCE=2 is set by default, and is activated when -O
     is set to 2 or higher.  This enables additional compile-time and run-time checks for several libc
     functions.  To disable, specify either -U_FORTIFY_SOURCE or -D_FORTIFY_SOURCE=0.

 -O3 Optimize yet more.  -O3 turns on all optimizations specified by -O2 and also turns on the following
     optimization flags:

     -fgcse-after-reload -finline-functions -fipa-cp-clone -floop-interchange -floop-unroll-and-jam
     -fpeel-loops -fpredictive-commoning -fsplit-paths -ftree-loop-distribute-patterns
     -ftree-loop-distribution -ftree-loop-vectorize -ftree-partial-pre -ftree-slp-vectorize -funswitch-loops
     -fvect-cost-model -fversion-loops-for-strides

 -O0 Reduce compilation time and make debugging produce the expected results.  This is the default.

@jinevening
Copy link
Contributor

Replace, Substitute # backend specific

IMHO, all Replace/Substitute passes can be turned on in O1 except the below ones (if compilation time is acceptable). The passes except the below ones are beneficial in most cases.

ReplaceMulAddWithDepthwiseConv - Backend specific (useful when we have a premature backend)
ReplaceSubWithAdd - Backend specific

@mhs4670go
Copy link
Contributor

@jinevening Then, I'll make them O2. Actually, all replaced ops can be backend specific.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants