Compiler FE: Providing optimization group options #5784

seanshpark · 2021-01-22T01:05:42Z

As a suggestion, what if we provide the -O1, -O2, and -O3 options like c compiler?

As you might expect, it's kind of a group of optimization options. Probably the most stable optimizations are reflected in -O1, the slightly challenging ones in -O2, and the risky, perhaps current --all items are reflected through -O3. Of course, the configuration of the options in this group can be changed at any time at our discretion, so users can use it without confusion. The long format of this option would be around --optimization-level={1,2,3}.

In addition, options such as -O4 or --experimental could be added.

The text was updated successfully, but these errors were encountered:

mhs4670go · 2021-09-17T09:56:58Z

Options can be categorized like below.

from b346889

Based on luci::CircleOptimizer::Options::Algorithm

Fusing

FuseActivationFunction
FuseBatchNormWithConv
FuseAddWithTConv
FuseBatchNormWithDwConv
FuseBatchNormWithTConv
FuseBCQ
FuseInstanceNorm
FuseMeanWithMean
FusePreActivationBatchNorm
FuseTransposeWithMean

Replace, Substitute

ReplaceMulAddWithDepthwiseConv
ReplaceSubWithAdd
ResolveCustomOpAdd
ResolveCustomOpBatchMatMul
ResolveCustomOpMatMul
ResolveCustomOpMaxPoolWithArgmax
SubstitutePackToReshape
SubstitutePadV2ToPad
SubstituteSplitVToSplit
SubstituteSqueezeToReshape
SubstituteStridedSliceToReshape
SubstituteTransposeToReshape
TransformMinMaxToRelu6Pass
TransformMinReluToRelu6Pass

Remove

RemoveFakeQuant
RemoveQuantDequantSeq
RemoveRedundantReshape
RemoveRedundantTranspose
RemoveUnnecessaryReshape
RemoveUnnecessarySlice
RemoveUnnecessaryStridedSlice
RemoveUnnecessarySplit

Constant folding

FoldAddV2
FoldCast
FoldDequantize
FoldDepthwiseConv2D
FoldSparseToDense
ForwardReshapeToUnaryOp

Value modification

MakeBatchNormGammaPositive
ExpandBroadcastConst # not sure

Need user decision

ShuffleWeightTo16x1Float32
convert_nchw_to_nhwc
nchw_to_nhwc_input_shape
nchw_to_nhwc_output_shape

How about this?

O1

Fusing
Replace, Substitute
Remove
Constant folding

O2

O1 + Value modification

@seanshpark @llFreetimell @jinevening

seanshpark · 2021-09-17T09:59:44Z

How about this?

Seems OK at first look :)

but didn't think much -_-;;;

jinevening · 2021-09-23T00:45:14Z

@mhs4670go In #5784 (comment), I think ForwardReshapeToUnaryOp can be enabled by default, but it is not constant folding (this helps RemoveRedundantReshape)

IMHO The below passes should not be enabled by default, because they are not always beneficial.

FusePreActivationBatchNorm - Not used due to low quantization accuracy
MakeBatchNormGammaPositive - Not used due to low quantization accuracy

ReplaceMulAddWithDepthwiseConv - Backend specific (useful when we have a premature backend)
ReplaceSubWithAdd - Backend specific
ExpandBroadcastConst - Backend specific
RemoveQuantDequantSeq - Backend specific
RemoveFakeQuant - Backend specific

mhs4670go · 2021-09-23T11:41:55Z

@jinevening Thank you for comment.

I've applied your comment.

from b346889

Based on luci::CircleOptimizer::Options::Algorithm

Fusing

FuseActivationFunction
FuseBatchNormWithConv
FuseAddWithTConv
FuseBatchNormWithDwConv
FuseBatchNormWithTConv
FuseBCQ
FuseInstanceNorm
FuseMeanWithMean
FuseTransposeWithMean

Replace, Substitute

ReplaceMulAddWithDepthwiseConv
ReplaceSubWithAdd
ResolveCustomOpAdd
ResolveCustomOpBatchMatMul
ResolveCustomOpMatMul
ResolveCustomOpMaxPoolWithArgmax
SubstitutePackToReshape
SubstitutePadV2ToPad
SubstituteSplitVToSplit
SubstituteSqueezeToReshape
SubstituteStridedSliceToReshape
SubstituteTransposeToReshape
TransformMinMaxToRelu6Pass
TransformMinReluToRelu6Pass
ForwardReshapeToUnaryOp # moved from Constant folding

Remove

RemoveFakeQuant
RemoveQuantDequantSeq
RemoveRedundantReshape
RemoveRedundantTranspose
RemoveUnnecessaryReshape
RemoveUnnecessarySlice
RemoveUnnecessaryStridedSlice
RemoveUnnecessarySplit

Constant folding

FoldAddV2
FoldCast
FoldDequantize
FoldDepthwiseConv2D
FoldSparseToDense

Value modification

ExpandBroadcastConst

O1

Fusing
~~Replace, Substitute~~
Remove
Constant folding

O2

Need user decision

ShuffleWeightTo16x1Float32
convert_nchw_to_nhwc
nchw_to_nhwc_input_shape
nchw_to_nhwc_output_shape
FusePreActivationBatchNorm # Not used due to low quantization accuracy
MakeBatchNormGammaPositive # Not used due to low quantization accuracy
Replace, Substitute # backend specific
Value modification # backend specific

I'm gonna post a PR with these categories.

lemmaa · 2021-09-23T16:13:04Z

FYI, in case of gcc, as a convention that is already widely used

-O, -O1

With -O, the compiler tries to reduce code size and execution time, without performing any optimizations that take a great deal of compilation time.

-O2

-O2 Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to -O, this option increases both compilation time and the performance of the generated code.

-O0

-O0 Reduce compilation time and make debugging produce the expected results. This is the default. <-- At least we need to accept this option.

 -O
 -O1 Optimize.  Optimizing compilation takes somewhat more time, and a lot more memory for a large function.

     With -O, the compiler tries to reduce code size and execution time, without performing any optimizations
     that take a great deal of compilation time.

     -O turns on the following optimization flags:

     -fauto-inc-dec -fbranch-count-reg -fcombine-stack-adjustments -fcompare-elim -fcprop-registers -fdce
     -fdefer-pop -fdelayed-branch -fdse -fforward-propagate -fguess-branch-probability -fif-conversion
     -fif-conversion2 -finline-functions-called-once -fipa-profile -fipa-pure-const -fipa-reference
     -fipa-reference-addressable -fmerge-constants -fmove-loop-invariants -fomit-frame-pointer
     -freorder-blocks -fshrink-wrap -fshrink-wrap-separate -fsplit-wide-types -fssa-backprop -fssa-phiopt
     -ftree-bit-ccp -ftree-ccp -ftree-ch -ftree-coalesce-vars -ftree-copy-prop -ftree-dce
     -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-phiprop -ftree-pta -ftree-scev-cprop
     -ftree-sink -ftree-slsr -ftree-sra -ftree-ter -funit-at-a-time

 -O2 Optimize even more.  GCC performs nearly all supported optimizations that do not involve a space-speed
     tradeoff.  As compared to -O, this option increases both compilation time and the performance of the
     generated code.

     -O2 turns on all optimization flags specified by -O.  It also turns on the following optimization flags:

     -falign-functions  -falign-jumps -falign-labels  -falign-loops -fcaller-saves -fcode-hoisting
     -fcrossjumping -fcse-follow-jumps  -fcse-skip-blocks -fdelete-null-pointer-checks -fdevirtualize
     -fdevirtualize-speculatively -fexpensive-optimizations -fgcse  -fgcse-lm -fhoist-adjacent-loads
     -finline-small-functions -findirect-inlining -fipa-bit-cp  -fipa-cp  -fipa-icf -fipa-ra  -fipa-sra
     -fipa-vrp -fisolate-erroneous-paths-dereference -flra-remat -foptimize-sibling-calls -foptimize-strlen
     -fpartial-inlining -fpeephole2 -freorder-blocks-algorithm=stc -freorder-blocks-and-partition
     -freorder-functions -frerun-cse-after-loop -fschedule-insns  -fschedule-insns2 -fsched-interblock
     -fsched-spec -fstore-merging -fstrict-aliasing -fthread-jumps -ftree-builtin-call-dce -ftree-pre
     -ftree-switch-conversion  -ftree-tail-merge -ftree-vrp

     Please note the warning under -fgcse about invoking -O2 on programs that use computed gotos.

     NOTE: In Ubuntu 8.10 and later versions, -D_FORTIFY_SOURCE=2 is set by default, and is activated when -O
     is set to 2 or higher.  This enables additional compile-time and run-time checks for several libc
     functions.  To disable, specify either -U_FORTIFY_SOURCE or -D_FORTIFY_SOURCE=0.

 -O3 Optimize yet more.  -O3 turns on all optimizations specified by -O2 and also turns on the following
     optimization flags:

     -fgcse-after-reload -finline-functions -fipa-cp-clone -floop-interchange -floop-unroll-and-jam
     -fpeel-loops -fpredictive-commoning -fsplit-paths -ftree-loop-distribute-patterns
     -ftree-loop-distribution -ftree-loop-vectorize -ftree-partial-pre -ftree-slp-vectorize -funswitch-loops
     -fvect-cost-model -fversion-loops-for-strides

 -O0 Reduce compilation time and make debugging produce the expected results.  This is the default.

jinevening · 2021-09-24T01:16:48Z

Replace, Substitute # backend specific

IMHO, all Replace/Substitute passes can be turned on in O1 except the below ones (if compilation time is acceptable). The passes except the below ones are beneficial in most cases.

ReplaceMulAddWithDepthwiseConv - Backend specific (useful when we have a premature backend)
ReplaceSubWithAdd - Backend specific

mhs4670go · 2021-09-24T01:58:06Z

@jinevening Then, I'll make them O2. Actually, all replaced ops can be backend specific.

mhs4670go mentioned this issue Jan 28, 2021

Compiler FE: Remove optimization "--all" #5780

Closed

seanshpark mentioned this issue Feb 1, 2021

DRAFT: Compiler FE: Revise --all to --O1 #5885

Closed

This comment has been minimized.

Sign in to view

mhs4670go mentioned this issue Aug 18, 2021

[one-cmds] Introduce config file for optimization options #7513

Closed

This was referenced Mar 30, 2022

ONE-vscode/compile 3/30 YongseopKim/ONE-vscode#15

Closed

[UX/Compile] Organize the compile steps and each step's options Samsung/ONE-vscode#303

Closed

YongseopKim mentioned this issue May 24, 2022

[UX/Compile] Make default config files with default compile options Samsung/ONE-vscode#691

Closed

2 tasks

YongseopKim mentioned this issue Jun 7, 2022

[one-cmds] define optimizations #9191

Open

hyunsik-yoon mentioned this issue Jun 30, 2022

Analyze onecc options #9369

Open

jinevening mentioned this issue Jan 27, 2023

[one-optimize] Make a basic option sets for safe optimization (O1) #10381

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiler FE: Providing optimization group options #5784

Compiler FE: Providing optimization group options #5784

seanshpark commented Jan 22, 2021

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

mhs4670go commented Sep 17, 2021 •

edited

Loading

seanshpark commented Sep 17, 2021

jinevening commented Sep 23, 2021

mhs4670go commented Sep 23, 2021

lemmaa commented Sep 23, 2021

jinevening commented Sep 24, 2021

mhs4670go commented Sep 24, 2021

Compiler FE: Providing optimization group options #5784

Compiler FE: Providing optimization group options #5784

Comments

seanshpark commented Jan 22, 2021

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

mhs4670go commented Sep 17, 2021 • edited Loading

Fusing

Replace, Substitute

Remove

Constant folding

Value modification

Need user decision

O1

O2

seanshpark commented Sep 17, 2021

jinevening commented Sep 23, 2021

mhs4670go commented Sep 23, 2021

Fusing

Replace, Substitute

Remove

Constant folding

Value modification

O1

O2

Need user decision

lemmaa commented Sep 23, 2021

-O, -O1

-O2

-O0

jinevening commented Sep 24, 2021

mhs4670go commented Sep 24, 2021

mhs4670go commented Sep 17, 2021 •

edited

Loading