[AutoScheduler] Improve tuning with random cost model #6835

comaniac · 2020-11-03T19:17:48Z

When tuning a task with all invective operators on GPU, Ansor inlines all operators for better performance. However, this makes the expression complex and causes high overhead of lowering the TE schedule, which is required for cost model feature extraction. Since the tuning space of these tasks is relatively small, it is sufficient to use random cost model so that we can avoid the lowering overhead.

However, the flow of InitPopulation -> RandomStates only gave me one state. After diving into details, I found that all states generated by initial populations are the same in terms of state.ToStr(). The reasons I can think of are either 2,048 is insufficient to produce two different states, or ToStr() does not reflect different states. Here is an example output of initial population sampling with de-duplication (throw away the states if it is already in the out_states):

------------------------------------------------------------
-------------------------  [ Search ]
------------------------------------------------------------
Generate Sketches               #s: 1
Sample Iter: 5  #Pop: 1 #Target: 50     fail_ct: 10239  Time elapsed: 46.81
#Target has been reduced to 25 due to too many failures or duplications
Sample Iter: 10 #Pop: 1 #Target: 25     fail_ct: 20479  Time elapsed: 92.52
#Target has been reduced to 12 due to too many failures or duplications
Sample Iter: 15 #Pop: 1 #Target: 12     fail_ct: 30719  Time elapsed: 138.50
#Target has been reduced to 6 due to too many failures or duplications
Sample Iter: 20 #Pop: 1 #Target: 6      fail_ct: 40959  Time elapsed: 184.33
#Target has been reduced to 3 due to too many failures or duplications
Sample Iter: 25 #Pop: 1 #Target: 3      fail_ct: 51199  Time elapsed: 229.81
#Target has been reduced to 1 due to too many failures or duplications
Sample Initial Population       #s: 1   fail_ct: 53247  Time elapsed: 239.00

As can be seen, even 50K samples cannot even produce the second state. I added a set to check the number of unique states before and after infer bound, and here is the results:

------------------------------------------------------------
-------------------------  [ Search ]
------------------------------------------------------------
Generate Sketches               #s: 1
Unique before InferBound 1
Unique after InferBound 1
Unique before InferBound 1
Unique after InferBound 1
Unique before InferBound 1
Unique after InferBound 1
Unique before InferBound 1
Unique after InferBound 1
Unique before InferBound 1
Unique after InferBound 1
Sample Iter: 5  #Pop: 1 #Target: 25     fail_ct: 10239  Time elapsed: 95.63
Unique before InferBound 1
Unique after InferBound 1
#Target has been reduced to 12 due to too many failures or duplications
Unique before InferBound 1
Unique after InferBound 1
Unique before InferBound 1
Unique after InferBound 1
Unique before InferBound 1
Unique after InferBound 1
Unique before InferBound 1
Unique after InferBound 1
Sample Iter: 10 #Pop: 1 #Target: 12     fail_ct: 20479  Time elapsed: 190.91
Unique before InferBound 1
Unique after InferBound 1
#Target has been reduced to 6 due to too many failures or duplications
Unique before InferBound 1
Unique after InferBound 1
Unique before InferBound 1
Unique after InferBound 1
Unique before InferBound 1
Unique after InferBound 1
Unique before InferBound 1
Unique after InferBound 1
Sample Iter: 15 #Pop: 1 #Target: 6      fail_ct: 30719  Time elapsed: 285.54
Unique before InferBound 1
Unique after InferBound 1
#Target has been reduced to 3 due to too many failures or duplications
Unique before InferBound 1
Unique after InferBound 1
Unique before InferBound 1
Unique after InferBound 1
Unique before InferBound 1
Unique after InferBound 1
Unique before InferBound 1
Unique after InferBound 1
Sample Iter: 20 #Pop: 1 #Target: 3      fail_ct: 40959  Time elapsed: 380.32
Unique before InferBound 1
Unique after InferBound 1
#Target has been reduced to 1 due to too many failures or duplications
Sample Initial Population       #s: 1   fail_ct: 43007  Time elapsed: 399.26
Unique before InferBound 1
Unique after InferBound 1
GA Iter: 0      Max score: 0.0121       Min score: 0.0121       #Pop: 1 #M+: 0  #M-: 0
Unique before InferBound 1
Unique after InferBound 7
GA Iter: 1      Max score: 0.0121       Min score: 0.0121       #Pop: 7 #M+: 0  #M-: 0
Unique before InferBound 7
Unique after InferBound 12
GA Iter: 2      Max score: 0.0121       Min score: 0.0121       #Pop: 12 #M+: 0  #M-: 0
Unique before InferBound 12
...

This log implies two points:

We should call ToStr() after infer bound to make sure we can differentiate states. As we can see from the log, 2,048 candidates have the same ToStr() before infer bound, but we got 7 different ToStr() after infer bound.
Systematically mutate tile sizes is way more efficient than random sampling. As can be seen from the log, we can get 7 different states in only 2,048 mutated states, but we only get 1 state in 43,007 random states.

Accordingly, this PR runs evolutionary search even we are using the random cost model and here is the new log (initial poulation is set to 1 and retry is set to 2):

------------------------------------------------------------
-------------------------  [ Search ]
------------------------------------------------------------
Generate Sketches               #s: 1
Sample Initial Population       #s: 1   fail_ct: 2047   Time elapsed: 19.30
GA iteration number has been adjusted to 3 due to random cost model
GA Iter: 0      Max score: 0.3863       Min score: 0.3863       #Pop: 1 #M+: 0  #M-: 0
GA Iter: 3      Max score: 0.9023       Min score: 0.0289       #Pop: 12        #M+: 1422       #M-: 161
EvolutionarySearch              #s: 12  Time elapsed: 31.20
------------------------------------------------------------
-------------------------  [ Measure ]
------------------------------------------------------------
Get 12 programs for measure. (This may take a while)
............************
// skip
------------------------------------------------------------
-------------------------  [ Train cost model ]
------------------------------------------------------------
------------------------------------------------------------
-------------------------  [ Search ]
------------------------------------------------------------
Sample Initial Population       #s: 1   fail_ct: 2047   Time elapsed: 19.09
GA iteration number has been adjusted to 3 due to random cost model
GA Iter: 0      Max score: N/A  Min score: N/A  #Pop: 0 #M+: 0  #M-: 0
GA Iter: 3      Max score: N/A  Min score: N/A  #Pop: 0 #M+: 1425       #M-: 155
EvolutionarySearch              #s: 0   Time elapsed: 32.12
------------------------------------------------------------
-------------------------  [ Search ]
------------------------------------------------------------
Sample Initial Population       #s: 1   fail_ct: 2047   Time elapsed: 19.14
GA iteration number has been adjusted to 3 due to random cost model
GA Iter: 0      Max score: N/A  Min score: N/A  #Pop: 0 #M+: 0  #M-: 0
GA Iter: 3      Max score: N/A  Min score: N/A  #Pop: 0 #M+: 1431       #M-: 166
EvolutionarySearch              #s: 0   Time elapsed: 30.53
------------------------------------------------------------
-------------------------  [ Search ]
------------------------------------------------------------
Sample Initial Population       #s: 1   fail_ct: 2047   Time elapsed: 19.34
GA iteration number has been adjusted to 3 due to random cost model
GA Iter: 0      Max score: N/A  Min score: N/A  #Pop: 0 #M+: 0  #M-: 0
GA Iter: 3      Max score: N/A  Min score: N/A  #Pop: 0 #M+: 1427       #M-: 157
EvolutionarySearch              #s: 0   Time elapsed: 29.05
It seems all candidates in the search space have been measured.
------------------------------------------------------------
-------------------------  [ Done ]
------------------------------------------------------------

cc @merrymercy @jcf94

comaniac · 2020-11-03T23:06:21Z

@jcf94 please take a look at the failed test. It is because all measured schedules are invalid, and the root cause is that this PR changes the GA iteration to 3 when random cost model is used. It means this test is potentially flaky.

jcf94

Needs to re-trigger CI?

comaniac · 2020-11-09T09:17:41Z

Needs to re-trigger CI?

The problem is that after this PR, that unit test will only try very few candidates, and it will become flaky if all candidates are invalid (e.g., PTX error).

comaniac · 2020-11-10T17:47:50Z

@merrymercy @jcf94 I increased the measure trial from 2 to 10 to reduce the flaky possibility (please see the latest commit).

* fix * more fix * fix * revert * format * Update sketch_policy.cc * increase measure trial to avoid flaky

comaniac requested a review from merrymercy November 3, 2020 19:17

merrymercy self-assigned this Nov 4, 2020

jcf94 approved these changes Nov 9, 2020

View reviewed changes

comaniac added 6 commits November 9, 2020 19:59

fix

1186342

more fix

495d16f

fix

b59b437

revert

31fbd8f

format

b779c3c

Update sketch_policy.cc

f2666f7

comaniac force-pushed the ansor_fix_simple branch from 7a95727 to f2666f7 Compare November 9, 2020 19:59

increase measure trial to avoid flaky

69b60aa

merrymercy approved these changes Nov 11, 2020

View reviewed changes

merrymercy merged commit d03c0c0 into apache:main Nov 11, 2020

comaniac deleted the ansor_fix_simple branch November 11, 2020 02:33

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 2, 2020

[AutoScheduler] Improve tuning with random cost model (apache#6835)

1f4a5d6

* fix * more fix * fix * revert * format * Update sketch_policy.cc * increase measure trial to avoid flaky

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 4, 2020

[AutoScheduler] Improve tuning with random cost model (apache#6835)

746ad51

* fix * more fix * fix * revert * format * Update sketch_policy.cc * increase measure trial to avoid flaky

trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Dec 4, 2020

[AutoScheduler] Improve tuning with random cost model (apache#6835)

4388409

* fix * more fix * fix * revert * format * Update sketch_policy.cc * increase measure trial to avoid flaky

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoScheduler] Improve tuning with random cost model #6835

[AutoScheduler] Improve tuning with random cost model #6835

comaniac commented Nov 3, 2020

comaniac commented Nov 3, 2020

jcf94 left a comment

comaniac commented Nov 9, 2020

comaniac commented Nov 10, 2020

[AutoScheduler] Improve tuning with random cost model #6835

[AutoScheduler] Improve tuning with random cost model #6835

Conversation

comaniac commented Nov 3, 2020

comaniac commented Nov 3, 2020

jcf94 left a comment

Choose a reason for hiding this comment

comaniac commented Nov 9, 2020

comaniac commented Nov 10, 2020