[MetaSchedule] Add Profiler Support For Tuning Efficiency Optimization #11486

zxybazh · 2022-05-27T05:40:16Z

Similar to our current performance statistics table, a tuning efficiency statistics table could be extremely helpful. The output result looks like the following example:

Tuning Efficiency Statistics:
 ID |                   Module Name | Time Usage(s) | Time Ratio(%) 
--------------------------------------------------------------------
  0 |                    Total Time |        2.2824 |      100.0000 
  1 | TaskScheduler::InitializeTask |        2.2819 |       99.9786 
--------------------------------------------------------------------

python/tvm/target/target.py

junrushao · 2022-05-27T17:53:49Z

The annoying logging messages are removed by the #11478, so don't worry about it

src/meta_schedule/search_strategy/search_strategy.cc

include/tvm/meta_schedule/search_strategy.h

zxybazh · 2022-06-09T08:07:51Z

Now the profiler supports context from python side.
Example:

"""Test sleep"""
import time
from tvm.meta_schedule import Profiler

with Profiler() as profiler:
    with profiler.timeit("Level0"):
        time.sleep(2)
        with profiler.timeit("Level1"):
            time.sleep(3)

for t, v in profiler.get().items():
    print(f"{t}: {v*60} sec")

Output:

Level1: 3.00305f64 sec
Level0: 5.00411f64 sec

src/meta_schedule/profiler.cc

python/tvm/meta_schedule/profiler.py

src/meta_schedule/profiler.cc

zxybazh · 2022-06-10T20:30:06Z

Hi the PR is ready for another round of review. Added a lot of scope timers.
Now the result looks like the following table:

Name	TimeCost(s)
EvolutionarySearch::GenerateMeasureCandidates:	475.862
EvolutionarySearch::GenerateMeasureCandidates::EvolveWithCostModel:	445.189
TaskScheduler::Tune::_SendToRunner:	231.624
PyRunner::Run:	231.623
TaskScheduler::Tune::_SendToBuilder:	165.24
PyBuilder::Build:	165.238
EvolutionarySearch::GenerateMeasureCandidates::EvolveWithCostModel::PredictNormalizedScore:	82.9979
PyCostModel::Predict:	82.5159
PerStoreFeature::ExtractFrom:	73.8773
RewriteParallelVectorizeUnroll::Apply:	58.5822
MutateComputeLocation::Apply:	31.0864
RewriteReductionBlock::Apply:	28.7786
EvolutionarySearch::GenerateMeasureCandidates::SampleInitPopulation:	24.7748
UpdateCostModel::Apply:	18.361
MutateParallel::Apply:	10.0166
AddToDatabase::Apply:	8.6062
JSONDatabase::CommitTuningRecord:	8.59636
EvolutionarySearch::PreTuning:	2.16647
GradientBased::GradientCalculation:	1.4008
EchoStatistics::Apply:	0.475711
EvolutionarySearch::GenerateMeasureCandidates::PickBestFromDatabase:	0.279633
EvolutionarySearch::GenerateMeasureCandidates::PickWithEpsGreedy:	0.257396
MutateTileSize::Apply:	0.220804
PyCostModel::Save:	0.174295
DisallowDynamicLoop::Apply:	0.118625
EvolutionarySearch::PostTuning:	0.113904
PostOrderApply::GenerateDesignSpace:	0.110077
ApplyHistoryBest::Query:	0.0800386
RemoveBuildArtifact::Apply:	0.0746311
RandomComputeLocation::Apply:	0.063535
JSONDatabase::CommitWorkload:	0.0499996
PyRunnerFuture::Result:	0.0243004
MultiLevelTiling::Apply:	0.0148662
JSONDatabase::GetTopK:	0.0119345
MutateUnroll::Apply:	0.00650915
JSONDatabase::HasWorkload:	0.00433721
AutoInline::Apply:	0.00333696
ParallelizeVectorizeUnroll::Apply:	0.00166319
AddRFactor::Apply:	0.000863387
EvolutionarySearch::InitializeWithTuneContext:	1.99E-05
PyRunnerFuture::Done:	8.99E-06
PostOrderApply::InitializeWithTuneContext:	7.97E-06
EvolutionarySearch::NotifyRunnerResults:	7.28E-06

zxybazh · 2022-06-11T06:26:51Z

Introduced a new macro TVMTimeScope to easily use ScopedTimer on c++ side.
Usage:

TVMTimeScope(TuneContextNode::_SendtoBuilder);
// same as
ScopedTimer timer = ProfilerNode::TimeIt("TuneContextNode::_SendtoBuilder");

Updated the timers to make sure we cover all the parts under task scheduler, without overlapping or missing parts (except minor parts like O(1) level operation). Current table looks like the following one (using dataframe & tabulate).

+----+-------------------------------------------------------------------------------+-----------------+
|    | Name                                                                          |   TimeCost(Min) |
|----+-------------------------------------------------------------------------------+-----------------|
| 11 | EvolutionarySearchNode::NotifyRunnerResults                                   |     1.09333e-07 |
| 13 | GradientBased::GradientCalculation                                            |     1.68533e-07 |
| 27 | GradientBasedNode::JoinRunningTask                                            |     2.67667e-07 |
|  5 | EvolutionarySearchNode::State::PickBestFromDatabase                           |     2.78533e-07 |
| 15 | JSONDatabaseNode::HasWorkload                                                 |     8.60367e-07 |
| 20 | JSONDatabaseNode::GetTopK                                                     |     9.2985e-07  |
| 32 | TuneContextNode::_SendToRunner                                                |     1.01758e-06 |
| 19 | TuneContextNode::_SendToBuilder                                               |     1.98973e-06 |
| 18 | EvolutionarySearchNode::State::PickBestFromDatabase::ApplyPostProcs           |     1.27156e-05 |
|  7 | PyRunnerFutureNode::Result                                                    |     2.03548e-05 |
| 36 | EvolutionarySearchNode::State::GenerateMeasureCandidates                      |     2.37273e-05 |
|  8 | RemoveBuildArtifactNode::Apply                                                |     3.77088e-05 |
|  0 | ApplyHistoryBestNode::Query                                                   |     5.09462e-05 |
|  6 | TaskSchedulerNode::Tune                                                       |     5.84982e-05 |
| 24 | GradientBasedNode::TuningStatistics                                           |     9.66866e-05 |
| 30 | EchoStatisticsNode::Apply                                                     |     0.000178363 |
| 31 | EvolutionarySearchNode::PostTuning                                            |     0.000304334 |
| 23 | TuneContextNode::Initialize                                                   |     0.000716826 |
| 28 | JSONDatabaseNode::CommitWorkload                                              |     0.00073313  |
|  4 | PyCostModelNode::Save                                                         |     0.000864831 |
|  3 | PyCostModelNode::Predict                                                      |     0.000907736 |
|  9 | EvolutionarySearchNode::State::PickWithEpsGreedy                              |     0.000951621 |
| 14 | TuneContextNode::_ClearMeasureState                                           |     0.00118936  |
| 22 | TaskSchedulerNode::InitializeTask                                             |     0.00155487  |
| 35 | PostOrderApplyNode::GenerateDesignSpace                                       |     0.00172804  |
| 34 | PerStoreFeatureNode::ExtractFrom                                              |     0.00301362  |
|  1 | JSONDatabaseNode::CommitTuningRecord                                          |     0.00362623  |
| 26 | AddToDatabaseNode::Apply                                                      |     0.00371405  |
| 21 | EvolutionarySearchNode::AssembleCandidates                                    |     0.00616255  |
| 10 | EvolutionarySearchNode::PredictNormalizedScore                                |     0.00617349  |
| 33 | PyCostModelNode::Update                                                       |     0.0346064   |
| 25 | UpdateCostModelNode::Apply                                                    |     0.0346072   |
| 16 | EvolutionarySearchNode::PreTuning                                             |     0.0350402   |
| 29 | PyRunner::Run                                                                 |     0.0763377   |
|  2 | EvolutionarySearchNode::State::SampleInitPopulation                           |     0.358492    |
| 12 | PyBuilderNode::Build                                                          |     0.369279    |
| 17 | EvolutionarySearchNode::State::EvolveWithCostModel                            |     1.23361     |
| 37 | EvolutionarySearchNode::State::EvolveWithCostModel::FindCandidatesViaMutation |     3.19194     |
+----+-------------------------------------------------------------------------------+-----------------+

zxybazh · 2022-06-11T06:28:00Z

Ready for review.

zxybazh · 2022-06-11T06:49:42Z

An alternate way could be creating a preemptive interface that instead of counting time based solely on scopes, each timer would only count time elapsed in this scope and not in any other timer that created during this scope.

For example if we want to count func A & B's time cost, and func A calls B in the scope than we might have to implement it as:

Func A () {
     {
       TVMTimeScope(XXX::A);
        …
     }
     call Func B;
     {
        TVMTimeScope(XXX::A);
        …
     }
}

Func B () {
    TVMTimeScope(XXX::B);
    …
}

With preemptive timer we can simple implement as

Func A () {
    TVMTimeScope(XXX::A);
    …
    call Func B;
    …
}

Func B () {
    TVMTimeScope(XXX::B);
    …
}

Implementation won’t be super hard, we only need to store current timestamps in a separate stack (I think we cannot reuse the context management timestamp stack because our method will change the semantics of profiler context).

The disadvantage is this method introduces some amgiuation in how profiler works and needs an extra stack.

include/tvm/meta_schedule/profiler.h

src/meta_schedule/feature_extractor/per_store_feature.cc

apache#11486) Co-authored-by: Junru Shao <junrushao1994@gmail.com>

kparzysz-quic · 2022-06-24T15:49:15Z

python/tvm/meta_schedule/profiler.py

+# specific language governing permissions and limitations
+# under the License.
+"""A context manager that profiles tuning time cost for different parts."""
+from __future__ import annotations


This is only available in Python 3.7+, and fails with 3.6.

This breaks our downstream builds. @zxybazh @junrushao1994

Thanks for reporting this issue! Will remove for compatibility.

junrushao reviewed May 27, 2022

View reviewed changes

python/tvm/target/target.py Outdated Show resolved Hide resolved

junrushao reviewed May 27, 2022

View reviewed changes

src/meta_schedule/search_strategy/search_strategy.cc Outdated Show resolved Hide resolved

junrushao reviewed May 27, 2022

View reviewed changes

include/tvm/meta_schedule/search_strategy.h Outdated Show resolved Hide resolved

junrushao reviewed May 27, 2022

View reviewed changes

include/tvm/meta_schedule/search_strategy.h Outdated Show resolved Hide resolved

zxybazh force-pushed the feature/2022-05-25/tuning-statistics branch from 59decb7 to e9626bc Compare May 28, 2022 04:57

zxybazh force-pushed the feature/2022-05-25/tuning-statistics branch 2 times, most recently from a9badbc to 3c925c8 Compare June 9, 2022 08:06

junrushao reviewed Jun 10, 2022

View reviewed changes

src/meta_schedule/profiler.cc Outdated Show resolved Hide resolved

junrushao reviewed Jun 10, 2022

View reviewed changes

python/tvm/meta_schedule/profiler.py Outdated Show resolved Hide resolved

junrushao reviewed Jun 10, 2022

View reviewed changes

src/meta_schedule/profiler.cc Outdated Show resolved Hide resolved

zxybazh force-pushed the feature/2022-05-25/tuning-statistics branch from 3c925c8 to 716e15d Compare June 10, 2022 05:35

zxybazh force-pushed the feature/2022-05-25/tuning-statistics branch from f1ebebc to 427f62f Compare June 11, 2022 07:58

zxybazh changed the title ~~[MetaSchedule] Add Tuning Efficiency Statistcs~~ [MetaSchedule] Add Profiler Support For Tuning Efficiency Optimization Jun 12, 2022

junrushao reviewed Jun 12, 2022

View reviewed changes

include/tvm/meta_schedule/profiler.h Outdated Show resolved Hide resolved

junrushao reviewed Jun 12, 2022

View reviewed changes

src/meta_schedule/feature_extractor/per_store_feature.cc Outdated Show resolved Hide resolved

junrushao force-pushed the feature/2022-05-25/tuning-statistics branch 2 times, most recently from 5e33189 to b921a16 Compare June 13, 2022 00:27

[MetaSchedule] Add Profiler Support For Tuning Efficiency Optimization

b8cfc67

junrushao force-pushed the feature/2022-05-25/tuning-statistics branch from b921a16 to b8cfc67 Compare June 13, 2022 00:29

junrushao added 5 commits June 12, 2022 23:24

make C++ impl right

414c021

fix python api and unittests

81dad82

time evo search

a00452b

Add table

07a17c3

fix lint

4d9b2ee

junrushao approved these changes Jun 13, 2022

View reviewed changes

junrushao merged commit e61ad7a into apache:main Jun 13, 2022

juda pushed a commit to juda/tvm that referenced this pull request Jun 21, 2022

[MetaSchedule] Add Profiler Support For Tuning Efficiency Optimization (

ac24a31

apache#11486) Co-authored-by: Junru Shao <junrushao1994@gmail.com>

kparzysz-quic reviewed Jun 24, 2022

View reviewed changes

driazati mentioned this pull request Jul 14, 2022

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MetaSchedule] Add Profiler Support For Tuning Efficiency Optimization #11486

[MetaSchedule] Add Profiler Support For Tuning Efficiency Optimization #11486

zxybazh commented May 27, 2022 •

edited

junrushao commented May 27, 2022

zxybazh commented Jun 9, 2022 •

edited

zxybazh commented Jun 10, 2022 •

edited

zxybazh commented Jun 11, 2022 •

edited

zxybazh commented Jun 11, 2022

zxybazh commented Jun 11, 2022 •

edited

kparzysz-quic Jun 24, 2022

zxybazh Jun 24, 2022

[MetaSchedule] Add Profiler Support For Tuning Efficiency Optimization #11486

[MetaSchedule] Add Profiler Support For Tuning Efficiency Optimization #11486

Conversation

zxybazh commented May 27, 2022 • edited

junrushao commented May 27, 2022

zxybazh commented Jun 9, 2022 • edited

zxybazh commented Jun 10, 2022 • edited

zxybazh commented Jun 11, 2022 • edited

zxybazh commented Jun 11, 2022

zxybazh commented Jun 11, 2022 • edited

kparzysz-quic Jun 24, 2022

Choose a reason for hiding this comment

zxybazh Jun 24, 2022

Choose a reason for hiding this comment

zxybazh commented May 27, 2022 •

edited

zxybazh commented Jun 9, 2022 •

edited

zxybazh commented Jun 10, 2022 •

edited

zxybazh commented Jun 11, 2022 •

edited

zxybazh commented Jun 11, 2022 •

edited