Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MetaSchedule] Add Profiler Support For Tuning Efficiency Optimization #11486

Merged
merged 6 commits into from Jun 13, 2022

Conversation

zxybazh
Copy link
Member

@zxybazh zxybazh commented May 27, 2022

Similar to our current performance statistics table, a tuning efficiency statistics table could be extremely helpful. The output result looks like the following example:

Tuning Efficiency Statistics:
 ID |                   Module Name | Time Usage(s) | Time Ratio(%) 
--------------------------------------------------------------------
  0 |                    Total Time |        2.2824 |      100.0000 
  1 | TaskScheduler::InitializeTask |        2.2819 |       99.9786 
--------------------------------------------------------------------

@junrushao
Copy link
Member

The annoying logging messages are removed by the #11478, so don't worry about it

@zxybazh zxybazh force-pushed the feature/2022-05-25/tuning-statistics branch from 59decb7 to e9626bc Compare May 28, 2022 04:57
@zxybazh zxybazh force-pushed the feature/2022-05-25/tuning-statistics branch 2 times, most recently from a9badbc to 3c925c8 Compare June 9, 2022 08:06
@zxybazh
Copy link
Member Author

zxybazh commented Jun 9, 2022

Now the profiler supports context from python side.
Example:

"""Test sleep"""
import time
from tvm.meta_schedule import Profiler

with Profiler() as profiler:
    with profiler.timeit("Level0"):
        time.sleep(2)
        with profiler.timeit("Level1"):
            time.sleep(3)

for t, v in profiler.get().items():
    print(f"{t}: {v*60} sec")

Output:

Level1: 3.00305f64 sec
Level0: 5.00411f64 sec

@zxybazh zxybazh force-pushed the feature/2022-05-25/tuning-statistics branch from 3c925c8 to 716e15d Compare June 10, 2022 05:35
@zxybazh
Copy link
Member Author

zxybazh commented Jun 10, 2022

Hi the PR is ready for another round of review. Added a lot of scope timers.
Now the result looks like the following table:

Name TimeCost(s)
EvolutionarySearch::GenerateMeasureCandidates: 475.862
EvolutionarySearch::GenerateMeasureCandidates::EvolveWithCostModel: 445.189
TaskScheduler::Tune::_SendToRunner: 231.624
PyRunner::Run: 231.623
TaskScheduler::Tune::_SendToBuilder: 165.24
PyBuilder::Build: 165.238
EvolutionarySearch::GenerateMeasureCandidates::EvolveWithCostModel::PredictNormalizedScore: 82.9979
PyCostModel::Predict: 82.5159
PerStoreFeature::ExtractFrom: 73.8773
RewriteParallelVectorizeUnroll::Apply: 58.5822
MutateComputeLocation::Apply: 31.0864
RewriteReductionBlock::Apply: 28.7786
EvolutionarySearch::GenerateMeasureCandidates::SampleInitPopulation: 24.7748
UpdateCostModel::Apply: 18.361
MutateParallel::Apply: 10.0166
AddToDatabase::Apply: 8.6062
JSONDatabase::CommitTuningRecord: 8.59636
EvolutionarySearch::PreTuning: 2.16647
GradientBased::GradientCalculation: 1.4008
EchoStatistics::Apply: 0.475711
EvolutionarySearch::GenerateMeasureCandidates::PickBestFromDatabase: 0.279633
EvolutionarySearch::GenerateMeasureCandidates::PickWithEpsGreedy: 0.257396
MutateTileSize::Apply: 0.220804
PyCostModel::Save: 0.174295
DisallowDynamicLoop::Apply: 0.118625
EvolutionarySearch::PostTuning: 0.113904
PostOrderApply::GenerateDesignSpace: 0.110077
ApplyHistoryBest::Query: 0.0800386
RemoveBuildArtifact::Apply: 0.0746311
RandomComputeLocation::Apply: 0.063535
JSONDatabase::CommitWorkload: 0.0499996
PyRunnerFuture::Result: 0.0243004
MultiLevelTiling::Apply: 0.0148662
JSONDatabase::GetTopK: 0.0119345
MutateUnroll::Apply: 0.00650915
JSONDatabase::HasWorkload: 0.00433721
AutoInline::Apply: 0.00333696
ParallelizeVectorizeUnroll::Apply: 0.00166319
AddRFactor::Apply: 0.000863387
EvolutionarySearch::InitializeWithTuneContext: 1.99E-05
PyRunnerFuture::Done: 8.99E-06
PostOrderApply::InitializeWithTuneContext: 7.97E-06
EvolutionarySearch::NotifyRunnerResults: 7.28E-06

@zxybazh
Copy link
Member Author

zxybazh commented Jun 11, 2022

Introduced a new macro TVMTimeScope to easily use ScopedTimer on c++ side.
Usage:

TVMTimeScope(TuneContextNode::_SendtoBuilder);
// same as
ScopedTimer timer = ProfilerNode::TimeIt("TuneContextNode::_SendtoBuilder");

Updated the timers to make sure we cover all the parts under task scheduler, without overlapping or missing parts (except minor parts like O(1) level operation). Current table looks like the following one (using dataframe & tabulate).

+----+-------------------------------------------------------------------------------+-----------------+
|    | Name                                                                          |   TimeCost(Min) |
|----+-------------------------------------------------------------------------------+-----------------|
| 11 | EvolutionarySearchNode::NotifyRunnerResults                                   |     1.09333e-07 |
| 13 | GradientBased::GradientCalculation                                            |     1.68533e-07 |
| 27 | GradientBasedNode::JoinRunningTask                                            |     2.67667e-07 |
|  5 | EvolutionarySearchNode::State::PickBestFromDatabase                           |     2.78533e-07 |
| 15 | JSONDatabaseNode::HasWorkload                                                 |     8.60367e-07 |
| 20 | JSONDatabaseNode::GetTopK                                                     |     9.2985e-07  |
| 32 | TuneContextNode::_SendToRunner                                                |     1.01758e-06 |
| 19 | TuneContextNode::_SendToBuilder                                               |     1.98973e-06 |
| 18 | EvolutionarySearchNode::State::PickBestFromDatabase::ApplyPostProcs           |     1.27156e-05 |
|  7 | PyRunnerFutureNode::Result                                                    |     2.03548e-05 |
| 36 | EvolutionarySearchNode::State::GenerateMeasureCandidates                      |     2.37273e-05 |
|  8 | RemoveBuildArtifactNode::Apply                                                |     3.77088e-05 |
|  0 | ApplyHistoryBestNode::Query                                                   |     5.09462e-05 |
|  6 | TaskSchedulerNode::Tune                                                       |     5.84982e-05 |
| 24 | GradientBasedNode::TuningStatistics                                           |     9.66866e-05 |
| 30 | EchoStatisticsNode::Apply                                                     |     0.000178363 |
| 31 | EvolutionarySearchNode::PostTuning                                            |     0.000304334 |
| 23 | TuneContextNode::Initialize                                                   |     0.000716826 |
| 28 | JSONDatabaseNode::CommitWorkload                                              |     0.00073313  |
|  4 | PyCostModelNode::Save                                                         |     0.000864831 |
|  3 | PyCostModelNode::Predict                                                      |     0.000907736 |
|  9 | EvolutionarySearchNode::State::PickWithEpsGreedy                              |     0.000951621 |
| 14 | TuneContextNode::_ClearMeasureState                                           |     0.00118936  |
| 22 | TaskSchedulerNode::InitializeTask                                             |     0.00155487  |
| 35 | PostOrderApplyNode::GenerateDesignSpace                                       |     0.00172804  |
| 34 | PerStoreFeatureNode::ExtractFrom                                              |     0.00301362  |
|  1 | JSONDatabaseNode::CommitTuningRecord                                          |     0.00362623  |
| 26 | AddToDatabaseNode::Apply                                                      |     0.00371405  |
| 21 | EvolutionarySearchNode::AssembleCandidates                                    |     0.00616255  |
| 10 | EvolutionarySearchNode::PredictNormalizedScore                                |     0.00617349  |
| 33 | PyCostModelNode::Update                                                       |     0.0346064   |
| 25 | UpdateCostModelNode::Apply                                                    |     0.0346072   |
| 16 | EvolutionarySearchNode::PreTuning                                             |     0.0350402   |
| 29 | PyRunner::Run                                                                 |     0.0763377   |
|  2 | EvolutionarySearchNode::State::SampleInitPopulation                           |     0.358492    |
| 12 | PyBuilderNode::Build                                                          |     0.369279    |
| 17 | EvolutionarySearchNode::State::EvolveWithCostModel                            |     1.23361     |
| 37 | EvolutionarySearchNode::State::EvolveWithCostModel::FindCandidatesViaMutation |     3.19194     |
+----+-------------------------------------------------------------------------------+-----------------+

@zxybazh
Copy link
Member Author

zxybazh commented Jun 11, 2022

Ready for review.

@zxybazh
Copy link
Member Author

zxybazh commented Jun 11, 2022

An alternate way could be creating a preemptive interface that instead of counting time based solely on scopes, each timer would only count time elapsed in this scope and not in any other timer that created during this scope.

For example if we want to count func A & B's time cost, and func A calls B in the scope than we might have to implement it as:

Func A () {
     {
       TVMTimeScope(XXX::A);
        …
     }
     call Func B;
     {
        TVMTimeScope(XXX::A);
        …
     }
}

Func B () {
    TVMTimeScope(XXX::B);
    …
}

With preemptive timer we can simple implement as

Func A () {
    TVMTimeScope(XXX::A);
    …
    call Func B;
    …
}

Func B () {
    TVMTimeScope(XXX::B);
    …
}

Implementation won’t be super hard, we only need to store current timestamps in a separate stack (I think we cannot reuse the context management timestamp stack because our method will change the semantics of profiler context).

The disadvantage is this method introduces some amgiuation in how profiler works and needs an extra stack.

@zxybazh zxybazh force-pushed the feature/2022-05-25/tuning-statistics branch from f1ebebc to 427f62f Compare June 11, 2022 07:58
@zxybazh zxybazh changed the title [MetaSchedule] Add Tuning Efficiency Statistcs [MetaSchedule] Add Profiler Support For Tuning Efficiency Optimization Jun 12, 2022
@junrushao junrushao force-pushed the feature/2022-05-25/tuning-statistics branch 2 times, most recently from 5e33189 to b921a16 Compare June 13, 2022 00:27
@junrushao junrushao force-pushed the feature/2022-05-25/tuning-statistics branch from b921a16 to b8cfc67 Compare June 13, 2022 00:29
@junrushao junrushao merged commit e61ad7a into apache:main Jun 13, 2022
juda pushed a commit to juda/tvm that referenced this pull request Jun 21, 2022
# specific language governing permissions and limitations
# under the License.
"""A context manager that profiles tuning time cost for different parts."""
from __future__ import annotations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only available in Python 3.7+, and fails with 3.6.

This breaks our downstream builds. @zxybazh @junrushao1994

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reporting this issue! Will remove for compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants