[AUTOTVM] Core part of auto-tuning module #1312

merrymercy · 2018-06-21T20:35:59Z

This PR is the step 1 in #1311

It includes

Tuning space definition API
Basic tuners: RandomTuner, GridSearchTuner, XGBTuner
Measurement executor (local mode and distributed mode through rpc)
Tuning results log file
Tutorial on how to write tunable schedule
- Get stareted
- A cuda conv2d example

Some code is contributed by @eqy and @tqchen

tqchen · 2018-06-23T15:57:58Z

@eqy @Laurawly please review

tqchen · 2018-06-23T15:58:19Z

docs/api/python/autotvm.rst

@@ -0,0 +1,83 @@
+Auto-tuning API


just use tvm.autotvm

tqchen · 2018-06-23T16:00:38Z

docs/api/python/autotvm.rst

+~~~~~~~~~~~~~~~~~~~~
+.. automodule:: tvm.autotvm.measure.measure
+
+.. class:: tvm.autotvm.MeasureInput(target, task, config)


Let us keep named tuple document in place in source. DO

class X(namedtuple("X", fields)): """docstring """ __slots__ = ()

tqchen · 2018-06-23T16:01:26Z

docs/api/python/autotvm.rst

+    :members:
+
+tvm.autotvm.task
+~~~~~~~~~~~~~~~~~~~~~


rst requires the underline to have the same length of the title

tqchen · 2018-06-23T16:01:42Z

docs/conf.py

@@ -189,6 +189,7 @@ def run_doxygen(folder):
 subsection_order = ExplicitOrder(
    ['../tutorials/language',
     '../tutorials/optimize',
+     '../tutorials/autotuning',


autotuning->autotvm

tqchen · 2018-06-23T16:15:09Z

python/tvm/autotvm/database.py

+        """
+        partial_results = [None] * len(measure_inputs)
+        unsaved = list()
+        for i in range(len(measure_inputs)):


for i, inp in enumerate(measure_inputs)

tqchen · 2018-06-23T16:43:19Z

python/tvm/autotvm/task/task.py

+    return _do_reg
+
+
+def create(func_name, args):


create and create_task can be super confusing, I would prefer to rename create_task-> create and remove create.

Allow pass in None in target, and call init_space later which returns a new task

tqchen · 2018-06-23T16:49:10Z

python/tvm/autotvm/task/task.py

+        Its argument should be hashable values.
+        Its return value should be a Tuple(Schedule, Array of Tensor)
+
+    Returns


need an example block here.

simple_template deserves its own file.
Need to explain why it is not in template

tqchen · 2018-06-23T16:49:50Z

python/tvm/autotvm/template/space.py

@@ -0,0 +1,830 @@
+# pylint: disable=too-few-public-methods,invalid-name,unused-argument,arguments-differ


be careful about arguments-differ

tqchen · 2018-06-23T16:50:31Z

python/tvm/autotvm/template/space.py

+    pass
+
+"""
+We can regard our schedule code as a transformation graph of axes.


put this as docstring of Transform space

tqchen · 2018-06-23T16:51:47Z

python/tvm/autotvm/tuner/model_based_basetuner.py

+    def has_next(self):
+        return len(self.visited) < len(self.space)
+
+    def save_state(self, filename):


save/load state are not pythonic

Instead, directly make them pickable. https://docs.python.org/2/library/pickle.html#object.__getstate__

tqchen · 2018-06-23T16:57:15Z

please always make docker change first as a separate PR, this is used to avoid out of space error we might encounter

tqchen · 2018-06-23T17:01:19Z

tutorials/autotuning/tune_simple_template.py

+# In autotvm, we can define a tunable parameter, or a "knob" for such kind of value.
+
+# Matmul V1: List candidate values
+@autotvm.simple_template  # 1. use a decorator 


need a simple template testcase

tqchen · 2018-06-23T18:03:58Z

tutorials/autotuning/tune_cuda_conv2d.py

+# ---------------------------------
+# There are plenty of useful schedule primitives in tvm. You can also find 
+# some tutorials that describe them in more details, such as 
+# (1). `Optimizing Conv2d on CUDA GPU <https://docs.tvm.ai/tutorials/optimize/opt_conv_cuda.html#sphx-glr-tutorials-optimize-opt-conv-cuda-py>`_


do internal doc reference using ref

merrymercy · 2018-06-26T01:28:05Z

psutil is missing in one node http://mode-gpu.cs.washington.edu:8080/blue/organizations/jenkins/dmlc%2Ftvm/detail/PR-1312/17/pipeline/139/

tqchen · 2018-06-26T02:35:49Z

python/tvm/autotvm/feature/loopvar.py

+    return names
+
+
+_get_buffer_curve_sample_flatten = get_global_func(


avoid get global function eagerly at root namespae, won't work for the runtime only env

tqchen · 2018-06-26T02:36:35Z

python/tvm/autotvm/tuner/metric.py

+            keep += 1
+        ret[i] = keep + 1
+    return ret / len(trial_ranks)
+


two lines between global functions

merrymercy · 2018-07-04T20:07:03Z

@tqchen python package psutil is missing in this node http://mode-gpu.cs.washington.edu:8080/blue/organizations/jenkins/dmlc%2Ftvm/detail/PR-1312/17/pipeline/139/

tqchen · 2018-07-05T16:07:25Z

@were please review

were · 2018-07-05T05:42:54Z

python/tvm/autotvm/feature.py

+    i.e. Do not eliminate loop with extent of 1, do not vectorize, unroll or inject virtual threads
+    """
+    binds, _ = build_module.get_binds(args, binds)
+    # cfg = current_build_config()


Why do you still keep these comments? The unuseful code should be eliminated. If you really want it, you can find it in the version control.

were · 2018-07-06T01:05:36Z

tutorials/autotvm/tune_simple_template.py

+# back to use low level API.
+
+@autotvm.template
+def matmul(N, L, M, dtype):


Every time you need to rewrite the whole algorithm part, which is redundant. Why don't you get it from topi.dense (or wrap it up in some your own function in this .py) and apply some default schedule first?

I think the intent of the simple template is to provide a standalone example e.g., to tune an operator that has not been upstreamed into topi yet. The example uses a common operator, but that is just for illustration purposes.

My point is that rewriting the op weaken the point you want to make. We can just take the advantage of decoupling algorithm description and scheduling.

The core part of autotvm is independent of topi. We don't want to introduce topi for this basic tutorial.

were · 2018-07-06T01:17:27Z

python/tvm/autotvm/tuner/sa_model_optimizer.py

+
+            new_scores = model.predict(new_points)
+
+            ac_prob = np.exp((new_scores - scores) / t)


I am a little bit curious how you normalize this score and temperature. I wrote a sa before: it seems when the temperature is high, it just acts like a random search. The acc ratio is too high nearly 100% every time.

I normalize the score to [0,1]

were · 2018-07-06T01:20:12Z

python/tvm/autotvm/tuner/ga_tuner.py

+    def update(self, inputs, results):
+        for inp, res in zip(inputs, results):
+            if res.error_no == 0:
+                y = inp.task.flop / np.mean(res.costs)


Finally, relying on the parameter of the hardware mode is resolved.

I cannot understand this comment, can you elaborate?

merrymercy · 2018-07-11T02:04:27Z

ready for merge

tqchen · 2018-07-11T02:37:15Z

if there is no further comments, I am going to merge this in an hour

kazum · 2018-07-11T03:06:26Z

python/tvm/autotvm/measure/measure_methods.py

+                msg = msg[:msg.index("Stack trace returned")]
+            res_pack.append(MeasureResult((RuntimeError(msg),),
+                                          MeasureErrorNo.COMPILE_HOST,
+                                          tstamp - tic, tstamp))


When gpu_verify_pass() raises InstantiationError, we reach here but it looks a bit confusing.
How about doing something like as follows to create a more appropriate result?

try: # extract error information _e, _msg = exc.message.split('\n')[-2].split(': ', 1) if _e == "InstantiationError": res_pack.append(MeasureResult(_msg, MeasureErrorNo.INSTANTIATION_ERROR, tstamp - tic, tstamp)) continue except Error as _: pass

kazum · 2018-07-11T03:13:15Z

python/tvm/autotvm/measure/measure_methods.py

+    def verify_pass(stmt):
+        valid = ir_pass.VerifyGPUCode(stmt, kwargs)
+        if not valid:
+            raise InstantiationError("invalid gpu kernel")


I think the message should be more informative. "Skip execution because of Invalid gpu kernel config." or something like that?

tqchen · 2018-07-11T05:15:25Z

OK, @merrymercy please act on @kazum 's comments

Laurawly · 2018-07-11T18:09:11Z

tutorials/autotvm/tune_cuda_conv2d.py

+**Author**: `Lianmin Zheng <https://https://github.com/merrymercy>`_
+
+This is an advanced tutorial for writing high performance tunable template for 
+CUDA GPU. By runing auto-tuner on this template, we can outperform the


Replacing CUDA GPU with Nvidia GPUs is more accurate.

Laurawly · 2018-07-11T18:11:37Z

tutorials/autotvm/tune_cuda_conv2d.py

+# the techniques used in these tutorials. Then we rely on the efficient auto-tuner
+# to search through this space and pick some good configurations.
+# 
+# If you are familiar with wring cude schedule, you can find the following


Spelling typos.

merrymercy · 2018-07-11T23:03:47Z

ready for merge

tqchen · 2018-07-12T00:28:11Z

@kazum @Laurawly please look again and https://docs.tvm.ai/contribute/code_review.html#approve-and-request-changes-explicitly

kazum

Looks good to me, thanks!

tqchen · 2018-07-12T21:52:09Z

@merrymercy can you rebase against master to resolve the conflict?

tqchen · 2018-07-12T23:56:57Z

Thanks, @Laurawly @kazum for the reviews!

merrymercy mentioned this pull request Jun 21, 2018

[AUTOTVM] Automated Operator Optimization #1311

Closed

merrymercy force-pushed the autotvm branch 4 times, most recently from 786008e to 8f0c806 Compare June 22, 2018 10:10

tqchen self-requested a review June 23, 2018 04:48

tqchen requested changes Jun 23, 2018

View reviewed changes

tqchen added status: review in progress status: need update need update based on feedbacks labels Jun 24, 2018

tqchen requested a review from Laurawly June 24, 2018 03:28

merrymercy force-pushed the autotvm branch 6 times, most recently from e960356 to 260d67c Compare June 26, 2018 01:28

tqchen requested changes Jun 26, 2018

View reviewed changes

merrymercy force-pushed the autotvm branch from fa8fb57 to ca4088b Compare July 4, 2018 19:27

merrymercy force-pushed the autotvm branch from ecd99d7 to bddd504 Compare July 5, 2018 02:51

were suggested changes Jul 5, 2018

View reviewed changes

were suggested changes Jul 6, 2018

View reviewed changes

were reviewed Jul 6, 2018

View reviewed changes

merrymercy added 5 commits July 9, 2018 15:53

fix feature cache

77106b3

fix feature cache

406e36b

fix ga tuner

1645134

fix local executor & ga tuner & feature cache

43fa4c1

update rpc, fix minors for rasp target and rpc tracker

cac2939

merrymercy force-pushed the autotvm branch from 18f4a93 to cac2939 Compare July 9, 2018 23:17

fix tutorial

d8a4cb7

merrymercy added 2 commits July 10, 2018 19:23

improve error msg

5d24e45

Merge branch 'autotvm' of github.com:merrymercy/tvm into autotvm

eb16a25

tqchen approved these changes Jul 11, 2018

View reviewed changes

improve error msg

a22abd9

kazum requested changes Jul 11, 2018

View reviewed changes

Laurawly reviewed Jul 11, 2018

View reviewed changes

improve error msg

5492d79

kazum approved these changes Jul 12, 2018

View reviewed changes

Laurawly approved these changes Jul 12, 2018

View reviewed changes

Merge branch 'master' into autotvm

6d16c79

tqchen merged commit 5980b5d into apache:master Jul 12, 2018

tqchen added status: accepted and removed status: need update need update based on feedbacks status: review in progress labels Jul 12, 2018

tqchen pushed a commit to tqchen/tvm that referenced this pull request Aug 4, 2018

[AUTOTVM] Core part of auto-tuning module (apache#1312)

6ea74d4

sergei-mironov pushed a commit to sergei-mironov/tvm that referenced this pull request Aug 8, 2018

[AUTOTVM] Core part of auto-tuning module (apache#1312)

bc677ba

		@@ -0,0 +1,830 @@
		# pylint: disable=too-few-public-methods,invalid-name,unused-argument,arguments-differ

		return names


		_get_buffer_curve_sample_flatten = get_global_func(


		new_scores = model.predict(new_points)

		ac_prob = np.exp((new_scores - scores) / t)

[AUTOTVM] Core part of auto-tuning module #1312

[AUTOTVM] Core part of auto-tuning module #1312

Conversation

merrymercy commented Jun 21, 2018 • edited Loading

tqchen commented Jun 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Jun 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merrymercy commented Jun 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merrymercy commented Jul 4, 2018

tqchen commented Jul 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

were Jul 6, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merrymercy commented Jul 11, 2018

tqchen commented Jul 11, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Jul 11, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merrymercy commented Jul 11, 2018

tqchen commented Jul 12, 2018

kazum left a comment

Choose a reason for hiding this comment

tqchen commented Jul 12, 2018

tqchen commented Jul 12, 2018

merrymercy commented Jun 21, 2018 •

edited

Loading

were Jul 6, 2018 •

edited

Loading