-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AUTOTVM] Core part of auto-tuning module #1312
Conversation
786008e
to
8f0c806
Compare
docs/api/python/autotvm.rst
Outdated
@@ -0,0 +1,83 @@ | |||
Auto-tuning API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just use tvm.autotvm
docs/api/python/autotvm.rst
Outdated
~~~~~~~~~~~~~~~~~~~~ | ||
.. automodule:: tvm.autotvm.measure.measure | ||
|
||
.. class:: tvm.autotvm.MeasureInput(target, task, config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us keep named tuple document in place in source. DO
class X(namedtuple("X", fields)):
"""docstring
"""
__slots__ = ()
docs/api/python/autotvm.rst
Outdated
:members: | ||
|
||
tvm.autotvm.task | ||
~~~~~~~~~~~~~~~~~~~~~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rst requires the underline to have the same length of the title
docs/conf.py
Outdated
@@ -189,6 +189,7 @@ def run_doxygen(folder): | |||
subsection_order = ExplicitOrder( | |||
['../tutorials/language', | |||
'../tutorials/optimize', | |||
'../tutorials/autotuning', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
autotuning->autotvm
python/tvm/autotvm/database.py
Outdated
""" | ||
partial_results = [None] * len(measure_inputs) | ||
unsaved = list() | ||
for i in range(len(measure_inputs)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for i, inp in enumerate(measure_inputs)
python/tvm/autotvm/task/task.py
Outdated
return _do_reg | ||
|
||
|
||
def create(func_name, args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create and create_task can be super confusing, I would prefer to rename create_task-> create and remove create.
Allow pass in None in target, and call init_space later which returns a new task
Its argument should be hashable values. | ||
Its return value should be a Tuple(Schedule, Array of Tensor) | ||
|
||
Returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need an example block here.
simple_template deserves its own file.
Need to explain why it is not in template
python/tvm/autotvm/template/space.py
Outdated
@@ -0,0 +1,830 @@ | |||
# pylint: disable=too-few-public-methods,invalid-name,unused-argument,arguments-differ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be careful about arguments-differ
python/tvm/autotvm/template/space.py
Outdated
pass | ||
|
||
""" | ||
We can regard our schedule code as a transformation graph of axes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put this as docstring of Transform space
def has_next(self): | ||
return len(self.visited) < len(self.space) | ||
|
||
def save_state(self, filename): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
save/load state are not pythonic
Instead, directly make them pickable. https://docs.python.org/2/library/pickle.html#object.__getstate__
please always make docker change first as a separate PR, this is used to avoid out of space error we might encounter |
# In autotvm, we can define a tunable parameter, or a "knob" for such kind of value. | ||
|
||
# Matmul V1: List candidate values | ||
@autotvm.simple_template # 1. use a decorator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a simple template testcase
# --------------------------------- | ||
# There are plenty of useful schedule primitives in tvm. You can also find | ||
# some tutorials that describe them in more details, such as | ||
# (1). `Optimizing Conv2d on CUDA GPU <https://docs.tvm.ai/tutorials/optimize/opt_conv_cuda.html#sphx-glr-tutorials-optimize-opt-conv-cuda-py>`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do internal doc reference using ref
e960356
to
260d67c
Compare
psutil is missing in one node http://mode-gpu.cs.washington.edu:8080/blue/organizations/jenkins/dmlc%2Ftvm/detail/PR-1312/17/pipeline/139/ |
return names | ||
|
||
|
||
_get_buffer_curve_sample_flatten = get_global_func( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid get global function eagerly at root namespae, won't work for the runtime only env
keep += 1 | ||
ret[i] = keep + 1 | ||
return ret / len(trial_ranks) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two lines between global functions
@tqchen python package |
@were please review |
python/tvm/autotvm/feature.py
Outdated
i.e. Do not eliminate loop with extent of 1, do not vectorize, unroll or inject virtual threads | ||
""" | ||
binds, _ = build_module.get_binds(args, binds) | ||
# cfg = current_build_config() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you still keep these comments? The unuseful code should be eliminated. If you really want it, you can find it in the version control.
# back to use low level API. | ||
|
||
@autotvm.template | ||
def matmul(N, L, M, dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every time you need to rewrite the whole algorithm part, which is redundant. Why don't you get it from topi.dense
(or wrap it up in some your own function in this .py
) and apply some default schedule first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the intent of the simple template is to provide a standalone example e.g., to tune an operator that has not been upstreamed into topi yet. The example uses a common operator, but that is just for illustration purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that rewriting the op weaken the point you want to make. We can just take the advantage of decoupling algorithm description and scheduling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core part of autotvm is independent of topi. We don't want to introduce topi for this basic tutorial.
|
||
new_scores = model.predict(new_points) | ||
|
||
ac_prob = np.exp((new_scores - scores) / t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little bit curious how you normalize this score and temperature. I wrote a sa before: it seems when the temperature is high, it just acts like a random search. The acc ratio is too high nearly 100% every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I normalize the score to [0,1]
def update(self, inputs, results): | ||
for inp, res in zip(inputs, results): | ||
if res.error_no == 0: | ||
y = inp.task.flop / np.mean(res.costs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, relying on the parameter of the hardware mode is resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot understand this comment, can you elaborate?
ready for merge |
if there is no further comments, I am going to merge this in an hour |
msg = msg[:msg.index("Stack trace returned")] | ||
res_pack.append(MeasureResult((RuntimeError(msg),), | ||
MeasureErrorNo.COMPILE_HOST, | ||
tstamp - tic, tstamp)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When gpu_verify_pass() raises InstantiationError, we reach here but it looks a bit confusing.
How about doing something like as follows to create a more appropriate result?
try:
# extract error information
_e, _msg = exc.message.split('\n')[-2].split(': ', 1)
if _e == "InstantiationError":
res_pack.append(MeasureResult(_msg,
MeasureErrorNo.INSTANTIATION_ERROR,
tstamp - tic, tstamp))
continue
except Error as _:
pass
def verify_pass(stmt): | ||
valid = ir_pass.VerifyGPUCode(stmt, kwargs) | ||
if not valid: | ||
raise InstantiationError("invalid gpu kernel") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the message should be more informative. "Skip execution because of Invalid gpu kernel config." or something like that?
OK, @merrymercy please act on @kazum 's comments |
**Author**: `Lianmin Zheng <https://https://github.com/merrymercy>`_ | ||
|
||
This is an advanced tutorial for writing high performance tunable template for | ||
CUDA GPU. By runing auto-tuner on this template, we can outperform the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing CUDA GPU with Nvidia GPUs is more accurate.
# the techniques used in these tutorials. Then we rely on the efficient auto-tuner | ||
# to search through this space and pick some good configurations. | ||
# | ||
# If you are familiar with wring cude schedule, you can find the following |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling typos.
ready for merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks!
@merrymercy can you rebase against master to resolve the conflict? |
This PR is the step 1 in #1311
It includes
Some code is contributed by @eqy and @tqchen