Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TVM PyTorch Integration] optimized_torch & as_torch how-to guide #12318

Merged
merged 26 commits into from
Sep 27, 2022

Conversation

juda
Copy link
Contributor

@juda juda commented Aug 5, 2022

This PR provides two how-to guides to show the usage of

  1. optimized_torch: tuning a PyTorch model/function by MetaSchedule
  2. as_torch: wrap TVMscript into a PyTorch model/function

@yelite @junrushao1994 @masahi

# Write your own PyTorch operator by TVMscript
# -------------------------------
# PyTorch is a very popular machine learning framework in which
# it highly optimizes most commonly used operators.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyTorch is a very popular machine learning framework which contains optimized implementations of most commonly used operators

# PyTorch is a very popular machine learning framework in which
# it highly optimizes most commonly used operators.
# Nevertheless, sometimes you might want to write your own operators
# in PyTorch, but the performance could be not satisfactory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevertheless, sometimes you might want to write your own operators in PyTorch. In that case, the performance of such custom operators might not be satisfactory for your needs.

# For example, assume you are writing a variance of MobileNet,
# and you need to define a 1-d depthwise convolution operator.
# Assume the number of in_channel and out_channel are both 700,
# the width is 800 and the kernel size is 50,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code uses kernel size 20


def torch_depthwise(inputs, filters):
global out_channel
global kernel_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need global here?

# Nevertheless, sometimes you might want to write your own operators
# in PyTorch, but the performance could be not satisfactory.
#
# For example, assume you are writing a variance of MobileNet,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make a lot of sense to talk about a variant of Mobilenet (where only 2d convolution is used) but then suddenly bring up 1D convolution.

)

# We can tune the TVMscript code by providing a target device.
# The model will deploy on CPU, and the optimization (e.g. tiling) will conduct automatically.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are basic grammar issues here.

Instead of "deploy", just use "run". But more importantly, we want to say that the model will be "tuned" for CPU.


print(tvm_depthwise.script())

# Hint: If user plan to deploy on GPU, the GPU target should be provided,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, there is no "target" in this tutorial.

print(tvm_depthwise.script())

# Hint: If user plan to deploy on GPU, the GPU target should be provided,
# and all the PyTorch tensors should convert into GPU version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence alone doesn't make sense.


# In the working machine, the average inference time of `tvm_depthwise` is 120.0 us (TVM version is 0.9.0),
# while the average inference time of `torch_depthwise` is 210.0 us (PyTorch version is 1.11.0),
# showing the performance arises by around 43%.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

showing the speedup of around 43%

compare = benchmark.Compare(results)
compare.print()

# In the working machine, the average inference time of `tvm_depthwise` is 120.0 us (TVM version is 0.9.0),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In author's environment,

@as_torch
@T.prim_func
def tvm_depthwise(
A: T.Buffer((70, 80), "float32"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be useful to show a follow-up example of how to make shape a variable by having nested function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we pass shape variables then we need match_buffer operators, which might confuse users.
Currently, I choose a minimal set of grammar.
@masahi what's your idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Buffer syntax sugar can be extended for dynamic shapes. But currently we cannot tune over dynamic shapes, so the performance will probably be slower than PT.

# We can build the TVMscript code by calling the `tune` method.
# Without providing more information, the model will be tuned for CPU.

tvm_depthwise.tune()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be better to explicitly write down the default TuneConfig and target here so that reader has better idea on how to customize this?

@juda
Copy link
Contributor Author

juda commented Sep 1, 2022

@masahi I have improved the text, could you please review it again?

# Nevertheless, sometimes you might want to write your own operators in PyTorch.
# In that case, the performance of such custom operators might not be satisfactory for your needs.
#
# One of the examples is to define a 1-d depthwise convolution operator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"For example, suppose we want to define..."


# Then, we plan to optimize the `depthwise` function by leveraging the power of TVM.
# TVM community proposes an embedded Domain Specific Language on Python call TVMscript,
# which serves for an abstraction of program on various hardware backends.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think calling TVMScript as "an abstraction of program on various hardware backends" is a bit long shot. I think it is a much more high-level, concrete thing.

# The computations and machine learning compilation analysis will be defined around them.
# The last 3 lines are computation statements, including an initialization of `C[vj, vi]` and the summing up along the axis k.
# Finally, we place 2 decorators `T.prim_func` and `as_torch` above the definition of function,
# which converts the Python AST to TVMscript AST and then converts to PyTorch's `nn.Module`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These sentences might be too detailed for a tutorial intended for PT users. I prefer a more succinct summary of what TVMScript is about, not necessarily explaining all the syntactic constructs used in the example.

======================
**Author**: `Yaoda Zhou <https://github.com/juda/>`_
This article is an introductory tutorial to optimize PyTorch models by using `tvm.contrib.torch.optimize_torch`.
For us to follow this tutorial, PyTorch, as well as TorchVision, should be installed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you copied "For us to follow this tutorial" from other tutorials, but this is not a good English phrase. We can just say "To follow this tutorial".

# Optimized SimpleModel by TVM MetaSchedule
# ------------------------------
# We provide a `optimize_torch` function, which has the similar usage as `torch.jit.trace`.
# The optimized function/model and example input are required to provide by users.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PyTorch model to optimize, along with its example input, are provided by users.

# ------------------------------
# Besides, let us define a resnet18 model in a standard way.
# TorchScript also provides a built-in "optimize_for_inference" function to accelerate the inference,
# we will compare the performance of those two optimizers later.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as "Define the resnet18 optimized by MetaSchedule" above.

# we will compare the performance of those two optimizers later.


class JitModule(torch.nn.Module):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop JitModule boilerplate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address this comment. There is no need to have JitModule.

jit_module_resnet18 = JitModule()

######################################################################
# Compare the performance between two scheduling approaches.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are "two scheduling approaches"? torch.jit.optimize_for_inference is not a scheduling approach.

).blocked_autorange()
)

# We can print the results on screen.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this sentence


# In the working machine, the average inference time by `optimized_torch` is 860.5 us,
# while the average inference time of `jit_optimized` is 1156.3 us,
# showing the performance arises by around 1/4.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apply my comment from as_torch tutorial here too. I won't repeat the same comment.

@juda
Copy link
Contributor Author

juda commented Sep 8, 2022

@tvm-bot rerun

@juda
Copy link
Contributor Author

juda commented Sep 9, 2022

@masahi I finish another round of polishing. Could you please have a look?

======================
**Author**:
`Yaoda Zhou <https://github.com/juda>`_,
`Masahiro Masuda <https://github.com/masahi>`_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add me as an author.

compare = benchmark.Compare(results)
compare.print()

# In author's environment, the average inference time of `tvm_depthwise` is 120.0 us (TVM version is 0.9.0),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.9.0 is the released version, I don't think this is the one you are using for development. There is no need to mention the TVM version.



# Then, we plan to optimize the `depthwise` function by leveraging the power of TVM.
# TVM community proposes an embedded Domain Specific Language on Python called TVMscript,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in Python


# Then, we plan to optimize the `depthwise` function by leveraging the power of TVM.
# TVM community proposes an embedded Domain Specific Language on Python called TVMscript,
# serving for a high-level abstraction of TVM intermediate representative,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which serves as the high-level frontend for TVM's Tensor IR.

# Then, we plan to optimize the `depthwise` function by leveraging the power of TVM.
# TVM community proposes an embedded Domain Specific Language on Python called TVMscript,
# serving for a high-level abstraction of TVM intermediate representative,
# which is easy to impose transformations and optimizations and deploy on various hardware backends.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence can be dropped

# In such a way, we obtain a new resnet18 model optimized by MetaSchedule.


class MyResNet18(torch.nn.Module):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address this comment. There is no need to have MyResNet18.

# we will compare the performance of those two optimizers later.


class JitModule(torch.nn.Module):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address this comment. There is no need to have JitModule.

######################################################################
# Compare the performance between two approaches.
# ------------------------------
# Using PyTorch's benchmark Compare class, we can have a direct comparison result between two inference models.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this sentence.

compare = benchmark.Compare(results)
compare.print()

# In author's environment, the average inference time of `tvm_module_resnet18` is 620.0 us (TVM version is 0.9.0),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the reference to TVM version (see the same comment to using_as_torch.py)

######################################################################
# Benchmark
# -------------------------------
# We will compare two operators by using PyTorch's benchmark toolkit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this sentence. It is not useful.

# specific language governing permissions and limitations
# under the License.
"""
Wrap Your TVMscript with PyTorch Module
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "Wrap ... with " the right phrase? I think "Wrap ... as PyTorch Module" is more correct.

`Yaoda Zhou <https://github.com/juda>`_,
`Masahiro Masuda <https://github.com/masahi>`_

This article is an introductory tutorial on wrapping the TVMscript code with the PyTorch module.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This article is a tutorial on wrapping TVMScript code as a PyTorch module.

`Masahiro Masuda <https://github.com/masahi>`_

This article is an introductory tutorial on wrapping the TVMscript code with the PyTorch module.
By the decorator `as_torch`, users can wrap a TVMscript code into a PyTorch nn.Module naturally.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Using the decorator..."

Drop "a" before TVMScript

`Yaoda Zhou <https://github.com/juda>`_,
`Masahiro Masuda <https://github.com/masahi>`_

This article is an introductory tutorial to optimize PyTorch models by using `tvm.contrib.torch.optimize_torch`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an introductory tutorial -> a tutorial

@juda
Copy link
Contributor Author

juda commented Sep 22, 2022

Hi @masahi , I polished the tutorial according to your feedback, could you please read it one more time?

Copy link
Member

@masahi masahi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very good now.

@masahi masahi merged commit b61f633 into apache:main Sep 27, 2022
@masahi
Copy link
Member

masahi commented Sep 27, 2022

@juda Thank your for your patience. I think the tutorial is now very clean and simple, without unnecessary things.

xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
…ache#12318)

* how-to use optmized_torch

* as_torch

* format

* one more comment

* improve doc

* improve code

* fix text

* SSR

* CPU model

* whitespace

* improve document

* small edit

* retrigger ci

* using_as_torch polish

* using_optimized_torch

* fix errors

* one more author

* small edit

* polish as_torch

* save progress

* more edit

* small edit

Co-authored-by: juda <yzhou@octoml.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants