Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve separation of PyTorch dtypes and TVM dtypes in relay PyTorch frontend #5779

Closed
wants to merge 1 commit into from

Conversation

t-vi
Copy link
Contributor

@t-vi t-vi commented Jun 11, 2020

Previously, we sometimes used type strings taken from PyTorch (e.g. float) in TVM. Now we aim to convert PyTorch dtypes early and only work with TVM dtypes afterwards.

Also fix arange/linspace type semantics.

A follow-up will then further improve type support (at which point we will be able to convert e.g. double models).

@t-vi
Copy link
Contributor Author

t-vi commented Jun 11, 2020

@masahi

So what I did here in addition to what is seen was to locally assert that no float is passed to const/var creation as dtype. This is related to the forum discussion here: https://discuss.tvm.ai/t/discuss-the-meaning-of-float-in-relay/6949/

The next step will be to apply the data type of input tensors on scalar inputs, but I'll keep that for a separate PR.
In addition to my primary use-case - allowing doubles to facilitate testing of conversions - this likely helps with mixed precision etc.

@t-vi t-vi force-pushed the pytorch_frontend_type_fix branch from f01a143 to f66d9f0 Compare June 11, 2020 21:00
@t-vi
Copy link
Contributor Author

t-vi commented Jun 12, 2020

The quantized isn't there yet, sorry about that. I'll have a fix in a bit.

…frontend

Previously, we sometimes used type strings taken from PyTorch
(e.g. float) in TVM. Now we aim to convert PyTorch dtypes early
and only work with TVM dtypes afterwards.

Also fix arange/linspace type semantics.

A follow-up will then further improve type support (at which point
we will be able to convert e.g. double models).
@t-vi
Copy link
Contributor Author

t-vi commented Jun 12, 2020

Now it's all happy. :)

@masahi
Copy link
Member

masahi commented Jun 12, 2020

It seems you are allowing input_shapes to be None. The input names that are passed as part of input_shapes is important: These are the names, of users choosing, that will be needed at deploy times.

If we use Torch IR input names, users need to manually inspect IR and somehow remember these names. The tricky part is Torch sometimes changes these input names when copying or saving/loading the same modules. So in the end what TVM expects as input names can be different from what users see as inputs to Torch IR.

To workaround this, we decided not to use names chosen by Torch and instead let users choose and supply input names (something obvious like input0, input1 that don't require remembering) as part of input_shapes

@masahi
Copy link
Member

masahi commented Jun 12, 2020

@t-vi
Copy link
Contributor Author

t-vi commented Jun 12, 2020

Note that I don't remove the possibility to pass in names. As the thread suggests, people will find that useful. I'm not sure why you would have to insist on passing them if the user is fine with the TorchScript provided ones. I'm not taking away passing input names, I just soften the mandates.

Passing the shapes should be needed very little, and I am surprised that you would need the user to do that. Ignoring the dtypes in of the inputs is actively terrible.

How about doing the following:

  • Allow passing nothing.
  • Allow passing names only. (A list of strings.)
  • Allow passing names and shapes (for backward compat).
  • Allow passing names and shapes and dtypes (as a list of triples).

If you insist, I could also live with just the last three.

@t-vi
Copy link
Contributor Author

t-vi commented Jun 12, 2020

The other part is that splitting at a potential . and taking the first part would actually reliably give the name from PyTorch. So in the end this is all about funny business.

@masahi
Copy link
Member

masahi commented Jun 12, 2020

I'm not sure why you would have to insist on passing them if the user is fine with the TorchScript provided ones

If you want to use names chosen by Torch, how are you going to figure out the correct names to give to TVM at deploy time? The names are the one attached to the graph after this line https://github.com/apache/incubator-tvm/blob/master/python/tvm/relay/frontend/pytorch.py#L2504, rather than the graph you supply to the frontend. You also need to remember whatever names Torch chooses until deploy time, since TVM doesn't export input names but they are needed to correctly set inputs.

I think most of the times names don't change. Previously we were using names chosen by Torch, but due to the corner case reasons discussed in the thread above, we decided to it is better to let user chooses whatever name they like (ones that don't require remembering).

Passing the shapes should be needed very little, and I am surprised that you would need the user to do that.

Passing the pairs of (name, shape) is common across other frontends. We initially started with the same API as others, and we haven't found a good reason to deviate from it (we did change dict to list, since the argument order matters in Torch). Yes Torch knows the shape when traced, but for scripted cases it doesn't. The Relay itself is not ready for dynamic shape input. For these reasons, we require input shapes to be passed explicitly. Since TVM users are supposed to know the input shape, I don't think it is a problem.

Passing dtypes is not something we (not only pytorch, but other frontends too) thought about, since we always assume float32 inputs. We can discuss how to integrate them. But most of the times inputs are fp32, so I don't want to introduce breaking API changes to allow dtype option.

@t-vi
Copy link
Contributor Author

t-vi commented Jun 12, 2020

If you want to use names chosen by Torch, how are you going to figure out the correct names to give to TVM at deploy time? The names are the one attached to the graph after this line https://github.com/apache/incubator-tvm/blob/master/python/tvm/relay/frontend/pytorch.py#L2504, rather than the graph you supply to the frontend. You also need to remember whatever names Torch chooses until deploy time, since TVM doesn't export input names but they are needed to correctly set inputs.

Thank you for insisting on using stable names. The user-supplied(!) names are the part before the (last, ha, here only) . and they're stable. This is e.g. what PyTorch itself does when you print script_module.code or to give you the name of the argument when you are missing an input.

The function ultimately doing this in PyTorch is DebugNameBase:
https://github.com/pytorch/pytorch/blob/a9aa6367c2b1647f1d2772678f9971740c598c7a/torch/csrc/jit/ir/ir.cpp#L735

Passing dtypes is not something we (not only pytorch, but other frontends too) thought about, since we always assume float32 inputs. We can discuss how to integrate them. But most of the times inputs are fp32, so I don't want to introduce breaking API changes to allow dtype option.

I have to strongly differ that most inputs are fp32, starting with anything NLP.
Again, I think it is a misunderstanding that any of this has breaking API changes, but the suggestion is to make things more optional. I do see that splitting of the disambiguation counter is a good idea. But then we should just take what the user supplied in the the model definition.

@masahi
Copy link
Member

masahi commented Jun 12, 2020

The user-supplied(!) names are the part before the (last, ha, here only) . and they're stable.

Do you mean Torch allows users to set the argument name? If you also know when and how exactly Torch changes input names, then sure I can see passing another names for TVM would be annoying. But I'd argue that most users are not familiar with such details of Torchscript, so we shouldn't expect them to correctly deal with names chosen by Torch.

Requiring input names are common across other frontends. I think making it optional makes API a bit confusing and we need to explain what input names are expected if omitted, while benefiting only users who are intimately familiar with Torchscript internals. Making the API as close as possible to other frontends also applies to input shapes, so I don't want to make it optional, either. Shapes are required because Relay assumes static input shapes.

So my opinion is not make input_shapes optional, to keep the API straightforward/less confusing, and close to other frontends.

@t-vi
Copy link
Contributor Author

t-vi commented Jun 12, 2020

Well, I see that this is not going anywhere..

@t-vi t-vi closed this Jun 12, 2020
@tqchen
Copy link
Member

tqchen commented Jun 12, 2020

It would be great to have a constructive discussion about the technical choices, agree on the pros and cons, before we reach a conclusion. Everyone is contributing to a common project (and we value everyone's opinion) and I think it would be great if we can have a clear discussion. We also need to acknowledge that engineering decisions have tradeoffs and there is no true answer to the problem.

One common way I find that I find useful, is to dissect the discussion, label each discussion points, try to agree on sub points and rationales.

In the conversation so far, I see a few choices:

  • Ways to handle names:
    • T0: Allow optional input names from torchscript.
    • T1: Only user to pass in name override
    • T2: T0 and optionally allow T1
  • Ways to handle data type
    • D0: Assume most things are fp32
    • D1: Being able to convert the right data type from torchscript.

We can then discuss their pros and cons. For example

T0 is certainly more convenient, but it also depends on the stablity of the torchscript's ability to keep names. T1 is more explicit when a user intend to name the input.

D0 solves most of the common problems, but as the machine learning models move to mixed precision, we will inevitably want to support more data types, that likely makes D1 more appealing.

Because the pros and cons are mainly technical, I hope that most of us can agree on the technical points. The main thing that we might not agree on, would be something like the priorization of technical tradeoffs.

For example, I might favor clear naming scheme over implicit and thus prefer T2. A different person might think simplicity is key and fp32 is fine, so D0 is OK. This should be the only part we disagree on.

When we find more comon grounds, it is much easier to reach agreements. In the cases as this, one thing we can do is to have a constructive discussion, and perhaps bringing up more people to see what everyone's thoughts. Having a clear summary and dissected discussion also helps others to quickly understand the situation and share their opinions. In many cases we can find that we do not disagree that much after all. It could be a good discuss forum thread.

Regardless of the outcome, I want to say that we value good technical debates and usually they leads to better code overall. Many parts of this PR are certainly valuable, like the better data type handling. So let us have a good conversation and bring a better Pytorch support for everyone.

@tqchen
Copy link
Member

tqchen commented Jun 12, 2020

cc @masahi @t-vi it would be great if we can summarize and dissect the pts a bit more :)

@masahi
Copy link
Member

masahi commented Jun 12, 2020

I don't have any objection regarding dtype handling in this PR. At the moment we assume fp32 everywhere by default, but to support doubles we need to somehow pass dtype information from user. I think we can pass an optional list of dtypes, corresponding to the entries in input_shapes (a required argument). If not passed we can fill in the dtype list with fp32.

I think allowing input_shape to be optional is an orthogonal change to dtype issues that we can discuss elsewhere. I've already explained my reasoning, but to reiterate: making it optional is technically possible since Torch maintains names and traced Torch modules know the input shape.

But it doesn't work for scripted modules. If names are omitted, user need to be aware of (or we need to explain) how to correctly figure out the names Torch maintains. Since this is not trivial and Torch may change the way they handle naming on a whim in the future, we shouldn't rely on naming chosen by Torch.

On top of above, I think it is better to keep the API as close as possible to other frontends. They all require input names and shapes to be passed explicitly.

@tqchen
Copy link
Member

tqchen commented Jun 13, 2020

Seems the most contentious part is the name handling? It would be great if we can also list all the alternatives (in labeled form), and discuss their sides, then talk about the reasoning :) It will make the reasoning clear.

@t-vi perhaps we can first go forward with dtype handling and discuss the name and shape handling in the forum?

@t-vi
Copy link
Contributor Author

t-vi commented Jun 13, 2020

Sorry, but the dtypes discussion isn't for me. Working on NLP models like BERT, if I have to argue that non-fp32 inputs are important, TVM is not a good choice for that work.

The "ways to handle names" discussion is equally sad, not for the outcome but for the type of arguments, I would prefer to leave the status quo unchanged over having a discussion.

I have pushed an update that keeps the requirement and exact layout of input_shapes from the original interface. However, it still has dtype handling (and now this dtype handling is non-optional) and incidentally, a unit test for rsub relied on the incorrect dtype handling, so I had to implement proper type promotion as well (but only in applied in rsub for now). I'll not reopen the PR though because it still does the dtype handling I don't want to preempt your discussion.

@tqchen
Copy link
Member

tqchen commented Jun 13, 2020

I don't think we disagree in many cases.

Let me try to rephrase @t-vi's the dtype part of the arguments:

  • F0: NLP and quantization worklaods needs mixed precisions(fp16, i8, i32).
  • F1: In order to support NLP workloads well, we will need to able to handle other data types well other than fp32.

I think we all agrees to F0 and F1. Now one of @masahi 's argument is

  • K0: Most users tries to use fp32 by default, it would be nice to keep things to be compatible to that interface when adding dtype support.

As we can see that K0 do not necessarily conflict with F0 and F1. As an outsider to the discussion, it is a bit harder for me to express my thought(without looking at the code). It would be easier if we can list interface candidates during the discussion, and discuss their pros and cons. There is certainly a tradeoff we need to make, in terms of ease of use, level of compatibility etc.

I need a bit more time to dig in order to understand the name argument. But at a high level (this is my opinion) I think it makes sense to inheritate names from the source(if they are available) while allow users to override them. The main reason why certain name/shapes are requirement in frontend is that many cases these information are incomplete. Again having interface candidates will be helpful here.

The main reason for a discussion is not necessarily argue for which side is right, but actually to clarify the intent, and reach concensus(sometimes both sides actually already agrees to).

So that in the future others can look into it. Also in cases like this discussions are also important to find out the best ways for integeration (e.g. not about whether or not shall we do dtype handling, but how to best do them).

Discussions also serves the good purpose on learning. It would be great for everything to share the understanding of the "status quo" and contentious points. Sometimes the problem of the disagreement is not what we need to do to support these, but how to best resolve the situation.

We can certainly create followup discussions in the forum.

@t-vi
Copy link
Contributor Author

t-vi commented Jun 17, 2020

So I removed all input changes from the branch, I did keep the dtype changes and made more (like automatic PyTorch promotion).
@masahi @tqchen Should we take a detour via the forums or is this part uncontroversial after all?

I can convert various modules in fp64 with the patch.

@tqchen
Copy link
Member

tqchen commented Jun 17, 2020

I think we can just do a PR :)

@t-vi
Copy link
Contributor Author

t-vi commented Jun 17, 2020

As I cannot reopen this, I opened #5834 . Thank you for the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants