Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ONNX] export() with dynamic shapes fails where dynamo_export(dynamic_shapes=True) succeeds #126607

Open
borisfom opened this issue May 18, 2024 · 6 comments
Assignees
Labels
module: onnx Related to torch.onnx oncall: export

Comments

@borisfom
Copy link
Contributor

borisfom commented May 18, 2024

馃悰 Describe the bug

Here is an example where dynamo_export() succeeds when run directly on a Module with dynamic_shapes=True, but fails if I first call export.export() on it. It happens more with partial dynamic shapes, but also happens when I set all dimensions to be fully dynamic.

Currently I have to call export() first due to another bug (have to run decompositions manually).
So I am trying to pass an equivalent of 'dynamic_shapes=True' to export() and achieve the same result as when I call dynamo_export directly. This example does work if I specify all axes as dynamic, but fails if I only specify one:

import torch
import torch.nn as nn

class Model(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x):
        return nn.functional.softmax(x.view(x.size(1), -1), dim=0)

device = torch.device('cuda')
model = Model().to(device)
x = torch.rand(1024, 20, 16).to(device) 

batch = torch.export.Dim("batch")
x1 = torch.export.Dim("x1")
x2 = torch.export.Dim("x2")
# this one succeeds                                                                                                                       
dynamic_shapes={'x': {0: batch, 1: x1, 2: x2}}

# this one fails                                                                                                                          
dynamic_shapes={'x': {0: batch}}

model = torch.export.export(model, (x,), dynamic_shapes=dynamic_shapes, strict=False).run_decompositions()

options = torch.onnx.ExportOptions(dynamic_shapes=True)
onnx_program = torch.onnx.dynamo_export(model, x, export_options=options)
onnx_program.save('model.onnx')

Versions

PyTorch nightly 05/15/24

cc @avikchaudhuri @gmagogsfm @zhxchen17 @tugsbayasgalan @angelayi @suo @ydwu4

@borisfom borisfom changed the title [ONNX] There should be a way to run export() with dynamic shape settings equivalent to dynamo_export(.., dynamic_shapes=True) [ONNX] export() with dynamic shapes fails when only part of input dimensions are dynamic May 18, 2024
@borisfom
Copy link
Contributor Author

This is what I get when run the repro. Specializing some dimensions would result in more efficient code, so it would be nice to either make guards calculation succeed in this case, or be able to ignore guards and treat this error as warning.

V0518 04:08:06.019000 139943363598144 torch/fx/experimental/symbolic_shapes.py:2289] create_env
I0518 04:08:06.050000 139943363598144 torch/fx/experimental/symbolic_shapes.py:3260] create_symbol s0 = 1024 for L['args'][0][0].size()[0] [2, 9223372036854775806] (_export/non_strict_utils.py:92 in fakify), for more info run with TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="s0"
V0518 04:08:06.051000 139943363598144 torch/fx/experimental/symbolic_shapes.py:4746] eval True == True [statically known]
V0518 04:08:06.052000 139943363598144 torch/fx/experimental/symbolic_shapes.py:4746] eval False == False [statically known]
I0518 04:08:06.173000 139943363598144 torch/fx/experimental/symbolic_shapes.py:4661] eval Ne(s0, 20) [guard added] (_refs/__init__.py:3685 in _reshape_view_helper), for more info run with TORCHDYNAMO_EXTENDED_DEBUG_GUARD_ADDED="Ne(s0, 20)"
I0518 04:08:06.177000 139943363598144 torch/fx/experimental/symbolic_shapes.py:4661] eval Ne(Mod(s0, 20), 0) [guard added] (_refs/__init__.py:3694 in _reshape_view_helper), for more info run with TORCHDYNAMO_EXTENDED_DEBUG_GUARD_ADDED="Ne(Mod(s0, 20), 0)"
V0518 04:08:06.178000 139943363598144 torch/fx/experimental/symbolic_shapes.py:4746] eval Eq(s0, 1) == False [statically known]
V0518 04:08:06.179000 139943363598144 torch/fx/experimental/symbolic_shapes.py:4746] eval True == True [statically known]
V0518 04:08:06.183000 139943363598144 torch/fx/experimental/symbolic_shapes.py:4746] eval Ne(20*s0, 20) == True [statically known]
V0518 04:08:06.184000 139943363598144 torch/fx/experimental/symbolic_shapes.py:4746] eval False == False [statically known]
V0518 04:08:06.206000 139943363598144 torch/fx/experimental/symbolic_shapes.py:4746] eval Ne(s0, 1) == True [statically known]
I0518 04:08:06.234000 139943363598144 torch/fx/experimental/symbolic_shapes.py:3348] produce_guards
V0518 04:08:06.234000 139943363598144 torch/fx/experimental/symbolic_shapes.py:3530] track_symint L['args'][0][0].size()[0] s0 StrictMinMaxConstraint(warn_only=False, vr=ValueRanges(lower=0, upper=oo, is_bool=False))
V0518 04:08:06.234000 139943363598144 torch/fx/experimental/symbolic_shapes.py:3530] track_symint L['args'][0][0].size()[1] 20 None
V0518 04:08:06.234000 139943363598144 torch/fx/experimental/symbolic_shapes.py:3530] track_symint L['args'][0][0].size()[2] 16 None
V0518 04:08:06.234000 139943363598144 torch/fx/experimental/symbolic_shapes.py:3530] track_symint L['args'][0][0].stride()[0] 320 None
V0518 04:08:06.234000 139943363598144 torch/fx/experimental/symbolic_shapes.py:3530] track_symint L['args'][0][0].stride()[1] 16 None
V0518 04:08:06.234000 139943363598144 torch/fx/experimental/symbolic_shapes.py:3530] track_symint L['args'][0][0].stride()[2] 1 None
V0518 04:08:06.234000 139943363598144 torch/fx/experimental/symbolic_shapes.py:3530] track_symint L['args'][0][0].storage_offset() 0 None
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/export/_trace.py", line 1088, in _export
    produce_guards_and_solve_constraints(
  File "/usr/local/lib/python3.10/dist-packages/torch/_export/non_strict_utils.py", line 270, in produce_guards_and_solve_constraints
    raise constraint_violation_error
  File "/usr/local/lib/python3.10/dist-packages/torch/_export/non_strict_utils.py", line 238, in produce_guards_and_solve_constraints
    shape_env.produce_guards(
  File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/symbolic_shapes.py", line 3853, in produce_guards
    raise ConstraintViolationError(
torch.fx.experimental.symbolic_shapes.ConstraintViolationError: Constraints violated (batch)! For more information, run with TORCH_LOGS="+dynamic".
  - Not all values of batch = L['args'][0][0].size()[0] in the specified range satisfy the generated guard Ne(L['args'][0][0].size()[0], 20).
  - Not all values of batch = L['args'][0][0].size()[0] in the specified range satisfy the generated guard Ne(Mod(L['args'][0][0].size()[0], 20), 0).
Suggested fixes:
  batch = Dim('batch')
 

@borisfom
Copy link
Contributor Author

BTW, my real-world model that I wrote this repro after, does fail even when all dimensions are specified, or when dynamo_export is called directly:

torch._dynamo.exc.UserError: Tried to use data-dependent value in the subsequent computation. This can happen when we encounter unbounded dynamic value that is unknown during tracing time.  You will need to explicitly give hint to the compiler. Please take a look at torch._check OR torch._check_is_size APIs.  Could not guard\
 on data-dependent expression Eq(4*u0**2, 0) (unhinted: Eq(s0*u0**2, 0)).  (Size-like symbols: u0)

ATTENTION: guard_size_oblivious would fix the error, evaluating expression to False.
Maybe you need to add guard_size_oblivious to framework code, see doc below for more guidance.

Potential framework code culprit (scroll up for full backtrace):
  File "/usr/local/lib/python3.10/dist-packages/torch/_decomp/decompositions.py", line 1126, in _softmax
    if x.numel() == 0:
...
  File "/git/NeMo/nemo/collections/tts/modules/transformer.py", line 115, in forward
    return self._forward(inp, attn_mask, conditioning)
  File "/git/NeMo/nemo/collections/tts/modules/transformer.py", line 148, in _forward
    attn_prob = F.softmax(attn_score, dim=2)

attn_prob is undergoing transformations similar to view() I used in my example. Not sure what I should mark/check as size.

@pianpwk pianpwk self-assigned this May 18, 2024
@borisfom
Copy link
Contributor Author

borisfom commented May 20, 2024

With the latest PyTorch nightly (05/20), I am also getting more failures on Nemo unit tests when trying to run export() with all dimensions being dynamic with no min/max, compared to running dynamo_export() directly.
Means: when I try dynamic Dims() for export() that would be equivalent to 'dynamic_shapes=True', I still can't export same networks that are exportable by calling dynamo_export(model, ... , dynamic_shapes=True) directly.

@borisfom borisfom changed the title [ONNX] export() with dynamic shapes fails when only part of input dimensions are dynamic [ONNX] export() with dynamic shapes fails where dynamo_export(dynamic_shapes=True) succeeds May 20, 2024
@borisfom
Copy link
Contributor Author

borisfom commented May 20, 2024

Here, a repro case with fully dynamic Dims that used to work with last week PyTorch nightly (05/15), but fails with 05/20. dynamo_export works but export.export() fails :

import torch
from nemo.core.classes import typecheck
from nemo.utils.export_utils import wrap_forward_method, parse_input_example
from nemo.collections.nlp.models import PunctuationCapitalizationModel
model = PunctuationCapitalizationModel.from_pretrained(model_name="punctuation_en_distilbert")
model.cuda().eval()
wrap_forward_method(model)
model._prepare_for_export()
typecheck.set_typecheck_enabled(enabled=False)

with torch.no_grad():
    input_example = model.input_module.input_example(max_batch=4)
    input_list, input_dict = parse_input_example(input_example)

    print("Running torch.onnx.dynamo_export ...")
    options = torch.onnx.ExportOptions(dynamic_shapes=True)
    ex = torch.onnx.dynamo_export(model, *input_list, **input_dict, export_options=options)

    print("Running torch.export.export ...")
    x1 = torch.export.Dim("x1")
    x2 = torch.export.Dim("x2")
    x3 = torch.export.Dim("x3")
    b1 = torch.export.Dim("b1")
    b2 = torch.export.Dim("b2")
    b3 = torch.export.Dim("b3")
    dynamic_shapes={'input_ids': {0: b1, 1: x1}, 'attention_mask': {0: b2, 1: x2}, 'token_type_ids': {0: b3, 1: x3}}
    ex_model = torch.export.export(
        model,
        tuple(input_list),
        kwargs=input_dict,
        dynamic_shapes=dynamic_shapes,
        strict=False
    )

@justinchuby
Copy link
Collaborator

So torch.export fails and torch.onnx.dynamo_export succeeds?

@borisfom
Copy link
Contributor Author

borisfom commented May 24, 2024

Correct: torch.onnx.dynamo_export with dynamic_shapes=True succeeds, but if I have to use export.export.first, even if I specify all the axes as dynamic with no bounds (which should be equivalent to dynamic_shapes=True), it fails in export.export as it can't calculate bounds properly, even with strict=False.
I have to use export.export() first for large models to force external ONNX format, as direct dynamo_export() produces non-parseable huge ONNX in this case - I files separate bugs for those issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: onnx Related to torch.onnx oncall: export
Projects
None yet
Development

No branches or pull requests

4 participants