[ONNX] export() with dynamic shapes fails where dynamo_export(dynamic_shapes=True) succeeds #126607

borisfom · 2024-05-18T03:44:31Z

🐛 Describe the bug

Here is an example where dynamo_export() succeeds when run directly on a Module with dynamic_shapes=True, but fails if I first call export.export() on it. It happens more with partial dynamic shapes, but also happens when I set all dimensions to be fully dynamic.

Currently I have to call export() first due to another bug (have to run decompositions manually).
So I am trying to pass an equivalent of 'dynamic_shapes=True' to export() and achieve the same result as when I call dynamo_export directly. This example does work if I specify all axes as dynamic, but fails if I only specify one:

import torch
import torch.nn as nn

class Model(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x):
        return nn.functional.softmax(x.view(x.size(1), -1), dim=0)

device = torch.device('cuda')
model = Model().to(device)
x = torch.rand(1024, 20, 16).to(device) 

batch = torch.export.Dim("batch")
x1 = torch.export.Dim("x1")
x2 = torch.export.Dim("x2")
# this one succeeds                                                                                                                       
dynamic_shapes={'x': {0: batch, 1: x1, 2: x2}}

# this one fails                                                                                                                          
dynamic_shapes={'x': {0: batch}}

model = torch.export.export(model, (x,), dynamic_shapes=dynamic_shapes, strict=False).run_decompositions()

options = torch.onnx.ExportOptions(dynamic_shapes=True)
onnx_program = torch.onnx.dynamo_export(model, x, export_options=options)
onnx_program.save('model.onnx')

Versions

PyTorch nightly 05/15/24

cc @avikchaudhuri @gmagogsfm @zhxchen17 @tugsbayasgalan @angelayi @suo @ydwu4

The text was updated successfully, but these errors were encountered:

borisfom · 2024-05-18T04:14:23Z

This is what I get when run the repro. Specializing some dimensions would result in more efficient code, so it would be nice to either make guards calculation succeed in this case, or be able to ignore guards and treat this error as warning.

V0518 04:08:06.019000 139943363598144 torch/fx/experimental/symbolic_shapes.py:2289] create_env
I0518 04:08:06.050000 139943363598144 torch/fx/experimental/symbolic_shapes.py:3260] create_symbol s0 = 1024 for L['args'][0][0].size()[0] [2, 9223372036854775806] (_export/non_strict_utils.py:92 in fakify), for more info run with TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="s0"
V0518 04:08:06.051000 139943363598144 torch/fx/experimental/symbolic_shapes.py:4746] eval True == True [statically known]
V0518 04:08:06.052000 139943363598144 torch/fx/experimental/symbolic_shapes.py:4746] eval False == False [statically known]
I0518 04:08:06.173000 139943363598144 torch/fx/experimental/symbolic_shapes.py:4661] eval Ne(s0, 20) [guard added] (_refs/__init__.py:3685 in _reshape_view_helper), for more info run with TORCHDYNAMO_EXTENDED_DEBUG_GUARD_ADDED="Ne(s0, 20)"
I0518 04:08:06.177000 139943363598144 torch/fx/experimental/symbolic_shapes.py:4661] eval Ne(Mod(s0, 20), 0) [guard added] (_refs/__init__.py:3694 in _reshape_view_helper), for more info run with TORCHDYNAMO_EXTENDED_DEBUG_GUARD_ADDED="Ne(Mod(s0, 20), 0)"
V0518 04:08:06.178000 139943363598144 torch/fx/experimental/symbolic_shapes.py:4746] eval Eq(s0, 1) == False [statically known]
V0518 04:08:06.179000 139943363598144 torch/fx/experimental/symbolic_shapes.py:4746] eval True == True [statically known]
V0518 04:08:06.183000 139943363598144 torch/fx/experimental/symbolic_shapes.py:4746] eval Ne(20*s0, 20) == True [statically known]
V0518 04:08:06.184000 139943363598144 torch/fx/experimental/symbolic_shapes.py:4746] eval False == False [statically known]
V0518 04:08:06.206000 139943363598144 torch/fx/experimental/symbolic_shapes.py:4746] eval Ne(s0, 1) == True [statically known]
I0518 04:08:06.234000 139943363598144 torch/fx/experimental/symbolic_shapes.py:3348] produce_guards
V0518 04:08:06.234000 139943363598144 torch/fx/experimental/symbolic_shapes.py:3530] track_symint L['args'][0][0].size()[0] s0 StrictMinMaxConstraint(warn_only=False, vr=ValueRanges(lower=0, upper=oo, is_bool=False))
V0518 04:08:06.234000 139943363598144 torch/fx/experimental/symbolic_shapes.py:3530] track_symint L['args'][0][0].size()[1] 20 None
V0518 04:08:06.234000 139943363598144 torch/fx/experimental/symbolic_shapes.py:3530] track_symint L['args'][0][0].size()[2] 16 None
V0518 04:08:06.234000 139943363598144 torch/fx/experimental/symbolic_shapes.py:3530] track_symint L['args'][0][0].stride()[0] 320 None
V0518 04:08:06.234000 139943363598144 torch/fx/experimental/symbolic_shapes.py:3530] track_symint L['args'][0][0].stride()[1] 16 None
V0518 04:08:06.234000 139943363598144 torch/fx/experimental/symbolic_shapes.py:3530] track_symint L['args'][0][0].stride()[2] 1 None
V0518 04:08:06.234000 139943363598144 torch/fx/experimental/symbolic_shapes.py:3530] track_symint L['args'][0][0].storage_offset() 0 None
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/export/_trace.py", line 1088, in _export
    produce_guards_and_solve_constraints(
  File "/usr/local/lib/python3.10/dist-packages/torch/_export/non_strict_utils.py", line 270, in produce_guards_and_solve_constraints
    raise constraint_violation_error
  File "/usr/local/lib/python3.10/dist-packages/torch/_export/non_strict_utils.py", line 238, in produce_guards_and_solve_constraints
    shape_env.produce_guards(
  File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/symbolic_shapes.py", line 3853, in produce_guards
    raise ConstraintViolationError(
torch.fx.experimental.symbolic_shapes.ConstraintViolationError: Constraints violated (batch)! For more information, run with TORCH_LOGS="+dynamic".
  - Not all values of batch = L['args'][0][0].size()[0] in the specified range satisfy the generated guard Ne(L['args'][0][0].size()[0], 20).
  - Not all values of batch = L['args'][0][0].size()[0] in the specified range satisfy the generated guard Ne(Mod(L['args'][0][0].size()[0], 20), 0).
Suggested fixes:
  batch = Dim('batch')

borisfom · 2024-05-18T04:50:52Z

BTW, my real-world model that I wrote this repro after, does fail even when all dimensions are specified, or when dynamo_export is called directly:

torch._dynamo.exc.UserError: Tried to use data-dependent value in the subsequent computation. This can happen when we encounter unbounded dynamic value that is unknown during tracing time.  You will need to explicitly give hint to the compiler. Please take a look at torch._check OR torch._check_is_size APIs.  Could not guard\
 on data-dependent expression Eq(4*u0**2, 0) (unhinted: Eq(s0*u0**2, 0)).  (Size-like symbols: u0)

ATTENTION: guard_size_oblivious would fix the error, evaluating expression to False.
Maybe you need to add guard_size_oblivious to framework code, see doc below for more guidance.

Potential framework code culprit (scroll up for full backtrace):
  File "/usr/local/lib/python3.10/dist-packages/torch/_decomp/decompositions.py", line 1126, in _softmax
    if x.numel() == 0:
...
  File "/git/NeMo/nemo/collections/tts/modules/transformer.py", line 115, in forward
    return self._forward(inp, attn_mask, conditioning)
  File "/git/NeMo/nemo/collections/tts/modules/transformer.py", line 148, in _forward
    attn_prob = F.softmax(attn_score, dim=2)

attn_prob is undergoing transformations similar to view() I used in my example. Not sure what I should mark/check as size.

borisfom · 2024-05-20T18:39:07Z

With the latest PyTorch nightly (05/20), I am also getting more failures on Nemo unit tests when trying to run export() with all dimensions being dynamic with no min/max, compared to running dynamo_export() directly.
Means: when I try dynamic Dims() for export() that would be equivalent to 'dynamic_shapes=True', I still can't export same networks that are exportable by calling dynamo_export(model, ... , dynamic_shapes=True) directly.

borisfom · 2024-05-20T19:10:32Z

Here, a repro case with fully dynamic Dims that used to work with last week PyTorch nightly (05/15), but fails with 05/20. dynamo_export works but export.export() fails :

import torch
from nemo.core.classes import typecheck
from nemo.utils.export_utils import wrap_forward_method, parse_input_example
from nemo.collections.nlp.models import PunctuationCapitalizationModel
model = PunctuationCapitalizationModel.from_pretrained(model_name="punctuation_en_distilbert")
model.cuda().eval()
wrap_forward_method(model)
model._prepare_for_export()
typecheck.set_typecheck_enabled(enabled=False)

with torch.no_grad():
    input_example = model.input_module.input_example(max_batch=4)
    input_list, input_dict = parse_input_example(input_example)

    print("Running torch.onnx.dynamo_export ...")
    options = torch.onnx.ExportOptions(dynamic_shapes=True)
    ex = torch.onnx.dynamo_export(model, *input_list, **input_dict, export_options=options)

    print("Running torch.export.export ...")
    x1 = torch.export.Dim("x1")
    x2 = torch.export.Dim("x2")
    x3 = torch.export.Dim("x3")
    b1 = torch.export.Dim("b1")
    b2 = torch.export.Dim("b2")
    b3 = torch.export.Dim("b3")
    dynamic_shapes={'input_ids': {0: b1, 1: x1}, 'attention_mask': {0: b2, 1: x2}, 'token_type_ids': {0: b3, 1: x3}}
    ex_model = torch.export.export(
        model,
        tuple(input_list),
        kwargs=input_dict,
        dynamic_shapes=dynamic_shapes,
        strict=False
    )

justinchuby · 2024-05-24T21:16:31Z

So torch.export fails and torch.onnx.dynamo_export succeeds?

borisfom · 2024-05-24T22:54:53Z

Correct: torch.onnx.dynamo_export with dynamic_shapes=True succeeds, but if I have to use export.export.first, even if I specify all the axes as dynamic with no bounds (which should be equivalent to dynamic_shapes=True), it fails in export.export as it can't calculate bounds properly, even with strict=False.
I have to use export.export() first for large models to force external ONNX format, as direct dynamo_export() produces non-parseable huge ONNX in this case - I files separate bugs for those issues.

borisfom mentioned this issue May 18, 2024

torch.export.export() fails with dynamic shapes when more than one shape is dynamic #126127

Open

borisfom changed the title ~~[ONNX] There should be a way to run export() with dynamic shape settings equivalent to dynamo_export(.., dynamic_shapes=True)~~ [ONNX] export() with dynamic shapes fails when only part of input dimensions are dynamic May 18, 2024

pianpwk self-assigned this May 18, 2024

mikaylagawarecki added oncall: export module: onnx Related to torch.onnx labels May 20, 2024

borisfom changed the title ~~[ONNX] export() with dynamic shapes fails when only part of input dimensions are dynamic~~ [ONNX] export() with dynamic shapes fails where dynamo_export(dynamic_shapes=True) succeeds May 20, 2024

borisfom mentioned this issue May 21, 2024

Dynamo Export: Some weights are exported as extra inputs #126071

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ONNX] export() with dynamic shapes fails where dynamo_export(dynamic_shapes=True) succeeds #126607

[ONNX] export() with dynamic shapes fails where dynamo_export(dynamic_shapes=True) succeeds #126607

borisfom commented May 18, 2024 •

edited

borisfom commented May 18, 2024

borisfom commented May 18, 2024

borisfom commented May 20, 2024 •

edited

borisfom commented May 20, 2024 •

edited

justinchuby commented May 24, 2024

borisfom commented May 24, 2024 •

edited

[ONNX] export() with dynamic shapes fails where dynamo_export(dynamic_shapes=True) succeeds #126607

[ONNX] export() with dynamic shapes fails where dynamo_export(dynamic_shapes=True) succeeds #126607

Comments

borisfom commented May 18, 2024 • edited

🐛 Describe the bug

Versions

borisfom commented May 18, 2024

borisfom commented May 18, 2024

borisfom commented May 20, 2024 • edited

borisfom commented May 20, 2024 • edited

justinchuby commented May 24, 2024

borisfom commented May 24, 2024 • edited

borisfom commented May 18, 2024 •

edited

borisfom commented May 20, 2024 •

edited

borisfom commented May 20, 2024 •

edited

borisfom commented May 24, 2024 •

edited