Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MobileFormer]Dimensions must be equal [Add Layer] #103

Closed
kevinz8866 opened this issue Jan 8, 2023 · 4 comments
Closed

[MobileFormer]Dimensions must be equal [Add Layer] #103

kevinz8866 opened this issue Jan 8, 2023 · 4 comments
Labels
Keras (.h5) Keras (.h5) opset < 11 Unsupported opset model output signaturedefs saved_model, output signaturedefs Parameter replacement Use Parameter replacement Transformer Transformer

Comments

@kevinz8866
Copy link

kevinz8866 commented Jan 8, 2023

Issue Type

Others

onnx2tf version number

1.4.2

onnx version number

1.12.0

tensorflow version number

2.10.1

Download URL for ONNX

https://drive.google.com/file/d/1vGzO9MZGX-yGz6ATm4yHVJMASZACuy2t/view?usp=share_link

Parameter Replacement JSON

{
  "format_version": 1,
  "operations": [
    {
      "op_name": "Add_282",
      "param_target": "pre_process_transpose_perm", 
      "param_name": "perm",
      "values": [0, 3, 1, 2]
    }
  ]
}

Description

Hi, thank you so much for actively maintaining this useful repo. I am trying to convert my custom pytorch model to onnx and then to keras for further tuning. I exported the model using opset==12. The issue occurred during convertion with an ADD LAYER when I run convert('onnx/py_model.onnx').

In attempting to solve this problem, I provided the parameter replacement file to the converter but got the same issue. It looks like one input for the layer is in NHWC and the other is in NCHW for op_name Add_282 (number 1 on the screenshot) in this case. From the error, I can see that one of the input for add is in correct order(NCHW) but the other one is in NHWC. So I look back to the operation Add_263 (number 2 on the screenshot) that is responsible for generating the wrong input and I found that the keras input for this operation is already in NHWC. I did not trace further up because I think that illustrate my problem: it seems like one of the input for the Add_282 is in correct shape.

So I wonder if there is some quick fix for this, like if I forgot to set some parameter or did incorrectly for parameter replacement, or version issue? I also find #17 but at the very end you said no need to do parameter replacement for transpose operation after 1.1.38 and after opset>11. I apologize in advance if I missed something super obvious, but could you provide some tips on how to fix this? Thanks again!

Screenshot

screenshot

Traceback (most recent call last):
  File "C:\Users\qz796\anaconda3\envs\phoodify\lib\site-packages\onnx2tf\utils\common_functions.py", line 267, in print_wrapper_func
    result = func(*args, **kwargs)
  File "C:\Users\qz796\anaconda3\envs\phoodify\lib\site-packages\onnx2tf\utils\common_functions.py", line 329, in inverted_operation_enable_disable_wrapper_func
    result = func(*args, **kwargs)
  File "C:\Users\qz796\anaconda3\envs\phoodify\lib\site-packages\onnx2tf\utils\common_functions.py", line 37, in get_replacement_parameter_wrapper_func
    func(*args, **kwargs)
  File "C:\Users\qz796\anaconda3\envs\phoodify\lib\site-packages\onnx2tf\ops\Add.py", line 126, in make_node
    tf.math.add(
  File "C:\Users\qz796\anaconda3\envs\phoodify\lib\site-packages\tensorflow\python\util\traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "C:\Users\qz796\anaconda3\envs\phoodify\lib\site-packages\keras\layers\core\tf_op_layer.py", line 119, in handle
    return TFOpLambda(op)(*args, **kwargs)
  File "C:\Users\qz796\anaconda3\envs\phoodify\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
ValueError: Exception encountered when calling layer "tf.math.add_101" (type TFOpLambda).

Dimensions must be equal, but are 56 and 16 for '{{node tf.math.add_101/Add}} = AddV2[T=DT_FLOAT](Placeholder, Placeholder_1)' with input shapes: [1,56,56,16], [1,16,56,56].

Call arguments received by layer "tf.math.add_101" (type TFOpLambda):
  • x=tf.Tensor(shape=(1, 56, 56, 16), dtype=float32)
  • y=tf.Tensor(shape=(1, 16, 56, 56), dtype=float32)
  • name='Add_248'
@kevinz8866 kevinz8866 reopened this Jan 8, 2023
@PINTO0309
Copy link
Owner

PINTO0309 commented Jan 9, 2023

I exported the model using opset==12.

First, your model has opset=9. This is a trivial point that has nothing to do directly with this issue.
image

The first problem is that the output of Reshape_247 is still NCHW. This is due to the presence of a Transpose immediately before the Reshape that gets in the way of the tool's decision to automatically transpose to NHWC.
image

image

INFO: onnx_op_type: Reshape onnx_op_name: Reshape_247
INFO:  input_name.1: onnx::Reshape_747 shape: [1, 2, 8, 3136] dtype: float32
INFO:  input_name.2: onnx::Reshape_2838 shape: [4] dtype: <class 'numpy.int64'>
INFO:  output_name.1: onnx::Add_753 shape: [1, 16, 56, 56] dtype: float32
INFO: tf_op_type: reshape
INFO:  input.1.tensor: name: tf.compat.v1.transpose_21/transpose:0 shape: (1, 2, 8, 3136) dtype: <dtype: 'float32'> 
INFO:  input.2.shape: val: [1, 16, 56, 56] 
INFO:  output.1.output: name: tf.reshape_13/Reshape:0 shape: (1, 16, 56, 56) dtype: <dtype: 'float32'> 

INFO: onnx_op_type: Add onnx_op_name: Add_248
INFO:  input_name.1: onnx::Reshape_700 shape: [1, 16, 56, 56] dtype: float32
INFO:  input_name.2: onnx::Add_753 shape: [1, 16, 56, 56] dtype: float32
INFO:  output_name.1: input.84 shape: [1, 16, 56, 56] dtype: float32
ERROR: The trace log is below.
Traceback (most recent call last):
  File "/home/b920405/git/onnx2tf/onnx2tf/utils/common_functions.py", line 267, in print_wrapper_func
    result = func(*args, **kwargs)
  File "/home/b920405/git/onnx2tf/onnx2tf/utils/common_functions.py", line 329, in inverted_operation_enable_disable_wrapper_func
    result = func(*args, **kwargs)
  File "/home/b920405/git/onnx2tf/onnx2tf/utils/common_functions.py", line 37, in get_replacement_parameter_wrapper_func
    func(*args, **kwargs)
  File "/home/b920405/git/onnx2tf/onnx2tf/ops/Add.py", line 126, in make_node
    tf.math.add(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/usr/local/lib/python3.8/dist-packages/keras/layers/core/tf_op_layer.py", line 119, in handle
    return TFOpLambda(op)(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
ValueError: Exception encountered when calling layer "tf.math.add_33" (type TFOpLambda).

Dimensions must be equal, but are 56 and 16 for '{{node tf.math.add_33/Add}} = AddV2[T=DT_FLOAT](Placeholder, Placeholder_1)' with input shapes: [1,56,56,16], [1,16,56,56].

Call arguments received by layer "tf.math.add_33" (type TFOpLambda):
  • x=tf.Tensor(shape=(1, 56, 56, 16), dtype=float32)
  • y=tf.Tensor(shape=(1, 16, 56, 56), dtype=float32)
  • name='Add_248'
ERROR: Read this and deal with it. https://github.com/PINTO0309/onnx2tf#parameter-replacement
ERROR: Alternatively, if the input OP has a dynamic dimension, use the -b or -ois option to rewrite it to a static shape and try again.
ERROR: If the input OP of ONNX before conversion is NHWC, use the -kt option.

This Transpose makes it impossible to determine if a shape can be converted to NHWC.
image

For now, let's transpose the output of Reshape_247 to the expected shape. [1,16,56,56] to [1,56,56,16]

  • replace_kevinz8866.json
    {
      "format_version": 1,
      "operations": [
        {
          "op_name": "Reshape_247",
          "param_target": "outputs",
          "param_name": "onnx::Add_753",
          "post_process_transpose_perm": [0,2,3,1]
        }
      ]
    }
onnx2tf -i py_model.onnx -prf replace_kevinz8866.json

The first Reshape shape error was avoided. But now another Reshape with a similar shape causes an error.
image
image

Append to JSON.

  • replace_kevinz8866.json
    {
      "format_version": 1,
      "operations": [
        {
          "op_name": "Reshape_247",
          "param_target": "outputs",
          "param_name": "onnx::Add_753",
          "post_process_transpose_perm": [0,2,3,1]
        },
        {
          "op_name": "Reshape_418",
          "param_target": "outputs",
          "param_name": "onnx::Add_1015",
          "post_process_transpose_perm": [0,2,3,1]
        }
      ]
    }
onnx2tf -i py_model.onnx -prf replace_kevinz8866.json

The error occurs again. Your model has a rather tricky structure for automatically converting models. Therefore, you will need to repeat the aforementioned operations several times and perform trial and error until there are no more errors.

It is not a difficult task, just tedious.

  • replace_kevinz8866.json
    {
      "format_version": 1,
      "operations": [
        {
          "op_name": "Reshape_247",
          "param_target": "outputs",
          "param_name": "onnx::Add_753",
          "post_process_transpose_perm": [0,2,3,1]
        },
        {
          "op_name": "Reshape_418",
          "param_target": "outputs",
          "param_name": "onnx::Add_1015",
          "post_process_transpose_perm": [0,2,3,1]
        },
        {
          "op_name": "Reshape_588",
          "param_target": "outputs",
          "param_name": "onnx::Add_1275",
          "post_process_transpose_perm": [0,2,3,1]
        },
        {
          "op_name": "Reshape_759",
          "param_target": "outputs",
          "param_name": "onnx::Add_1537",
          "post_process_transpose_perm": [0,2,3,1]
        },
        {
          "op_name": "Reshape_929",
          "param_target": "outputs",
          "param_name": "onnx::Add_1797",
          "post_process_transpose_perm": [0,2,3,1]
        },
        {
          "op_name": "Reshape_1098",
          "param_target": "outputs",
          "param_name": "onnx::Add_2056",
          "post_process_transpose_perm": [0,2,3,1]
        },
        {
          "op_name": "Reshape_1269",
          "param_target": "outputs",
          "param_name": "onnx::Add_2318",
          "post_process_transpose_perm": [0,2,3,1]
        },
        {
          "op_name": "Reshape_1439",
          "param_target": "outputs",
          "param_name": "onnx::Add_2578",
          "post_process_transpose_perm": [0,2,3,1]
        }
      ]
    }
onnx2tf -i py_model.onnx -prf replace_kevinz8866.json
         : 
         : 
         : 
 tf.math.multiply_226 (TFOpLambda)            (1, 101)                       0               ['tf.linalg.matmul_156[0][0]']                 
                                                                                                                                            
 tf.__operators__.add_51 (TFOpLambda)         (1, 101)                       0               ['tf.math.multiply_226[0][0]']                 
                                                                                                                                            
 tf.cast_84 (TFOpLambda)                      (1, 101)                       0               ['tf.__operators__.add_51[0][0]']              
                                                                                                                                            
============================================================================================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
____________________________________________________________________________________________________________________________________________

saved_model output started ==========================================================
saved_model output complete!
WARNING:absl:Please consider providing the trackable_obj argument in the from_concrete_functions. Providing without the trackable_obj argument is deprecated and it will use the deprecated conversion path.
Estimated count of arithmetic ops: 184.907 M  ops, equivalently 92.454 M  MACs
Float32 tflite output complete!
Estimated count of arithmetic ops: 184.907 M  ops, equivalently 92.454 M  MACs
Float16 tflite output complete!

Note that your model seems to contain Erf operations that are not supported by TFLite.
image

If you want to decompose Erf into primitive layers, you need to use the following command.

onnx2tf -i py_model.onnx -rerf -prf replace_kevinz8866.json

The following is a sample command if you need output in Keras h5 format.

onnx2tf -i py_model.onnx -oh5 -rerf -prf replace_kevinz8866.json

But unfortunately your model does not seem to support output to h5 format.

h5 output started ===================================================================
[E] No function: __iter__ registered for opset: 9
[E] No function: __iter__ registered for opset: 9
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/b920405/.vscode/extensions/ms-python.python-2022.20.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/home/b920405/.vscode/extensions/ms-python.python-2022.20.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/home/b920405/.vscode/extensions/ms-python.python-2022.20.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/home/b920405/.vscode/extensions/ms-python.python-2022.20.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/b920405/.vscode/extensions/ms-python.python-2022.20.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/b920405/.vscode/extensions/ms-python.python-2022.20.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/home/b920405/git/onnx2tf/onnx2tf/onnx2tf.py", line 1385, in <module>
    main()
  File "/home/b920405/git/onnx2tf/onnx2tf/onnx2tf.py", line 1342, in main
    model = convert(
  File "/home/b920405/git/onnx2tf/onnx2tf/onnx2tf.py", line 636, in convert
    model.save(f'{output_folder_path}/model_float32.h5')
  File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/usr/lib/python3.8/json/__init__.py", line 234, in dumps
    return cls(
  File "/usr/lib/python3.8/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.8/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
AttributeError: 'numpy.dtype[float32]' object has no attribute 'item'

As a workaround, you can use the -osd, --output_signaturedefs options to output a saved_model of the complete structure.

onnx2tf -i py_model.onnx -osd -rerf -prf replace_kevinz8866.json
saved_model_cli show --dir saved_model --all
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['x'] tensor_info:
        dtype: DT_FLOAT
        shape: (1, 224, 224, 3)
        name: serving_default_x:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['tf.cast_84'] tensor_info:
        dtype: DT_FLOAT
        shape: (1, 101)
        name: PartitionedCall:0
  Method name is: tensorflow/serving/predict
python
>> import tensorflow as tf
>> imported = tf.saved_model.load('saved_model')
>> 

Finally, if you don't mind me asking, could you tell me the official name of this model? If it is a publicly available model, I would like to use it in my research to enhance this tool.

Including the name of the model in the title of the isuue would be very useful for engineers around the world to explore the information.

e.g.
image

@PINTO0309 PINTO0309 added Parameter replacement Use Parameter replacement Keras (.h5) Keras (.h5) opset < 11 Unsupported opset model output signaturedefs saved_model, output signaturedefs labels Jan 9, 2023
@kevinz8866
Copy link
Author

kevinz8866 commented Jan 9, 2023

Hi Master PINTO0309,

Thank you so much for the timely response. Your explanation is very clear. Showing me one example is enough, but you completed all the tedious work and provided extra information on the downstream issues. I cannot thank you enough! I can replicate your result locally so feel free to close this issue. The model is a fine-tuned MobileFormer by Chen Et al., repo:https://github.com/AAboys/MobileFormer and paper: https://arxiv.org/abs/2108.05895. I was trying to replicate their results and see their performances on some fine-grained datasets.

Most of my workflow is on tensorflow so since yesterday I became fixated to find a way to convert pytorch to tf and have a keras model to do some extra tinkering. It wasn't even close until I find your repo! I guess this transpose/reshape issue might be linked to transformer achitecture which later might quite prevalent in production. This particular model is not very representative in terms of vision transformers because it is only for my research area. SwinTransformer is more well-known and it has well maintained repo in both pytorch https://github.com/microsoft/Swin-Transformer as well as tensorflow https://github.com/rishigami/Swin-Transformer-TF. The downside is that Swin is a much larger model than this.

Actually, there is another issue with this coverted model. The output values are a little off compared to the original pytorch model or the tensorflow representation from onnx_tf. I don't know if this is on your priority list but if you are interested in this, I can provide more info or create a new issue. Converting to TF is not a must for me so for now I am happy that all the codes runs through. Feel free to close this issue. Thank you so much!

@kevinz8866 kevinz8866 changed the title Dimensions must be equal [Add Layer] [MobileFormer]Dimensions must be equal [Add Layer] Jan 9, 2023
@PINTO0309
Copy link
Owner

Thank you!

The output values are a little off compared to the original pytorch model or the tensorflow representation from onnx_tf.

I am very interested. I believe that to be useful to other researchers, I need to work out all the little bugs that you have pointed out. I am very sorry to make this a hassle, but I need your help to improve this tool. I would appreciate it if you could post another issue if you can.

I will close this issue once, but will continue to improve the tool.

Thanks again. 👍

@kevinz8866
Copy link
Author

Cool. I will do that later this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Keras (.h5) Keras (.h5) opset < 11 Unsupported opset model output signaturedefs saved_model, output signaturedefs Parameter replacement Use Parameter replacement Transformer Transformer
Projects
None yet
Development

No branches or pull requests

2 participants