new parameter `-cind`, `—custom_input_op_name_np_data_path` #296

On-JungWoan · 2023-04-07T08:48:32Z

1. Content and background

Added -cid, --custom_input_data option

Use custom input data instead of dummy inputs.

ex) onnx2tf -i model.onnx -cid input.pkl

2. Summary of corrections

Test usage

import torch
import pickle
from os import path
from torch import nn

save_dir = 'onnx2tf/test/'

# model
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()


    def forward(self, x_1, x_2):
        x_1[x_1==1] = 0
        
        return x_1+x_2


# input data
x_1 = torch.tensor([2,3,4,5])
x_2 = torch.tensor([0,0,0,0])
m = Model().eval()


# export input
with open(path.join(save_dir, 'input.pkl'), 'wb') as f:
    pickle.dump([x_1.numpy(),x_2.numpy()], f) # Input type is must be [np.array(), np.array(), np.array()... ]!


# export onnx
torch.onnx.export(
    m,
    (x_1, x_2),
    path.join(save_dir, 'model.onnx'),
    verbose=True,
    opset_version=16
)

onnx2tf -i test/model.onnx -cid test/input.pkl

Check function

no -cid

-cid

3. Before/After (If there is an operating log that can be used as a reference)

4. Issue number (only if there is a related issue)

#282 (comment)

I made it with my limited skills. Please let me know if there are any mistakes or areas that need improvement.

PINTO0309 · 2023-04-07T09:40:58Z

Thank you so much for reading the complex logic and issuing the pull request.

I would like to discuss a few technical aspects and pull request improvements.

Python's pickle form is a binary object that is dangerous enough to run by itself as a virtual machine. If possible, I would avoid using pickle to avoid making the spec cumbersome to implement checks during deserialization, etc.
https://lincolnloop.com/insights/playing-pickle-security/
Since there is already a mechanism to load Numpy data with the -qcind option, I feel that it would be a very good idea if it could be handled well by changing the use of -qcind and the option name. However, it may be a bit difficult to modify the logic so that it does not affect other quantization logic, etc.
I have not yet included my comments in the source code review, but I think a more detailed description of the options would make the content easier for users to understand. It is acceptable for the description to be a bit lengthy, but as much as possible, it would be appreciated if it included an explanation of when the option should be used, what data should be used, etc. The current description of the file is clear that it is a string parameter, but it is difficult to determine whether it is sufficient to specify the path to the file or whether something other than that needs to be specified.
```
    parser.add_argument(
        '-cid',
        '--custom_input_data',
        type=str,
        default=None,
        help=\
            'Use custom onnx input data instead of dummy inputs.'
    )  
```

Please let us discuss this so that we can improve the tool for the better.

On-JungWoan · 2023-04-07T10:12:21Z

If I understand correctly, you are suggesting to save the input data in the .npy format and use the logic of the -qcind option when loading it, is that right?

If that is right, could you briefly explain what I need to be careful about when using that logic?

PINTO0309 · 2023-04-07T10:19:07Z

Your understanding is mostly correct, but I will add an additional explanation to avoid a little misunderstanding.

Currently, -qcind is an option that is only intended for inputting INT8 quantization calibration data into the tool. However, as you have faced with the issue, I feel that it would be better to extend the implications of this option a bit and improve the functionality so that it can also be diverted as validation data for the -cotof option.

There are many ways to do this, but for example, I think it would be better to change the name -qcind itself to -cind, so that -cind is referenced by both the -cotof and -oiqt options, so that the meaning is consistent and easy to understand. I also feel that it would avoid confusing users by having multiple similar optional functions.

I am not insisting on the existence of the -qcind option.

On-JungWoan · 2023-04-07T10:38:24Z

I understand what you mean, and I'll do my best to complete the task to the best of my ability. However, since this task is very challenging for me, I'm not sure if I can complete the implementation in a short amount of time. I would appreciate it if you could understand this with an open mind.

PINTO0309 · 2023-04-07T10:39:06Z

Of course. I will cooperate.

On-JungWoan · 2023-04-07T17:54:41Z

1. Summary of corrections

1-1) New parameter `-cind`

First of all, I have changed the -qcind option to -cind. Additionally, I have removed the -cid option.

And the -cind option receives input_op_name, numpy_file_path, mean, and std from the user, where mean and std are optional.

1-2) Usage in `-cotof`

Challenging

However, the logic for cases where the input format of Tensorflow and the input format of ONNX are different (e.g. NCHW vs NHWC) has not been prepared yet.

1-3) Usage in `-oiqt`

2. Test

The test data is in onnx2tf/test_cind/model.py. You can simply test it by running the following command.

2-1) `-cotof`

onnx2tf -i test_cind/model.onnx -cind onnx::Equal_0 test_cind/x_1.npy -cind onnx::Add_1 test_cind/x_2.npy -cotof

2-2) `-oiqt`

onnx2tf -i test_cind/model.onnx -cind onnx::Equal_0 test_cind/x_1.npy MEAN STD -cind onnx::Add_1 test_cind/x_2.npy MEAN STD -oiqt

I haven't been able to write a description for the parameter yet. As soon as I have time, I will try to write it as soon as possible. Also, if there are any implementation errors, please let me know. As I am still a student, I need to attend classes or do assignments. Therefore, my response may be delayed. I would appreciate your understanding in this matter.

PINTO0309

I have reviewed the revised code you suggested as best I can. It is perfectly acceptable if it takes some time, so please help us consider a correction.

Note that I have granted you the authority to run the regression test by CI, so every time you execute a commit, the regression test will be run by CI. In the unlikely event that your commit affects some of the past conversion logic, CI will display an error log. Please make use of it.

onnx2tf/onnx2tf.py

onnx2tf/utils/common_functions.py

test/input.pkl

test/model.py

test_cind/model.py

PINTO0309 · 2023-04-08T15:05:30Z

This is the first time I have learned that CI is pending pending approval until I approve the initial pull request. I will run CI manually for a while when I get your commit notification.

PINTO0309 · 2023-04-08T15:08:07Z

Reviewed visually again. Looks good.

When the CI is all green, I'll merge it in, do a quick test on hand, and if there are no problems, I'll release it.

PINTO0309 · 2023-04-08T15:12:08Z

Oh, I needed to correct the description of your parameters. I'll look forward to it and wait for a while.

By the way, your current proposal is almost fine if no errors occur in the first 5 minutes of the CI.

On-JungWoan · 2023-04-08T15:14:20Z

Ah, I see. Please wait for a moment.

On-JungWoan · 2023-04-08T15:34:12Z

Would it be okay to write it like this? I would appreciate any advice as my writing skills are lacking!

            'Input name of OP and path of data file (Numpy) for custom input for -cotof or -oiqt, \n' + 
            'and mean (optional) and std (optional). \n' +

            '\n<Usage in -cotof> \n' +
            'When using -cotof, custom input defined by the user, instead of dummy data, is used. \n' +
            'In this case, mean and std are omitted from the input. \n' +
            '-cind {input_op_name} {numpy_file_path} \n' +
            'ex) -cind onnx::Equal_0 test_cind/x_1.npy -cind onnx::Add_1 test_cind/x_2.npy -cotof \n' +
            'The input_op_name must be the same as in ONNX, \n'
            'and it may not work if the input format is different between ONNX and TF. \n'

            '\n<Usage in -oiqt> \n' +
            'INPUT Name of OP and path of calibration data file (Numpy) for quantization \n' +
            'and mean and std. \n' +
            'The specification can be omitted only when the input OP is a single 4D tensor image data. \n' +
            'If omitted, it is automatically calibrated using 20 normalized MS-COCO images. \n' +
            'The type of the input OP must be Float32. \n' +
            'Data for calibration must be pre-normalized to a range of 0 to 1. \n' +
            '-cind {input_op_name} {numpy_file_path} {mean} {std} \n' +
            'Numpy file paths must be specified the same number of times as the number of input OPs. \n' +
            'Normalize the value of the input OP based on the tensor specified in mean and std. \n' +
            '(input_value - mean) / std \n' +
            'Tensors in Numpy file format must be in dimension order after conversion to TF. \n' +
            'Note that this is intended for deployment on low-resource devices, \n' +
            'so the batch size is limited to 1 only. \n\n' +
            'e.g. \n' +
            'The example below shows a case where there are three input OPs. \n' +
            'Assume input0 is 128x128 RGB image data. \n' +
            'In addition, input0 should be a value that has been divided by 255 \n' +
            'in the preprocessing and normalized to a range between 0 and 1. \n' +
            'input1 and input2 assume the input of something that is not an image. \n' +
            'Because input1 and input2 assume something that is not an image, \n' +
            'the divisor is not 255 when normalizing from 0 to 1. \n' +
            '"n" is the number of calibration data. \n\n' +
            'ONNX INPUT shapes: \n' +
            '   input0: [n,3,128,128] \n' +
            '       mean: [1,3,1,1] -> [[[[0.485]],[[0.456]],[[0.406]]]] \n' +
            '       std:  [1,3,1,1] -> [[[[0.229]],[[0.224]],[[0.225]]]] \n' +
            '   input1: [n,64,64] \n' +
            '       mean: [1,64] -> [0.1, ..., 0.64] \n' +
            '       std:  [1,64] -> [0.05, ..., 0.08] \n' +
            '   input2: [n,5] \n' +
            '       mean: [1] -> [0.3] \n' +
            '       std:  [1] -> [0.07] \n' +
            'TensorFlow INPUT shapes (Numpy file ndarray shapes): \n' +
            '   input0: [n,128,128,3] \n' +
            '       mean: [1,1,1,3] -> [[[[0.485, 0.456, 0.406]]]] \n' +
            '       std:  [1,1,1,3] -> [[[[0.229, 0.224, 0.225]]]] \n' +
            '   input1: [n,64,64] \n' +
            '       mean: [1,64] -> [0.1, ..., 0.64] \n' +
            '       std:  [1,64] -> [0.05, ..., 0.08] \n' +
            '   input2: [n,5] \n' +
            '       mean: [1] -> [0.3] \n' +
            '       std:  [1] -> [0.07] \n' +
            '-cind "input0" "../input0.npy" [[[[0.485, 0.456, 0.406]]]] [[[[0.229, 0.224, 0.225]]]] \n' +
            '-cind "input1" "./input1.npy" [0.1, ..., 0.64] [0.05, ..., 0.08] \n' +
            '-cind "input2" "input2.npy" [0.3] [0.07]'

PINTO0309 · 2023-04-08T15:37:37Z

It looks generally good. It would be perfect if you could add a capturing description of what happens when -cotof and -oiqt are specified at the same time.

PINTO0309 · 2023-04-08T15:49:12Z

Tomorrow I will go to bed today because I have to do some housework in the morning. I will check your corrections tomorrow when I have enough time. Please don't take it too hard.

On-JungWoan · 2023-04-08T15:49:39Z

Can I write it like this? Actually, I don't know exactly what happens when -cotof and -oiqt are specified at the same time.

            '\n<Using -cotof and -oiqt at the same time> \n' +
            'To use -cotof and -oiqt simultaneously, \n' + 
            'you need to enter the Input name of OP, path of data file, mean, and std all together. \n' +
            'And the data file must be in Float32 format. \n'

PINTO0309 · 2023-04-08T15:53:03Z

Your judgment is correct.

Ideally, when -cotof and -oiqt are specified at the same time, it would be better to print an error if the four parameters, i.e., MEAN and STD, are not specified. 👍

On-JungWoan · 2023-04-08T15:53:59Z

Thank you for your response. I will try to write the rest tomorrow when I wake up. Have a good night.

On-JungWoan · 2023-04-09T12:30:56Z

I just pushed it. Sorry for the delay.

PINTO0309 · 2023-04-09T12:45:03Z

Thanks. Probably not a problem, I'll merge and release when CI is all green. If users report problems after release, you can fix them each time. 👍

Perhaps there is a slight possibility that I will make a few tweaks in a few commits before release. For example, the README.

On-JungWoan · 2023-04-09T14:20:13Z

Thank you for merging it:)

PINTO0309 · 2023-04-09T14:22:56Z

If you notice any other problems or improvements, please feel free to submit a pull request.

new parameter -cid

8112d20

PINTO0309 added the discussion Specification Discussion label Apr 7, 2023

PINTO0309 added the feature request feature request label Apr 7, 2023

make parameter

4ce4555

remove custom input check function

3a3e302

On-JungWoan changed the title ~~new parameter -cid, -custom_input_data~~ new parameter -cind, —custom_input_op_name_np_data_path Apr 8, 2023

PINTO0309 reviewed Apr 8, 2023

View reviewed changes

On-JungWoan added 2 commits April 8, 2023 23:58

some changes

0ebfb8f

delete test file

caad539

add description

d1002dc

Fix typo

60917f4

PINTO0309 merged commit 67b73ed into PINTO0309:main Apr 9, 2023

PINTO0309 mentioned this pull request Apr 9, 2023

README Adjustments #305

Merged

On-JungWoan deleted the custom_input branch April 9, 2023 14:20

motokimura mentioned this pull request Apr 10, 2023

Correctly apply mean and std params of -cind option #306

Merged

PINTO0309 mentioned this pull request May 4, 2023

Several problems when INT8-quantizing with calibration data #339

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new parameter `-cind`, `—custom_input_op_name_np_data_path` #296

new parameter `-cind`, `—custom_input_op_name_np_data_path` #296

On-JungWoan commented Apr 7, 2023 •

edited

Loading

PINTO0309 commented Apr 7, 2023

On-JungWoan commented Apr 7, 2023 •

edited

Loading

PINTO0309 commented Apr 7, 2023

On-JungWoan commented Apr 7, 2023

PINTO0309 commented Apr 7, 2023

On-JungWoan commented Apr 7, 2023 •

edited

Loading

PINTO0309 left a comment

PINTO0309 commented Apr 8, 2023

PINTO0309 commented Apr 8, 2023

PINTO0309 commented Apr 8, 2023 •

edited

Loading

On-JungWoan commented Apr 8, 2023

On-JungWoan commented Apr 8, 2023

PINTO0309 commented Apr 8, 2023

PINTO0309 commented Apr 8, 2023

On-JungWoan commented Apr 8, 2023

PINTO0309 commented Apr 8, 2023

On-JungWoan commented Apr 8, 2023

On-JungWoan commented Apr 9, 2023

PINTO0309 commented Apr 9, 2023 •

edited

Loading

On-JungWoan commented Apr 9, 2023

PINTO0309 commented Apr 9, 2023

new parameter -cind, —custom_input_op_name_np_data_path #296

new parameter -cind, —custom_input_op_name_np_data_path #296

Conversation

On-JungWoan commented Apr 7, 2023 • edited Loading

1. Content and background

2. Summary of corrections

3. Before/After (If there is an operating log that can be used as a reference)

4. Issue number (only if there is a related issue)

PINTO0309 commented Apr 7, 2023

On-JungWoan commented Apr 7, 2023 • edited Loading

PINTO0309 commented Apr 7, 2023

On-JungWoan commented Apr 7, 2023

PINTO0309 commented Apr 7, 2023

On-JungWoan commented Apr 7, 2023 • edited Loading

1. Summary of corrections

1-1) New parameter -cind

1-2) Usage in -cotof

1-3) Usage in -oiqt

2. Test

2-1) -cotof

2-2) -oiqt

PINTO0309 left a comment

Choose a reason for hiding this comment

PINTO0309 commented Apr 8, 2023

PINTO0309 commented Apr 8, 2023

PINTO0309 commented Apr 8, 2023 • edited Loading

On-JungWoan commented Apr 8, 2023

On-JungWoan commented Apr 8, 2023

PINTO0309 commented Apr 8, 2023

PINTO0309 commented Apr 8, 2023

On-JungWoan commented Apr 8, 2023

PINTO0309 commented Apr 8, 2023

On-JungWoan commented Apr 8, 2023

On-JungWoan commented Apr 9, 2023

PINTO0309 commented Apr 9, 2023 • edited Loading

On-JungWoan commented Apr 9, 2023

PINTO0309 commented Apr 9, 2023

new parameter `-cind`, `—custom_input_op_name_np_data_path` #296

new parameter `-cind`, `—custom_input_op_name_np_data_path` #296

On-JungWoan commented Apr 7, 2023 •

edited

Loading

On-JungWoan commented Apr 7, 2023 •

edited

Loading

On-JungWoan commented Apr 7, 2023 •

edited

Loading

1-1) New parameter `-cind`

1-2) Usage in `-cotof`

1-3) Usage in `-oiqt`

2-1) `-cotof`

2-2) `-oiqt`

PINTO0309 commented Apr 8, 2023 •

edited

Loading

PINTO0309 commented Apr 9, 2023 •

edited

Loading