Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new parameter -cind, —custom_input_op_name_np_data_path #296

Merged
merged 7 commits into from
Apr 9, 2023

Conversation

On-JungWoan
Copy link
Contributor

@On-JungWoan On-JungWoan commented Apr 7, 2023

1. Content and background

  • Added -cid, --custom_input_data option
Use custom input data instead of dummy inputs.

ex) onnx2tf -i model.onnx -cid input.pkl

2. Summary of corrections

  • Test usage
import torch
import pickle
from os import path
from torch import nn

save_dir = 'onnx2tf/test/'

# model
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()


    def forward(self, x_1, x_2):
        x_1[x_1==1] = 0
        
        return x_1+x_2


# input data
x_1 = torch.tensor([2,3,4,5])
x_2 = torch.tensor([0,0,0,0])
m = Model().eval()


# export input
with open(path.join(save_dir, 'input.pkl'), 'wb') as f:
    pickle.dump([x_1.numpy(),x_2.numpy()], f) # Input type is must be [np.array(), np.array(), np.array()... ]!


# export onnx
torch.onnx.export(
    m,
    (x_1, x_2),
    path.join(save_dir, 'model.onnx'),
    verbose=True,
    opset_version=16
)
onnx2tf -i test/model.onnx -cid test/input.pkl
  • Check function

image

  • no -cid

image

  • -cid

image

3. Before/After (If there is an operating log that can be used as a reference)

4. Issue number (only if there is a related issue)

#282 (comment)

I made it with my limited skills. Please let me know if there are any mistakes or areas that need improvement.

@PINTO0309 PINTO0309 added the discussion Specification Discussion label Apr 7, 2023
@PINTO0309
Copy link
Owner

Thank you so much for reading the complex logic and issuing the pull request.

I would like to discuss a few technical aspects and pull request improvements.

  1. Python's pickle form is a binary object that is dangerous enough to run by itself as a virtual machine. If possible, I would avoid using pickle to avoid making the spec cumbersome to implement checks during deserialization, etc.
    https://lincolnloop.com/insights/playing-pickle-security/
  2. Since there is already a mechanism to load Numpy data with the -qcind option, I feel that it would be a very good idea if it could be handled well by changing the use of -qcind and the option name. However, it may be a bit difficult to modify the logic so that it does not affect other quantization logic, etc.
  3. I have not yet included my comments in the source code review, but I think a more detailed description of the options would make the content easier for users to understand. It is acceptable for the description to be a bit lengthy, but as much as possible, it would be appreciated if it included an explanation of when the option should be used, what data should be used, etc. The current description of the file is clear that it is a string parameter, but it is difficult to determine whether it is sufficient to specify the path to the file or whether something other than that needs to be specified.
        parser.add_argument(
            '-cid',
            '--custom_input_data',
            type=str,
            default=None,
            help=\
                'Use custom onnx input data instead of dummy inputs.'
        )  

Please let us discuss this so that we can improve the tool for the better.

@PINTO0309 PINTO0309 added the feature request feature request label Apr 7, 2023
@On-JungWoan
Copy link
Contributor Author

On-JungWoan commented Apr 7, 2023

If I understand correctly, you are suggesting to save the input data in the .npy format and use the logic of the -qcind option when loading it, is that right?

If that is right, could you briefly explain what I need to be careful about when using that logic?

@PINTO0309
Copy link
Owner

Your understanding is mostly correct, but I will add an additional explanation to avoid a little misunderstanding.

Currently, -qcind is an option that is only intended for inputting INT8 quantization calibration data into the tool. However, as you have faced with the issue, I feel that it would be better to extend the implications of this option a bit and improve the functionality so that it can also be diverted as validation data for the -cotof option.

There are many ways to do this, but for example, I think it would be better to change the name -qcind itself to -cind, so that -cind is referenced by both the -cotof and -oiqt options, so that the meaning is consistent and easy to understand. I also feel that it would avoid confusing users by having multiple similar optional functions.

I am not insisting on the existence of the -qcind option.

@On-JungWoan
Copy link
Contributor Author

I understand what you mean, and I'll do my best to complete the task to the best of my ability. However, since this task is very challenging for me, I'm not sure if I can complete the implementation in a short amount of time. I would appreciate it if you could understand this with an open mind.

@PINTO0309
Copy link
Owner

Of course. I will cooperate.

@On-JungWoan
Copy link
Contributor Author

On-JungWoan commented Apr 7, 2023

1. Summary of corrections

1-1) New parameter -cind

First of all, I have changed the -qcind option to -cind. Additionally, I have removed the -cid option.

And the -cind option receives input_op_name, numpy_file_path, mean, and std from the user, where mean and std are optional.

1-2) Usage in -cotof

  • Challenging

    However, the logic for cases where the input format of Tensorflow and the input format of ONNX are different (e.g. NCHW vs NHWC) has not been prepared yet.

1-3) Usage in -oiqt

2. Test

The test data is in onnx2tf/test_cind/model.py. You can simply test it by running the following command.

2-1) -cotof

onnx2tf -i test_cind/model.onnx -cind onnx::Equal_0 test_cind/x_1.npy -cind onnx::Add_1 test_cind/x_2.npy -cotof

2-2) -oiqt

onnx2tf -i test_cind/model.onnx -cind onnx::Equal_0 test_cind/x_1.npy MEAN STD -cind onnx::Add_1 test_cind/x_2.npy MEAN STD -oiqt

I haven't been able to write a description for the parameter yet. As soon as I have time, I will try to write it as soon as possible. Also, if there are any implementation errors, please let me know. As I am still a student, I need to attend classes or do assignments. Therefore, my response may be delayed. I would appreciate your understanding in this matter.

@On-JungWoan On-JungWoan changed the title new parameter -cid, -custom_input_data new parameter -cind, —custom_input_op_name_np_data_path Apr 8, 2023
Copy link
Owner

@PINTO0309 PINTO0309 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the revised code you suggested as best I can. It is perfectly acceptable if it takes some time, so please help us consider a correction.

Note that I have granted you the authority to run the regression test by CI, so every time you execute a commit, the regression test will be run by CI. In the unlikely event that your commit affects some of the past conversion logic, CI will display an error log. Please make use of it.

onnx2tf/onnx2tf.py Outdated Show resolved Hide resolved
onnx2tf/onnx2tf.py Outdated Show resolved Hide resolved
onnx2tf/onnx2tf.py Outdated Show resolved Hide resolved
onnx2tf/onnx2tf.py Outdated Show resolved Hide resolved
onnx2tf/onnx2tf.py Outdated Show resolved Hide resolved
onnx2tf/utils/common_functions.py Show resolved Hide resolved
test/input.pkl Outdated Show resolved Hide resolved
test/model.py Outdated Show resolved Hide resolved
test_cind/model.py Outdated Show resolved Hide resolved
@PINTO0309
Copy link
Owner

This is the first time I have learned that CI is pending pending approval until I approve the initial pull request. I will run CI manually for a while when I get your commit notification.

@PINTO0309
Copy link
Owner

Reviewed visually again. Looks good.

When the CI is all green, I'll merge it in, do a quick test on hand, and if there are no problems, I'll release it.

@PINTO0309
Copy link
Owner

PINTO0309 commented Apr 8, 2023

Oh, I needed to correct the description of your parameters. I'll look forward to it and wait for a while.

By the way, your current proposal is almost fine if no errors occur in the first 5 minutes of the CI.

@On-JungWoan
Copy link
Contributor Author

Ah, I see. Please wait for a moment.

@On-JungWoan
Copy link
Contributor Author

Would it be okay to write it like this? I would appreciate any advice as my writing skills are lacking!

            'Input name of OP and path of data file (Numpy) for custom input for -cotof or -oiqt, \n' + 
            'and mean (optional) and std (optional). \n' +

            '\n<Usage in -cotof> \n' +
            'When using -cotof, custom input defined by the user, instead of dummy data, is used. \n' +
            'In this case, mean and std are omitted from the input. \n' +
            '-cind {input_op_name} {numpy_file_path} \n' +
            'ex) -cind onnx::Equal_0 test_cind/x_1.npy -cind onnx::Add_1 test_cind/x_2.npy -cotof \n' +
            'The input_op_name must be the same as in ONNX, \n'
            'and it may not work if the input format is different between ONNX and TF. \n'

            '\n<Usage in -oiqt> \n' +
            'INPUT Name of OP and path of calibration data file (Numpy) for quantization \n' +
            'and mean and std. \n' +
            'The specification can be omitted only when the input OP is a single 4D tensor image data. \n' +
            'If omitted, it is automatically calibrated using 20 normalized MS-COCO images. \n' +
            'The type of the input OP must be Float32. \n' +
            'Data for calibration must be pre-normalized to a range of 0 to 1. \n' +
            '-cind {input_op_name} {numpy_file_path} {mean} {std} \n' +
            'Numpy file paths must be specified the same number of times as the number of input OPs. \n' +
            'Normalize the value of the input OP based on the tensor specified in mean and std. \n' +
            '(input_value - mean) / std \n' +
            'Tensors in Numpy file format must be in dimension order after conversion to TF. \n' +
            'Note that this is intended for deployment on low-resource devices, \n' +
            'so the batch size is limited to 1 only. \n\n' +
            'e.g. \n' +
            'The example below shows a case where there are three input OPs. \n' +
            'Assume input0 is 128x128 RGB image data. \n' +
            'In addition, input0 should be a value that has been divided by 255 \n' +
            'in the preprocessing and normalized to a range between 0 and 1. \n' +
            'input1 and input2 assume the input of something that is not an image. \n' +
            'Because input1 and input2 assume something that is not an image, \n' +
            'the divisor is not 255 when normalizing from 0 to 1. \n' +
            '"n" is the number of calibration data. \n\n' +
            'ONNX INPUT shapes: \n' +
            '   input0: [n,3,128,128] \n' +
            '       mean: [1,3,1,1] -> [[[[0.485]],[[0.456]],[[0.406]]]] \n' +
            '       std:  [1,3,1,1] -> [[[[0.229]],[[0.224]],[[0.225]]]] \n' +
            '   input1: [n,64,64] \n' +
            '       mean: [1,64] -> [0.1, ..., 0.64] \n' +
            '       std:  [1,64] -> [0.05, ..., 0.08] \n' +
            '   input2: [n,5] \n' +
            '       mean: [1] -> [0.3] \n' +
            '       std:  [1] -> [0.07] \n' +
            'TensorFlow INPUT shapes (Numpy file ndarray shapes): \n' +
            '   input0: [n,128,128,3] \n' +
            '       mean: [1,1,1,3] -> [[[[0.485, 0.456, 0.406]]]] \n' +
            '       std:  [1,1,1,3] -> [[[[0.229, 0.224, 0.225]]]] \n' +
            '   input1: [n,64,64] \n' +
            '       mean: [1,64] -> [0.1, ..., 0.64] \n' +
            '       std:  [1,64] -> [0.05, ..., 0.08] \n' +
            '   input2: [n,5] \n' +
            '       mean: [1] -> [0.3] \n' +
            '       std:  [1] -> [0.07] \n' +
            '-cind "input0" "../input0.npy" [[[[0.485, 0.456, 0.406]]]] [[[[0.229, 0.224, 0.225]]]] \n' +
            '-cind "input1" "./input1.npy" [0.1, ..., 0.64] [0.05, ..., 0.08] \n' +
            '-cind "input2" "input2.npy" [0.3] [0.07]'

@PINTO0309
Copy link
Owner

It looks generally good. It would be perfect if you could add a capturing description of what happens when -cotof and -oiqt are specified at the same time.

@PINTO0309
Copy link
Owner

Tomorrow I will go to bed today because I have to do some housework in the morning. I will check your corrections tomorrow when I have enough time. Please don't take it too hard.

@On-JungWoan
Copy link
Contributor Author

Can I write it like this? Actually, I don't know exactly what happens when -cotof and -oiqt are specified at the same time.

            '\n<Using -cotof and -oiqt at the same time> \n' +
            'To use -cotof and -oiqt simultaneously, \n' + 
            'you need to enter the Input name of OP, path of data file, mean, and std all together. \n' +
            'And the data file must be in Float32 format. \n'

@PINTO0309
Copy link
Owner

Your judgment is correct.

Ideally, when -cotof and -oiqt are specified at the same time, it would be better to print an error if the four parameters, i.e., MEAN and STD, are not specified. 👍

@On-JungWoan
Copy link
Contributor Author

Thank you for your response. I will try to write the rest tomorrow when I wake up. Have a good night.

@On-JungWoan
Copy link
Contributor Author

I just pushed it. Sorry for the delay.

@PINTO0309
Copy link
Owner

PINTO0309 commented Apr 9, 2023

Thanks. Probably not a problem, I'll merge and release when CI is all green. If users report problems after release, you can fix them each time. 👍

Perhaps there is a slight possibility that I will make a few tweaks in a few commits before release. For example, the README.

@PINTO0309 PINTO0309 merged commit 67b73ed into PINTO0309:main Apr 9, 2023
@PINTO0309 PINTO0309 mentioned this pull request Apr 9, 2023
@On-JungWoan
Copy link
Contributor Author

Thank you for merging it:)

@On-JungWoan On-JungWoan deleted the custom_input branch April 9, 2023 14:20
@PINTO0309
Copy link
Owner

If you notice any other problems or improvements, please feel free to submit a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Specification Discussion feature request feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants