Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hi, will export to QLinear save weights in int8? #81

Closed
lucasjinreal opened this issue Apr 18, 2022 · 10 comments
Closed

Hi, will export to QLinear save weights in int8? #81

lucasjinreal opened this issue Apr 18, 2022 · 10 comments
Labels
bug Something isn't working

Comments

@lucasjinreal
Copy link

Using tensorrt backend, will QLinear make the onnx model smaller?
I got some error when trying to save to QLinear:

deploy/common.py", line 138, in optimize_model
    assert node_detect, "Graph is illegel, error occured!"
AssertionError: Graph is illegel, error occured!

@Tracin
Copy link
Contributor

Tracin commented Apr 19, 2022

ONNX_QNN backend will save model weights in INT8, so make the model smaller.

Any model file or something to reproduce the bug ?

@Tracin Tracin added the bug Something isn't working label Apr 19, 2022
@lucasjinreal
Copy link
Author

lucasjinreal commented Apr 19, 2022

@Tracin Hi, I am trying quantize a simple resnet18 model, I am using QNNX_QNN, it seems can sucssfully trace and insert FakeQuantize ops, but finally can not pass QNNX_QNN pass:

deploy/deploy_onnx_qnn.py", line 262, in format_qlinear_dtype_pass
    scale_proto = self.onnx_model.initializer[scale][0]
KeyError: '::FixedPerTensorAffine_400'

what could be the reason?

More log like:

WARNING: The shape inference of ::FixedPerTensorAffine type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of ::FixedPerTensorAffine type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of ::FixedPerChannelAffine type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of ::FixedPerTensorAffine type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of ::FixedPerChannelAffine type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of ::FixedPerTensorAffine type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of ::FixedPerTensorAffine type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of ::FixedPerTensorAffine type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
04.19 14:34:57 INFO convert_d...py]: <function deploy_qparams_tvm at 0x7fac38196790> BackendType.ONNX_QNN
04.19 14:34:57 INFO convert_d...py]: Convert to ONNX QNN.

btw, does QNNX QNN saved model can inference via ORT?

@Tracin
Copy link
Contributor

Tracin commented Apr 19, 2022

It is not right absolutely, any simple code to reproduce the error?
The exported ONNX model using standard onnx opset 11, should be able to inferenced via ORT.

@lucasjinreal
Copy link
Author

@Tracin I am not sure my code can run on your side or not, since I modified some code in mqbench to support torch1.11, (this is the main purpose I trying mqbench for decent pytorch support).

Here is the process I runed:

extra_prepare_dict:
    extra_qconfig_dict:
        w_observer: MinMaxObserver
        a_observer: EMAMinMaxObserver
        w_fakequantize: FixedFakeQuantize
        a_fakequantize: FixedFakeQuantize
        w_qscheme:
            bit: 8
            # symmetry: False
            symmetry: true
            per_channel: True
            pot_scale: False
        a_qscheme:
            bit: 8
            # symmetry: False
            symmetry: true
            per_channel: False
            pot_scale: False
quantize:
    quantize_type: naive_ptq # support naive_ptq or advanced_ptq
    cali_batchsize: 16
    # backend: 'Tensorrt'
    backend: 'ONNX_QNN'
    # backend: 'PPLW8A16'
    deploy:
        model_name: 'r18.onnx'
        output_path: './'
        deploy_to_qlinear: true
model:                    # architecture details
    type: resnet18        # model name
    kwargs:
        num_classes: 1000
    path: /path-of-pretrained
data:
    path: /path-of-imagenet
    batch_size: 64
    num_workers: 4
    pin_memory: True
    input_size: 224
    test_resize: 256
process:
    seed: 1005

Code:

if __name__ == '__main__':
    train_loader, test_loader = prepare_dataloader()

    config_f = sys.argv[1]
    config = parse_config(config_f)
    print(config)
    # first finetune model on cifar, we don't have imagnet so using cifar as test
    model = resnet18(pretrained=True)
    model.fc = nn.Linear(512, 10)
    if os.path.exists("r18_raw.pth"):
        model.load_state_dict(torch.load("r18_raw.pth", map_location="cpu"))
    else:
        # train_model(model, train_loader, test_loader, device)
        print("train finished.")
        # torch.save(model.state_dict(), "r18_raw.pth")
    model.to(device)
    model.eval()

    if hasattr(config, 'quantize'):
        model = get_quantize_model(model, config)
        print('now model in quantized mode.')
    
    model.to(device)
    evaluate_model(model, test_loader)

    # evaluate
    if not hasattr(config, 'quantize'):
        evaluate_model(model, test_loader)
    elif config.quantize.quantize_type == 'advanced_ptq':
        print('begin calibration now!')
        cali_data = load_calibrate_data(test_loader, cali_batchsize=config.quantize.cali_batchsize)
        from mqbench.utils.state import enable_quantization, enable_calibration_woquantization
        # do activation and weight calibration seperately for quick MSE per-channel for weight one
        model.eval()
        enable_calibration_woquantization(model, quantizer_type='act_fake_quant')
        for batch in cali_data:
            model(batch.cuda())
        enable_calibration_woquantization(model, quantizer_type='weight_fake_quant')
        model(cali_data[0].cuda())
        print('begin advanced PTQ now!')
        if hasattr(config.quantize, 'reconstruction'):
            model = ptq_reconstruction(
                model, cali_data, config.quantize.reconstruction)
        enable_quantization(model)
        evaluate_model(model, test_loader)
        if hasattr(config.quantize, 'deploy'):
            deploy(model, config)
    elif config.quantize.quantize_type == 'naive_ptq':
        print('begin calibration now!')
        cali_data = load_calibrate_data(test_loader, cali_batchsize=config.quantize.cali_batchsize)
        # do activation and weight calibration seperately for quick MSE per-channel for weight one
        model.eval()
        enable_calibration_woquantization(model, quantizer_type='act_fake_quant')
        for batch in cali_data:
            model(batch.to(device))
        enable_calibration_woquantization(model, quantizer_type='weight_fake_quant')
        model(cali_data[0].to(device))
        print('begin quantization now!')
        enable_quantization(model)
        # print(model)
        evaluate_model(model, test_loader)
        if hasattr(config.quantize, 'deploy'):
            deploy(model, config)
    else:
        print("The quantize_type must in 'naive_ptq' or 'advanced_ptq',")
        print("and 'advanced_ptq' need reconstruction configration.")

the model traced seems normally on saved onnx with Fake ops:

image

But, the problems is on the QNN pass can not get pass.

And I found these comented code will raise a Assertion:

image

any solution to fix this?

@Tracin
Copy link
Contributor

Tracin commented Apr 19, 2022

Resnet18 can pass the test in https://github.com/ModelTC/MQBench/blob/main/test/backend/test_backend.py#L120-L131
Try chaning the model definition using in the testcase ?

@lucasjinreal
Copy link
Author

@Tracin May I ask when will pytorch1.11 support? I do belive your test can run, as I mentioned before, I am runing on pytorch1.11 for upgradation on our product.

However, I found the problem caused be initializers. The initializers still old names:

def format_qlinear_dtype_pass(self):
        print(type(self.onnx_model.initializer))
        print(self.onnx_model.initializer.keys())
        for node in self.onnx_model.graph.node:
            if node.op_type in FAKE_QUANTIZE_OP:
                print(node)
                scale, zero_point, qmin, qmax = node.input[1], node.input[2], node.input[3], node.input[4]
                logger.debug(f'scale, z, qmin, qmax: {scale}, {zero_point}, {qmin} {qmax}')
                qmin = self.onnx_model.get_constant(qmin)
                qmax = self.onnx_model.get_constant(qmax)
                assert qmax - qmin == 2 ** 8 - 1, "Only 8 bit quantization support deploy to QNN."
                scale_proto = self.onnx_model.initializer[scale][0]
                if scale_proto.raw_data != b'' and scale_proto.dims[0] == 1:
                    scale_data = self.onnx_model.get_initializer(scale)
                    self.onnx_model.set_initializer(scale, scale_data.astype(np.float32), raw=False)
                zero_point_proto = self.onnx_model.initializer[zero_point][0]
                zero_point_data = self.onnx_model.get_initializer(zero_point)
                # Align sym and asym scheme.
                zero_point_data = (zero_point_data - qmin).reshape((1,))
                self.onnx_model.set_initializer(zero_point, zero_point_data.astype(np.uint8), raw=False)

initializers still old names:

dict_keys(['conv1.weight', 'conv1.bias', 'layer1.0.conv1.weight', 'layer1.0.conv1.bias', 'layer1.0.conv2.weight', 'layer1.0.conv2.bias', 'layer1.1.conv1.weight', 'layer1.1.conv1.bias', 'layer1.1.conv2.weight', 'layer1.1.conv2.bias', 'layer2.0.conv1.weight', 'layer2.0.conv1.bias', 'layer2.0.conv2.weight', 'layer2.0.conv2.bias', 'layer2.0.downsample.0.weight', 'layer2.0.downsample.0.bias', 'layer2.1.conv1.weight', 'layer2.1.conv1.bias', 'layer2.1.conv2.weight', 'layer2.1.conv2.bias', 'layer3.0.conv1.weight', 'layer3.0.conv1.bias', 'layer3.0.conv2.weight', 'layer3.0.conv2.bias', 'layer3.0.downsample.0.weight', 'layer3.0.downsample.0.bias', 'layer3.1.conv1.weight', 'layer3.1.conv1.bias', 'layer3.1.conv2.weight', 'layer3.1.conv2.bias', 'layer4.0.conv1.weight', 'layer4.0.conv1.bias', 'layer4.0.conv2.weight', 'layer4.0.conv2.bias', 'layer4.0.downsample.0.weight', 'layer4.0.downsample.0.bias', 'layer4.1.conv1.weight', 'layer4.1.conv1.bias', 'layer4.1.conv2.weight', 'layer4.1.conv2.bias', 'fc.weight', 'fc.bias'])

what might miss here?

@Tracin
Copy link
Contributor

Tracin commented Apr 19, 2022

It is difficult to reproduce your code, to make things simple, can the specific test case pass under your python evn ?
If not, I would reproduce the bug easily, otherwise it might be wrong in other parts of code.

@lucasjinreal
Copy link
Author

@Tracin You can have test on pytorch1.11. A lots of things changed including onnx export, fx etc.

I think I know where the problem is.

@Tracin
Copy link
Contributor

Tracin commented Apr 19, 2022

@Tracin You can have test on pytorch1.11. A lots of things changed including onnx export, fx etc.

I think I know where the problem is.

It will take some time to test.
The problem is about the version or not ?

@lucasjinreal
Copy link
Author

@Tracin yes. torch1.11 removed a lot of APIs used in mqbench.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants