Hi, will export to QLinear save weights in int8? #81

lucasjinreal · 2022-04-18T15:36:53Z

Using tensorrt backend, will QLinear make the onnx model smaller?
I got some error when trying to save to QLinear:

deploy/common.py", line 138, in optimize_model
    assert node_detect, "Graph is illegel, error occured!"
AssertionError: Graph is illegel, error occured!

The text was updated successfully, but these errors were encountered:

Tracin · 2022-04-19T06:12:12Z

ONNX_QNN backend will save model weights in INT8, so make the model smaller.

Any model file or something to reproduce the bug ?

lucasjinreal · 2022-04-19T06:37:24Z

@Tracin Hi, I am trying quantize a simple resnet18 model, I am using QNNX_QNN, it seems can sucssfully trace and insert FakeQuantize ops, but finally can not pass QNNX_QNN pass:

deploy/deploy_onnx_qnn.py", line 262, in format_qlinear_dtype_pass
    scale_proto = self.onnx_model.initializer[scale][0]
KeyError: '::FixedPerTensorAffine_400'

what could be the reason?

More log like:

WARNING: The shape inference of ::FixedPerTensorAffine type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of ::FixedPerTensorAffine type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of ::FixedPerChannelAffine type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of ::FixedPerTensorAffine type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of ::FixedPerChannelAffine type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of ::FixedPerTensorAffine type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of ::FixedPerTensorAffine type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of ::FixedPerTensorAffine type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
04.19 14:34:57 INFO convert_d...py]: <function deploy_qparams_tvm at 0x7fac38196790> BackendType.ONNX_QNN
04.19 14:34:57 INFO convert_d...py]: Convert to ONNX QNN.

btw, does QNNX QNN saved model can inference via ORT?

Tracin · 2022-04-19T07:02:21Z

It is not right absolutely, any simple code to reproduce the error?
The exported ONNX model using standard onnx opset 11, should be able to inferenced via ORT.

lucasjinreal · 2022-04-19T07:46:06Z

@Tracin I am not sure my code can run on your side or not, since I modified some code in mqbench to support torch1.11, (this is the main purpose I trying mqbench for decent pytorch support).

Here is the process I runed:

extra_prepare_dict:
    extra_qconfig_dict:
        w_observer: MinMaxObserver
        a_observer: EMAMinMaxObserver
        w_fakequantize: FixedFakeQuantize
        a_fakequantize: FixedFakeQuantize
        w_qscheme:
            bit: 8
            # symmetry: False
            symmetry: true
            per_channel: True
            pot_scale: False
        a_qscheme:
            bit: 8
            # symmetry: False
            symmetry: true
            per_channel: False
            pot_scale: False
quantize:
    quantize_type: naive_ptq # support naive_ptq or advanced_ptq
    cali_batchsize: 16
    # backend: 'Tensorrt'
    backend: 'ONNX_QNN'
    # backend: 'PPLW8A16'
    deploy:
        model_name: 'r18.onnx'
        output_path: './'
        deploy_to_qlinear: true
model:                    # architecture details
    type: resnet18        # model name
    kwargs:
        num_classes: 1000
    path: /path-of-pretrained
data:
    path: /path-of-imagenet
    batch_size: 64
    num_workers: 4
    pin_memory: True
    input_size: 224
    test_resize: 256
process:
    seed: 1005

Code:

if __name__ == '__main__':
    train_loader, test_loader = prepare_dataloader()

    config_f = sys.argv[1]
    config = parse_config(config_f)
    print(config)
    # first finetune model on cifar, we don't have imagnet so using cifar as test
    model = resnet18(pretrained=True)
    model.fc = nn.Linear(512, 10)
    if os.path.exists("r18_raw.pth"):
        model.load_state_dict(torch.load("r18_raw.pth", map_location="cpu"))
    else:
        # train_model(model, train_loader, test_loader, device)
        print("train finished.")
        # torch.save(model.state_dict(), "r18_raw.pth")
    model.to(device)
    model.eval()

    if hasattr(config, 'quantize'):
        model = get_quantize_model(model, config)
        print('now model in quantized mode.')
    
    model.to(device)
    evaluate_model(model, test_loader)

    # evaluate
    if not hasattr(config, 'quantize'):
        evaluate_model(model, test_loader)
    elif config.quantize.quantize_type == 'advanced_ptq':
        print('begin calibration now!')
        cali_data = load_calibrate_data(test_loader, cali_batchsize=config.quantize.cali_batchsize)
        from mqbench.utils.state import enable_quantization, enable_calibration_woquantization
        # do activation and weight calibration seperately for quick MSE per-channel for weight one
        model.eval()
        enable_calibration_woquantization(model, quantizer_type='act_fake_quant')
        for batch in cali_data:
            model(batch.cuda())
        enable_calibration_woquantization(model, quantizer_type='weight_fake_quant')
        model(cali_data[0].cuda())
        print('begin advanced PTQ now!')
        if hasattr(config.quantize, 'reconstruction'):
            model = ptq_reconstruction(
                model, cali_data, config.quantize.reconstruction)
        enable_quantization(model)
        evaluate_model(model, test_loader)
        if hasattr(config.quantize, 'deploy'):
            deploy(model, config)
    elif config.quantize.quantize_type == 'naive_ptq':
        print('begin calibration now!')
        cali_data = load_calibrate_data(test_loader, cali_batchsize=config.quantize.cali_batchsize)
        # do activation and weight calibration seperately for quick MSE per-channel for weight one
        model.eval()
        enable_calibration_woquantization(model, quantizer_type='act_fake_quant')
        for batch in cali_data:
            model(batch.to(device))
        enable_calibration_woquantization(model, quantizer_type='weight_fake_quant')
        model(cali_data[0].to(device))
        print('begin quantization now!')
        enable_quantization(model)
        # print(model)
        evaluate_model(model, test_loader)
        if hasattr(config.quantize, 'deploy'):
            deploy(model, config)
    else:
        print("The quantize_type must in 'naive_ptq' or 'advanced_ptq',")
        print("and 'advanced_ptq' need reconstruction configration.")

the model traced seems normally on saved onnx with Fake ops:

But, the problems is on the QNN pass can not get pass.

And I found these comented code will raise a Assertion:

any solution to fix this?

Tracin · 2022-04-19T08:07:25Z

Resnet18 can pass the test in https://github.com/ModelTC/MQBench/blob/main/test/backend/test_backend.py#L120-L131
Try chaning the model definition using in the testcase ?

lucasjinreal · 2022-04-19T08:13:30Z

@Tracin May I ask when will pytorch1.11 support? I do belive your test can run, as I mentioned before, I am runing on pytorch1.11 for upgradation on our product.

However, I found the problem caused be initializers. The initializers still old names:

def format_qlinear_dtype_pass(self):
        print(type(self.onnx_model.initializer))
        print(self.onnx_model.initializer.keys())
        for node in self.onnx_model.graph.node:
            if node.op_type in FAKE_QUANTIZE_OP:
                print(node)
                scale, zero_point, qmin, qmax = node.input[1], node.input[2], node.input[3], node.input[4]
                logger.debug(f'scale, z, qmin, qmax: {scale}, {zero_point}, {qmin} {qmax}')
                qmin = self.onnx_model.get_constant(qmin)
                qmax = self.onnx_model.get_constant(qmax)
                assert qmax - qmin == 2 ** 8 - 1, "Only 8 bit quantization support deploy to QNN."
                scale_proto = self.onnx_model.initializer[scale][0]
                if scale_proto.raw_data != b'' and scale_proto.dims[0] == 1:
                    scale_data = self.onnx_model.get_initializer(scale)
                    self.onnx_model.set_initializer(scale, scale_data.astype(np.float32), raw=False)
                zero_point_proto = self.onnx_model.initializer[zero_point][0]
                zero_point_data = self.onnx_model.get_initializer(zero_point)
                # Align sym and asym scheme.
                zero_point_data = (zero_point_data - qmin).reshape((1,))
                self.onnx_model.set_initializer(zero_point, zero_point_data.astype(np.uint8), raw=False)

initializers still old names:

dict_keys(['conv1.weight', 'conv1.bias', 'layer1.0.conv1.weight', 'layer1.0.conv1.bias', 'layer1.0.conv2.weight', 'layer1.0.conv2.bias', 'layer1.1.conv1.weight', 'layer1.1.conv1.bias', 'layer1.1.conv2.weight', 'layer1.1.conv2.bias', 'layer2.0.conv1.weight', 'layer2.0.conv1.bias', 'layer2.0.conv2.weight', 'layer2.0.conv2.bias', 'layer2.0.downsample.0.weight', 'layer2.0.downsample.0.bias', 'layer2.1.conv1.weight', 'layer2.1.conv1.bias', 'layer2.1.conv2.weight', 'layer2.1.conv2.bias', 'layer3.0.conv1.weight', 'layer3.0.conv1.bias', 'layer3.0.conv2.weight', 'layer3.0.conv2.bias', 'layer3.0.downsample.0.weight', 'layer3.0.downsample.0.bias', 'layer3.1.conv1.weight', 'layer3.1.conv1.bias', 'layer3.1.conv2.weight', 'layer3.1.conv2.bias', 'layer4.0.conv1.weight', 'layer4.0.conv1.bias', 'layer4.0.conv2.weight', 'layer4.0.conv2.bias', 'layer4.0.downsample.0.weight', 'layer4.0.downsample.0.bias', 'layer4.1.conv1.weight', 'layer4.1.conv1.bias', 'layer4.1.conv2.weight', 'layer4.1.conv2.bias', 'fc.weight', 'fc.bias'])

what might miss here?

Tracin · 2022-04-19T08:18:35Z

It is difficult to reproduce your code, to make things simple, can the specific test case pass under your python evn ?
If not, I would reproduce the bug easily, otherwise it might be wrong in other parts of code.

lucasjinreal · 2022-04-19T08:24:35Z

@Tracin You can have test on pytorch1.11. A lots of things changed including onnx export, fx etc.

I think I know where the problem is.

Tracin · 2022-04-19T08:33:09Z

@Tracin You can have test on pytorch1.11. A lots of things changed including onnx export, fx etc.

I think I know where the problem is.

It will take some time to test.
The problem is about the version or not ?

lucasjinreal · 2022-04-19T08:37:28Z

@Tracin yes. torch1.11 removed a lot of APIs used in mqbench.

Tracin added the bug Something isn't working label Apr 19, 2022

lucasjinreal closed this as completed Apr 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hi, will export to QLinear save weights in int8? #81

Hi, will export to QLinear save weights in int8? #81

lucasjinreal commented Apr 18, 2022

Tracin commented Apr 19, 2022

lucasjinreal commented Apr 19, 2022 •

edited

Tracin commented Apr 19, 2022

lucasjinreal commented Apr 19, 2022

Tracin commented Apr 19, 2022

lucasjinreal commented Apr 19, 2022

Tracin commented Apr 19, 2022 •

edited

lucasjinreal commented Apr 19, 2022

Tracin commented Apr 19, 2022

lucasjinreal commented Apr 19, 2022

Hi, will export to QLinear save weights in int8? #81

Hi, will export to QLinear save weights in int8? #81

Comments

lucasjinreal commented Apr 18, 2022

Tracin commented Apr 19, 2022

lucasjinreal commented Apr 19, 2022 • edited

Tracin commented Apr 19, 2022

lucasjinreal commented Apr 19, 2022

Tracin commented Apr 19, 2022

lucasjinreal commented Apr 19, 2022

Tracin commented Apr 19, 2022 • edited

lucasjinreal commented Apr 19, 2022

Tracin commented Apr 19, 2022

lucasjinreal commented Apr 19, 2022

lucasjinreal commented Apr 19, 2022 •

edited

Tracin commented Apr 19, 2022 •

edited