Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQBench的结果与SNPE DSP的结果不是位精确的 #109

Closed
changewOw opened this issue Jun 8, 2022 · 7 comments
Closed

MQBench的结果与SNPE DSP的结果不是位精确的 #109

changewOw opened this issue Jun 8, 2022 · 7 comments
Labels
good first issue Good for newcomers Stale

Comments

@changewOw
Copy link

changewOw commented Jun 8, 2022

MQBench是一个非常有趣的项目。

环境
pytorch: 1.8.1
MQBench: branch main, e217520
SNPE: snpe-1.61.0.3358

问题:
我用一个只有两层卷积模型做了一个简单的测试,比对MQBench 量化后的结果和SNPE DSP的结果,发现并不是位精确的,请问一下这是否是正常的,我是否有哪里做错了。

复现

  • MQBench量化
def seed_torchv2(seed: int = 42) -> None:
    np.random.seed(seed)
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.conv = nn.Conv2d(3, 128,1,1, bias=True)
        self.conv2 = nn.Conv2d(128, 20,1,1,bias=True)
        self.relu = nn.ReLU()
        self.flat = nn.Flatten(1)

    def forward(self, x): # (1,3,20,20)
        x = self.avg_pool(x)
        x = self.conv(x)
        x = self.conv2(x)
        x = self.flat(x)
        return x

    
SIZE = 20
backend = BackendType.SNPE

np.set_printoptions(suppress=True, precision=6)
torch.set_printoptions(6)
seed_torchv2(42)


def gen_input_data(length=100):
    data = []
    for _ in range(length):
        data.append(np.ones((1,3,SIZE,SIZE), dtype=np.float32) * 0.1 * np.random.randint(0, 10))
    return np.stack(data, axis=0)


model = Net()          # use vision pre-defined model
model.eval()

train_data = gen_input_data(100)
dummy_input = np.zeros((1,3,SIZE,SIZE), dtype=np.float32) + 0.5


print("pytorch fp32 result")
print(model(torch.from_numpy(dummy_input.copy())).float())


# quant
model = prepare_by_platform(model, backend)

enable_calibration(model)

for i, d in enumerate(train_data):
    _ = model(torch.from_numpy(d).float())

enable_quantization(model)


print("quant sim result")
print(model(torch.from_numpy(dummy_input.copy())).float())


input_shape = {"image":[1,3,SIZE,SIZE]}
convert_deploy(model, backend, input_shape)

# save dummy input and test it on DSP
image = dummy_input.copy()
assert image.shape == (1,3,SIZE,SIZE)
assert image.dtype == np.float32
image.tofile("./tmp.raw")
print("#" * 50)
pytorch fp32 result
tensor([[-0.347889, -0.289117, -0.083191, -0.222827,  0.124699,  0.235278,
          0.434433, -0.302174, -0.047763,  0.229472, -0.037784,  0.082496,
         -0.150852, -0.170281,  0.130777,  0.146441, -0.494992, -0.182881,
          0.600709, -0.063706]], grad_fn=<ViewBackward>)

quant sim result
tensor([[-0.344930, -0.290467, -0.081694, -0.222389,  0.131618,  0.231466,
          0.435701, -0.299544, -0.049924,  0.226927, -0.036308,  0.081694,
         -0.149772, -0.172465,  0.131618,  0.149772, -0.494702, -0.181542,
          0.599088, -0.063540]], grad_fn=<ViewBackward>
  • DLC转换
    ./snpe-onnx-to-dlc --input_network mqbench_qmodel_deploy_model.onnx --output_path tmp.dlc --quantization_overrides mqbench_qmodel_clip_ranges.json
    ./snpe-dlc-quantize --input_dlc tmp.dlc --input_list tmp_file.txt --output_dlc tmp_quat_mq.dlc --override_params --bias_bitwidth 32
    tmp_file.txt和tmp_file_android.txt都只有一个文件就是tmp.raw,tmp.raw在上面python程序里面保存下来为一个3x20x20的float文件

  • SNPE DSP run
    ./snpe-net-run --container /sdcard/tmp_quat_mq.dlc --input_list /sdcard/tmp_file_android.txt --use_dsp

##################################################
74.raw
(20,)
[-0.34493 -0.285929 -0.081694 -0.222389 0.127079 0.236005 0.435701
-0.299544 -0.049924 0.226927 -0.036308 0.081694 -0.149772 -0.172465
0.131618 0.149772 -0.490163 -0.177003 0.599088 -0.068078]

比对quant sim result 和 DSP 的结果,可以看到粗斜体是二者不一致的地方

@Tracin Tracin added the good first issue Good for newcomers label Jun 8, 2022
@Tracin
Copy link
Contributor

Tracin commented Jun 8, 2022

整个流程没有任何问题,事实上我们几乎不可能做到在pytorch里bit对齐后端硬件,有太多未知的运算细节.
MQBench旨在量化模式,量化位置上尽力对齐后端运算.
我们一般以cosine指标来计算两者误差,0.99+可以认为训练部署的精度是有保证的.

@changewOw
Copy link
Author

谢谢!

@changewOw
Copy link
Author

@Tracin 我进一步测试了一下mobilenetv3-small(num_classes=2),发现有些样本quantsim的结果为[-0.6173174 -0.05346843]
DSP的结果为[-0.792305 -0.490937], 他们的cosine为0.8923229。

想请问一下,有哪些地方可以改进来尽量对齐quantsim和DSP的结果,是不是mobilenetv3不适合来做量化

@Tracin Tracin reopened this Jun 9, 2022
@Tracin
Copy link
Contributor

Tracin commented Jun 9, 2022

可以先检查量化参数是否正确

  • 查看带量化节点的ONNX模型,查看量化节点插入是否正确
  • snpe-dlc-quantize会打印重载的tensor截断值,可以检查snpe_encodings中是否有缺失

@changewOw
Copy link
Author

changewOw commented Jun 9, 2022

我的整个模型是UNet的模式,有三个输出
encoder: mobilenetv3-small
decoder:upNearest2d->upNearest2d->upNearest2d

    def forward(self, x):
        feature_8x, feature_16x, feature_32x = self.model(x)

        logits_cls = self.head_cls(feature_32x)

        accu_radius, heatmaps_uv = self.decode_model(feature_8x, feature_16x, feature_32x)

        return logits_cls, accu_radius, heatmaps_uv

1.我检查了带量化节点的onnx,看上去没有问题。
2.看上去有点问题

        "1572": [
            {
                "bitwidth": 8,
                "min": -17.530067443847656,
                "max": 12.27104663848877
            }
        ],
        "1583": [
            {
                "bitwidth": 8,
                "min": -0.3967282474040985,
                "max": 12.248984336853027
            }
        ],
        "1591": [
            {
                "bitwidth": 8,
                "min": 0.0,
                "max": 5.7605085372924805
            }
        ],

DLC-quant的输出是:

[INFO] InitializeStderr: DebugLog initialized.
[INFO] Writing intermediate model
[INFO] Setting activation for layer: image and buffer: image
[INFO] bw: 8, min: 0.000000, max: 1.000000, delta: 0.003922, offset: 0.000000
[INFO] Setting activation for layer: Conv_6 and buffer: 1572
[INFO] bw: 8, min: -17.530067, max: 12.271047, delta: 0.116867, offset: -150.000000
[INFO] Setting activation for layer: Add_11_Hswish and buffer: 1583
[INFO] bw: 8, min: -0.359582, max: 17.979096, delta: 0.071916, offset: -5.000000
[INFO] Setting activation for layer: Conv_24 and buffer: 1590
[INFO] bw: 8, min: -18.765767, max: 4.577016, delta: 0.091540, offset: -205.000000
[INFO] Setting activation for layer: Relu_25 and buffer: 1591
[INFO] bw: 8, min: 0.000000, max: 5.760509, delta: 0.022590, offset: 0.000000
[INFO] Setting activation for layer: GlobalAveragePool_29 and buffer: 1595
[INFO] bw: 8, min: 0.000000, max: 2.837885, delta: 0.011129, offset: 0.000000

我发现几个问题:
a)比如1583这个buffer,json给的是-0.39和12.24,而dlc给的是-0.35和17.97。
b)1583这个buffer,json没有给出delta和offset,dlc是自己算出来的吗
c)json文件里面没有出现1590这个buffer

我把onnx json和dlc的log放在了zip文件里面,如果可以帮忙看看,谢谢

detnet_center_unet.zip

@Tracin
Copy link
Contributor

Tracin commented Jun 10, 2022

可以尝试:

  • 1583为什么没有被重载? 目前看应该是通过inputlist输入计算得出的
  • json为什么没有1590? 说明MQbench在此位置没有插入量化节点
  • json中有没有多余的项?
  • num_classes有点少,可以适当增大,多统计一些样本计算均值

@github-actions
Copy link

github-actions bot commented Oct 9, 2022

This issue has not received any updates in 120 days. Please reply to this issue if this still unresolved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers Stale
Projects
None yet
Development

No branches or pull requests

2 participants