Added some preliminary unit tests to the CNNs 'quantize_model' #927

OscarSavolainenDR · 2024-03-30T15:16:41Z

This PR is a work in progress, and I expect to add more tests.

As of commit 66e029b, we test some aspects of layerwise and fx quantization, as well as some invalid inputs, e.g. invalid strings and zero and negative valued bit widths.

tests/brevitas_examples/test_quantize_model.py

Giuseppe5 · 2024-04-11T10:18:43Z

#934

I missed this comment on the other PR. The bias bitwidth can be None, which means that the bias is not quantized. Leaving this here if it could be useful for some tests.

OscarSavolainenDR · 2024-04-11T18:29:50Z

#934

I missed this comment on the other PR. The bias bitwidth can be None, which means that the bias is not quantized. Leaving this here if it could be useful for some tests.

I'll incorporate it!

OscarSavolainenDR · 2024-04-14T14:25:45Z

tests/brevitas_examples/test_quantize_model.py

+from brevitas.nn import QuantReLU
+from brevitas.quant_tensor import QuantTensor
+from brevitas_examples.imagenet_classification.ptq.ptq_common import quantize_model
+


Remaining ToDos, but we can add to it!

Save for the missing minifloat tests (see below), I think we can add the other two todos and review/merge this.

Great work!

Sure, shall do!

OscarSavolainenDR · 2024-04-14T14:25:55Z

tests/brevitas_examples/test_quantize_model.py

+        )
+
+
+def test_layerwise_valid_minifloat_bit_widths(minimal_model):


This PR is getting there, but still need to work on this. I am testing if my explicit implementation of minifloat quantization matches the under-the-hood Brevitas one, but could use some guidance on whether this is correctly implemented. I'll be hacking at it either way!

If it can be helpful, there are two PRs (#922 and #919) where we are expanding support to minifloat, to match the level of support we have for integer quantization. This means that a minifloat QuantTensor will have the correct metadata to properly characterize it, and I think that could be helpful in the writing of the tests.

If you agree with this, I'm happy to leave the minifloat tests to another PRs after those two have been merged, as not to block this one.

That sounds good!

OscarSavolainenDR · 2024-04-14T14:27:12Z

src/brevitas_examples/imagenet_classification/ptq/ptq_common.py

@@ -557,5 +557,8 @@ def check_positive_int(*args):
    We check that every inputted value is positive, and an integer.
    """
    for arg in args:
+        if not arg:


Redundant to PR: #934

Included here to make the tests pass.

The PR has been merged so you can just rebase now

Cool, have rebased!

…st some aspects of 'layerwise' and 'fx' quantization, as well as some invalid inputs, e.g. invalid strings and zero and negative valued bit widths.

…arted work on minifloat quantization

…g MSE for qparam calibration.

OscarSavolainenDR · 2024-04-23T23:03:38Z

I think all of the ToDos (for this PR) are done, subject to whatever changes are desired!

The two tests I added in the latest commit:

check that the percentiles in stats calibration work as expected, and that the quantization range becomes what we expect it to be.
check that the MSE calibration method minimizes MSE (perturbed the qparams slightly, and none registered smaller MSE than the original ones).

OscarSavolainenDR · 2024-04-26T12:18:26Z

Some nox checks are failing because some Python/PyTorch versions are throwing an error: Input scale required when I try to feed data through the FX-quantized model.

Some debugging:
If I use those versions (e.g. Python 3.8, PyTorch 1.9.1), it particularly fails at this line:

_0 = getattr(self, "0")(input_1);  input_1 = None

inside the Graph Mode forward call.

getattr(self, "0") returns:

QuantConv2d(
  10, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)
  (input_quant): ActQuantProxyFromInjector(
    (_zero_hw_sentinel): StatelessBuffer()
  )
  (output_quant): ActQuantProxyFromInjector(
    (_zero_hw_sentinel): StatelessBuffer()
  )
  (weight_quant): WeightQuantProxyFromInjector(
    (_zero_hw_sentinel): StatelessBuffer()
    (tensor_quant): RescalingIntQuant(
      (int_quant): IntQuant(
        (float_to_int_impl): RoundSte()
        (tensor_clamp_impl): TensorClampSte()
        (delay_wrapper): DelayWrapper(
          (delay_impl): _NoDelay()
        )
      )
      (scaling_impl): ParameterFromStatsFromParameterScaling(
        (parameter_list_stats): _ParameterListStats(
          (first_tracked_param): _ViewParameterWrapper(
            (view_shape_impl): OverTensorView()
          )
          (stats): _Stats(
            (stats_impl): AbsMax()
          )
        )
        (stats_scaling_impl): _StatsScaling(
          (affine_rescaling): Identity()
          (restrict_clamp_scaling): _RestrictClampValue(
            (clamp_min_ste): ScalarClampMinSte()
            (restrict_value_impl): FloatRestrictValue()
          )
          (restrict_scaling_pre): Identity()
        )
        (restrict_inplace_preprocess): Identity()
      )
      (int_scaling_impl): IntScaling()
      (zero_point_impl): ZeroZeroPoint(
        (zero_point): StatelessBuffer()
      )
      (msb_clamp_bit_width_impl): BitWidthConst(
        (bit_width): StatelessBuffer()
      )
    )
  )
  (bias_quant): BiasQuantProxyFromInjector(
    (_zero_hw_sentinel): StatelessBuffer()
    (tensor_quant): PrescaledRestrictIntQuant(
      (int_quant): IntQuant(
        (float_to_int_impl): RoundSte()
        (tensor_clamp_impl): TensorClamp()
        (delay_wrapper): DelayWrapper(
          (delay_impl): _NoDelay()
        )
      )
      (msb_clamp_bit_width_impl): BitWidthConst(
        (bit_width): StatelessBuffer()
      )
      (zero_point): StatelessBuffer()
    )
  )
)

I'm AFK for next week, but will pick this up when I get back! I'll look into why the input scale is missing, and solutions for it.

OscarSavolainenDR · 2024-05-16T22:47:10Z

Ok, I've narrowed in somewhat on what the issue is with the Python/Torch versioning. I refer to the old version that isn't working as 1.9.1 after Torch 1.9.1. It could also be Python versioning, but I assume not.

In 1.9.1, we get an error when we try to feed data through the quantized model:

RuntimeError: Input scale required

where the quantized model is given by e.g.:

    quant_model = quantize_model(
        model=fx_model,
        backend="fx",
        weight_bit_width=weight_bit_width,
        act_bit_width=act_bit_width,
        bias_bit_width=bias_bit_width if bias_bit_width > 0 else None,
        weight_quant_granularity="per_tensor",
        act_quant_percentile=99.9,
        act_quant_type="sym",
        scale_factor_type="float_scale",
        quant_format="int",
        layerwise_first_last_bit_width=5,
    )

The issue is in compute_bias_scale for one of the first layers of the graph, it returns None for the scale for 1.9.1 because the input tensor is not of type QuantTensor.

Ultimately, this is because the graph of the model is not quantizing correctly.

ipdb> quant_model.graph.print_tabular()
opcode       name             target           args        kwargs
-----------  ---------------  ---------------  ----------  --------
placeholder  input_1          input            ()          {}
call_module  input_1_quant    input_1_quant    (input_1,)  {}
call_module  _0               0                (input_1,)  {}
call_module  _1               1                (_0,)       {}
call_module  _2               2                (_1,)       {}
call_module  _3               3                (_2,)       {}
call_module  _4               4                (_3,)       {}
call_module  _5               5                (_4,)       {}
call_module  _6_input_quant   _6_input_quant   (_5,)       {}
call_module  _6               6                (_5,)       {}
call_module  _6_output_quant  _6_output_quant  (_6,)       {}
output       output           output           (_6,)       {}

_0 is not taking in the quantized tensor.

Whereas if I print out the graph in a later version of PyTorch (not 1.9.1), it uses the quantized tensor correctly:

ipdb> quant_model.graph.print_tabular()
opcode       name             target           args                kwargs
-----------  ---------------  ---------------  ------------------  --------
placeholder  input_1          input            ()                  {}
call_module  input_1_quant    input_1_quant    (input_1,)          {}
call_module  _0               0                (input_1_quant,)    {}
call_module  _1               1                (_0,)               {}
call_module  _2               2                (_1,)               {}
call_module  _3               3                (_2,)               {}
call_module  _4               4                (_3,)               {}
call_module  _5               5                (_4,)               {}
call_module  _6_input_quant   _6_input_quant   (_5,)               {}
call_module  _6               6                (_6_input_quant,)   {}
call_module  _6_output_quant  _6_output_quant  (_6,)               {}
output       output           output           (_6_output_quant,)  {}
ipdb>

The issue particularly happens in src/brevtias/graph/quantize.py quantize > inp_placeholder_handler when we try to rewrite the model.

There is a red herring: unlike later versions, 1.9.1 throws this warning (from inside InsertModuleCallAfter):

 ✘ Brevitas-3.8  oscar   tests-quantize-model -  python temp.py
> /home/oscar/Coding/OpenSource/Brevitas/src/brevitas/graph/quantize_impl.py(74)inp_placeholder_handler()
     73         ipdb.set_trace()
---> 74         model = rewriter.apply(model)
     75     return model

ipdb>         model = rewriter.apply(model)

/home/oscar/miniconda3/envs/Brevitas-3.8/lib/python3.8/site-packages/torch/fx/graph.py:606: UserWarning: Attempted to insert a call_module Node with no underlying reference in the owning GraphModule! Call GraphModule.add_submodule to add the necessary submodule
  warnings.warn("Attempted to insert a call_module Node with "
ipdb>

However, that warning is because of a bug in PyTorch, and they've since fixed it. I.e. in 1.9.1 they had:

        if (self.owning_module and
                self.owning_module.get_submodule(module_name) is not None):
            warnings.warn("Attempted to insert a call_module Node with "
                          "no underlying reference in the owning "
                          "GraphModule! Call "
                          "GraphModule.add_submodule to add the "
                          "necessary submodule")

Instead of:

        if (self.owning_module and
                self.owning_module.get_submodule(module_name) is None):
            warnings.warn("Attempted to insert a call_module Node with "
                          "no underlying reference in the owning "
                          "GraphModule! Call "
                          "GraphModule.add_submodule to add the "
                          "necessary submodule")

I.e. is not None vs is None. However, this doesn't seem relevant for the issue at hand.

The difference in the graph actually manifests here (from src/brevtias/graph/quantize.py quantize > inp_placeholder_handler > InsertModuleCallAfter > replace_all_uses_except):

        replace_all_uses_except(
            self.node,
            quant_identity_node,
            [quant_identity_node] + list(self.node_to_exclude),
        )

For some reason, in 1.9.1 the graph doesn't change, but it does in late Torch versions. The issue may be in replace_all_uses_except, or upstream in the inputs to the function, I haven't figured it out yet.

Sorry for the piecemeal update, am working on it in the background, but I leave an update just so in case I end up being busy with something else there'll be some record. Hopefully I can figure this out tomorrow!

Giuseppe5 · 2024-05-23T09:33:37Z

Thanks for the update.
I tried running this branch locally but I didn't manage to do that.

I will try again using these insights. Let me know if you manage to find out more, and I'll do the same!

OscarSavolainenDR · 2024-06-26T20:08:47Z

I haven't really been able to spend a lot of time on this. The issue seems to be rooted in a specific Torch version, and I'm wondering if it's not just simpler for Brevitas to specify that that specific Torch version isn't supported: it might legitimately be a bug in an old version of PyTorch when FX Graph mode was new.

If that's not an option I can try and debug this again.

OscarSavolainenDR mentioned this pull request Mar 30, 2024

Adding tests for "quantize" function for CNN PTQ #908

Open

OscarSavolainenDR changed the title ~~Added some prelininary unit tests to the CNNs 'quantize_model'~~ Added some preliminary unit tests to the CNNs 'quantize_model' Mar 30, 2024

Giuseppe5 reviewed Apr 10, 2024

View reviewed changes

tests/brevitas_examples/test_quantize_model.py Outdated Show resolved Hide resolved

tests/brevitas_examples/test_quantize_model.py Outdated Show resolved Hide resolved

tests/brevitas_examples/test_quantize_model.py Outdated Show resolved Hide resolved

OscarSavolainenDR force-pushed the tests-quantize-model branch from 66e029b to 56daf8d Compare April 13, 2024 15:21

OscarSavolainenDR commented Apr 14, 2024

View reviewed changes

OscarSavolainenDR requested a review from Giuseppe5 April 14, 2024 14:27

OscarSavolainenDR added 5 commits April 23, 2024 17:17

Added some prelininary unit tests to the CNNs 'quantize_model'. We te…

716a549

…st some aspects of 'layerwise' and 'fx' quantization, as well as some invalid inputs, e.g. invalid strings and zero and negative valued bit widths.

We test po2 scale values, and have the negative bit widths fail.

ba45137

Updated to lastest dev

d4f717f

Added tests for symm/asym quantization, per_chan quantization, and st…

a8d5819

…arted work on minifloat quantization

Remove unecessary inputs for minifloat

16a275d

OscarSavolainenDR force-pushed the tests-quantize-model branch from b0d1a7d to 16a275d Compare April 23, 2024 16:20

OscarSavolainenDR added 2 commits April 23, 2024 17:27

Removed minifloat tests

545cccc

Add tests for checking calibration percentile performance, and testin…

147f516

…g MSE for qparam calibration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added some preliminary unit tests to the CNNs 'quantize_model' #927

Added some preliminary unit tests to the CNNs 'quantize_model' #927

OscarSavolainenDR commented Mar 30, 2024

Giuseppe5 commented Apr 11, 2024

OscarSavolainenDR commented Apr 11, 2024

OscarSavolainenDR Apr 14, 2024

Giuseppe5 Apr 18, 2024

OscarSavolainenDR Apr 23, 2024

OscarSavolainenDR Apr 14, 2024

Giuseppe5 Apr 18, 2024

OscarSavolainenDR Apr 23, 2024

OscarSavolainenDR Apr 14, 2024

Giuseppe5 Apr 18, 2024

OscarSavolainenDR Apr 23, 2024 •

edited

Loading

OscarSavolainenDR commented Apr 23, 2024

OscarSavolainenDR commented Apr 26, 2024

OscarSavolainenDR commented May 16, 2024 •

edited

Loading

Giuseppe5 commented May 23, 2024

OscarSavolainenDR commented Jun 26, 2024

		)


		def test_layerwise_valid_minifloat_bit_widths(minimal_model):

Added some preliminary unit tests to the CNNs 'quantize_model' #927

Are you sure you want to change the base?

Added some preliminary unit tests to the CNNs 'quantize_model' #927

Conversation

OscarSavolainenDR commented Mar 30, 2024

Giuseppe5 commented Apr 11, 2024

OscarSavolainenDR commented Apr 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OscarSavolainenDR Apr 23, 2024 • edited Loading

Choose a reason for hiding this comment

OscarSavolainenDR commented Apr 23, 2024

OscarSavolainenDR commented Apr 26, 2024

OscarSavolainenDR commented May 16, 2024 • edited Loading

Giuseppe5 commented May 23, 2024

OscarSavolainenDR commented Jun 26, 2024

OscarSavolainenDR Apr 23, 2024 •

edited

Loading

OscarSavolainenDR commented May 16, 2024 •

edited

Loading