Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unifying C++ and Python ONNX parsers #727

Open
MatthewDaggitt opened this issue Feb 11, 2024 · 8 comments
Open

Unifying C++ and Python ONNX parsers #727

MatthewDaggitt opened this issue Feb 11, 2024 · 8 comments
Labels

Comments

@MatthewDaggitt
Copy link
Collaborator

Summary

At the moment, we have the deeply suboptimal solution of having two ONNX parsers, one in Python and one in C++. We should only have the C++ one, and expose Python bindings for it. However, before doing that we really need to ensure the two parsers actually agree.

Helper script

I've put together the following shell script that diffs the input query generated by both parsers.

#!/bin/bash
BASEDIR=${PWD}
DUMP_LOCATION=~/Downloads

network=$1
networkFilename=$(basename "$network")
networkName="${networkFilename%%.*}"

cppQuery="$DUMP_LOCATION/$networkName-queryCPP.txt"
pythonQuery="$DUMP_LOCATION/$networkName-queryPython.txt"

python_command="
import sys, os
sys.path.append(os.path.join(os.path.dirname('$BASEDIR')))
import maraboupy.Marabou as Marabou
import maraboupy.MarabouCore as Core
network = Marabou.read_onnx('$network')
input_query = network.getMarabouQuery()
Core.saveQuery(input_query, '$pythonQuery')
"

echo "Generating query for '$network' via C++."
errorMessage=$( ../build/Marabou $network --query-dump-file "$cppQuery" 2>&1)
if [[ $? -eq 0 ]]; then
echo "Succeeded. File at '$cppQuery'"
else
echo "Failed. Error: $errorMessage"
exit 1
fi

echo "Generating query for '$network' via Python"
errorMessage=$( python3 -c "$python_command" 2>&1)
if [[ $? -eq 0 ]]; then
echo "Succeeded. File at '$pythonQuery'"
else
echo "Failed. Error: $errorMessage"
exit 1
fi

diff $pythonQuery $cppQuery

I'm going to create issues for discrepancies in the files in resources/onnx that I find using this.

@MatthewDaggitt
Copy link
Collaborator Author

MatthewDaggitt commented Feb 11, 2024

Okay just ran them all. 1/109 isn't too bad right? 😭

  • onnx/model-german-traffic-sign-fast.onnx : C++ error
  • onnx/vnnlib/test_unsat_vnncomp.onnx : differs
  • onnx/vnnlib/test_tiny_vnncomp.onnx : differs
  • onnx/vnnlib/test_nano_vnncomp.onnx : differs
  • onnx/vnnlib/test_small_vnncomp.onnx : differs
  • onnx/vnnlib/test_sat_vnncomp.onnx : differs
  • onnx/cnn_max_mninst2.onnx : C++ error
  • onnx/fc_2-2-3.onnx : differs
  • onnx/conv_mp1.onnx : differs
  • onnx/tanh_test.onnx : differs
  • onnx/fc_2-2sigmoids-3.onnx : differs
  • onnx/fc2.onnx : differs
  • onnx/oneInput_twoBranches.onnx : C++ error
  • onnx/multiInput_add.onnx : C++ error
  • onnx/robust_model_sigmoid_linear.onnx : differs
  • onnx/traffic-classifier64.onnx : C++ error
  • onnx/resize/resize_4dims.onnx : C++ error
  • onnx/KJ_TinyTaxiNet.onnx : C++ error
  • onnx/conv_mp1_intermediateOutput.onnx : C++ error
  • onnx/self-attention-mnist-pgd-medium-sim.onnx : C++ error
  • onnx/acasxu/ACASXU_experimental_v2a_4_4.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_5_3.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_3_7.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_5_5.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_1_2.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_5_4.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_3_8.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_1_6.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_2_9.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_4_1.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_3_6.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_1_8.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_4_8.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_2_4.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_5_8.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_1_9.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_1_4.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_2_1.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_5_9.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_5_7.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_3_5.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_4_2.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_1_1.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_3_4.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_1_5.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_3_1.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_2_6.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_1_7.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_3_3.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_2_7.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_2_3.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_2_8.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_4_9.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_4_7.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_5_2.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_4_3.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_4_6.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_3_2.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_3_9.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_1_3.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_4_5.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_2_5.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_5_6.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_2_2.onnx : differs
  • onnx/acasxu/ACASXU_experimental_v2a_5_1.onnx : differs
  • onnx/cifar10/cifar_deep_kw.onnx : C++ error
  • onnx/cifar10/cifar_wide_kw_simp.onnx : C++ error
  • onnx/cifar10/cifar_base_kw.onnx : C++ error
  • onnx/cifar10/cifar_wide_kw.onnx : C++ error
  • onnx/cifar10/cifar_deep_kw_simp.onnx : C++ error
  • onnx/cifar10/cifar_base_kw_simp.onnx : C++ error
  • onnx/mnist2x10.onnx : differs
  • onnx/layer-zoo/relu.onnx : same
  • onnx/layer-zoo/flatten.onnx : differs
  • onnx/layer-zoo/maxpool.onnx : differs
  • onnx/layer-zoo/transpose.onnx : differs
  • onnx/layer-zoo/add.onnx : Python error
  • onnx/layer-zoo/sub.onnx : Python error
  • onnx/layer-zoo/identity.onnx : differs
  • onnx/layer-zoo/conv.onnx : Python error
  • onnx/layer-zoo/sigmoid.onnx : same
  • onnx/layer-zoo/batchnorm.onnx : Python error
  • onnx/layer-zoo/reshape_with_dimension_inference.onnx : differs
  • onnx/layer-zoo/matmul.onnx : Python error
  • onnx/layer-zoo/gemm.onnx : Python error
  • onnx/layer-zoo/reshape.onnx : differs
  • onnx/layer-zoo/constant.onnx : Python error
  • onnx/layer-zoo/tanh.onnx : differs
  • onnx/cnn_max_mninst3.onnx : C++ error
  • onnx/linear2-3_bn1-linear3-1.onnx : differs
  • onnx/split/split_2d_split-3_axis-1.onnx : C++ error
  • onnx/split/split_2d_split-2-4_axis-1.onnx : C++ error
  • onnx/split/split_1d_split-2-4_axis-0.onnx : C++ error
  • onnx/split/split_1d_split-2_axis-0.onnx : C++ error
  • onnx/split/split_5d_split-2-2-81_axis-4-mul-concat.onnx : C++ error
  • onnx/split/split_5d_split-2-2-81_axis-4.onnx : C++ error
  • onnx/concat/concat_axis_1.onnx : C++ error
  • onnx/concat/concat_axis_0.onnx : C++ error
  • onnx/concat/concat_axis_2.onnx : C++ error
  • onnx/softmax-last-layer.onnx : C++ error
  • onnx/mnist5x20_leaky_relu.onnx : C++ error
  • onnx/fc_matMul.onnx : C++ error
  • onnx/fc1.onnx : differs

@wu-haoze
Copy link
Collaborator

@MatthewDaggitt , I agree this needs to be done. I think the differing input query might be because the variable indexing. In python Onnx parser, we make output variable to follow immediately after the input variable.

I think to fully switch to C++ parser, we need to

  1. support in C++ all operations currently supported in python
  2. Make sure the following tests passes when we swtich to use C++ binding. https://github.com/NeuralNetworkVerification/Marabou/blob/master/maraboupy/test/test_onnx.py

@MatthewDaggitt
Copy link
Collaborator Author

In python Onnx parser, we make output variable to follow immediately after the input variable

Do you know why we do this? (I can't find any documentation as to the reason in the code). Does the C++ parser need to follow suit?

@wu-haoze
Copy link
Collaborator

In python Onnx parser, we make output variable to follow immediately after the input variable

Do you know why we do this? (I can't find any documentation as to the reason in the code). Does the C++ parser need to follow suit?

There is a reason but it's not very good :). The output variables follow immediately after the input variables in the legacy Nnet format. And we had this test to make sure Marabounetwork object created from parsing the .nnet format and the onnx format is the same. But I think we can probably get rid of that test or at least rewrite that tests.

In fact this reindexing can take many seconds on larger network, so I usually set the reindexing flag off in practice.

@MatthewDaggitt
Copy link
Collaborator Author

MatthewDaggitt commented Feb 13, 2024

Operations that need to add support for:

  • squeeze
  • unsqueeze
  • concat
  • mul
  • matmul
  • split
  • resize
  • leakyrelu
  • constant output check

@wu-haoze
Copy link
Collaborator

@MatthewDaggitt , I agree this needs to be done. I think the differing input query might be because the variable indexing. In python Onnx parser, we make output variable to follow immediately after the input variable.

I think to fully switch to C++ parser, we need to

  1. support in C++ all operations currently supported in python
  2. Make sure the following tests passes when we swtich to use C++ binding. https://github.com/NeuralNetworkVerification/Marabou/blob/master/maraboupy/test/test_onnx.py

In addition, I think we should also try to make sure there is no significant degradation in terms of parsing time.

@MatthewDaggitt
Copy link
Collaborator Author

And we had this test to make sure Marabounetwork object created from parsing the .nnet format and the onnx format is the same. But I think we can probably get rid of that test or at least rewrite that tests.

Do you know which tests these are? It occurs to me that in the short term we should just move the reassignOutputVariables logic to that test?

@wu-haoze
Copy link
Collaborator

And we had this test to make sure Marabounetwork object created from parsing the .nnet format and the onnx format is the same. But I think we can probably get rid of that test or at least rewrite that tests.

Do you know which tests these are? It occurs to me that in the short term we should just move the reassignOutputVariables logic to that test?

Hum.., I can't seem to find the test. Perhaps we already got rid of it. When I set the reassignOutputVariables to false, all python tests but one (test_onnx.py::test_errors() ) passes. And the failed test exposes a small bug that can be easily fixed. So I think it's actually safe to not reindex output variables. PR: #746.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants