Skip to content

Commit

Permalink
[SYSTEMDS-263] ONNX graph importer (Python API, docs, tests)
Browse files Browse the repository at this point in the history
This PR implements a first poc-implementation for an ONNX importer.

It adds support for the following operators: Add, Sub, MatMul, Neg, Xor,
Or, And, Relu, Tanh, Sigmoid, Softmax, Dropout, MaxPool, Conv, If; as
well as the logic for nested sub-graphs.

AMLS project SS 2020
Closes #904.
  • Loading branch information
Lukas Timpl authored and mboehm7 committed May 14, 2020
1 parent 30d5c40 commit 0ac0c25
Show file tree
Hide file tree
Showing 50 changed files with 2,520 additions and 49 deletions.
17 changes: 16 additions & 1 deletion .github/workflows/python.yml
Expand Up @@ -72,9 +72,12 @@ jobs:
key: ${{ runner.os }}-pip-${{ matrix.python-version }}-${{ hashFiles('src/main/python/setup.py') }}
restore-keys: |
${{ runner.os }}-pip-${{ matrix.python-version }}-
- name: Install protobuf
run: sudo apt-get install protobuf-compiler libprotoc-dev

- name: Install pip Dependencies
run: pip install numpy py4j wheel scipy sklearn
run: pip install numpy py4j wheel scipy sklearn jinja2 onnx

- name: Build Python Package
run: |
Expand All @@ -97,3 +100,15 @@ jobs:
cd src/main/python
python -m unittest tests/lineage/*.py
echo "Exit Status: " $?
- name: Run onnx-systemds python tests
run: |
export SYSTEMDS_ROOT=$(pwd)
export PATH=$SYSTEMDS_ROOT/bin:$PATH
cd src/main/python
echo "Creating models"
python tests/onnx/test_models/model_generate.py
ls tests/onnx/test_models/*.onnx
echo "Beginning tests"
python -m unittest tests/onnx/*.py
echo "Exit Status: " $?
3 changes: 3 additions & 0 deletions .gitignore
Expand Up @@ -50,6 +50,9 @@ src/main/python/NOTICE
src/main/python/dist
src/main/python/docs/build
src/main/python/docs/source/_build
src/main/python/tests/onnx/output_test
src/main/python/tests/onnx/dml_output
src/main/python/tests/onnx/test_models/*.onnx

# User configuration files
conf/SystemDS-config.xml
Expand Down
3 changes: 2 additions & 1 deletion docs/Tasks.txt
Expand Up @@ -213,7 +213,8 @@ SYSTEMDS-250 Extended Slice Finding
SYSTEMDS-260 Misc Tools
* 261 Stable marriage algorithm OK
* 262 Data augmentation tool for data cleaning OK
* 263 ONNX graph importer/exporter
* 263 ONNX graph importer (Python API, docs, tests) OK
* 264 ONNX graph exporter

SYSTEMDS-270 Compressed Matrix Blocks
* 271 Reintroduce compressed matrix blocks from SystemML OK
Expand Down
46 changes: 0 additions & 46 deletions docs/onnx-systemds-design.md

This file was deleted.

1 change: 1 addition & 0 deletions pom.xml
Expand Up @@ -508,6 +508,7 @@
<exclude>**/*.libsvm</exclude>
<exclude>**/*.mtx</exclude>
<exclude>**/*.mtd</exclude>
<exclude>**/*.out</exclude>
<exclude>**/part-*</exclude>
<exclude>**/*.keep</exclude>
<exclude>**/target/**</exclude>
Expand Down
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions src/main/python/docs/source/index.rst
Expand Up @@ -59,3 +59,11 @@ tensors (multi-dimensional arrays) whose first dimension may have a heterogeneou
:caption: Central Classes

matrix.rst

.. toctree::
:maxdepth: 1
:hidden:
:caption: onnx-systemds

onnx_systemds.rst
onnx_systemds_design.rst
59 changes: 59 additions & 0 deletions src/main/python/docs/source/onnx_systemds.rst
@@ -0,0 +1,59 @@
.. -------------------------------------------------------------
..
.. Licensed to the Apache Software Foundation (ASF) under one
.. or more contributor license agreements. See the NOTICE file
.. distributed with this work for additional information
.. regarding copyright ownership. The ASF licenses this file
.. to you under the Apache License, Version 2.0 (the
.. "License"); you may not use this file except in compliance
.. with the License. You may obtain a copy of the License at
..
.. http://www.apache.org/licenses/LICENSE-2.0
..
.. Unless required by applicable law or agreed to in writing,
.. software distributed under the License is distributed on an
.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
.. KIND, either express or implied. See the License for the
.. specific language governing permissions and limitations
.. under the License.
..
.. -------------------------------------------------------------
QuickStart
=============
onnx-systemds is a tool for importing/exporting onnx graphs into/from SystemDS DML scripts.

Prerequisites
---------------
to run onnx-systemds you need to:

- install `onnx <https://github.com/onnx/onnx>`_: `Installation instructions <https://github.com/onnx/onnx#installation>`_
- `set up the environment <https://github.com/apache/systemml/blob/master/bin/README.md>`_

Usage
------
An example call from the ``src/main/python`` directory of systemds::

python -m systemds.onnx_systemds.convert tests/onnx/test_models/simple_mat_add.onnx


This will generate the dml script ``simple_mat_add.dml`` in the current directory.

Run Tests
---------
Form the ``src/main/python`` directory of systemds:

At first generate the test models::

python tests/onnx/test_models/model_generate.py

Then you can run the tests::

python -m unittest tests/onnx/test_simple.py


Converter
---------
It is also possible to invoke the converter from within python.

.. autofunction:: systemds.onnx_systemds.convert.onnx2systemds
217 changes: 217 additions & 0 deletions src/main/python/docs/source/onnx_systemds_design.rst
@@ -0,0 +1,217 @@
.. -------------------------------------------------------------
..
.. Licensed to the Apache Software Foundation (ASF) under one
.. or more contributor license agreements. See the NOTICE file
.. distributed with this work for additional information
.. regarding copyright ownership. The ASF licenses this file
.. to you under the Apache License, Version 2.0 (the
.. "License"); you may not use this file except in compliance
.. with the License. You may obtain a copy of the License at
..
.. http://www.apache.org/licenses/LICENSE-2.0
..
.. Unless required by applicable law or agreed to in writing,
.. software distributed under the License is distributed on an
.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
.. KIND, either express or implied. See the License for the
.. specific language governing permissions and limitations
.. under the License.
..
.. -------------------------------------------------------------
Design
======

This document describes the initial design of `onnx-systemds`

For dealing with different operator-set versions of onnx the current strategy is to use the
`converter provided by onnx <https://github.com/onnx/onnx/blob/master/docs/PythonAPIOverview.md#converting-version-of-an-onnx-model-within-default-domain-aionnx>`_ to convert to a common version.

However, the converter does not support adapters for all op-sets/operators so this conversion will fail for many models.
On the onnx repository you can find a list of
`currently supported adapters <https://github.com/onnx/onnx/blob/master/onnx/version_converter.py#L21>`_


Goals
-----

- Support for importing `operators of the ONNX base definition <https://github.com/onnx/onnx/blob/master/docs/Operators.md>`_
- Support for importing `operators defined by ONNX-ML <https://github.com/onnx/onnx/blob/master/docs/Operators-ml.md>`_
- Support for exporting DML script to ONNX graphs


Limitations
------------

- Not able to support all data types / operators as they are not currently supported by SystemDS

Onnx - Operators
-----------------

Onnx includes several very simple and also more complex operators.
When implementing an operator it's best to have a look at the
`operator schemas <https://github.com/onnx/onnx/blob/master/docs/Operators.md>`_,
which precisely define the inputs, outputs and attributes of the operation.

Besides the standard onnx definition, there also exists onnx-ML the operator schemas for which are defined in a
`separate document <https://github.com/onnx/onnx/blob/master/docs/Operators-ml.md>`_.
It is an extension of the standard onnx format, however currently only onnx standard operators are supported.

Onnx - Files
-------------

Onnx uses the `ProtoBuf format <https://developers.google.com/protocol-buffers/>`_.
It specifies this representation in several ``.proto``/``.proto3``
`files <https://github.com/onnx/onnx/tree/master/onnx>`_ again with dedicated files for onnx-ML.
These files are helpful to understand the underlying structure and values that are possible.

Protobuf creates the underlying structure such that you can access elements of the onnx graph as if they were
class members. For more information take a look at
`Google's protocol-buffer documentation
<https://developers.google.com/protocol-buffers/docs/pythontutorial#the-protocol-buffer-api>`_.

This is also why in its current form, this converter does not convert the protobuf-structure into an internal format,
as the provided protobuf structure can already be conveniently used. Instead,
there exist a number of onnx-helper functions/classes (see ``onnx_helper.py``).

Traversing the Graph
---------------------

For creating the script, it is essential to insert computations in the right order into the dml-script.
To do this, the converter builds a tree-structure (DAG) from the protobuf-nodes
(see `render.gen_graph_functions`).

- For traversing the graph, we start from the bottom.
- The converter starts with the graph-outputs as available outputs.
- It generates the dml snippets in reverse-order

Graph traversal
^^^^^^^^^^^^^^^^

1. Find a node for which all outputs are available.

2. Process the node:

- Generate the dml parts for this node
- add its inputs to the list of available outputs
- remove the node from the graph

3. if there are nodes left restart at 1.

Example
^^^^^^^

In the example below with the nodes ``Add``, ``MatMul`` and ``Sub``, we would start with ``F`` as available output.
Therefore the first node to insert would be ``Sub``. After inserting ``Sub`` its inputs become available outputs,
therefore all outputs of ``MatMul`` become available. Finally, after removing ``MatMul`` from the graph all outputs
to ``Add`` are available, and it can be removed from the graph as well.

.. image:: assets/sample_graph.png
:width: 200px
:align: center
:alt: sample graph


Rendering DML scripts
---------------------

The main idea of this converter is, that the logic for generating the actual dml-syntax is handled by
`Jinja templates <https://jinja.palletsprojects.com/en/2.11.x/>`_ (located in ``/templates``).
Therefore the python code stays uncluttered, because it does not have to merge strings together to produce valid
dml-syntax and instead simply provides the elements that are needed to render the script.

The template-engine then takes these inputs and renders a human readable script with valid dml syntax.
To improve readability the generator also automatically ads the doc-strings which are part of the onnx-definitions as
comments to the script.

When traversing the graph, a script part is generated for each node consisting of three elements:

- `dml_script` The actual script snipped for the node
- `imports` Imports required for the node
- `sub_graphs` Any sub_graphs of the node that need to be handled

The function that is called for rendering a specific operator is defined in the dictionary
``operator_generators`` in ``render.py``

1. `dml_script`
^^^^^^^^^^^^^^^^^^

Depending on the operator this can be a function call or a more complex dml-snippet.
This part is generated by the template-engine when the corresponding template is rendered.

Many onnx-operators can be handled by a single template file. There exists a ``function_call.dml.jinja``
template which should be able to handle a large number of operators.

2. `imports`
^^^^^^^^^^^^^

Some operators are handled by calling scripts provided by systemds located in ``$SYSTEMDS_ROOT/scripts``.
To enable these imports, the converter automatically resolves the ``$SYSTEMDS_ROOT``
environment variable and adds a ``setw($SYSTEMDS_ROOT/scripts)`` to the script.

3. `sub_graphs`
^^^^^^^^^^^^^^^^^

Since sub-graphs have their own variable scope and are independent, they are handled as separate functions.
The converter generates a function for each graph in the model.
In the main-graph, the sub-graph is replaced by a function call to the sub-graph function.
To handle this the function ``render.gen_graph_functions`` recursively calls itself to render sub-graph functions
(and also the sub-graph functions of sub-graphs and so on...).

Final Script
------------

In the final render all required imports, the sub-functions and the main-function are combined in a single dml-file.

Implementing new operators
----------------------------

When implementing an operator it's best to have a look at the
`operator schemas <https://github.com/onnx/onnx/blob/master/docs/Operators.md>`_
which exactly define the inputs, outputs and attributes of the operation

It is also nice to have a test-model to work with, to generate one refer to
``tests/onnx/test_models/model_generate.py``.

To implement a new operator, the function that handles the operator needs to be defined in the ``operator_generators``
located in ``render.py``.
All functions listed in this dictionary need to have the same call structure.

If there exists a dml-script (in ``$SYSTEMDS_ROOT/scripts``) that provides the functionality the operator
can be implemented by translating the arguments/inputs, adding the import-render and function-call-render to this script.

Testing models
---------------

onnx provides a convenient way for
`creating models <https://github.com/onnx/onnx/blob/master/docs/PythonAPIOverview.md#checking-an-onnx-model>`_
using helper functions in python. All current test-models are produced like this (see ``tests/onnx/test_models``).

Creating a Testcase
^^^^^^^^^^^^^^^^^^^^^

The current test-system takes a model, converts it to dml using the converter and then runs a
``dml_wrapper`` which calls the model-function using the script ``$SYSTEMDS_ROOT/bin/systemds``.
Finally, the output (stored by the dml-wrapper) is compared to a reference output.

When creating files stick to the naming conventions of other files in the same folder.

Steps:
""""""""

1. Create a model in ``tests/onnx/test_models``, e.g. ``sample_model.onnx``

2. Create a dml wrapper that calls the model-function in ``tests/onnx/dml_wrapper/sample_model_wrapper.dml``

- The wrapper needs to call the model-function and store the output to ``output_test/sample_model.out``
- The name of the model-function is generated from the model-name (see ``util.generate_function_name`` )

3. Provide a reference output in ``tests/onnx/output_reference/sample_model_reference.out``

4. Create the unit test function.

Tools
------

- `Pycharm <https://www.jetbrains.com/pycharm/>`_ in the professional version allows you to `debug template files <https://www.jetbrains.com/help/pycharm/templates.html#debug>`_ which can be handy
- `Neutron <https://github.com/lutzroeder/netron>`_ is a nice free tool for viewing onnx-graphs

0 comments on commit 0ac0c25

Please sign in to comment.