New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
onnx-systemds implementation #904
Conversation
…edactor files, add license-header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a cursory check, the changeset looks good. Can the design be documented thoroughly in a .md
file.
Feel free to include as much information in the doc as possible with the uses of tables, graphs etc.
I am facing problems with building onnx
for now.
Thank you for the quick review, I've added a design document which describes the current implementation in detail. |
Thank you for the design documentation, @lukas-jkl . I would make a deep pass shortly. |
Sorry, @lukas-jkl - I may not be able to review PR anytime soon. But, I posted this on the list, so that someone would come forward for review. Thank you. |
thanks for the initial look @j143. @Baunsgaard could you please have a look. I'll also try it out later this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Great PR, some changes needed, but absolutely great addition.
I Like your testing and documentation
I would appreciate if you relocate the Onnx_systemds inside our src/main/python/systemds itself. such that when we release a new pip install this addition is included, and we don't need to provide an extra package.
The documentation is written in markdown, depending on the time you have, i would appreciate if you write it up as a rst (or more) instead inside src/main/python/docs
. This would make it integrate with the already existing documents.
Furthermore, there is already the document in /docs that contain another onnx-systemds-design.md. It would be great to consolidate these two.
If you want to make it absolutely perfect then you should add an github action inside .github/workflows/ maybe call it onnx.yml
, and if you merge the onnx implementation into systemds then extend the python.yml
instead.
A simple test you can have instead of an output file (like the ones in tests/output_reference
) is to finish your scripts with a sum(res) and then compare that one number with your result. (you can add more aggregates if you don't trust just sum).
Minor extra things Missing:
- init.py inside the tests folder.
to run onnx-systemds you need [onnx](https://github.com/onnx/onnx) | ||
|
||
* [Installation instructions](https://github.com/onnx/onnx#installation) | ||
* The conda install seems to work best |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is unessesary (line 14), since it depends on how the user want to install,
just having the link to install instructions is fine. Maybe add Install instructions Onnx.
|
||
* [Installation instructions](https://github.com/onnx/onnx#installation) | ||
* The conda install seems to work best | ||
* the environment variable `SYSTEMDS_ROOT` needs to be set to the root of the repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link to our /bin/README.md for setting up environment
For a more detailed description of this converter refer to the [description of the converter design](docs/onnx-systemds-design.md) | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of your documentation files, have many extra newlines before a new section. one empty line is sufficient
@@ -0,0 +1,37 @@ | |||
# onnx-systemds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Onnx onnx or ONNX?
It is in general not consistent throughout the PR,
@@ -0,0 +1,37 @@ | |||
# onnx-systemds | |||
|
|||
A tool for importing/exporting [ONNX](https://github.com/onnx/onnx/blob/master/docs/IR.md) graphs into/from SystemDS DML scripts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consistency, here the link is ONNX, while later it is onnx.
@@ -0,0 +1,37 @@ | |||
# onnx-systemds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add License
""" Class for preparing onnx value structures for writing them to the dml script """ | ||
def __init__(self, value_info: onnx.ValueInfoProto, initializer: onnx.TensorProto = None): | ||
|
||
supported_types = ["int", "boolean", "double"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean int here ? or is it a long value?
I'm assuming this here maps to our internal representations of simple long double or boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those should be the types supported by systems (I've now renamed the variable to make this more clear). So I guess it should be "integer"
10: "double", # float16, | ||
11: "double", | ||
12: "int", # uint32 | ||
13: "int", # uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
13 is not possible for all values, to map if a long
12 && 7 is not possible if the int is not a long 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way of throwing an not implemented exception if a type is invalid, if so then i would suggest to make some appropriate error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are correct, I am however unsure how big the "Integer" type of systemds is? I guess it should be int32. I've now changed those translations and an exception is thrown if we do not support the type. I've also added an additional check to enforce that a matrix must only contain double values, as stated in the dml-language-reference.
# TODO: not sure this is the solution for every instance of this problem | ||
# Multiply all shapes right | ||
rows = self.shape[0] | ||
cols = reduce(lambda s0, s1: s0 * s1, self.shape[1:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only place where the functools import is used, and the syntax is a little foreign to me.
- thing move the import to be an inline import since it is as i can see it an edge case if it is needed.
- consider if the import is necessary to execute this.
- If functools is a default python lib then completely ignore this comment 😄
return GeneratedScriptPart(imports=[import_render], dml_script=node_render) | ||
|
||
|
||
def gen_maxpool_call(env: jinja2.environment.Environment, graph: onnx.GraphProto, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gen conv and gen maxpool, contain much of the same code, maybe some of it could be alleviated using some private helper functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furter comment:
The Onnx files inside test_models, have an model_generate file.
I think it would be nice if those were not included in the PR, but have a note in the readme to generate them first before tests?
|
||
try: | ||
abspath_input = os.path.abspath(input_file) | ||
res = subprocess.run([systemds_root_path + "/bin/systemds.sh", abspath_input] + args, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the issue for why it fails now. we changed the name to /bin/systemds
without the .sh
.
i installed not using conda, with no problems. the only thing to remember is to install protobuf, but the Onnx guide specifies this as well. |
The thing to discuss in general for this implementation of Onnx in SystemDS is the use of jinja2.
The upside is somewhat hopefully simpler translation to DML. This said in the current implementation it would not be so difficult to simplify and construct the strings needed without jinja2, at least to my understanding, but maybe it is useful down the road. Any opinions? |
… on systemds-warnings in tests
… tests callable from main python directory
…uotes, unify naming, etc.
Thank you for the quick review @Baunsgaard ! I've implemented your suggested changes and have rewritten the documentation in rst files. I am however still a bit unsure about the yml files. I guess it would not be sufficient to simply add onnx and Jinja to the pip dependencies as protobuf would also need to be installed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the quick review @Baunsgaard ! I've implemented your suggested changes and have rewritten the documentation in rst files. I am however still a bit unsure about the yml files. I guess it would not be sufficient to simply add onnx and Jinja to the pip dependencies as protobuf would also need to be installed?
Since our python tests are executed on Ubuntu-latest it should hopefully be as simple as adding a step,
- name: install protobuf
run: {INSERT INSTALL COMMAND HERE }
Seems to work now! |
Cool many thanks, LGTM. 🥇 Up for discussion in the future is still #904 (comment) Only real thing left to refine in my opinion is the Output files for validation, that i think is to much, because the tests should not reflect a specific result, but a certain behavior. But I also acknowledge this is a highly personal thing, and that some of the python tests already in does not follow this principle. |
LGTM - thanks @lukas-jkl for this awesome patch, and @Baunsgaard for the detailed discussions. Overall, this is a great initial importer and completes the AMLS project. During the merge, I only added Down the the road, I would also prefer to remove these expected output files and rather generate them dynamically (e.g., via pytorch). Similarly, we should clean up the license headers to use a common text layout (beginning/end lines). Btw, there are additional rat problems in the data slicing implementation (which I fix when merging Svetlana's PR) and the |
This PR implements a first poc-implementation for an ONNX importer.
It adds support for the following operators:
As well as the logic for nested sub-graphs.