drop numpy dependency from runtime where possible #111

StrikerRUS · 2019-10-19T00:32:01Z

Closed #108.

See #108 for the rationale. With this PR numpy is used only for runtime code where some vector math is performed (that is for numpy was created).

I'm so sorry about the excess diff in linear and xgboost generated code examples for all languages. I guess they were caused by different package versions. Should I remove them from this PR or it's OK?

As a side note, I think there can be a CI job which will generate code examples with the latest packages and push them back into the repo after the each commit in master. So that examples will be always fresh.

coveralls · 2019-10-19T00:47:26Z

Coverage increased (+0.05%) to 95.238% when pulling 9b5fe1e on StrikerRUS:drop_numpy_runtime into 30abeef on BayesWitnesses:master.

krinart · 2019-10-19T02:58:27Z

m2cgen/interpreters/python/interpreter.py

-    tanh_function_name = "np.tanh"
+    exponent_function_name = "math.exp"
+    power_function_name = "math.pow"
+    tanh_function_name = "math.tanh"


I'm not sure I understand. It seems like now we make an assumption that expr here will always be scalar. And similar assumptions will be made for ExpExpr and PowExpr.

Maybe I'm missing something though.

Ohh.. I see, those are instances of NumExpr which means that they are scalars.

We should probably think about adding checks similar to what we have here, but for subclasses of NumExpr to verify that they only accept/produce scalars.

We should probably think about adding checks similar to what we have here, but for subclasses of NumExpr to verify that they only accept/produce scalars.

Yeah, seems to be so!
I guess all other languages rely on this implicit assumption now as well. For instance, these C functions work only with scalars:

m2cgen/m2cgen/interpreters/c/interpreter.py

Lines 20 to 22 in 30abeef

exponent_function_name = "exp"

power_function_name = "pow"

tanh_function_name = "tanh"

Should I add asserts for scalars in this PR? Or it's better to have a separate one?

I would say it's up to you as it's not really relevant to or required by this PR.

On the other hand I feel like it's just adding 4 asserts: 1 for ExpExpr, 1 for TanhExpr and 2 for PowExpr.

So whatever works for you.

@krinart I added asserts in this PR, if you do not mind 🙂 .

Thanks a lot for the guide where to add them!

Thanks for your effort! 👍

izeigerman

Thanks for your contribution 👍
I'm a bit confused that there are now to ways to define a vector in Python. One through vector_init and the other one - through array_convert_to_numpy. Furthermore I'm not sure I fully understand the motivation behind this approach.
Additionally can we please revert all generated code samples that are not related to Python?

izeigerman · 2019-10-21T14:51:24Z

m2cgen/interpreters/python/interpreter.py

+    power_function_name = "math.pow"
+    tanh_function_name = "math.tanh"
+
+    with_vector_operations = False


Not sure why can't we keep using with_vectors flag here that comes from ToCodeInterpreter?

with_vectors means that some vectors are used. Refer to https://github.com/BayesWitnesses/m2cgen/blob/master/generated_code_examples/python/classification/linear.py as an example.
with_vector_operations means that some vector math happens. Refer to

m2cgen/generated_code_examples/python/classification/random_forest.py

Line 29 in 30abeef

return ((var0) * (0.5)) + ((var1) * (0.5))

as an example.

@izeigerman Oh, just realized that with_vector_operations is the same as with_linear_algebra.
Updated in the latest commit.

izeigerman · 2019-10-21T14:54:09Z

m2cgen/interpreters/python/interpreter.py

        return self._cg.infix_expression(
-            left=self._do_interpret(expr.left, **kwargs),
+            left=self._cg.array_convert_to_numpy(


I'm not sure I understand this change. Why can't we keep using vector_init for this purpose?

Because [1, 2, 3] + [4, 5, 6] != [5, 7, 9] in Python.

My question is rather why do we need raw Python lists in a generated code in the first place. Why can't it always be a numpy array?

I'm guessing that this is the part where we may fallback to Python lists to eliminate the numpy dependency but I'd rather always have a consistent representation of the vector (especially considering that we may return it from the score function). So I'd argue that every time when we need vectors, it should be numpy arrays.

Yeah, this is about eliminating heavy redundant dependency from runtime (as well as reducing time and memory costs). I think that this is quite more important rather than have consistent return value. There are many frameworks which return either type1, or type2, depending on some conditions. BTW, score at present may return double and double[], so this can be treated as inconsistency as well.

I think that from the user side it's much easier to wrap score function call into np.array() only once, if they need consistent output, rather than manually edit generated function removing dozens of np.assarray() entries, if they do not have/want/able to install numpy.

Joke 😃 : It's time to change project's description:

Transform ML models into a native code (Java, C, Python, Go, JavaScript) with zero dependencies

BTW, score at present may return double and double[], so this can be treated as inconsistency as well.

Well there is a distinction between scalar and vector, but here we're introducing different flavors of vectors.

Additionally we break backward compatibility, which is not great.

I'm positive about the benefit of using math vs numpy in certain scenarios, but I'm not fully convinced about the whole vector thing.

@krinart you're a better Python expert than I am. What's your opinion?

Well there is a distinction between scalar and vector, but here we're introducing different flavors of vectors.
Additionally we break backward compatibility, which is not great.

Agree! I only want to say that the project has version 0.4.0, which means that, according to the semantic versioning, its API is subject to change.

After discussion with @krinart I think it may not be that bad in Python world. We also discussed the possibility of dropping numpy dependency altogether in favor of manually written vector multiplication like we do in other languages. This PR can be the first step in that direction. Let's merge it. @StrikerRUS thanks a lot for your effort 👍

@izeigerman Thanks a lot for considering this change!

We also discussed the possibility of dropping numpy dependency altogether in favor of manually written vector multiplication like we do in other languages.

It's a great step towards isolated environments where it's not so straightforward to install any dependency! I guess they are the majority of the target audience of the package. As of now only vector addition and multiplication by number should be supported, it will not hurt the performance too much.

However, with the growth of supported algorithms number there can be needed more non-trivial vector operations where numpy speed will be crucial. So I think there can be an option which will provide users the possibility to choose either they need a runtime with zero dependencies but slow, or one with a few number of dependencies but significantly faster.

cc @krinart

izeigerman · 2019-10-21T14:54:20Z

m2cgen/interpreters/python/interpreter.py

        return self._cg.infix_expression(
-            left=self._do_interpret(expr.left, **kwargs),
+            left=self._cg.array_convert_to_numpy(


Same as above

StrikerRUS · 2019-10-21T19:02:54Z

I reverted changes from generated examples that are not related to this PR. Also, I realized that with_vector_operations I introduced here is the same as with_linear_algebra instead, so I used it to determine when numpy is needed.

StrikerRUS added 2 commits October 19, 2019 01:53

drop numpy dependency from runtime where possible

4bc9352

updated Python tests

8a33926

krinart reviewed Oct 19, 2019

View reviewed changes

check that only scalars are used in math functions

76eb578

krinart approved these changes Oct 20, 2019

View reviewed changes

izeigerman requested changes Oct 21, 2019

View reviewed changes

StrikerRUS added 2 commits October 21, 2019 21:40

revert changes from generated examples

7b686bf

use with_linear_algebra instead of with_vector_operations

9b5fe1e

izeigerman approved these changes Oct 25, 2019

View reviewed changes

izeigerman merged commit cb933aa into BayesWitnesses:master Oct 25, 2019

krinart mentioned this pull request Oct 25, 2019

Remove numpy from default PythonInterpreter and potentially introduce PythonNumpyInterpreter #115

Closed

StrikerRUS deleted the drop_numpy_runtime branch November 13, 2019 19:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

drop numpy dependency from runtime where possible #111

drop numpy dependency from runtime where possible #111

StrikerRUS commented Oct 19, 2019 •

edited

Loading

coveralls commented Oct 19, 2019 •

edited

Loading

krinart Oct 19, 2019

krinart Oct 19, 2019

krinart Oct 19, 2019

StrikerRUS Oct 19, 2019

StrikerRUS Oct 19, 2019

krinart Oct 19, 2019

StrikerRUS Oct 20, 2019 •

edited

Loading

krinart Oct 20, 2019

izeigerman left a comment •

edited

Loading

izeigerman Oct 21, 2019

StrikerRUS Oct 21, 2019

StrikerRUS Oct 21, 2019 •

edited

Loading

izeigerman Oct 21, 2019 •

edited

Loading

StrikerRUS Oct 21, 2019

izeigerman Oct 21, 2019

izeigerman Oct 21, 2019 •

edited

Loading

StrikerRUS Oct 21, 2019 •

edited

Loading

StrikerRUS Oct 21, 2019

izeigerman Oct 22, 2019 •

edited

Loading

StrikerRUS Oct 22, 2019 •

edited

Loading

izeigerman Oct 25, 2019

StrikerRUS Oct 25, 2019

izeigerman Oct 21, 2019

StrikerRUS commented Oct 21, 2019

	exponent_function_name = "exp"
	power_function_name = "pow"
	tanh_function_name = "tanh"

drop numpy dependency from runtime where possible #111

drop numpy dependency from runtime where possible #111

Conversation

StrikerRUS commented Oct 19, 2019 • edited Loading

coveralls commented Oct 19, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StrikerRUS Oct 20, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

izeigerman left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StrikerRUS Oct 21, 2019 • edited Loading

Choose a reason for hiding this comment

izeigerman Oct 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

izeigerman Oct 21, 2019 • edited Loading

Choose a reason for hiding this comment

StrikerRUS Oct 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

izeigerman Oct 22, 2019 • edited Loading

Choose a reason for hiding this comment

StrikerRUS Oct 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StrikerRUS commented Oct 21, 2019

StrikerRUS commented Oct 19, 2019 •

edited

Loading

coveralls commented Oct 19, 2019 •

edited

Loading

StrikerRUS Oct 20, 2019 •

edited

Loading

izeigerman left a comment •

edited

Loading

StrikerRUS Oct 21, 2019 •

edited

Loading

izeigerman Oct 21, 2019 •

edited

Loading

izeigerman Oct 21, 2019 •

edited

Loading

StrikerRUS Oct 21, 2019 •

edited

Loading

izeigerman Oct 22, 2019 •

edited

Loading

StrikerRUS Oct 22, 2019 •

edited

Loading