drop numpy dependency from Python code for cases without vectors #108

StrikerRUS · 2019-10-16T21:57:38Z

According to this line, it seems that numpy is used as a default math library for runtime even when we do not operate with vectors.

m2cgen/m2cgen/interpreters/python/interpreter.py

Lines 30 to 31 in 2475f3c

    
           if self.with_vectors or self.with_math_module: 
        
               self._cg.add_dependency("numpy", alias="np")

Let me describe two advantages of dropping numpy where it's possible.

The first one is excess dependence. Even though numpy is a sort of "classic" dependence and there should be no problems with installing it, it requires additional manipulation from a user side. Also, there are some companies with very strict security policies, which prohibit using pip (conda, brew, and other package managers). So, I guess, for them raw Python may be preferable solution in cases where it's possible.

The second one is speed. numpy is about efficient vector math, in other cases it only produces redundant computational cost. Consider the following example. Take this generated Python code from the repo, change return type from np.array to simple list, replace the following things in script:

numpy -> math
np.exp -> math.exp
np.power -> math.pow

Here what we get after removing numpy:

import math
def score_raw(input):
    var0 = (0) - (0.25)
    var1 = math.exp((var0) * ((((math.pow((5.4) - (input[0]), 2)) + (math.pow((3.0) - (input[1]), 2))) + (math.pow((4.5) - (input[2]), 2))) + (math.pow((1.5) - (input[3]), 2))))
    var2 = math.exp((var0) * ((((math.pow((6.2) - (input[0]), 2)) + (math.pow((2.2) - (input[1]), 2))) + (math.pow((4.5) - (input[2]), 2))) + (math.pow((1.5) - (input[3]), 2))))
    var3 = math.exp((var0) * ((((math.pow((5.0) - (input[0]), 2)) + (math.pow((2.3) - (input[1]), 2))) + (math.pow((3.3) - (input[2]), 2))) + (math.pow((1.0) - (input[3]), 2))))
    var4 = math.exp((var0) * ((((math.pow((5.9) - (input[0]), 2)) + (math.pow((3.2) - (input[1]), 2))) + (math.pow((4.8) - (input[2]), 2))) + (math.pow((1.8) - (input[3]), 2))))
    var5 = math.exp((var0) * ((((math.pow((5.0) - (input[0]), 2)) + (math.pow((2.0) - (input[1]), 2))) + (math.pow((3.5) - (input[2]), 2))) + (math.pow((1.0) - (input[3]), 2))))
    var6 = math.exp((var0) * ((((math.pow((6.7) - (input[0]), 2)) + (math.pow((3.0) - (input[1]), 2))) + (math.pow((5.0) - (input[2]), 2))) + (math.pow((1.7) - (input[3]), 2))))
    var7 = math.exp((var0) * ((((math.pow((7.0) - (input[0]), 2)) + (math.pow((3.2) - (input[1]), 2))) + (math.pow((4.7) - (input[2]), 2))) + (math.pow((1.4) - (input[3]), 2))))
    var8 = math.exp((var0) * ((((math.pow((4.9) - (input[0]), 2)) + (math.pow((2.4) - (input[1]), 2))) + (math.pow((3.3) - (input[2]), 2))) + (math.pow((1.0) - (input[3]), 2))))
    var9 = math.exp((var0) * ((((math.pow((6.3) - (input[0]), 2)) + (math.pow((2.5) - (input[1]), 2))) + (math.pow((4.9) - (input[2]), 2))) + (math.pow((1.5) - (input[3]), 2))))
    var10 = math.exp((var0) * ((((math.pow((6.0) - (input[0]), 2)) + (math.pow((2.7) - (input[1]), 2))) + (math.pow((5.1) - (input[2]), 2))) + (math.pow((1.6) - (input[3]), 2))))
    var11 = math.exp((var0) * ((((math.pow((5.7) - (input[0]), 2)) + (math.pow((2.6) - (input[1]), 2))) + (math.pow((3.5) - (input[2]), 2))) + (math.pow((1.0) - (input[3]), 2))))
    var12 = math.exp((var0) * ((((math.pow((5.1) - (input[0]), 2)) + (math.pow((3.8) - (input[1]), 2))) + (math.pow((1.9) - (input[2]), 2))) + (math.pow((0.4) - (input[3]), 2))))
    var13 = math.exp((var0) * ((((math.pow((4.4) - (input[0]), 2)) + (math.pow((2.9) - (input[1]), 2))) + (math.pow((1.4) - (input[2]), 2))) + (math.pow((0.2) - (input[3]), 2))))
    var14 = math.exp((var0) * ((((math.pow((5.7) - (input[0]), 2)) + (math.pow((4.4) - (input[1]), 2))) + (math.pow((1.5) - (input[2]), 2))) + (math.pow((0.4) - (input[3]), 2))))
    var15 = math.exp((var0) * ((((math.pow((5.8) - (input[0]), 2)) + (math.pow((4.0) - (input[1]), 2))) + (math.pow((1.2) - (input[2]), 2))) + (math.pow((0.2) - (input[3]), 2))))
    var16 = math.exp((var0) * ((((math.pow((5.1) - (input[0]), 2)) + (math.pow((3.3) - (input[1]), 2))) + (math.pow((1.7) - (input[2]), 2))) + (math.pow((0.5) - (input[3]), 2))))
    var17 = math.exp((var0) * ((((math.pow((5.7) - (input[0]), 2)) + (math.pow((3.8) - (input[1]), 2))) + (math.pow((1.7) - (input[2]), 2))) + (math.pow((0.3) - (input[3]), 2))))
    var18 = math.exp((var0) * ((((math.pow((4.3) - (input[0]), 2)) + (math.pow((3.0) - (input[1]), 2))) + (math.pow((1.1) - (input[2]), 2))) + (math.pow((0.1) - (input[3]), 2))))
    var19 = math.exp((var0) * ((((math.pow((4.5) - (input[0]), 2)) + (math.pow((2.3) - (input[1]), 2))) + (math.pow((1.3) - (input[2]), 2))) + (math.pow((0.3) - (input[3]), 2))))
    var20 = math.exp((var0) * ((((math.pow((6.3) - (input[0]), 2)) + (math.pow((2.7) - (input[1]), 2))) + (math.pow((4.9) - (input[2]), 2))) + (math.pow((1.8) - (input[3]), 2))))
    var21 = math.exp((var0) * ((((math.pow((6.0) - (input[0]), 2)) + (math.pow((3.0) - (input[1]), 2))) + (math.pow((4.8) - (input[2]), 2))) + (math.pow((1.8) - (input[3]), 2))))
    var22 = math.exp((var0) * ((((math.pow((6.3) - (input[0]), 2)) + (math.pow((2.8) - (input[1]), 2))) + (math.pow((5.1) - (input[2]), 2))) + (math.pow((1.5) - (input[3]), 2))))
    var23 = math.exp((var0) * ((((math.pow((5.8) - (input[0]), 2)) + (math.pow((2.8) - (input[1]), 2))) + (math.pow((5.1) - (input[2]), 2))) + (math.pow((2.4) - (input[3]), 2))))
    var24 = math.exp((var0) * ((((math.pow((6.1) - (input[0]), 2)) + (math.pow((3.0) - (input[1]), 2))) + (math.pow((4.9) - (input[2]), 2))) + (math.pow((1.8) - (input[3]), 2))))
    var25 = math.exp((var0) * ((((math.pow((7.7) - (input[0]), 2)) + (math.pow((2.6) - (input[1]), 2))) + (math.pow((6.9) - (input[2]), 2))) + (math.pow((2.3) - (input[3]), 2))))
    var26 = math.exp((var0) * ((((math.pow((6.9) - (input[0]), 2)) + (math.pow((3.1) - (input[1]), 2))) + (math.pow((5.1) - (input[2]), 2))) + (math.pow((2.3) - (input[3]), 2))))
    var27 = math.exp((var0) * ((((math.pow((6.3) - (input[0]), 2)) + (math.pow((3.3) - (input[1]), 2))) + (math.pow((6.0) - (input[2]), 2))) + (math.pow((2.5) - (input[3]), 2))))
    var28 = math.exp((var0) * ((((math.pow((4.9) - (input[0]), 2)) + (math.pow((2.5) - (input[1]), 2))) + (math.pow((4.5) - (input[2]), 2))) + (math.pow((1.7) - (input[3]), 2))))
    var29 = math.exp((var0) * ((((math.pow((6.0) - (input[0]), 2)) + (math.pow((2.2) - (input[1]), 2))) + (math.pow((5.0) - (input[2]), 2))) + (math.pow((1.5) - (input[3]), 2))))
    var30 = math.exp((var0) * ((((math.pow((7.9) - (input[0]), 2)) + (math.pow((3.8) - (input[1]), 2))) + (math.pow((6.4) - (input[2]), 2))) + (math.pow((2.0) - (input[3]), 2))))
    var31 = math.exp((var0) * ((((math.pow((7.2) - (input[0]), 2)) + (math.pow((3.0) - (input[1]), 2))) + (math.pow((5.8) - (input[2]), 2))) + (math.pow((1.6) - (input[3]), 2))))
    var32 = math.exp((var0) * ((((math.pow((7.7) - (input[0]), 2)) + (math.pow((3.8) - (input[1]), 2))) + (math.pow((6.7) - (input[2]), 2))) + (math.pow((2.2) - (input[3]), 2))))
    return [(((((((((((((((((((-0.08359187780790468) + ((var1) * (-0.0))) + ((var2) * (-0.0))) + ((var3) * (-0.4393498355605194))) + ((var4) * (-0.009465620856664334))) + ((var5) * (-0.16223369966927))) + ((var6) * (-0.26861888775075243))) + ((var7) * (-0.4393498355605194))) + ((var8) * (-0.4393498355605194))) + ((var9) * (-0.0))) + ((var10) * (-0.0))) + ((var11) * (-0.19673905328606292))) + ((var12) * (0.3340655283922188))) + ((var13) * (0.3435087305152051))) + ((var14) * (0.4393498355605194))) + ((var15) * (0.0))) + ((var16) * (0.28614124535416424))) + ((var17) * (0.11269159286168087))) + ((var18) * (0.0))) + ((var19) * (0.4393498355605194)), (((((((((((((((((((((-0.18563912331454907) + ((var20) * (-0.0))) + ((var21) * (-0.06014273244194299))) + ((var22) * (-0.0))) + ((var23) * (-0.031132453078851926))) + ((var24) * (-0.0))) + ((var25) * (-0.3893079321588921))) + ((var26) * (-0.06738007627290196))) + ((var27) * (-0.1225075748937126))) + ((var28) * (-0.3893079321588921))) + ((var29) * (-0.29402231709614085))) + ((var30) * (-0.3893079321588921))) + ((var31) * (-0.0))) + ((var32) * (-0.028242141062729226))) + ((var12) * (0.16634667752431267))) + ((var13) * (0.047772685163074764))) + ((var14) * (0.3893079321588921))) + ((var15) * (0.3893079321588921))) + ((var16) * (0.0))) + ((var17) * (0.0))) + ((var18) * (0.3893079321588921))) + ((var19) * (0.3893079321588921)), ((((((((((((((((((((((((0.5566649875797668) + ((var20) * (-25.563066587228416))) + ((var21) * (-38.35628154976547))) + ((var22) * (-38.35628154976547))) + ((var23) * (-0.0))) + ((var24) * (-38.35628154976547))) + ((var25) * (-0.0))) + ((var26) * (-0.0))) + ((var27) * (-0.0))) + ((var28) * (-6.2260303727828745))) + ((var29) * (-18.42781911624364))) + ((var30) * (-0.14775026537286423))) + ((var31) * (-7.169755983020096))) + ((var32) * (-0.0))) + ((var1) * (12.612328267927264))) + ((var2) * (6.565812506955159))) + ((var3) * (0.0))) + ((var4) * (38.35628154976547))) + ((var5) * (0.0))) + ((var6) * (38.35628154976547))) + ((var7) * (0.0))) + ((var8) * (0.0))) + ((var9) * (38.35628154976547))) + ((var10) * (38.35628154976547))) + ((var11) * (0.0))]

And here are some timings:

%%timeit -n 10000
score([1, 2, 3, 4])

310 µs ± 658 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%%timeit -n 10000
score_raw([1, 2, 3, 4])

39.4 µs ± 136 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Results seems to be identical:

np.testing.assert_allclose(score([1, 2, 3, 4]), score_raw([1, 2, 3, 4]))

Please share your thoughts about this refactoring.

The text was updated successfully, but these errors were encountered:

izeigerman · 2019-10-17T00:35:39Z

Hey @StrikerRUS! Thanks for creating this issue.

Besides using pow and exp we also perform vector additions and multiplications by scalar. This can be potentially replaced with manually written functions similar to ones we introduced in other languages. However I don't see much benefit of doing this, especially considering that numpy is basically a de facto standard in the Data Science and Machine Learning community.

UPD However I don't have strong arguments against using exp and pow from math package. If there is an evidence that they perform faster than numpy (and apparently there is) then we should consider using them instead.

StrikerRUS · 2019-10-17T00:53:18Z

@izeigerman Thanks a lot for your prompt response!

Besides using pow and exp we also perform vector additions and multiplications by scalar.

Is self.with_vectors indicator used for that? I'm not suggesting replacing vector operators with manually written functions. I just suggested dropping numpy for fully scalar operations.

However I don't see much benefit of doing this,

I guess ~8x speedup worth it!

especially considering that numpy is basically a de facto standard in the Data Science and Machine Learning community.

Sure, but generated runtime Python module can be used far away from data science community 🙂 .

izeigerman · 2019-10-17T02:27:14Z

Hey @StrikerRUS, sorry, didn't expect you to reply so quickly :) I've edited my previous message right when you posted your reply. I've read your post carefully one more time and came to a conclusion that what you're saying makes sense.

izeigerman · 2019-10-17T02:32:26Z

Great research btw 👍

StrikerRUS · 2019-10-17T02:47:38Z

@izeigerman No problem! I'm sorry too: I read fast, type slowly and GitHub doesn't provide live updates for editions. 😃
I'm glad you liked the idea!

izeigerman added good first issue Good for newcomers help wanted Extra attention is needed labels Oct 17, 2019

StrikerRUS mentioned this issue Oct 19, 2019

drop numpy dependency from runtime where possible #111

Merged

izeigerman closed this as completed in #111 Oct 25, 2019

StrikerRUS mentioned this issue Jan 23, 2020

remove scikit-learn and scipy from dependencies #155

Merged

StrikerRUS mentioned this issue May 27, 2020

use math.log for scalars #226

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

drop numpy dependency from Python code for cases without vectors #108

drop numpy dependency from Python code for cases without vectors #108

StrikerRUS commented Oct 16, 2019 •

edited

Loading

izeigerman commented Oct 17, 2019 •

edited

Loading

StrikerRUS commented Oct 17, 2019

izeigerman commented Oct 17, 2019 •

edited

Loading

izeigerman commented Oct 17, 2019

StrikerRUS commented Oct 17, 2019

drop numpy dependency from Python code for cases without vectors #108

drop numpy dependency from Python code for cases without vectors #108

Comments

StrikerRUS commented Oct 16, 2019 • edited Loading

izeigerman commented Oct 17, 2019 • edited Loading

StrikerRUS commented Oct 17, 2019

izeigerman commented Oct 17, 2019 • edited Loading

izeigerman commented Oct 17, 2019

StrikerRUS commented Oct 17, 2019

StrikerRUS commented Oct 16, 2019 •

edited

Loading

izeigerman commented Oct 17, 2019 •

edited

Loading

izeigerman commented Oct 17, 2019 •

edited

Loading