cache handler names #207

StrikerRUS · 2020-05-08T01:27:00Z

This method (now function to not overcomplicate the code with handling of two decorators) is used very extensively. So the profit of caching it is obvious.

For example,

import sys

from sklearn.datasets import load_boston

import lightgbm as lgb
import m2cgen as m2c

X, y = load_boston(True)
est = lgb.LGBMRegressor(n_estimators=1000, random_state=42).fit(X, y)

sys.setrecursionlimit(int(1e5))
_ = m2c.export_to_python(est)

m2c.interpreters.utils._get_handler_name.cache_info()

CacheInfo(hits=97666, misses=5, maxsize=16, currsize=5)

coveralls · 2020-05-08T02:44:33Z

Coverage increased (+0.01%) to 95.614% when pulling 647f3ec on cache into 7626a60 on master.

izeigerman

TBH, I'm not quite convinced that this computation contributes into the runtime significantly, but I agree that caching it is a better practice than recomputing it every time. One comment for your consideration, LGTM otherwise.

izeigerman · 2020-05-08T15:54:16Z

m2cgen/interpreters/utils.py

@@ -7,3 +10,13 @@
 def get_file_content(path):
    with open(path) as f:
        return f.read()
+
+
+@lru_cache(maxsize=16)


Does it make sense to put a larger number here to accommodate for future AST extension? I'd even suggest to make it unbounded if possible, since the total number of expressions in AST will always be pretty limited. Otherwise perhaps we can set it to something like 32.

I think that for the sake of efficiency it should be as small as possible. I even thought to set maxsize=8.
For future AST extension we can easily update the maxsize here while adding new expressions.

Yeah, my concern is that one should always keep this number on the back of their minds when adding new expression. Why do you say it should be as small as possible though? The cache is super cheap in this case, isn't it? It's if we just had a map with 16 pointer to string pairs.

Yeah, my concern is that one should always keep this number on the back of their minds when adding new expression.

Agree! It is not so convenient. Driving in a such direction we can simply replace two these functions with manual dict.

But according to the source code, lru_cache is based on linked list, not hashmap
https://github.com/python/cpython/blob/81a5fc38e81b424869f4710f48e9371dfa2d3b77/Lib/functools.py#L781

So I believe size matters here.

OK, maybe I'm overoptimizing things. I've proposed new changes to this PR so that we won't have to care about keeping maxsize actual. Now it has enough capacity automatically. Please let me know what you think about it.

Looks good, thanks 👍

StrikerRUS · 2020-05-08T16:28:00Z

TBH, I'm not quite convinced that this computation contributes into the runtime significantly, ...

But it still contributes due to the number of calls 🙂

import sys

from sklearn.datasets import load_boston

import lightgbm as lgb
import m2cgen as m2c

X, y = load_boston(True)
est = lgb.LGBMRegressor(n_estimators=1000, random_state=42).fit(X, y)

sys.setrecursionlimit(int(1e5))

%%prun
_ = m2c.export_to_python(est)

ncalls
    for the number of calls.
tottime
    for the total time spent in the given function (and excluding time made in calls to sub-functions)
percall
    is the quotient of tottime divided by ncalls
cumtime
    is the cumulative time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions.
percall
    is the quotient of cumtime divided by primitive calls
filename:lineno(function)
    provides the respective data of each function

         3787673 function calls (3503124 primitive calls) in 31.945 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    78574   27.871    0.000   27.871    0.000 code_generator.py:60(add_code_line)
   254781    0.687    0.000    1.285    0.000 {method 'sub' of 're.Pattern' objects}
        1    0.572    0.572    0.749    0.749 basic.py:2543(dump_model)
   197378    0.226    0.000    0.293    0.000 string.py:121(convert)
39268/1000    0.216    0.000    0.251    0.000 boosting.py:219(_assemble_tree)
   157110    0.193    0.000    0.775    0.000 string.py:107(substitute)
  97689/1    0.176    0.000   30.878   30.878 interpreter.py:24(_do_interpret)
        2    0.171    0.086    0.171    0.086 decoder.py:343(raw_decode)
   157110    0.144    0.000    0.919    0.000 code_generator.py:13(__call__)
    98671    0.135    0.000    0.175    0.000 sre_parse.py:1036(expand_template)
97689/5526    0.126    0.000   27.017    0.005 mixins.py:25(_pre_interpret_hook)
38268/2000    0.093    0.000   29.779    0.015 interpreter.py:109(handle_nested_expr)
    97671    0.092    0.000    1.176    0.000 interpreter.py:56(_select_handler)
   493427    0.089    0.000    0.089    0.000 {method 'group' of 're.Match' objects}
19134/1000    0.088    0.000   30.816    0.031 interpreter.py:103(interpret_if_expr)
    97671    0.079    0.000    0.118    0.000 re.py:271(_compile)
    97671    0.078    0.000    0.078    0.000 re.py:307(_subx)
    97671    0.067    0.000    0.967    0.000 interpreter.py:68(_normalize_expr_name)
   233660    0.066    0.000    0.066    0.000 {built-in method builtins.isinstance}

----->    97671    0.061    0.000    1.028    0.000 interpreter.py:63(_handler_name)  <-----

    97671    0.052    0.000    0.888    0.000 re.py:185(sub)
    98671    0.051    0.000    0.226    0.000 re.py:313(filter)
    19134    0.048    0.000    1.068    0.000 interpreter.py:125(interpret_comp_expr)
    97671    0.043    0.000    0.043    0.000 {built-in method builtins.hasattr}
   1000/1    0.042    0.000    0.049    0.049 utils.py:59(_inner)
    19134    0.040    0.000    9.265    0.000 code_generator.py:100(add_else_statement)
    19134    0.037    0.000    9.227    0.000 code_generator.py:96(add_if_statement)
    39269    0.037    0.000    0.312    0.000 code_generator.py:119(num_value)
    20152    0.032    0.000    9.785    0.000 code_generator.py:109(add_var_assignment)
    39269    0.031    0.000    0.343    0.000 interpreter.py:138(interpret_num_val)
    19134    0.023    0.000    0.090    0.000 code_generator.py:105(add_block_termination)
    20134    0.021    0.000    0.184    0.000 code_generator.py:116(infix_expression)
    19134    0.021    0.000    0.171    0.000 code_generator.py:122(array_index_access)
    38268    0.020    0.000    0.020    0.000 code_generator.py:53(decrease_indent)
    38269    0.019    0.000    0.019    0.000 code_generator.py:50(increase_indent)
    98674    0.018    0.000    0.018    0.000 {method 'join' of 'str' objects}
    19134    0.016    0.000    0.034    0.000 code_generator.py:134(_comp_op_overwrite)
    19134    0.015    0.000    0.186    0.000 interpreter.py:141(interpret_feature_ref)
157156/157152    0.015    0.000    0.015    0.000 {built-in method builtins.len}
    20134    0.014    0.000    0.019    0.000 types.py:164(__get__)
    97672    0.013    0.000    0.013    0.000 {built-in method builtins.getattr}
    97671    0.013    0.000    0.013    0.000 interpreter.py:21(_pre_interpret_hook)
    97671    0.012    0.000    0.012    0.000 {method 'lower' of 'str' objects}
        1    0.010    0.010   31.945   31.945 exporters.py:36(export_to_python)
    39269    0.009    0.000    0.009    0.000 ast.py:33(__init__)
    19134    0.009    0.000    0.009    0.000 ast.py:219(__init__)
    19134    0.009    0.000    0.009    0.000 ast.py:199(__init__)
        1    0.006    0.006   31.935   31.935 exporters.py:329(_export)
   1000/1    0.005    0.000   30.877   30.877 interpreter.py:132(interpret_bin_num_expr)
    20134    0.005    0.000    0.005    0.000 enum.py:628(value)
    19134    0.005    0.000    0.005    0.000 ast.py:15(__init__)
     1000    0.004    0.000    0.004    0.000 ast.py:104(__init__)
    19134    0.003    0.000    0.003    0.000 ast.py:190(from_str_op)
        1    0.003    0.003    0.003    0.003 {method 'decode' of 'bytes' objects}
     1018    0.003    0.000    0.008    0.000 code_generator.py:87(add_var_declaration)
        2    0.003    0.001    0.003    0.001 __init__.py:47(create_string_buffer)
     1000    0.002    0.000    0.006    0.000 utils.py:37(apply_bin_op)
     1018    0.001    0.000    0.001    0.000 code_generator.py:45(get_var_name)
        1    0.001    0.001    0.750    0.750 boosting.py:200(__init__)
     1018    0.000    0.000    0.000    0.000 code_generator.py:131(_get_var_declare_type)
        1    0.000    0.000    0.251    0.251 boosting.py:103(<listcomp>)
        1    0.000    0.000    0.000    0.000 boosting.py:202(<listcomp>)
     1003    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
        1    0.000    0.000    0.300    0.300 boosting.py:40(_assemble_single_output)
        1    0.000    0.000    0.049    0.049 utils.py:53(apply_op_to_expressions)
     18/1    0.000    0.000   27.005   27.005 mixins.py:38(bin_depth_threshold_hook)
      3/1    0.000    0.000    0.000    0.000 sre_parse.py:475(_parse)
        1    0.000    0.000   31.945   31.945 {built-in method builtins.exec}
        2    0.000    0.000    0.171    0.086 decoder.py:332(decode)
        1    0.000    0.000   30.879   30.879 interpreter.py:24(interpret)
      4/1    0.000    0.000    0.000    0.000 sre_compile.py:71(_compile)
        4    0.000    0.000    0.000    0.000 {method 'match' of 're.Pattern' objects}
       18    0.000    0.000    0.000    0.000 sre_parse.py:233(__next)
        1    0.000    0.000    0.000    0.000 contextlib.py:81(__init__)
        2    0.000    0.000    0.171    0.086 __init__.py:299(loads)
        1    0.000    0.000    0.300    0.300 boosting.py:28(assemble)
        1    0.000    0.000    0.000    0.000 encoder.py:204(iterencode)
        1    0.000    0.000    0.000    0.000 __init__.py:381(__getitem__)
        1    0.000    0.000    0.000    0.000 sre_compile.py:276(_optimize_charset)
        1    0.000    0.000    0.000    0.000 sre_compile.py:759(compile)
      4/2    0.000    0.000    0.000    0.000 sre_parse.py:174(getwidth)
        1    0.000    0.000    0.000    0.000 __init__.py:183(dumps)
        1    0.000    0.000    0.000    0.000 sre_parse.py:951(parse_template)
      3/1    0.000    0.000    0.000    0.000 sre_parse.py:417(_parse_sub)
        1    0.000    0.000    0.000    0.000 contextlib.py:116(__exit__)
        1    0.000    0.000    0.000    0.000 encoder.py:182(encode)
       15    0.000    0.000    0.000    0.000 sre_parse.py:164(__getitem__)
        1    0.000    0.000    0.000    0.000 boosting.py:14(__init__)
        1    0.000    0.000    0.000    0.000 boosting.py:94(__init__)
        1    0.000    0.000    0.000    0.000 sre_parse.py:919(parse)
        1    0.000    0.000    0.000    0.000 interpreter.py:18(__init__)
        1    0.000    0.000    0.000    0.000 sre_compile.py:536(_compile_info)
        1    0.000    0.000    0.000    0.000 __init__.py:121(_get_full_model_name)
        1    0.000    0.000    0.000    0.000 code_generator.py:21(add_function_def)
        3    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
        1    0.000    0.000    0.000    0.000 boosting.py:209(_final_transform)
        1    0.000    0.000   31.945   31.945 <string>:1(<module>)
       42    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        2    0.000    0.000    0.000    0.000 sre_parse.py:224(__init__)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.next}
        1    0.000    0.000    0.000    0.000 enum.py:836(__and__)
        1    0.000    0.000    0.000    0.000 __init__.py:374(__getattr__)
        1    0.000    0.000    0.000    0.000 contextlib.py:237(helper)
        2    0.000    0.000    0.000    0.000 code_generator.py:40(reset_state)
       12    0.000    0.000    0.000    0.000 sre_parse.py:254(get)
        4    0.000    0.000    0.000    0.000 sre_parse.py:172(append)
       10    0.000    0.000    0.000    0.000 sre_parse.py:249(match)
        1    0.000    0.000    0.000    0.000 contextlib.py:107(__enter__)
        1    0.000    0.000    0.001    0.001 code_generator.py:84(add_return_statement)
        2    0.000    0.000    0.000    0.000 basic.py:37(_safe_call)
        7    0.000    0.000    0.000    0.000 sre_parse.py:286(tell)
        1    0.000    0.000    0.000    0.000 encoder.py:104(__init__)
        1    0.000    0.000    0.000    0.000 interpreter.py:98(__init__)
        7    0.000    0.000    0.000    0.000 {built-in method builtins.min}
        2    0.000    0.000    0.000    0.000 enum.py:284(__call__)
        4    0.000    0.000    0.000    0.000 sre_parse.py:111(__init__)
        1    0.000    0.000    0.000    0.000 re.py:297(_compile_repl)
        1    0.000    0.000    0.000    0.000 sre_parse.py:960(addgroup)
        1    0.000    0.000    0.000    0.000 interpreter.py:75(__init__)
        2    0.000    0.000    0.000    0.000 enum.py:526(__new__)
        1    0.000    0.000    0.000    0.000 sre_parse.py:84(opengroup)
        1    0.000    0.000    0.000    0.000 sre_parse.py:408(_uniq)
        1    0.000    0.000    0.000    0.000 __init__.py:127(get_assembler_cls)
        1    0.000    0.000    0.251    0.251 boosting.py:99(_assemble_estimators)
        1    0.000    0.000    0.000    0.000 {built-in method _sre.compile}
        2    0.000    0.000    0.000    0.000 {built-in method _ctypes.addressof}
        1    0.000    0.000    0.000    0.000 sre_compile.py:598(_code)
        4    0.000    0.000    0.000    0.000 {method 'end' of 're.Match' objects}
        6    0.000    0.000    0.000    0.000 sre_parse.py:160(__len__)
        2    0.000    0.000    0.000    0.000 {built-in method _ctypes.byref}
        1    0.000    0.000    0.000    0.000 sre_parse.py:76(__init__)
        3    0.000    0.000    0.000    0.000 {method 'find' of 'bytearray' objects}
        1    0.000    0.000    0.000    0.000 sre_compile.py:461(_get_literal_prefix)
        1    0.000    0.000    0.000    0.000 code_generator.py:36(__init__)
        1    0.000    0.000    0.000    0.000 sre_compile.py:249(_compile_charset)
        1    0.000    0.000    0.000    0.000 sre_parse.py:903(fix_flags)
        2    0.000    0.000    0.000    0.000 code_generator.py:28(function_definition)
        2    0.000    0.000    0.000    0.000 sre_compile.py:595(isstring)
        1    0.000    0.000    0.000    0.000 sre_compile.py:423(_simple)
        1    0.000    0.000    0.000    0.000 sre_parse.py:96(closegroup)
        1    0.000    0.000    0.000    0.000 interpreter.py:53(_reset_reused_expr_cache)
        1    0.000    0.000    0.000    0.000 sre_compile.py:492(_get_charset_prefix)
        4    0.000    0.000    0.000    0.000 sre_parse.py:81(groups)
        1    0.000    0.000    0.000    0.000 base.py:3(__init__)
        1    0.000    0.000    0.000    0.000 sre_compile.py:65(_combine_flags)
        1    0.000    0.000    0.000    0.000 sre_parse.py:168(__setitem__)
        1    0.000    0.000    0.000    0.000 sklearn.py:711(booster_)
        1    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
        1    0.000    0.000    0.000    0.000 interpreter.py:12(__init__)
        2    0.000    0.000    0.000    0.000 sre_compile.py:453(_get_iscased)
        1    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
        1    0.000    0.000    0.000    0.000 boosting.py:85(_final_transform)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.ord}
        1    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.setattr}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

izeigerman · 2020-05-08T16:36:37Z

This is a pretty deep analysis, well done 👍

cache handler names

486fa1d

izeigerman approved these changes May 8, 2020

View reviewed changes

compute maxsize for cache at runtime

647f3ec

StrikerRUS mentioned this pull request May 9, 2020

use string buffer to store generated code #211

Merged

izeigerman merged commit f02feaf into master May 11, 2020

izeigerman deleted the cache branch May 11, 2020 14:45

izeigerman restored the cache branch May 11, 2020 14:45

izeigerman deleted the cache branch May 11, 2020 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cache handler names #207

cache handler names #207

StrikerRUS commented May 8, 2020

coveralls commented May 8, 2020 •

edited

izeigerman left a comment •

edited

izeigerman May 8, 2020

StrikerRUS May 8, 2020

izeigerman May 8, 2020

StrikerRUS May 8, 2020

StrikerRUS May 8, 2020

izeigerman May 8, 2020

StrikerRUS commented May 8, 2020

izeigerman commented May 8, 2020

cache handler names #207

cache handler names #207

Conversation

StrikerRUS commented May 8, 2020

coveralls commented May 8, 2020 • edited

izeigerman left a comment • edited

Choose a reason for hiding this comment

izeigerman May 8, 2020

Choose a reason for hiding this comment

StrikerRUS May 8, 2020

Choose a reason for hiding this comment

izeigerman May 8, 2020

Choose a reason for hiding this comment

StrikerRUS May 8, 2020

Choose a reason for hiding this comment

StrikerRUS May 8, 2020

Choose a reason for hiding this comment

izeigerman May 8, 2020

Choose a reason for hiding this comment

StrikerRUS commented May 8, 2020

izeigerman commented May 8, 2020

coveralls commented May 8, 2020 •

edited

izeigerman left a comment •

edited