use string buffer to store generated code #211

StrikerRUS · 2020-05-09T01:29:59Z

While changes in #207 are arguably giving significant improvement, the speedup achieved by these updates can be seen on transpiling medium-size models without any profiling tools.

During skimming the report from #207 (comment), I noticed that add_code_line() takes a huge amount of time (also refer to #209 for another strings-related improvement).

Here is a simple bench.

import sys

from sklearn.datasets import load_boston

import lightgbm as lgb
import m2cgen as m2c

X, y = load_boston(True)
est = lgb.LGBMRegressor(n_estimators=1000, random_state=42).fit(X, y)

sys.setrecursionlimit(int(1e5))

%%timeit -n7 -r5
_ = m2c.export_to_python(est)

And the results are the following:

30.6 s ± 16.9 ms per loop (mean ± std. dev. of 5 runs, 7 loops each)

2.41 s ± 4.84 ms per loop (mean ± std. dev. of 5 runs, 7 loops each)

StrikerRUS · 2020-05-09T01:30:28Z

m2cgen/interpreters/code_generator.py

 from string import Template
+from weakref import finalize


https://docs.python.org/3/library/weakref.html#comparing-finalizers-with-del-methods

StrikerRUS · 2020-05-09T01:31:01Z

m2cgen/interpreters/code_generator.py

+        self._finalize_buffer()
+        self._code_buf = StringIO()
+        self._code = None
+        self._finalizer = finalize(self, self._finalize_buffer)


Great post! https://forum.omz-software.com/topic/4473/python-question-class-cleanup-file-handles-etc/7

Nice, yet another Python TIL for me!

coveralls · 2020-05-09T04:26:42Z

Coverage decreased (-0.05%) to 95.479% when pulling fe069e7 on stringio into 3db1cc2 on master.

izeigerman · 2020-05-12T16:46:15Z

Oh, wow. This is a dramatic improvement indeed! Fantastic job 👍

izeigerman

Left a few points. Great improvement 🚀

izeigerman · 2020-05-12T16:49:28Z

m2cgen/interpreters/code_generator.py

+        if not self._code_buf.closed:
+            self._code = self._code_buf.getvalue()
+            self._finalize_buffer()
+        return self._code if self._code is not None else ""


[Minor] return self._code if self._code else ""

I always prefer to check for None explicitly according to the recommendation from PEP.

m2cgen/m2cgen/interpreters/code_generator.py

Line 41 in 2c2b174

self._code = None

Also, beware of writing if x when you really mean if x is not None -- e.g. when testing whether a variable or argument that defaults to None was set to some other value. The other value might have a type (such as a container) that could be false in a boolean context!
https://www.python.org/dev/peps/pep-0008/#programming-recommendations

If you only used if key: here, then an argument which evaluated to false would not be considered. Explicitly comparing with is None is the correct idiom to make this check. See Truth Value Testing.
https://stackoverflow.com/a/17117361

https://stackoverflow.com/a/7816439

izeigerman · 2020-05-12T16:49:44Z

m2cgen/interpreters/code_generator.py

+        self._finalizer = finalize(self, self._finalize_buffer)
+
+    def _finalize_buffer(self):
+        if self._code_buf is not None and not self._code_buf.closed:


[Minor] if self._code_buf and not self._code_buf.closed:

izeigerman · 2020-05-12T16:50:25Z

m2cgen/interpreters/code_generator.py

+                "Call reset_state() to allocate new buffer.")
+
+    def get_generated_code(self):
+        if not self._code_buf.closed:


Shall we do the if self._code here as well (similarly to _finalize_buffer check)?

Nope. We call reset_state() in __init__() where _code_buf is set to StringIO(). So we have a guarantee here that _code_buf has been already created (not None from __init__()).

izeigerman · 2020-05-12T16:52:13Z

m2cgen/interpreters/code_generator.py

@@ -46,24 +69,34 @@ def decrease_indent(self):
    # All code modifications should be implemented via following methods.

    def add_code_line(self, line):
+        self._check_buf_closed()


The invocation of this method all over the place looks like boilerplate. Shall we create a method called _append_to_code_buf and do both checking and writing there?

Sounds good! I'll do it right now.

izeigerman · 2020-05-12T16:53:47Z

m2cgen/interpreters/code_generator.py


    def prepend_code_line(self, line):
-        self.code = line + "\n" + self.code
+        self._check_buf_closed()


The content of this method now looks identical to the prepend_code_lines. Can this method be expressed in terms of the other one? Eg. self.prepend_code_lines([line])?

Nice catch!

izeigerman · 2020-05-12T16:59:35Z

m2cgen/interpreters/code_generator.py

+                "closing the underlying buffer!\n"
+                "Call reset_state() to allocate new buffer.")
+
+    def get_generated_code(self):


I'm a bit concerned about the method name which claims that it queries the state but which in fact modifies it (irreversibly). I suggest we be more explicit and call it something like finalize_generator or get_code_and_close or finalize_and_return_code, whichever you like.

That being said we don't necessarily have to support repeated invocations for this method and throw an exception on subsequent method calls. Please correct me if I'm wrong here.

Absolutely agree for changing the name!

That being said we don't necessarily have to support repeated invocations for this method and throw an exception on subsequent method calls. Please correct me if I'm wrong here.

That's true but only for now. Maybe in the future we will need something like

if cg.get_generated_code() != "": return cg.get_generated_code() else: ...

Of course, we can save the result from the first call into a variable, but I don't see any reasons to eliminate the opportunity to return the same value from this method in its subsequent calls. I believe it is in the consistency of functional programming paradigm 🙂 .

izeigerman · 2020-05-12T17:01:08Z

m2cgen/interpreters/code_generator.py

+        self._finalize_buffer()
+        self._code_buf = StringIO()
+        self._code = None
+        self._finalizer = finalize(self, self._finalize_buffer)


Nice, yet another Python TIL for me!

StrikerRUS · 2020-05-12T22:54:07Z

Thanks a lot for the thoughtful review! I addressed review comments in the two latest commits and left some replies. Please give it another round of review.

izeigerman

Looks great, thanks for addressing comments 👍

izeigerman · 2020-05-13T16:11:42Z

@StrikerRUS does this PR require any additional updates considering that the Ruby PR has been merged?

StrikerRUS · 2020-05-13T20:51:51Z

does this PR require any additional updates considering that the Ruby PR has been merged?

Yeah, definitely! Just pushed the needed changes.

use string buffer to store generated code

2c2b174

StrikerRUS commented May 9, 2020

View reviewed changes

izeigerman requested changes May 12, 2020

View reviewed changes

StrikerRUS added 3 commits May 13, 2020 00:11

Merge branch 'master' into stringio

0fb2771

better name for get_code method

24d7c9f

refactor methods for writing lines according to review comments

955321b

izeigerman approved these changes May 13, 2020

View reviewed changes

StrikerRUS added 2 commits May 13, 2020 23:49

Merge branch 'master' into stringio

c90e3ac

use new API of CodeGenerator

ffd5f0a

Merge branch 'master' into stringio

fe069e7

izeigerman merged commit 182a53e into master May 19, 2020

izeigerman deleted the stringio branch May 19, 2020 17:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use string buffer to store generated code #211

use string buffer to store generated code #211

StrikerRUS commented May 9, 2020

StrikerRUS May 9, 2020

StrikerRUS May 9, 2020

izeigerman May 12, 2020

coveralls commented May 9, 2020 •

edited

izeigerman commented May 12, 2020

izeigerman left a comment

izeigerman May 12, 2020

StrikerRUS May 12, 2020

izeigerman May 12, 2020

izeigerman May 12, 2020

StrikerRUS May 12, 2020 •

edited

izeigerman May 12, 2020

StrikerRUS May 12, 2020 •

edited

izeigerman May 12, 2020

StrikerRUS May 12, 2020

izeigerman May 12, 2020

izeigerman May 12, 2020 •

edited

StrikerRUS May 12, 2020

izeigerman May 12, 2020

StrikerRUS commented May 12, 2020

izeigerman left a comment

izeigerman commented May 13, 2020

StrikerRUS commented May 13, 2020

use string buffer to store generated code #211

use string buffer to store generated code #211

Conversation

StrikerRUS commented May 9, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented May 9, 2020 • edited

izeigerman commented May 12, 2020

izeigerman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StrikerRUS May 12, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StrikerRUS May 12, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

izeigerman May 12, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StrikerRUS commented May 12, 2020

izeigerman left a comment

Choose a reason for hiding this comment

izeigerman commented May 13, 2020

StrikerRUS commented May 13, 2020

coveralls commented May 9, 2020 •

edited

StrikerRUS May 12, 2020 •

edited

StrikerRUS May 12, 2020 •

edited

izeigerman May 12, 2020 •

edited