New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use string buffer to store generated code #211
Conversation
from string import Template | ||
from weakref import finalize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._finalize_buffer() | ||
self._code_buf = StringIO() | ||
self._code = None | ||
self._finalizer = finalize(self, self._finalize_buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, yet another Python TIL for me!
Oh, wow. This is a dramatic improvement indeed! Fantastic job 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few points. Great improvement 🚀
if not self._code_buf.closed: | ||
self._code = self._code_buf.getvalue() | ||
self._finalize_buffer() | ||
return self._code if self._code is not None else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Minor] return self._code if self._code else ""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always prefer to check for None
explicitly according to the recommendation from PEP.
self._code = None |
Also, beware of writing if x when you really mean if x is not None -- e.g. when testing whether a variable or argument that defaults to None was set to some other value. The other value might have a type (such as a container) that could be false in a boolean context!
https://www.python.org/dev/peps/pep-0008/#programming-recommendations
If you only used if key: here, then an argument which evaluated to false would not be considered. Explicitly comparing with is None is the correct idiom to make this check. See Truth Value Testing.
https://stackoverflow.com/a/17117361
self._finalizer = finalize(self, self._finalize_buffer) | ||
|
||
def _finalize_buffer(self): | ||
if self._code_buf is not None and not self._code_buf.closed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Minor] if self._code_buf and not self._code_buf.closed:
"Call reset_state() to allocate new buffer.") | ||
|
||
def get_generated_code(self): | ||
if not self._code_buf.closed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we do the if self._code
here as well (similarly to _finalize_buffer
check)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. We call reset_state()
in __init__()
where _code_buf
is set to StringIO()
. So we have a guarantee here that _code_buf
has been already created (not None
from __init__()
).
@@ -46,24 +69,34 @@ def decrease_indent(self): | |||
# All code modifications should be implemented via following methods. | |||
|
|||
def add_code_line(self, line): | |||
self._check_buf_closed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The invocation of this method all over the place looks like boilerplate. Shall we create a method called _append_to_code_buf
and do both checking and writing there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I'll do it right now.
|
||
def prepend_code_line(self, line): | ||
self.code = line + "\n" + self.code | ||
self._check_buf_closed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The content of this method now looks identical to the prepend_code_lines
. Can this method be expressed in terms of the other one? Eg. self.prepend_code_lines([line])
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
"closing the underlying buffer!\n" | ||
"Call reset_state() to allocate new buffer.") | ||
|
||
def get_generated_code(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about the method name which claims that it queries the state but which in fact modifies it (irreversibly). I suggest we be more explicit and call it something like finalize_generator
or get_code_and_close
or finalize_and_return_code
, whichever you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That being said we don't necessarily have to support repeated invocations for this method and throw an exception on subsequent method calls. Please correct me if I'm wrong here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely agree for changing the name!
That being said we don't necessarily have to support repeated invocations for this method and throw an exception on subsequent method calls. Please correct me if I'm wrong here.
That's true but only for now. Maybe in the future we will need something like
if cg.get_generated_code() != "":
return cg.get_generated_code()
else:
...
Of course, we can save the result from the first call into a variable, but I don't see any reasons to eliminate the opportunity to return the same value from this method in its subsequent calls. I believe it is in the consistency of functional programming paradigm 🙂 .
self._finalize_buffer() | ||
self._code_buf = StringIO() | ||
self._code = None | ||
self._finalizer = finalize(self, self._finalize_buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, yet another Python TIL for me!
Thanks a lot for the thoughtful review! I addressed review comments in the two latest commits and left some replies. Please give it another round of review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for addressing comments 👍
@StrikerRUS does this PR require any additional updates considering that the Ruby PR has been merged? |
Yeah, definitely! Just pushed the needed changes. |
While changes in #207 are arguably giving significant improvement, the speedup achieved by these updates can be seen on transpiling medium-size models without any profiling tools.
During skimming the report from #207 (comment), I noticed that
add_code_line()
takes a huge amount of time (also refer to #209 for another strings-related improvement).Here is a simple bench.
And the results are the following: