Skip to content

Commit

Permalink
Binary Reader Performance Refactor (#267)
Browse files Browse the repository at this point in the history
* Binary Reader Performance Refactor

This commit refactors the pure python (non-extension) binary reader to
improve performance. My testing shows it to be roughly 2.5x faster for
the streaming event API. The simpleion.loads has additional processing
overhead that decreases the gains there.

It ditches the coroutine dispatching overhead within the reader and
replaces the stateful buffer with an immutable buffer that uses
memoryviews to decrease memory allocations. The immutability of the
buffer is less important for the binary reader but will be important
for the text reader where we often need to look ahead.

It contains a change to the IonThunkEvent that should have a positive
impact on the text reader as well. More impactful will be to apply
this pattern to the text reader.

I left the managed_reader unchanged though de-coroutining that and
minimizing managed event construction could yield more gains for both
text and binary.
  • Loading branch information
rmarrowstone committed May 25, 2023
1 parent 8cc4e16 commit 2694fb3
Show file tree
Hide file tree
Showing 6 changed files with 646 additions and 412 deletions.
28 changes: 13 additions & 15 deletions amazon/ion/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@
# OF ANY KIND, either express or implied. See the License for the
# specific language governing permissions and limitations under the
# License.

from enum import IntEnum
from typing import NamedTuple, Optional, Any, Union, Sequence, Coroutine

Expand Down Expand Up @@ -272,21 +271,20 @@ def __repr__(self):


class IonThunkEvent(IonEvent):
"""An :class:`IonEvent` whose ``value`` field is a thunk."""
def __new__(cls, *args, **kwargs):
if len(args) >= 3:
args = list(args)
args[2] = MemoizingThunk(args[2])
else:
value = kwargs.get('value')
if value is not None:
kwargs['value'] = MemoizingThunk(kwargs['value'])
return super(IonThunkEvent, cls).__new__(cls, *args, **kwargs)
"""
A lazy `IonEvent` whose ``value`` field is a thunk.
The `value` will be materialized on first access and cached.
Accessing the value by its slot: ``event[2]`` will avoid materialization.
"""

@property
def value(self):
# We're masking the value field, this gets around that.
return self[2]()
if hasattr(self, 'cached_value'):
return self.cached_value
self.cached_value = self[2]()
return self.cached_value

# Singletons for structural events
ION_STREAM_END_EVENT = IonEvent(IonEventType.STREAM_END)
Expand Down Expand Up @@ -314,11 +312,11 @@ class Transition(NamedTuple):
This is generally used as a result of a state-machine.
Args:
event (Optional[DataEvent]): The event associated with the transition.
event (Union[DataEvent, IonEvent, None]): The event associated with the transition.
delegate (Coroutine): The co-routine delegate which can be the same routine from
whence this transition came.
"""
event: Optional[DataEvent]
event: Union[DataEvent, IonEvent, None]
delegate: Coroutine


Expand Down

0 comments on commit 2694fb3

Please sign in to comment.