Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds ReaderTokenStream #541

Merged
merged 5 commits into from
May 15, 2023
Merged

Adds ReaderTokenStream #541

merged 5 commits into from
May 15, 2023

Conversation

almann
Copy link
Contributor

@almann almann commented May 1, 2023

ReaderTokenStream adapts any, effectively user-level, IonReader into a TokenStream. Ideally, this would work over any abstraction of the IonReader (raw/system), but for now we only support StreamItem and Symbol based readers.

Prerequisite for #520.


Note that the this is dependent on #540 and currently based off of that--when/if that is merged and this is approved I will change to base to main.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

This is the basis for `TokenStream` and the implementations bridging to
and from `IonReader`.  The `TokenStream` is a minimal interface where
(potentially) lazy `Token` instances are returned instead of `read_XXX`
methods.  This allows us to compose streams of Ion values more easily
because we now have an iterator-like interface on which we can model
the computation of an Ion data stream.  Unlike iterator, `TokenStream`
has as its input an `Instruction` which allows a user to control
"skipping" to the end of a container, which is the analog of `step_in`
and `step_out` in the `IonReader` trait.

`Token` values have annotations and a field name (which could be
lazily evaluated).  They contain a `Content` which is either the
start/end of a container, a scalar value, typed null, or the end of a
stream.  These values rely on the `thunk` module to provide deferring
the computation of the value.

The `token` module also provides various specific subdomains of
`IonType` in `ContainerType` and `ScalarType` to avoid contexts
where inapplicable `IonType` are possible (e.g., in a start/end
container token).

Also updates `rstest` and adds `rstest_reuse` for reusing table inputs
for unit tests.

Prerequisite for amazon-ion#520.
`ReaderTokenStream` adapts any, effectively user-level, `IonReader`
into a `TokenStream`.  Ideally, this would work over any abstraction
of the `IonReader` (raw/system), but for now we only support
`StreamItem` and `Symbol` based readers.

Prerequisite for amazon-ion#520.
@codecov
Copy link

codecov bot commented May 1, 2023

Codecov Report

Patch coverage: 81.72% and project coverage change: +0.34 🎉

Comparison is base (ddc0acf) 82.58% compared to head (0408bbd) 82.92%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #541      +/-   ##
==========================================
+ Coverage   82.58%   82.92%   +0.34%     
==========================================
  Files          84       85       +1     
  Lines       16018    16297     +279     
  Branches    16018    16297     +279     
==========================================
+ Hits        13228    13514     +286     
+ Misses       1617     1576      -41     
- Partials     1173     1207      +34     
Impacted Files Coverage Δ
src/tokens/mod.rs 84.29% <ø> (+18.18%) ⬆️
src/tokens/reader_stream.rs 81.72% <81.72%> (ø)

... and 6 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@almann almann mentioned this pull request May 1, 2023
@almann almann added the enhancement New feature or request label May 1, 2023
Copy link
Contributor Author

@almann almann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ PR tour below.

Comment on lines +105 to +107
/// These generate the methods that produce lazy content for our stream against the underlying
/// reader. There is a lot of boilerplate and variants so this avoids copy/paste errors.
macro_rules! scalar_content_method {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ It makes me sad to do this, but it does eliminate a lot of boilerplate for generating the Content instances against the Rc<RefCell<...>>.

R: IonReader<Item = StreamItem, Symbol = Symbol> + 'a,
{
// XXX this is so we can have multiple closures to lazily evaluate tokens
reader_cell: Rc<RefCell<UnderlyingReader<R>>>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This is necessary because we need to maintain three closures (annotations/field names/content) and statically borrow checker only allows one universal borrow for closures. We use this shared pointer to give each closure what they need to defer the calls to the underlying reader.

Comment on lines +20 to +25
/// Internal tracking state of the returned token. This is necessary because the lifetime of the
/// token returned from the stream is bound to the stream itself.
///
/// This is essentially strongly typed [`Option`] with convenience to operate within the
/// context of a [`ReaderTokenStream`]
enum UnderlyingReader<R>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The lifetime of next_token is not bound to any static lifetime we can keep track of, this is because if we want to adapt IonReader, next/read_XXX are not associated in any lexical context that it may be valid for as one lifetime that we can track for our Token. This lets us make sure the Reader we "gave" to a Token can be invalidated when the Reader "moves" one while the caller may have held on to that Token.

Comment on lines +158 to +168
#[inline]
fn invalidate_token(&mut self) {
// only invalidate if we have something deferred
if self.pending_token.is_deferred() {
let underlying = {
let mut original = self.reader_cell.borrow_mut();
original.invalidate()
};
self.reader_cell = Rc::new(RefCell::new(underlying));
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This lets us "garbage collect" outstanding tokens, such tokens will just error when trying to evaluate any deferred value within them.

where
R: IonReader<Item = StreamItem, Symbol = Symbol> + 'a,
{
fn next_token(&mut self, instruction: Instruction) -> IonResult<Token<'a>> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Originally this was tied to the borrow of &mut self. The problem with this is that you must materialize the token if you want to implement IonReader over IonStream (see #542). This is a problematic design since having an IonReader that has to materialize the token on every call to next is not efficient.

@almann almann marked this pull request as ready for review May 1, 2023 05:35
@almann almann requested a review from zslayton May 3, 2023 04:51
Comment on lines 86 to 87
/// We are positioned on some value.
NonContainer,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this "non-container" (vs "scalar") because it includes null containers, which are arguably scalar values even though their Ion type is a container? If so, that'd be helpful to have in the comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to the doc comment.

src/tokens/reader_stream.rs Outdated Show resolved Hide resolved
almann and others added 3 commits May 15, 2023 09:08
@almann almann changed the base branch from tokens-reader-stream-pr-base to main May 15, 2023 16:22
@almann almann merged commit f8edf82 into amazon-ion:main May 15, 2023
18 checks passed
@almann almann deleted the tokens-reader-stream branch May 15, 2023 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants