-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds ReaderTokenStream #541
Conversation
This is the basis for `TokenStream` and the implementations bridging to and from `IonReader`. The `TokenStream` is a minimal interface where (potentially) lazy `Token` instances are returned instead of `read_XXX` methods. This allows us to compose streams of Ion values more easily because we now have an iterator-like interface on which we can model the computation of an Ion data stream. Unlike iterator, `TokenStream` has as its input an `Instruction` which allows a user to control "skipping" to the end of a container, which is the analog of `step_in` and `step_out` in the `IonReader` trait. `Token` values have annotations and a field name (which could be lazily evaluated). They contain a `Content` which is either the start/end of a container, a scalar value, typed null, or the end of a stream. These values rely on the `thunk` module to provide deferring the computation of the value. The `token` module also provides various specific subdomains of `IonType` in `ContainerType` and `ScalarType` to avoid contexts where inapplicable `IonType` are possible (e.g., in a start/end container token). Also updates `rstest` and adds `rstest_reuse` for reusing table inputs for unit tests. Prerequisite for amazon-ion#520.
`ReaderTokenStream` adapts any, effectively user-level, `IonReader` into a `TokenStream`. Ideally, this would work over any abstraction of the `IonReader` (raw/system), but for now we only support `StreamItem` and `Symbol` based readers. Prerequisite for amazon-ion#520.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #541 +/- ##
==========================================
+ Coverage 82.58% 82.92% +0.34%
==========================================
Files 84 85 +1
Lines 16018 16297 +279
Branches 16018 16297 +279
==========================================
+ Hits 13228 13514 +286
+ Misses 1617 1576 -41
- Partials 1173 1207 +34
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ PR tour below.
/// These generate the methods that produce lazy content for our stream against the underlying | ||
/// reader. There is a lot of boilerplate and variants so this avoids copy/paste errors. | ||
macro_rules! scalar_content_method { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ It makes me sad to do this, but it does eliminate a lot of boilerplate for generating the Content
instances against the Rc<RefCell<...>>
.
R: IonReader<Item = StreamItem, Symbol = Symbol> + 'a, | ||
{ | ||
// XXX this is so we can have multiple closures to lazily evaluate tokens | ||
reader_cell: Rc<RefCell<UnderlyingReader<R>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This is necessary because we need to maintain three closures (annotations/field names/content) and statically borrow checker only allows one universal borrow for closures. We use this shared pointer to give each closure what they need to defer the calls to the underlying reader.
/// Internal tracking state of the returned token. This is necessary because the lifetime of the | ||
/// token returned from the stream is bound to the stream itself. | ||
/// | ||
/// This is essentially strongly typed [`Option`] with convenience to operate within the | ||
/// context of a [`ReaderTokenStream`] | ||
enum UnderlyingReader<R> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ The lifetime of next_token
is not bound to any static lifetime we can keep track of, this is because if we want to adapt IonReader
, next
/read_XXX
are not associated in any lexical context that it may be valid for as one lifetime that we can track for our Token
. This lets us make sure the Reader
we "gave" to a Token
can be invalidated when the Reader
"moves" one while the caller may have held on to that Token
.
#[inline] | ||
fn invalidate_token(&mut self) { | ||
// only invalidate if we have something deferred | ||
if self.pending_token.is_deferred() { | ||
let underlying = { | ||
let mut original = self.reader_cell.borrow_mut(); | ||
original.invalidate() | ||
}; | ||
self.reader_cell = Rc::new(RefCell::new(underlying)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This lets us "garbage collect" outstanding tokens, such tokens will just error when trying to evaluate any deferred value within them.
where | ||
R: IonReader<Item = StreamItem, Symbol = Symbol> + 'a, | ||
{ | ||
fn next_token(&mut self, instruction: Instruction) -> IonResult<Token<'a>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Originally this was tied to the borrow of &mut self
. The problem with this is that you must materialize the token if you want to implement IonReader
over IonStream
(see #542). This is a problematic design since having an IonReader
that has to materialize the token on every call to next
is not efficient.
src/tokens/reader_stream.rs
Outdated
/// We are positioned on some value. | ||
NonContainer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this "non-container" (vs "scalar") because it includes null containers, which are arguably scalar values even though their Ion type is a container? If so, that'd be helpful to have in the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to the doc comment.
Co-authored-by: Zack Slayton <zack.slayton@gmail.com>
ReaderTokenStream
adapts any, effectively user-level,IonReader
into aTokenStream
. Ideally, this would work over any abstraction of theIonReader
(raw/system), but for now we only supportStreamItem
andSymbol
based readers.Prerequisite for #520.
Note that the this is dependent on #540 and currently based off of that--when/if that is merged and this is approved I will change to base to main.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.