Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace closure-based Reader methods like string_ref_map #335

Closed
zslayton opened this issue Oct 12, 2021 · 3 comments
Closed

Replace closure-based Reader methods like string_ref_map #335

zslayton opened this issue Oct 12, 2021 · 3 comments
Labels
code-quality Issues of style, consistency, or factoring enhancement New feature or request good first issue Good for newcomers

Comments

@zslayton
Copy link
Contributor

Several methods in the Cursor/SystemReader trait were implemented before Non-Lexical Lifetimes (NLL) were stabilized in Rust 1.31. Back then, borrowing a reference to something could really complicate downstream code, requiring the programmer to add extra scopes to clarify when a reference would stop being used.

The methods below allowed the user to specify a closure that take a reference as a parameter; the user could read/copy/analyze the data and return it without scoping issues.

https://github.com/amzn/ion-rust/blob/739d74c57347746fdde3ab5a8844ed824c2b5662/src/cursor.rs#L75-L91

https://github.com/amzn/ion-rust/blob/739d74c57347746fdde3ab5a8844ed824c2b5662/src/cursor.rs#L99-L104

https://github.com/amzn/ion-rust/blob/739d74c57347746fdde3ab5a8844ed824c2b5662/src/cursor.rs#L109-L114

Now that we have NLL, this approach is no longer necessary. It'd be simpler to just give out a reference to the data.

@zslayton zslayton added enhancement New feature or request good first issue Good for newcomers code-quality Issues of style, consistency, or factoring labels Oct 12, 2021
@zslayton
Copy link
Contributor Author

zslayton commented Apr 21, 2022

In my work on #345 I found that NLL was not the only reason we had created the TYPE_ref_map methods.

The raw binary reader wraps a provided BufRead implementation. When it goes to read the next Ion value from the stream, it needs all of that value's bytes to be in a contiguous buffer. It uses two different approaches to achieve this:

  1. First, it checks to see if all of that value's bytes are already in the ReadBuf's input buffer. If so, we're ready to parse those bytes.
  2. On the other hand, if only some of the bytes are in the input buffer (which is relatively rare), it must load all of the bytes into a separate Vec<u8> that it owns. This can involve several calls to read() to get more bytes.

This second step is necessary because the Ion reader doesn't control the input buffer. If it's not big enough to hold the next input value, the reader can't reallocate the buffer to fix that. Instead, it manages its own backup.

Once the bytes are in-memory, the reader can apply the user's closure to the &str or &[u8]. Importantly, when the closure returns, the reader is then able to ask ReadBuf to discard the bytes it processed from the input buffer if applicable. This allowed the reader to only consume the bytes if they were successfully processed by the closure.

If we rewrite the IonDataSource trait to be a struct that controls its own buffer, we'll be able to replace methods like string_ref_map() with methods like read_str(&mut self) -> &str. This is also a prerequisite for addressing #351.

@zslayton
Copy link
Contributor Author

An addendum: because these methods (now named map_TYPENAME instead of TYPENAME_ref_map) use generics, the StreamReader trait is not object safe.

As a temporary workaround, these methods have added the where Self: Sized constraint. This allows StreamReader implementations to be used via dyn StreamReader.

However, it also means there's no good way to implement the constrained methods in our impl StreamReader for Box<dyn StreamReader>. For the moment, invoking them will lead to a panic.

The methods we'll implement in the near future will fix this problem because they do not require generics, only lifetimes.

// Replace this:
fn map_string<F, U>(&mut self, f: F) -> IonResult<U>
where
    Self: Sized,
    F: FnOnce(&str) -> U,
{
  // ...
}

// With this:
fn read_str(&mut self) -> IonResult<&str> {
  // ...
}

zslayton pushed a commit that referenced this issue Apr 10, 2023
Replaces `map_blob`, `map_clob`, and `map_string` with `read_` equivalents.

Fixes issue #335.
@zslayton
Copy link
Contributor Author

Fixed by #509.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code-quality Issues of style, consistency, or factoring enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant