Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enables GATs iteration in the Element API #442

Merged
merged 11 commits into from
Oct 13, 2022
Merged

Enables GATs iteration in the Element API #442

merged 11 commits into from
Oct 13, 2022

Conversation

zslayton
Copy link
Contributor

@zslayton zslayton commented Oct 12, 2022

This PR replaces all uses of Box<dyn Iterator<Item=_> + 'a> in the Element and ElementRef APIs with generic associated types. This means that heap allocations are no longer required to iterate over annotations, sequence elements, or struct fields.

Because this change uses generic associated types, it also configures the CI tests to use the beta channel of cargo. GATs will be available in the next release of Rust (v1.65, out in November); once that happens, we can and should switch back to stable. (#443)

This change also removes the recently added dependency on the linkhash crate. That dependency was originally pulled in so structs could remember the order in which their fields were inserted for later iteration. However, it only worked if the struct did not contain duplicate field names. This PR changes the field storage for Struct and StructRef to store their fields in a Vec and also maintain a HashMap<Symbol, IndexVec> that remembers all of the indexes at which values for a given field can be found.

Further, it moves Sequence elements and Struct fields into an Rc to make clone() operations cheaper. A future PR will modify iterators over collection types to clone the Rc and thus remove one of their lifetime constraints, making it much easier to write a recursive iterator over an element tree without constant cloning.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov
Copy link

codecov bot commented Oct 12, 2022

Codecov Report

Merging #442 (d427b7f) into main (1ab131a) will decrease coverage by 0.09%.
The diff coverage is 67.69%.

@@            Coverage Diff             @@
##             main     #442      +/-   ##
==========================================
- Coverage   83.64%   83.54%   -0.10%     
==========================================
  Files          84       85       +1     
  Lines       16376    16400      +24     
==========================================
+ Hits        13697    13701       +4     
- Misses       2679     2699      +20     
Impacted Files Coverage Δ
src/value/mod.rs 88.68% <ø> (ø)
src/binary/non_blocking/raw_binary_reader.rs 86.52% <50.00%> (ø)
src/value/element_stream_reader.rs 86.66% <50.00%> (ø)
src/value/borrowed.rs 65.96% <60.97%> (-1.67%) ⬇️
src/value/owned.rs 75.04% <64.07%> (-1.17%) ⬇️
src/binary/raw_binary_reader.rs 92.80% <66.66%> (ø)
src/value/iterators.rs 80.00% <80.00%> (ø)
src/binary/int.rs 95.50% <100.00%> (ø)
src/binary/var_uint.rs 96.84% <100.00%> (ø)
src/symbol_ref.rs 100.00% <100.00%> (ø)
... and 16 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@zslayton
Copy link
Contributor Author

I can't make any sense of the code coverage results; I don't think I trust codecov anymore. The ion-tests integration exercises the DOM thoroughly, however.

src/value/mod.rs Outdated Show resolved Hide resolved
src/value/mod.rs Outdated Show resolved Hide resolved
/// If a field name appears more than once, this method makes the arbitrary decision to return
/// the value associated with the last appearance. If your application uses structs that repeat
/// field names, you are encouraged to use [get_all] instead.
fn get_last<A: AsSymbolRef>(&self, field_name: A) -> Option<&Element> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why this is get_last instead of get_first (just get_any). My only concern is that get_last forces the implementation to read and map the entire struct. That may be unproblematic now, but I don't see a good reason to choose last over first or any.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my assumption is that we would want to build the map (and the indexing to it) lazily. That doesn't seem to be happening now (and I don't think it needs to now). I confess that reviewing rust code in github is still relatively new to me, so please let me know what I'm missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is that get_last forces the implementation to read and map the entire struct. ... I guess my assumption is that we would want to build the map (and the indexing to it) lazily.

This is a good instinct, and a LazyElement is something we're planning to implement in the future. As you mention, this implementation (Element/Struct) is fully materialized in memory, so the choice of first or last is truly arbitrary.

This particular method (get_last) lives on a private type (Fields) that isn't exposed to the end user. It's invoked by a public method called get that's defined on the IonStruct trait. If we were going to change the contract, that's the method we'd want to change.

At present, get promises to return the last field it found in the serialized struct. From an Ion data model perspective, this isn't really correct; struct fields don't have an order. The most correct thing to do would be to only offer a get_all() method that returned an iterator whose iteration order was not guaranteed. However, nearly all struct use cases don't use repeated fields and users expect to be able to access the one-and-only value for a given field trivially, so get is available as a convenience. If it turns out someone wants the first appearance of a given field, they can call self.get_all(name).first() and (now that GATs are enabled) that will be basically the same cost as get.

Comment on lines +287 to +288
let symbol = SymbolRef::with_unknown_text();
// ...and use the unknown text symbol to look up matching field values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to reason about what the behavior of this would be. What fields would match the unknown text symbol?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fields whose symbol ID intentionally maps to unknown text. This is the case for $0 as well as any symbol whose entry in the symbol table was non-text (null, 5, false, etc).

.github/workflows/rust.yml Outdated Show resolved Hide resolved
examples/read_all_values.rs Show resolved Hide resolved
src/value/borrowed.rs Outdated Show resolved Hide resolved
src/value/borrowed.rs Outdated Show resolved Hide resolved
/// If a field name appears more than once, this method makes the arbitrary decision to return
/// the value associated with the last appearance. If your application uses structs that repeat
/// field names, you are encouraged to use [get_all] instead.
fn get_last<'iter, A: AsSymbolRef>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that you've made this explicitly get_last instead of get with the documented disclaimer that it gets the last one. Nice touch.

Comment on lines 360 to +361
fn get<T: AsSymbolRef>(&self, field_name: T) -> Option<&Self::Element> {
match field_name.as_symbol_ref().text() {
None => {
// Build a cheap, stack-allocated `SymbolRef` that represents unknown text
let symbol = SymbolRef::with_unknown_text();
// Use the unknown text symbol to look up matching field values
self.fields.get(&symbol)?.last()
}
Some(text) => {
// Otherwise, look it up by text
self.fields.get(text)?.last()
}
}
self.fields.get_last(field_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

src/value/iterators.rs Outdated Show resolved Hide resolved
src/value/owned.rs Outdated Show resolved Hide resolved
@rmarrowstone rmarrowstone self-requested a review October 13, 2022 20:26
@zslayton zslayton merged commit 9a6196e into main Oct 13, 2022
@zslayton zslayton deleted the gats_iterators branch October 13, 2022 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants