Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates symbol logic in the Element APIs #426

Merged
merged 4 commits into from
Sep 22, 2022
Merged

Updates symbol logic in the Element APIs #426

merged 4 commits into from
Sep 22, 2022

Conversation

zslayton
Copy link
Contributor

This patch:

  • Adds an Unknown variant to the inner representation of
    Symbol, allowing it to represent $0.
  • Removes the IonImportSource trait and its implementations.
    This type enabled readers to compare symbols in the event that
    the corresponding shared symbol table could not be found, a niche
    use case. Removing IonImportSource eliminates a surprising number
    of allocations.
  • Updates the IonSymbolToken trait. Now that import source is not
    being tracked/considered, implementations must update their
    behavior for equivalence testing.
  • Replaces usages of the owned::SymbolToken type with the Symbol
    type that is used in all of the IonReader APIs. The Symbol type
    takes advantage of String interning via the symbol table, which
    was still a TODO for owned::SymbolToken. This eliminates some
    conversion logic and allocations that were happening for each
    annotation, struct field name, and symbol value.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Zack Slayton added 2 commits September 22, 2022 07:57
This patch:
* Removes the `IonImportSource` trait and its implementations.
  This type enabled readers to compare symbols in the event that
  the corresponding shared symbol table could not be found, a niche
  use case. Removing `IonImportSource` eliminates a surprising number
  of allocations.
* Updates the `IonSymbolToken` trait. Now that import source is not
  being tracked/considered, implementations must update their
  behavior for equivalence testing.
* Replaces usages of the `owned::SymbolToken` type with the `Symbol`
  type that is used in all of the `IonReader` APIs. The `Symbol` type
  takes advantage of `String` interning via the symbol table, which
  was still a TODO for `owned::SymbolToken`. This eliminates some
  conversion logic and allocations that were happening for each
  annotation, struct field name, and symbol value.
@codecov
Copy link

codecov bot commented Sep 22, 2022

Codecov Report

Merging #426 (c8ad42c) into main (16382f4) will decrease coverage by 0.97%.
The diff coverage is 74.03%.

@@            Coverage Diff             @@
##             main     #426      +/-   ##
==========================================
- Coverage   84.81%   83.84%   -0.98%     
==========================================
  Files          83       83              
  Lines       15466    15532      +66     
==========================================
- Hits        13118    13023      -95     
- Misses       2348     2509     +161     
Impacted Files Coverage Δ
src/raw_symbol_token_ref.rs 48.48% <0.00%> (-8.66%) ⬇️
src/value/native_writer.rs 82.45% <0.00%> (ø)
src/symbol_table.rs 70.14% <33.33%> (-8.54%) ⬇️
src/binary/binary_writer.rs 81.95% <60.00%> (-0.63%) ⬇️
src/value/owned.rs 74.14% <62.85%> (-10.14%) ⬇️
src/symbol.rs 67.85% <71.64%> (-8.27%) ⬇️
src/text/text_formatter.rs 40.97% <83.33%> (ø)
src/value/reader.rs 93.70% <83.33%> (ø)
src/value/borrowed.rs 63.63% <93.75%> (-13.75%) ⬇️
src/binary/non_blocking/raw_binary_reader.rs 86.52% <100.00%> (ø)
... and 10 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

/// The text of a fully resolved field name, annotation, or symbol value. If the symbol has known
/// text (that is: the symbol is not `$0`), it will be stored as either a `String` or a shared
/// reference to text in a symbol table.
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Hash, Clone)]
pub struct Symbol {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Question) How does this handle Symbol with undefined text?

Undefined text - text in a SymbolToken structure which is undefined, e.g. null. SymbolTokens with undefined text do not necessarily refer to symbols with unknown text because the text may be resolvable through the SymbolToken’s importLocation

Undefined text reference from https://amzn.github.io/ion-docs/guides/symbols-guide.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Symbol represents a resolved symbol, which is to say that this part:

SymbolTokens with undefined text do not necessarily refer to symbols with unknown text because the text may be resolvable through the SymbolToken’s importLocation

has already been attempted. If it failed, then an error would have been returned by the reader and we wouldn't be constructing a Symbol. If we have a Symbol, then the resolution step has already succeeded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If, after resolution, the Symbol is found to explicitly have undefined text ($0 is the main example), then the Symbol's inner SymbolText enum will be set to Unknown.

impl ImportSource for OwnedImportSource {
fn table(&self) -> &str {
&self.table
fn with_symbol_id(self, _local_sid: SymbolId) -> Self {
Copy link
Contributor

@desaikd desaikd Sep 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Suggestion) Do we really require this method now that it just places Symbol::unknown_text() for all symbols having unknown texts?
Can we rather just have a new method called with_unknown_text()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question; with_symbol_id doesn't make much sense for Symbol, but it will make sense for RawSymbolToken, which represents a not-yet-resolved symbol from the input stream. The raw token might have a symbol ID or text.

Copy link
Contributor

@desaikd desaikd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed answers @zslayton. LGTM!

@zslayton zslayton merged commit 024cb2e into main Sep 22, 2022
@zslayton zslayton deleted the unknown-text branch September 22, 2022 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants