-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates symbol logic in the Element APIs #426
Conversation
This patch: * Removes the `IonImportSource` trait and its implementations. This type enabled readers to compare symbols in the event that the corresponding shared symbol table could not be found, a niche use case. Removing `IonImportSource` eliminates a surprising number of allocations. * Updates the `IonSymbolToken` trait. Now that import source is not being tracked/considered, implementations must update their behavior for equivalence testing. * Replaces usages of the `owned::SymbolToken` type with the `Symbol` type that is used in all of the `IonReader` APIs. The `Symbol` type takes advantage of `String` interning via the symbol table, which was still a TODO for `owned::SymbolToken`. This eliminates some conversion logic and allocations that were happening for each annotation, struct field name, and symbol value.
Codecov Report
@@ Coverage Diff @@
## main #426 +/- ##
==========================================
- Coverage 84.81% 83.84% -0.98%
==========================================
Files 83 83
Lines 15466 15532 +66
==========================================
- Hits 13118 13023 -95
- Misses 2348 2509 +161
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
/// The text of a fully resolved field name, annotation, or symbol value. If the symbol has known | ||
/// text (that is: the symbol is not `$0`), it will be stored as either a `String` or a shared | ||
/// reference to text in a symbol table. | ||
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Hash, Clone)] | ||
pub struct Symbol { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Question) How does this handle Symbol with undefined text?
Undefined text - text in a SymbolToken structure which is undefined, e.g. null. SymbolTokens with undefined text do not necessarily refer to symbols with unknown text because the text may be resolvable through the SymbolToken’s importLocation
Undefined text reference from https://amzn.github.io/ion-docs/guides/symbols-guide.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Symbol
represents a resolved symbol, which is to say that this part:
SymbolTokens with undefined text do not necessarily refer to symbols with unknown text because the text may be resolvable through the SymbolToken’s importLocation
has already been attempted. If it failed, then an error would have been returned by the reader and we wouldn't be constructing a Symbol
. If we have a Symbol
, then the resolution step has already succeeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If, after resolution, the Symbol
is found to explicitly have undefined text ($0
is the main example), then the Symbol
's inner SymbolText
enum will be set to Unknown
.
impl ImportSource for OwnedImportSource { | ||
fn table(&self) -> &str { | ||
&self.table | ||
fn with_symbol_id(self, _local_sid: SymbolId) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Suggestion) Do we really require this method now that it just places Symbol::unknown_text()
for all symbols having unknown texts?
Can we rather just have a new method called with_unknown_text()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question; with_symbol_id
doesn't make much sense for Symbol
, but it will make sense for RawSymbolToken
, which represents a not-yet-resolved symbol from the input stream. The raw token might have a symbol ID or text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed answers @zslayton. LGTM!
This patch:
Unknown
variant to the inner representation ofSymbol
, allowing it to represent$0
.IonImportSource
trait and its implementations.This type enabled readers to compare symbols in the event that
the corresponding shared symbol table could not be found, a niche
use case. Removing
IonImportSource
eliminates a surprising numberof allocations.
IonSymbolToken
trait. Now that import source is notbeing tracked/considered, implementations must update their
behavior for equivalence testing.
owned::SymbolToken
type with theSymbol
type that is used in all of the
IonReader
APIs. TheSymbol
typetakes advantage of
String
interning via the symbol table, whichwas still a TODO for
owned::SymbolToken
. This eliminates someconversion logic and allocations that were happening for each
annotation, struct field name, and symbol value.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.