Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symbol Token and Symbol Table #11

Closed
almann opened this issue Jan 31, 2020 · 7 comments · Fixed by #14
Closed

Symbol Token and Symbol Table #11

almann opened this issue Jan 31, 2020 · 7 comments · Fixed by #14
Assignees
Labels
enhancement New feature or request

Comments

@almann
Copy link
Contributor

almann commented Jan 31, 2020

Had a a whiteboard session with @pbcornell and @mrutter-amzn on 1/24/2020 with some consensus on how to model SymbolToken and ImportSource:

20200124_113115

The variants of local/system/shared symbol table are closed so we should consider implementing this through a struct rather than a interface type.

One thing that came out of this discussion that we should consider in implementing this is putting this into a symbols package and renaming the above APIs accordingly.

@almann almann added the enhancement New feature or request label Jan 31, 2020
@tgregg
Copy link
Contributor

tgregg commented Jan 31, 2020

Why is 'version' needed in 'ImportSource'?

@tgregg
Copy link
Contributor

tgregg commented Jan 31, 2020

Since 'LocalSid' is present in 'SymbolToken', it must be used as a caching mechanism within a particular local symbol table context. What is the mechanism for guaranteeing that 'LocalSid' does not leak into a different context?

@almann
Copy link
Contributor Author

almann commented Jan 31, 2020

Why is 'version' needed in 'ImportSource'?

You are right we don't need this here.

@almann
Copy link
Contributor Author

almann commented Jan 31, 2020

Since 'LocalSid' is present in 'SymbolToken', it must be used as a caching mechanism within a particular local symbol table context. What is the mechanism for guaranteeing that 'LocalSid' does not leak into a different context?

I am a little confused about the question. I do believe the SymbolToken data structure should have the local SID for completeness and optimization when transferring between different streams with compatible contexts.

That said, are you saying this should not be modeled or are you asking about a constraint that should exist elsewhere (or enforced by the data structure)?

@tgregg
Copy link
Contributor

tgregg commented Jan 31, 2020

Let's first agree that LocalSID, from a correctness perspective, is not necessary. That's because

  1. If the text is known, the LocalSID can always be calculated (or a new one assigned).
  2. If the text is not known, and there is an ImportSource, then the LocalSID can be calculated in any symbol table context that includes a compatible shared symbol table import.
  3. If the text is not known and there is no ImportSource, then this represents a symbol with unknown text from a local symbol table, which is always equivalent to symbol zero.

If we agree on that, then we can agree that LocalSID may be used to cache the calculations referred to in 1. and 2. above. This is great when the SymbolToken is always used in compatible symbol table contexts because it reduces repetitive recalculation of the local SID in that context.

However, SymbolTokens are often exposed via the public API. The reader may return a SymbolToken to the user, and that user subsequently may provide it to a writer that may have a completely different symbol table context than the reader that originally created the SymbolToken. If the writer were to use the LocalSID stored in the SymbolToken, the result is likely to be incorrect.

My question is: how to we prevent misuse of LocalSID? Do we clear it before returning SymbolTokens via public APIs? Do we somehow make a SymbolToken aware of its context to make sure localSID is only used within the same context?

We currently don't have a good story for this in ion-java. It would be great to start out on the right foot with ion-go to avoid repeating some of the same potential pitfalls.

@almann
Copy link
Contributor Author

almann commented Jan 31, 2020

Misuse aside, let us consider the case where I want an unmanaged parser (one that does not do resolution of symbols), having the LocalSID with unknown text and unknown location would be the only way to model this if I wanted to ever expose such a reader (like we do in Python).

Do you think the misuse issue obviates the unmanaged parser case?

@almann
Copy link
Contributor Author

almann commented Jan 31, 2020

Having had an offline chat with @tgregg, here is where I think we're at with respect to local SID:

  1. Local SIDs are necessary to model the variant of the API that operates without a symbol table context (e.g. the "raw" parser composed below the "managed" parser)
  2. This field should be made internal for now. I do think there are use cases outside of the package that will want this (e.g. efficient logging readers, object bindings), but we can defer exposing this advanced field until such time as we're ready to expose it.
  3. We should document an invariant with respect to the variance of this field:
  • If the Local SID is defined, the text must be unknown and the source must be undefined. This implies that the symbol token has yet to be resolved and we do not (yet) know the context of such a token.
  • In all other cases Local SID must be undefined as this implies that the symbol token has been resolved and the local identifier is not meaningful in this context.

This, of course, implies we model unknown and undefined values for the field domains above.

@tgregg, please feel free to add or correct anything I might've missed from our conversation.

@almann almann self-assigned this Feb 3, 2020
almann added a commit to almann/ion-go that referenced this issue Feb 23, 2020
Removes non-used `symbols.go` module which had a preliminary symbol
table implementation.

Adds SymbolToken as a pure data structure.

Adds SymbolTable as an abstract data type for
system/shared/local symbol tables.

None of the APIs are currently exported for making symbol tables.

Resolves amazon-ion#11
almann added a commit to almann/ion-go that referenced this issue Feb 23, 2020
Removes non-used `symbols.go` module which had a preliminary symbol
table implementation.

Adds SymbolToken as a pure data structure.

Adds SymbolTable as an abstract data type for
system/shared/local symbol tables.

None of the APIs are currently exported for making symbol tables.

Resolves amazon-ion#11
almann added a commit that referenced this issue Feb 25, 2020
Removes non-used `symbols.go` module which had a preliminary symbol
table implementation.

Adds SymbolToken as a pure data structure.

Adds SymbolTable as an abstract data type for
system/shared/local symbol tables.

None of the APIs are currently exported for making symbol tables.

Makes a `goimports -w` pass to clean up a few files.

Resolves #11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants