New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVRO-3275: Clarify specification on naming #1439
Conversation
Adds an example, and mentions for all occurrences of the 'namespace' attribute that it is optional.
Minor addition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM! :D
I made the same suggestion on https://github.com/martin-g/avro-website/pull/7/files -- it's so much easier to review the markdown than the XML here!
they do not contain a dot, the namespace is the namespace of | ||
the enclosing definition. | ||
</p> | ||
<p>Primitive type names have no namespace and their names may |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh -- this is another problem, isn't it? At our company we just ran into the inconsistent treatment of a record named "record"
(a non-primitive type, OK in Java, forbidden in Python). I'll bring this up on the mailing list!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* AVRO-3275: Clarify specification on naming * AVRO-3275: Further clarify specification on naming Adds an example, and mentions for all occurrences of the 'namespace' attribute that it is optional. * AVRO-3275: Further clarify specification on naming Minor addition * Apply suggestions from code review Co-authored-by: Ryan Skraba <ryan@skraba.com>
"type": { | ||
"type": "enum", | ||
"name": "Understanding", | ||
"doc": "A simple name (attribute) and no namespace attribute: inherit the namespace of the enclosing type 'a.full.Name'. The fullname is 'a.full.Understanding'.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RyanSkraba I really appreciate this writeup as I'm reimplementing schema parsing for Elixir, and struggling with the inconsistencies of how existing libraries resolve references.
Can you answer a question for me?
Imagine the following schema
{
"name": "a",
"namespace": "top",
"type": "record",
"fields": [
{"name": "b", "type": {"name": "c", "type": "string"}}
]
}
Would the fullnames here be either:
top.a
,top.b
, andtop.c
top.a
,top.b
,c
It's unclear to me how namespaces should propagate. It seems like 1 should be the correct answer, but unclear as most libraries seem to not enforce this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello! I think this is a good question, and probably could also be clarified in the specification.
- Field names don't have namespaces, and
- Unnamed types such as
string
don't have names and don't inherit namespaces.
So there is only one named type in your example top.a
and one field name b
. The attribute "name": "c"
on the string
looks like it is accepted, but silently dropped in the Java SDK.
Where the specification is vague: "Attributes not defined in this document are permitted as metadata, but must not affect the format of serialized data." I think it makes sense to drop metadata that are applied to primitive schemas, if we consider that the type string
is an immutable singleton instance in the SDK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created AVRO-3430 for discussion on whether Java is correct to silently drop the "name": "c"
attribute on the primitive string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is correct as you pointed out that it's technically metadata here. The canonical parsing form would drop it down to just the primitive.
Can you clarify where in the spec it specifies that Record fields do not have namespaces and thus do not have full names? Clearly they do not have a namespace field, but that nuance was not obvious.
Thank you so much for answering this out of band! This all seems like useful data to go back into the spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if you're interested, this is the PR that brought up these questions beam-community/avro_ex#62
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question is that if field names do not have namespaces, how do field aliases come into play?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's all about context.
A namespace is a context for (named) schemata. Named schemata (of types record
/error
, enum
or fixed
) must have a unique name within their context. This results in globally unique full names (i.e., names prefixed with the namespace and a dot).
Fields cannot share a context with schemata, because they're not schemata. The context where fields are known is their record
schema. Field names must be unique there.
Aliases are alternate names for schemata and fields, and the same rules as for names apply. Aliases for schemata are interpreted in the namespace of the schema, or globally if the alias includes a namespace (i.e., it's a full name). Aliases for fields are interpreted within the record that contains the field (and like field names, cannot reference another context/record
schema).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify where in the spec it specifies that Record fields do not have namespaces and thus do not have full names?
I agree that it's not great, and specified "by omission" here To paraphrase: Record, enum and fixed each have a fullname composed of namespace and name... The name portion of a fullname, record field names, and enum symbols must match [an alphanumeric regex without dot]
I raised AVRO-3436 because it would be simple to clarify!
Oh, I also agree with Oscar's interpretation : named types have namespaces because they are reused (as type references), while fields (and enum symbols) can only be used in the context of their enclosing record (or enum). If you have a use case for fields with a namespace, I'd be interested in hearing it. Maybe we could suggest an alternative? But |
Make sure you have checked all steps below.
Jira
In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.Tests
adds the following unit tests ORdoes not need testing for this extremely good reason: it's a text change in specificationCommits
Documentation