Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-3275: Clarify specification on naming #1439

Merged
merged 4 commits into from Feb 7, 2022

Conversation

opwvhk
Copy link
Contributor

@opwvhk opwvhk commented Dec 30, 2021

Make sure you have checked all steps below.

Jira

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason: it's a text change in specification

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

Adds an example, and mentions for all occurrences of the 'namespace'
attribute that it is optional.
Copy link
Contributor

@RyanSkraba RyanSkraba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM! :D

I made the same suggestion on https://github.com/martin-g/avro-website/pull/7/files -- it's so much easier to review the markdown than the XML here!

doc/src/content/xdocs/spec.xml Outdated Show resolved Hide resolved
they do not contain a dot, the namespace is the namespace of
the enclosing definition.
</p>
<p>Primitive type names have no namespace and their names may
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh -- this is another problem, isn't it? At our company we just ran into the inconsistent treatment of a record named "record" (a non-primitive type, OK in Java, forbidden in Python). I'll bring this up on the mailing list!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc/src/content/xdocs/spec.xml Outdated Show resolved Hide resolved
@RyanSkraba RyanSkraba merged commit 840dec6 into apache:master Feb 7, 2022
RyanSkraba added a commit that referenced this pull request Feb 7, 2022
* AVRO-3275: Clarify specification on naming

* AVRO-3275: Further clarify specification on naming

Adds an example, and mentions for all occurrences of the 'namespace'
attribute that it is optional.

* AVRO-3275: Further clarify specification on naming

Minor addition

* Apply suggestions from code review

Co-authored-by: Ryan Skraba <ryan@skraba.com>
@opwvhk opwvhk deleted the AVRO-3275-spec-update-naming branch February 22, 2022 14:13
"type": {
"type": "enum",
"name": "Understanding",
"doc": "A simple name (attribute) and no namespace attribute: inherit the namespace of the enclosing type 'a.full.Name'. The fullname is 'a.full.Understanding'.",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RyanSkraba I really appreciate this writeup as I'm reimplementing schema parsing for Elixir, and struggling with the inconsistencies of how existing libraries resolve references.

Can you answer a question for me?

Imagine the following schema

{
  "name": "a",
  "namespace": "top",
  "type": "record",
  "fields": [
      {"name": "b", "type": {"name": "c", "type": "string"}}
  ]
}

Would the fullnames here be either:

  1. top.a, top.b, and top.c
  2. top.a, top.b, c

It's unclear to me how namespaces should propagate. It seems like 1 should be the correct answer, but unclear as most libraries seem to not enforce this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! I think this is a good question, and probably could also be clarified in the specification.

  1. Field names don't have namespaces, and
  2. Unnamed types such as string don't have names and don't inherit namespaces.

So there is only one named type in your example top.a and one field name b. The attribute "name": "c" on the string looks like it is accepted, but silently dropped in the Java SDK.

Where the specification is vague: "Attributes not defined in this document are permitted as metadata, but must not affect the format of serialized data." I think it makes sense to drop metadata that are applied to primitive schemas, if we consider that the type string is an immutable singleton instance in the SDK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created AVRO-3430 for discussion on whether Java is correct to silently drop the "name": "c" attribute on the primitive string.

Copy link

@davydog187 davydog187 Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is correct as you pointed out that it's technically metadata here. The canonical parsing form would drop it down to just the primitive.

Can you clarify where in the spec it specifies that Record fields do not have namespaces and thus do not have full names? Clearly they do not have a namespace field, but that nuance was not obvious.

Thank you so much for answering this out of band! This all seems like useful data to go back into the spec

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if you're interested, this is the PR that brought up these questions beam-community/avro_ex#62

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question is that if field names do not have namespaces, how do field aliases come into play?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's all about context.

A namespace is a context for (named) schemata. Named schemata (of types record/error, enum or fixed) must have a unique name within their context. This results in globally unique full names (i.e., names prefixed with the namespace and a dot).

Fields cannot share a context with schemata, because they're not schemata. The context where fields are known is their record schema. Field names must be unique there.

Aliases are alternate names for schemata and fields, and the same rules as for names apply. Aliases for schemata are interpreted in the namespace of the schema, or globally if the alias includes a namespace (i.e., it's a full name). Aliases for fields are interpreted within the record that contains the field (and like field names, cannot reference another context/record schema).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify where in the spec it specifies that Record fields do not have namespaces and thus do not have full names?

I agree that it's not great, and specified "by omission" here To paraphrase: Record, enum and fixed each have a fullname composed of namespace and name... The name portion of a fullname, record field names, and enum symbols must match [an alphanumeric regex without dot] I raised AVRO-3436 because it would be simple to clarify!

@RyanSkraba
Copy link
Contributor

Oh, I also agree with Oscar's interpretation : named types have namespaces because they are reused (as type references), while fields (and enum symbols) can only be used in the context of their enclosing record (or enum).

If you have a use case for fields with a namespace, I'd be interested in hearing it. Maybe we could suggest an alternative? But user@avro.apache.org or a new JIRA might get more visibility!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants