New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AvroEx.Schema.Parser - Hand-rolled schema parser #62
Conversation
I'm having some questions about the spec with respect to how full names are resolved, and what is and isn't allowed in terms of duplicates. Testing with avro-js, I get the following results Duplicate field names in a Record❌ Not allowed > avro.parse({type: "record", name: "a", fields: [{name: "a", type: "string"}, {name: "a", type: "string"}]})
Uncaught Error: duplicate a field name
at new RecordType (/Users/dave/Development/avro_ex/javascript/node_modules/avro-js/lib/schemas.js:1429:11)
at /Users/dave/Development/avro_ex/javascript/node_modules/avro-js/lib/schemas.js:136:14
at Object.createType (/Users/dave/Development/avro_ex/javascript/node_modules/avro-js/lib/schemas.js:137:7)
at Object.parse (/Users/dave/Development/avro_ex/javascript/node_modules/avro-js/lib/files.js:74:13) Duplicate field names in separate Records✅ Allowed > avro.parse({type: "record", name: "a", fields: [{name: "a", type: "string"}, {name: "b", type: {type: "record", name: "inner", fields: [{name: "a", type: "string"}]}}]})
RecordType {
_name: 'a',
_aliases: [],
_type: 'record',
_fields: [
Field { _name: 'a', _type: StringType {}, _aliases: [], _order: 1 },
Field { _name: 'b', _type: [RecordType], _aliases: [], _order: 1 }
],
_constructor: [Function: a] { getType: [Function (anonymous)] },
_read: [Function: reada],
_skip: [Function: skipa],
_write: [Function: writea],
_check: [Function: checka]
} However, it does not seem to respect namespaces on Record fields, which tells me I'm either misunderstanding the spec, or this implementation has issues. The spec reads
This leads me to believe the following assertions: ❌ Record fields should not contain duplicates |
It seems that apache/avro#1439 and apache/avro#1573 seem to clarify the rules around namespace resolution. I'll dig into that more |
Asking for feedback here https://github.com/apache/avro/pull/1439/files#r818203665 |
Opening this for review while I get over the final hurdle, namespace propagation. @doomspork if you want to give some review please do. Note that once the tests are passing and credo is happy, I will likely merge so I can move onto removing ecto and the rest of the cleanup, which will be very noisy |
{:ok, schema} -> schema | ||
_ -> raise "Parsing schema failed" | ||
def decode_schema!(schema) do | ||
if is_binary(schema) and not Schema.Parser.primitive?(schema) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@doomspork this ended up being very elegant. Now you can pass anything to AvroEx.decode_schema!/1
and it will figure out if its json or not
@@ -305,13 +309,14 @@ defmodule AvroEx.Schema do | |||
"Record<name=foo>" | |||
""" | |||
@spec type_name(schema_types()) :: String.t() | |||
def type_name(%Primitive{type: nil}), do: "null" | |||
def type_name(%Primitive{type: :null}), do: "null" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for another ticket, but this function need a way to get the parent namespace and pass it down to full_name
lib/avro_ex/schema/parser.ex
Outdated
|
||
defp extract_data({data, rest, {type, raw}}) do | ||
if rest != %{} do | ||
# TODO this violates the spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfornuately I can't strictly validate here as the spec says that any additional fields should be treated as metadata. Maybe we can add an option in the future for strict parsing. I'll remove this validation for this PR and add a ticket for future work to add the option
lib/avro_ex/schema/parser.ex
Outdated
put_in(context.names[name], schema) | ||
end | ||
|
||
defp aliases(%{aliases: aliases, namespace: namespace} = record, parent_namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I stole this from AvroEx.Schema
. I think we will remove that function in a future PR as is only needed in this context
lib/avro_ex/schema/record.ex
Outdated
field(:qualified_names, {:array, :string}, default: []) | ||
# TODO remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metadata is valid
The goal of this PR is to
Closes #40
Closes #60
Closes #61
Closes #55
Schema features