-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(WIP) Implement external mapping of IDs #80
Conversation
@@ -26,6 +26,7 @@ | |||
import org.apache.avro.LogicalType; | |||
import org.apache.avro.LogicalTypes; | |||
import org.apache.avro.Schema; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: non-functional change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -87,7 +98,7 @@ public static Schema buildAvroProjection(Schema schema, com.netflix.iceberg.Sche | |||
|
|||
public static boolean isTimestamptz(Schema schema) { | |||
LogicalType logicalType = schema.getLogicalType(); | |||
if (logicalType != null && logicalType instanceof LogicalTypes.TimestampMicros) { | |||
if (logicalType instanceof LogicalTypes.TimestampMicros) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm all for cleaning up minor issues like this, but I would prefer to keep these in separate commits so that the project is easier to maintain. Could you move this sort of improvement into a separate PR that we can get in separately (and probably more quickly)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will create a separate PR
import java.util.HashMap; | ||
import java.util.List; | ||
|
||
public class AvroToIcebergSchemaVisitor extends AvroNamedSchemaVisitor<com.netflix.iceberg.Schema> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a SchemaToType implementation that does most of the work done here. I think it would be better to extend that class with logic for assigning IDs.
I don't think it would be difficult to extend that to detect when an ID is missing and make a function call to assign it. That way, ID assignment is external to the conversion logic. That would make it possible to plug in something that assigns IDs like this, or something that maps IDs using a configured field mapping.
@YuvalItzchakov, I think the main goal of this PR is to take a mapping and an Avro schema without IDs and produce something that can be passed into The work here produces an Iceberg schema with IDs from an Avro schema, but that's not exactly what is needed to be passed into |
@rdblue Thank you for taking the time to review everything! Frankly, I was not sure this code was in the right direction at all, because as you've mentioned, I'll start working on the changes. |
@YuvalItzchakov, if you want to continue working on this, please re-open it in the apache/incubator-iceberg repository. That's the project's new home. Thanks! |
Initial fix for #71
This implementation introduces
AvroNamedSchemaVisitor<T>
and theAvroToIcebergSchemaVisitor<iceberg.Schema>
types in order to overcome the fact thatAvroSchemaVisitor
only passes the schema at the field level, neglecting to pass in the field names which are needed generate the schema.There are a few "workarounds" in the implementation of the
record
method which I don't really like, still need to iterate on that.There are still open issues which need to be resolved:
0
, and they also map structs in a table with ids (is this desired?)Record
is tested, need to add tests forMap
,Union
,Array
, etc..)