Add external schema mappings for data without field IDs #71

rdblue · 2018-10-18T19:45:49Z

Files written by Iceberg writers contain Iceberg field IDs that are used for column projection. Iceberg doesn't currently support tracking data files that were written by other systems and added to Iceberg tables with the API because the field IDs are missing. To support files written by non-Iceberg writers, Iceberg could support a table-level mapping from a source schema to Iceberg IDs.

For example, a table with 2 columns might have an Avro schema mapping like this one, encoded as JSON in table properties:

[ {"field-id": 1, "names": ["id"]},
  {"field-id": 2, "names": ["data"]} ]

When reading an Avro file, the read schema would be produced using the file's schema and the field IDs from the mapping. The names in each field mapping is a list to handle aliasing.

The text was updated successfully, but these errors were encountered:

govi20 · 2018-10-20T17:29:55Z

I would like to work on this issue.

YuvalItzchakov · 2018-10-20T19:26:41Z

@govi20 I have already started working on this issue, I'd love to pair up if you want :)

rdblue · 2018-12-08T00:26:04Z

I've moved this to apache/iceberg#40

rdblue changed the title ~~Add schema mappings for data files without Iceberg field IDs~~ Add external schema mappings for data without field IDs Oct 18, 2018

rdblue mentioned this issue Oct 18, 2018

Add an API to maintain external schema mappings #72

Closed

rdblue added the good first issue label Oct 19, 2018

YuvalItzchakov mentioned this issue Oct 24, 2018

(WIP) Implement external mapping of IDs #80

Closed

omervk mentioned this issue Nov 13, 2018

Allow Specifying Partitioning Function for External Mappings #100

Open

rdblue closed this as completed Dec 8, 2018

Parth-Brahmbhatt pushed a commit to Parth-Brahmbhatt/iceberg that referenced this issue Apr 12, 2019

Allow passing the unpartitioned spec to DataFiles.builder. (Netflix#71)

de29e34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add external schema mappings for data without field IDs #71

Add external schema mappings for data without field IDs #71

rdblue commented Oct 18, 2018

govi20 commented Oct 20, 2018

YuvalItzchakov commented Oct 20, 2018

rdblue commented Dec 8, 2018

Add external schema mappings for data without field IDs #71

Add external schema mappings for data without field IDs #71

Comments

rdblue commented Oct 18, 2018

govi20 commented Oct 20, 2018

YuvalItzchakov commented Oct 20, 2018

rdblue commented Dec 8, 2018