From 4614fdaf4c1c959da4eef11d741d0d7f2c94f9b3 Mon Sep 17 00:00:00 2001 From: emkornfield Date: Wed, 31 Jul 2024 23:01:34 -0700 Subject: [PATCH 01/11] clarify_projection --- format/spec.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/format/spec.md b/format/spec.md index 5a90f6fd978d..b09ab5670e97 100644 --- a/format/spec.md +++ b/format/spec.md @@ -241,7 +241,9 @@ Struct evolution requires the following rules for default values: #### Column Projection -Columns in Iceberg data files are selected by field id. The table schema's column names and order may change after a data file is written, and projection must be done using field ids. If a field id is missing from a data file, its value for each row should be `null`. +Columns in Iceberg data files are selected by field id. The table schema's column names and order may change after a data file is written, and projection must be done using field ids. + +When a projected column has an [identity partition transform](#partition-transforms) applied to it for a data file, the value from the [manifest file](#manifests) must be used for that column (i.e. the column should not be read from the data file). This is to support tables that were migrated from other table formats (notably Hive) that do not write partition values to data files. Otherwise, if a field id is missing from a data file, its value for each row should be `null`. For example, a file may be written with schema `1: a int, 2: b string, 3: c double` and read using projection schema `3: measurement, 2: name, 4: a`. This must select file columns `c` (renamed to `measurement`), `b` (now called `name`), and a column of `null` values called `a`; in that order. @@ -399,6 +401,9 @@ Sorting floating-point numbers should produce the following behavior: `-NaN` < ` A data or delete file is associated with a sort order by the sort order's id within [a manifest](#manifests). Therefore, the table must declare all the sort orders for lookup. A table could also be configured with a default sort order id, indicating how the new data should be sorted by default. Writers should use this default sort order to sort the data on write, but are not required to if the default order is prohibitively expensive, as it would be for streaming writes. +#### Writing with Identity transform + +When writing data files, all columns including those with an identity transforms should be written to data files. This provides redundancy in case of corruption or bugs in the metadata layer. Due to [column projection rules](#column-projection) readers can still properly scan the table if columns that have an indentity partition transform applied are ommitted. This is not the case for any other transform type. ### Manifests @@ -591,11 +596,10 @@ For example, an `events` table with a timestamp column named `ts` that is partit Scan predicates are also used to filter data and delete files using column bounds and counts that are stored by field id in manifests. The same filter logic can be used for both data and delete files because both store metrics of the rows either inserted or deleted. If metrics show that a delete file has no rows that match a scan predicate, it may be ignored just as a data file would be ignored [2]. -Data files that match the query filter must be read by the scan. +Data files that match the query filter must be read by the scan. Note that for any snapshot, all file paths marked with "ADDED" or "EXISTING" may appear at most once across all manifest files in the snapshot. If a file path appears more than once, the results of the scan are undefined. Reader implementations may raise an error in this case, but are not required to do so. - Delete files that match the query filter must be applied to data files at read time, limited by the scope of the delete file using the following rules. * A _position_ delete file must be applied to a data file when all of the following are true: From db9f35728759539489697391ee436ec6ba0e74ac Mon Sep 17 00:00:00 2001 From: emkornfield Date: Wed, 31 Jul 2024 23:02:58 -0700 Subject: [PATCH 02/11] typo --- format/spec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/format/spec.md b/format/spec.md index b09ab5670e97..0f5fd65af06f 100644 --- a/format/spec.md +++ b/format/spec.md @@ -403,7 +403,7 @@ A data or delete file is associated with a sort order by the sort order's id wit #### Writing with Identity transform -When writing data files, all columns including those with an identity transforms should be written to data files. This provides redundancy in case of corruption or bugs in the metadata layer. Due to [column projection rules](#column-projection) readers can still properly scan the table if columns that have an indentity partition transform applied are ommitted. This is not the case for any other transform type. +When writing data files, all columns including those with an identity transforms should be written to data files. This provides redundancy in case of corruption or bugs in the metadata layer. Due to [column projection rules](#column-projection) readers can still properly scan the table if columns that have an indentity partition transforms applied are ommitted. This is not the case for any other transform type. ### Manifests From 2b2e5957ffbd874f6a9fac92c1850b45d44aefe1 Mon Sep 17 00:00:00 2001 From: emkornfield Date: Wed, 31 Jul 2024 23:06:52 -0700 Subject: [PATCH 03/11] remove whitespace --- format/spec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/format/spec.md b/format/spec.md index 0f5fd65af06f..da1428e55ff9 100644 --- a/format/spec.md +++ b/format/spec.md @@ -241,7 +241,7 @@ Struct evolution requires the following rules for default values: #### Column Projection -Columns in Iceberg data files are selected by field id. The table schema's column names and order may change after a data file is written, and projection must be done using field ids. +Columns in Iceberg data files are selected by field id. The table schema's column names and order may change after a data file is written, and projection must be done using field ids. When a projected column has an [identity partition transform](#partition-transforms) applied to it for a data file, the value from the [manifest file](#manifests) must be used for that column (i.e. the column should not be read from the data file). This is to support tables that were migrated from other table formats (notably Hive) that do not write partition values to data files. Otherwise, if a field id is missing from a data file, its value for each row should be `null`. From 020a800db46a5bfb307ee82702bbfac8e3f9fa75 Mon Sep 17 00:00:00 2001 From: emkornfield Date: Wed, 31 Jul 2024 23:08:15 -0700 Subject: [PATCH 04/11] remove white space --- format/spec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/format/spec.md b/format/spec.md index da1428e55ff9..0e162050b76c 100644 --- a/format/spec.md +++ b/format/spec.md @@ -243,7 +243,7 @@ Struct evolution requires the following rules for default values: Columns in Iceberg data files are selected by field id. The table schema's column names and order may change after a data file is written, and projection must be done using field ids. -When a projected column has an [identity partition transform](#partition-transforms) applied to it for a data file, the value from the [manifest file](#manifests) must be used for that column (i.e. the column should not be read from the data file). This is to support tables that were migrated from other table formats (notably Hive) that do not write partition values to data files. Otherwise, if a field id is missing from a data file, its value for each row should be `null`. +When a projected column has an [identity partition transform](#partition-transforms) applied to it for a data file, the value from the [manifest file](#manifests) must be used for that column (i.e. the column should not be read from the data file). This is to support tables that were migrated from other table formats (notably Hive) that do not write partition values to data files. Otherwise, if a field id is missing from a data file, its value for each row should be `null`. For example, a file may be written with schema `1: a int, 2: b string, 3: c double` and read using projection schema `3: measurement, 2: name, 4: a`. This must select file columns `c` (renamed to `measurement`), `b` (now called `name`), and a column of `null` values called `a`; in that order. From e41fb87ed58e80f9bc9827d9681107771f2c84b7 Mon Sep 17 00:00:00 2001 From: emkornfield Date: Thu, 1 Aug 2024 07:52:53 -0700 Subject: [PATCH 05/11] Update format/spec.md Co-authored-by: Ajantha Bhat --- format/spec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/format/spec.md b/format/spec.md index 0e162050b76c..127333c1791a 100644 --- a/format/spec.md +++ b/format/spec.md @@ -403,7 +403,7 @@ A data or delete file is associated with a sort order by the sort order's id wit #### Writing with Identity transform -When writing data files, all columns including those with an identity transforms should be written to data files. This provides redundancy in case of corruption or bugs in the metadata layer. Due to [column projection rules](#column-projection) readers can still properly scan the table if columns that have an indentity partition transforms applied are ommitted. This is not the case for any other transform type. +When writing data files, all columns including those with an identity transforms should be written to data files. This provides redundancy in case of corruption or bugs in the metadata layer. Due to [column projection rules](#column-projection) readers can still properly scan the table if columns that have an identity partition transforms applied are omitted. This is not the case for any other transform type. ### Manifests From c4e39519615bee0217e1af82e0f97fdc0e5c8b8a Mon Sep 17 00:00:00 2001 From: emkornfield Date: Thu, 1 Aug 2024 09:02:00 -0700 Subject: [PATCH 06/11] address comments --- format/spec.md | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/format/spec.md b/format/spec.md index 127333c1791a..6fae79b7ca57 100644 --- a/format/spec.md +++ b/format/spec.md @@ -243,7 +243,11 @@ Struct evolution requires the following rules for default values: Columns in Iceberg data files are selected by field id. The table schema's column names and order may change after a data file is written, and projection must be done using field ids. -When a projected column has an [identity partition transform](#partition-transforms) applied to it for a data file, the value from the [manifest file](#manifests) must be used for that column (i.e. the column should not be read from the data file). This is to support tables that were migrated from other table formats (notably Hive) that do not write partition values to data files. Otherwise, if a field id is missing from a data file, its value for each row should be `null`. +Values for Field ids which are not present in a data file must be resolved according the following rules: + +* Return the value from partition metadata if an [Identity Transform](#partition-transforms) exists for the field. +* Return the default value as defined in [Default values](#default-values) if it exists. +* Return `null` in all other cases. For example, a file may be written with schema `1: a int, 2: b string, 3: c double` and read using projection schema `3: measurement, 2: name, 4: a`. This must select file columns `c` (renamed to `measurement`), `b` (now called `name`), and a column of `null` values called `a`; in that order. @@ -401,9 +405,6 @@ Sorting floating-point numbers should produce the following behavior: `-NaN` < ` A data or delete file is associated with a sort order by the sort order's id within [a manifest](#manifests). Therefore, the table must declare all the sort orders for lookup. A table could also be configured with a default sort order id, indicating how the new data should be sorted by default. Writers should use this default sort order to sort the data on write, but are not required to if the default order is prohibitively expensive, as it would be for streaming writes. -#### Writing with Identity transform - -When writing data files, all columns including those with an identity transforms should be written to data files. This provides redundancy in case of corruption or bugs in the metadata layer. Due to [column projection rules](#column-projection) readers can still properly scan the table if columns that have an identity partition transforms applied are omitted. This is not the case for any other transform type. ### Manifests @@ -1397,4 +1398,8 @@ This section covers topics not required by the specification but recommendations Iceberg supports two types of histories for tables. A history of previous "current snapshots" stored in ["snapshot-log" table metadata](#table-metadata-fields) and [parent-child lineage stored in "snapshots"](#table-metadata-fields). These two histories might indicate different snapshot IDs for a specific timestamp. The discrepancies can be caused by a variety of table operations (e.g. updating the `current-snapshot-id` can be used to set the snapshot of a table to any arbitrary snapshot, which might have a lineage derived from a table branch or no lineage at all). -When processing point in time queries implementations should use "snapshot-log" metadata to lookup the table state at the given point in time. This ensures time-travel queries reflect the state of the table at the provided timestamp. For example a SQL query like `SELECT * FROM prod.db.table TIMESTAMP AS OF '1986-10-26 01:21:00Z';` would find the snapshot of the Iceberg table just prior to '1986-10-26 01:21:00 UTC' in the snapshot logs and use the metadata from that snapshot to perform the scan of the table. If no snapshot exists prior to the timestamp given or "snapshot-log" is not populated (it is an optional field), then systems should raise an informative error message about the missing metadata. \ No newline at end of file +When processing point in time queries implementations should use "snapshot-log" metadata to lookup the table state at the given point in time. This ensures time-travel queries reflect the state of the table at the provided timestamp. For example a SQL query like `SELECT * FROM prod.db.table TIMESTAMP AS OF '1986-10-26 01:21:00Z';` would find the snapshot of the Iceberg table just prior to '1986-10-26 01:21:00 UTC' in the snapshot logs and use the metadata from that snapshot to perform the scan of the table. If no snapshot exists prior to the timestamp given or "snapshot-log" is not populated (it is an optional field), then systems should raise an informative error message about the missing metadata. + +### Writing data files + +All columns should be written to data files even if they introduce redundancy with metadata stored in manifest file (e.g. columns with identity partition transforms). Writing all columns provides redundancy in case of corruption or bugs in the metadata layer. \ No newline at end of file From 5522c0c42fff37b4517050eebdf7d0415618d31a Mon Sep 17 00:00:00 2001 From: emkornfield Date: Thu, 1 Aug 2024 14:31:13 -0700 Subject: [PATCH 07/11] address comments --- format/spec.md | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/format/spec.md b/format/spec.md index 6fae79b7ca57..d44a844081bc 100644 --- a/format/spec.md +++ b/format/spec.md @@ -150,6 +150,10 @@ Readers should be more permissive because v1 metadata files are allowed in v2 ta Readers may be more strict for metadata JSON files because the JSON files are not reused and will always match the table version. Required v2 fields that were not present in v1 or optional in v1 may be handled as required fields. For example, a v2 table that is missing `last-sequence-number` can throw an exception. +##### Writing data files + +All columns must be written to data files even if they introduce redundancy with metadata stored in manifest file (e.g. columns with identity partition transforms). Writing all columns provides a backup in case of corruption or bugs in the metadata layer. + ### Schemas and Data Types A table's **schema** is a list of named columns. All data types are either primitives or nested types, which are maps, lists, or structs. A table schema is also a struct type. @@ -243,10 +247,11 @@ Struct evolution requires the following rules for default values: Columns in Iceberg data files are selected by field id. The table schema's column names and order may change after a data file is written, and projection must be done using field ids. -Values for Field ids which are not present in a data file must be resolved according the following rules: +Values for field ids which are not present in a data file must be resolved according the following rules: -* Return the value from partition metadata if an [Identity Transform](#partition-transforms) exists for the field. -* Return the default value as defined in [Default values](#default-values) if it exists. +* Return the value from partition metadata if an [Identity Transform](#partition-transforms) exists for the field and the partition value is present in the `partitition` struct on `data_file` object in the manifest. +* Use `schema.name-mapping.default` metadata to map field id to columns without field id as described below and use the column if it is present. +* Return the default value if it has a defined in `initial-default` (See [Default values](#default-values) section for more details). * Return `null` in all other cases. For example, a file may be written with schema `1: a int, 2: b string, 3: c double` and read using projection schema `3: measurement, 2: name, 4: a`. This must select file columns `c` (renamed to `measurement`), `b` (now called `name`), and a column of `null` values called `a`; in that order. @@ -597,10 +602,11 @@ For example, an `events` table with a timestamp column named `ts` that is partit Scan predicates are also used to filter data and delete files using column bounds and counts that are stored by field id in manifests. The same filter logic can be used for both data and delete files because both store metrics of the rows either inserted or deleted. If metrics show that a delete file has no rows that match a scan predicate, it may be ignored just as a data file would be ignored [2]. -Data files that match the query filter must be read by the scan. +Data files that match the query filter must be read by the scan. Note that for any snapshot, all file paths marked with "ADDED" or "EXISTING" may appear at most once across all manifest files in the snapshot. If a file path appears more than once, the results of the scan are undefined. Reader implementations may raise an error in this case, but are not required to do so. + Delete files that match the query filter must be applied to data files at read time, limited by the scope of the delete file using the following rules. * A _position_ delete file must be applied to a data file when all of the following are true: @@ -1398,8 +1404,4 @@ This section covers topics not required by the specification but recommendations Iceberg supports two types of histories for tables. A history of previous "current snapshots" stored in ["snapshot-log" table metadata](#table-metadata-fields) and [parent-child lineage stored in "snapshots"](#table-metadata-fields). These two histories might indicate different snapshot IDs for a specific timestamp. The discrepancies can be caused by a variety of table operations (e.g. updating the `current-snapshot-id` can be used to set the snapshot of a table to any arbitrary snapshot, which might have a lineage derived from a table branch or no lineage at all). -When processing point in time queries implementations should use "snapshot-log" metadata to lookup the table state at the given point in time. This ensures time-travel queries reflect the state of the table at the provided timestamp. For example a SQL query like `SELECT * FROM prod.db.table TIMESTAMP AS OF '1986-10-26 01:21:00Z';` would find the snapshot of the Iceberg table just prior to '1986-10-26 01:21:00 UTC' in the snapshot logs and use the metadata from that snapshot to perform the scan of the table. If no snapshot exists prior to the timestamp given or "snapshot-log" is not populated (it is an optional field), then systems should raise an informative error message about the missing metadata. - -### Writing data files - -All columns should be written to data files even if they introduce redundancy with metadata stored in manifest file (e.g. columns with identity partition transforms). Writing all columns provides redundancy in case of corruption or bugs in the metadata layer. \ No newline at end of file +When processing point in time queries implementations should use "snapshot-log" metadata to lookup the table state at the given point in time. This ensures time-travel queries reflect the state of the table at the provided timestamp. For example a SQL query like `SELECT * FROM prod.db.table TIMESTAMP AS OF '1986-10-26 01:21:00Z';` would find the snapshot of the Iceberg table just prior to '1986-10-26 01:21:00 UTC' in the snapshot logs and use the metadata from that snapshot to perform the scan of the table. If no snapshot exists prior to the timestamp given or "snapshot-log" is not populated (it is an optional field), then systems should raise an informative error message about the missing metadata. \ No newline at end of file From bfa9ea87e280646989e5e4aca72fcb5820081189 Mon Sep 17 00:00:00 2001 From: emkornfield Date: Fri, 2 Aug 2024 01:03:23 -0700 Subject: [PATCH 08/11] Update format/spec.md Co-authored-by: Ajantha Bhat --- format/spec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/format/spec.md b/format/spec.md index d44a844081bc..51c75a03f4b1 100644 --- a/format/spec.md +++ b/format/spec.md @@ -249,7 +249,7 @@ Columns in Iceberg data files are selected by field id. The table schema's colum Values for field ids which are not present in a data file must be resolved according the following rules: -* Return the value from partition metadata if an [Identity Transform](#partition-transforms) exists for the field and the partition value is present in the `partitition` struct on `data_file` object in the manifest. +* Return the value from partition metadata if an [Identity Transform](#partition-transforms) exists for the field and the partition value is present in the `partition` struct on `data_file` object in the manifest. * Use `schema.name-mapping.default` metadata to map field id to columns without field id as described below and use the column if it is present. * Return the default value if it has a defined in `initial-default` (See [Default values](#default-values) section for more details). * Return `null` in all other cases. From 654772af7514fc9b0efdbada05a1ebc9dbf53aca Mon Sep 17 00:00:00 2001 From: emkornfield Date: Fri, 2 Aug 2024 13:57:11 -0700 Subject: [PATCH 09/11] remove white space --- format/spec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/format/spec.md b/format/spec.md index 51c75a03f4b1..1cc575a6131f 100644 --- a/format/spec.md +++ b/format/spec.md @@ -249,7 +249,7 @@ Columns in Iceberg data files are selected by field id. The table schema's colum Values for field ids which are not present in a data file must be resolved according the following rules: -* Return the value from partition metadata if an [Identity Transform](#partition-transforms) exists for the field and the partition value is present in the `partition` struct on `data_file` object in the manifest. +* Return the value from partition metadata if an [Identity Transform](#partition-transforms) exists for the field and the partition value is present in the `partition` struct on `data_file` object in the manifest. This allows for metadata only migrations of Hive tables. * Use `schema.name-mapping.default` metadata to map field id to columns without field id as described below and use the column if it is present. * Return the default value if it has a defined in `initial-default` (See [Default values](#default-values) section for more details). * Return `null` in all other cases. From bd4614946ba72f9c4fa8e863d69147f9e63a8d5f Mon Sep 17 00:00:00 2001 From: emkornfield Date: Sun, 4 Aug 2024 13:02:31 -0700 Subject: [PATCH 10/11] remove unnecessary "in" --- format/spec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/format/spec.md b/format/spec.md index 1cc575a6131f..67d528186eea 100644 --- a/format/spec.md +++ b/format/spec.md @@ -251,7 +251,7 @@ Values for field ids which are not present in a data file must be resolved accor * Return the value from partition metadata if an [Identity Transform](#partition-transforms) exists for the field and the partition value is present in the `partition` struct on `data_file` object in the manifest. This allows for metadata only migrations of Hive tables. * Use `schema.name-mapping.default` metadata to map field id to columns without field id as described below and use the column if it is present. -* Return the default value if it has a defined in `initial-default` (See [Default values](#default-values) section for more details). +* Return the default value if it has a defined `initial-default` (See [Default values](#default-values) section for more details). * Return `null` in all other cases. For example, a file may be written with schema `1: a int, 2: b string, 3: c double` and read using projection schema `3: measurement, 2: name, 4: a`. This must select file columns `c` (renamed to `measurement`), `b` (now called `name`), and a column of `null` values called `a`; in that order. From 34c422f594a38e11b8cb16e798f31d7b4044dc47 Mon Sep 17 00:00:00 2001 From: emkornfield Date: Mon, 5 Aug 2024 17:51:54 -0700 Subject: [PATCH 11/11] fix typo --- format/spec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/format/spec.md b/format/spec.md index 67d528186eea..07adcb63d0f0 100644 --- a/format/spec.md +++ b/format/spec.md @@ -152,7 +152,7 @@ Readers may be more strict for metadata JSON files because the JSON files are no ##### Writing data files -All columns must be written to data files even if they introduce redundancy with metadata stored in manifest file (e.g. columns with identity partition transforms). Writing all columns provides a backup in case of corruption or bugs in the metadata layer. +All columns must be written to data files even if they introduce redundancy with metadata stored in manifest files (e.g. columns with identity partition transforms). Writing all columns provides a backup in case of corruption or bugs in the metadata layer. ### Schemas and Data Types