-
Notifications
You must be signed in to change notification settings - Fork 3.2k
[SPEC] Add relative paths to v4 spec #15630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -57,6 +57,14 @@ Version 3 of the Iceberg spec extends data types and existing metadata structure | |
|
|
||
| The full set of changes are listed in [Appendix E](#version-3). | ||
|
|
||
| ### Version 4: Metadata Structure and Representation | ||
|
|
||
| Version 4 of the Iceberg spec restructures metadata for improved performance and new capabilities: | ||
|
|
||
| * Support for [relative locations](#file-locations-in-metadata) in metadata fields | ||
|
|
||
| The full set of changes are listed in [Appendix E](#version-4). | ||
|
|
||
| ## Goals | ||
|
|
||
| * **Serializable isolation** -- Reads will be isolated from concurrent writes and always use a committed snapshot of a table’s data. Writes will support removing and adding files in a single operation and are never partially visible. Readers will not acquire locks. | ||
|
|
@@ -123,9 +131,15 @@ Tables do not require random-access writes. Once written, data and metadata file | |
|
|
||
| Tables do not require rename, except for tables that use atomic rename to implement the commit operation for new metadata files. | ||
|
|
||
| ### File Locations in Metadata | ||
|
|
||
| All location fields in format versions 3 and prior contain fully-qualified paths. | ||
|
|
||
| Version 4 of the Iceberg spec adds support for relative locations in metadata, enabling tables to be relocated without rewriting metadata files. Relative locations are allowed in all metadata tracked location fields and are resolved against the table's base location. The table's location may be fixed in table metadata or inferred, but is intended to be managed and supplied by a catalog. Requirements for relativization and resolution are in [Relative Paths](#path-resolution) | ||
|
|
||
| ## Specification | ||
|
|
||
| #### Terms | ||
| ### Terms | ||
|
|
||
| * **Schema** -- Names and types of fields in a table. | ||
| * **Partition spec** -- A definition of how partition values are derived from data fields. | ||
|
|
@@ -134,8 +148,10 @@ Tables do not require rename, except for tables that use atomic rename to implem | |
| * **Manifest** -- A file that lists data or delete files; a subset of a snapshot. | ||
| * **Data file** -- A file that contains rows of a table. | ||
| * **Delete file** -- A file that encodes rows of a table that are deleted by position or data values. | ||
| * **Absolute path** -- A path string that includes a [URI](https://datatracker.ietf.org/doc/html/rfc3986#section-3.1) scheme and can be used directly. | ||
| * **Relative path** -- A path string without a URI scheme that must be [resolved](#path-resolution) against the table location. | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The spec defines Absolute path and Relative path in the bullets above, but fully-qualified path is used here (and on lines 136, 206, 1741, and as a row label in the example table at line 214) without a definition. The term is doing work that the two defined terms can't quite cover. Line 206 says "Any path from a manifest produced prior to v4 is a fully-qualified path and must be produced with a URI scheme if the scheme was omitted to be consistent with V4 paths." That implies a fully-qualified path may not have a URI scheme — which conflicts with the v4 Absolute path definition (which requires one). The "Missing scheme" example row at line 215 reinforces this: a pre-v4 path can lack a URI scheme yet still be treated as a usable, non-relative path. Two cleanup options:
Option 2 feels cleaner — one fewer term, and the migration rule on lines 1741-1742 reads more naturally as "v3 paths are absolute paths (with the scheme prepended if missing)."
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I disagree with this (I'm not sure if this was AI generated or your opinion). Versions prior to V4 were defined in the spec already as either "URI with scheme" or "fully-qualified". Those were the existing terms in the spec. I don't think we should go back to further define those terms as we may introduce new requirements on older versions. The new terms apply to V4 and the behaviors being introduced in this revision. You need to take into consideration backward compatibility, which means that we cannot apply option 2 as it would redefine prior versions of the spec.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the end, every comment is my opinion because I have reviewed or edited the comment before posting. Checked the current spec. There is one mention of fully qualified for the data file fields I was saying that it was not explicitly defined in the terms list. does it make sense to define the term of fully qualified path in the terms formally. |
||
| #### Writer requirements | ||
| ### Writer requirements | ||
|
|
||
| Some tables in this spec have columns that specify requirements for tables by version. These requirements are intended for writers when adding metadata files (including manifests files and manifest lists) to a table with the given version. | ||
|
|
||
|
|
@@ -168,6 +184,48 @@ All columns must be written to data files even if they introduce redundancy with | |
|
|
||
| Writers are not allowed to commit files with a partition spec that contains a field with an unknown transform. | ||
|
|
||
| ### Paths in Metadata | ||
|
|
||
| Path strings stored in Iceberg metadata location fields are classified as one of two types: | ||
|
|
||
| * **Absolute path** -- A path string that starts with a [URI scheme](https://datatracker.ietf.org/doc/html/rfc3986#section-3.1) (e.g., `s3:`, `gs:`, `hdfs:`, `file:`). Absolute paths are used as-is without modification. | ||
| * **Relative path** -- A path string that does not start with a URI scheme. Relative paths must be resolved against the table's base location before use. | ||
|
|
||
| Prior to v4, all path fields must contain fully-qualified paths. Starting with v4, path fields may contain either absolute or relative paths. [Relative resolution within a URI](https://datatracker.ietf.org/doc/html/rfc3986#section-5.2) (e.g. `.` and `..`) and other file system navigation conventions are not supported in relative paths. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems to imply that resolution was previously allowed, which isn't the case. I like the older wording that things like I would bring back the older version and then add the link to the RFC as a clarification: The relative resolution components defined by the RFC have no special meaning and are opaque to Iceberg.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We never explicitly prohibited it and I don't feel like we should be adding requirements that weren't there previously. I didn't change the wording, so I'm not sure what you're referring to. I added the reference to the URI that describes these behaviors since other people were confused about what kind path evaluations we were referring to. |
||
|
|
||
| #### Path Resolution | ||
|
|
||
| Path resolution is the process of producing an absolute path from a relative path by combining it with the table's base location: | ||
|
|
||
| * If the path contains a URI scheme, it is absolute and is used without modification. | ||
|
danielcweeks marked this conversation as resolved.
|
||
| * If the path does not contain a URI scheme, the resolved path is the table location followed by the relative path joined by the URI separator character `/`. | ||
|
|
||
| The relative portion is joined to the prefix (table location) without consideration of any additional separator characters. The recommended convention for table location is to not end in a path separator because the join process would add a second separator character. (See example below). | ||
|
|
||
| Any path from a manifest produced prior to v4 is a fully-qualified path and must be produced with a URI scheme if the scheme was omitted to be consistent with V4 paths. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this sentence reads a bit awkward. is this more clear? "Paths in pre-v4 manifests are fully-qualified. When a pre-v4 path omits a URI scheme, readers must prepend a scheme to produce a v4-consistent absolute path." |
||
|
|
||
| Examples of path resolution: | ||
|
|
||
| | | Format Version | Table Location | File Path | Resolved Path | Description | | ||
| |---------------------|----------------|-----------------------|-------------------------------------------|--------------------------------------------|-------------------------------------| | ||
| | Relative Path | v4 | s3://bucket/db/table | data/00000-0.parquet | s3://bucket/db/table/data/00000-0.parquet | Path parts are joined on `/` | | ||
| | Absolute Path | v4 | s3://bucket/db/table | hdfs:/wh/db/table/data/00000-0.parquet | hdfs://wh/db/table/data/00000-0.parquet | Absolute path is used | | ||
| | Duplicate separator | v4 | s3://bucket/db/table/ | data/00000-0.parquet | s3://bucket/db/table//data/00000-0.parquet | Join results in duplicate `//` | | ||
| | Duplicate separator | v4 | s3://bucket/db/table | /data/00000-0.parquet | s3://bucket/db/table//data/00000-0.parquet | Join results in duplicate `//` | | ||
| | Fully-qualified | v3 and earlier | s3://bucket/db/table | s3://bucket/db/table/data/00000-0.parquet | s3://bucket/db/table/data/00000-0.parquet | Fully-qualified path is used | | ||
| | Missing scheme | v3 and earlier | /wh/db/table | /wh/db/table/data/00000-0.parquet | hdfs:/wh/db/table/data/00000-0.parquet | Scheme is prepended for consistency | | ||
|
|
||
| #### Path Relativization | ||
|
|
||
| Path relativization is the process of converting an absolute path to a relative path by removing the table location prefix. This is used when persisting paths to metadata files. | ||
|
|
||
| * If an absolute path starts with the table location immediately followed by a separator character, the relative path is the remainder of the string after the separator character. | ||
| * If an absolute path does not start with the table location immediately followed by the separator character, it is stored as an absolute path. | ||
|
|
||
| #### Table Location Specification | ||
|
|
||
| When the `location` field is present in table metadata, it is used directly as the table's base location. When the `location` field is not present (v4 and later), the table location must be provided. How the table location is persisted or determined when not specified in metadata is not a table-level concern; catalogs should provide a table's location | ||
|
|
||
| ### Schemas and Data Types | ||
|
|
||
| A table's **schema** is a list of named columns. All data types are either primitives or nested types, which are maps, lists, or structs. A table schema is also a struct type. | ||
|
|
@@ -954,6 +1012,34 @@ Table metadata consists of the following fields: | |
| | _optional_ | _optional_ | _optional_ | **`partition-statistics`** | A list (optional) of [partition statistics](#partition-statistics). | | ||
| | | | _required_ | **`next-row-id`** | A `long` higher than all assigned row IDs; the next snapshot’s `first-row-id`. See [Row Lineage](#row-lineage). | | ||
| | | | _optional_ | **`encryption-keys`** | A list (optional) of [encryption keys](#encryption-keys) used for table encryption. | | ||
| === "v4" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I didn't expect this to add a new table for v4. Is this the same as the old table, except with a v4 requirement column?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The purpose of adding the tabs was to separate V4 as we go to new structure. Why would we update the old table and add tabs with not additional tabs if we're going to just remove that and add the tab later? |
||
| | v4 | Field | Description | | ||
| |------------|-----------------------------|-------------| | ||
| | _required_ | **`format-version`** | An integer version number for the format. Implementations must throw an exception if a table's version is higher than the supported version. | | ||
| | _required_ | **`table-uuid`** | A UUID that identifies the table, generated when the table is created. Implementations must throw an exception if a table's UUID does not match the expected UUID after refreshing metadata. | | ||
| | _optional_ | **`location`** | The table's base location. This is used by writers to determine where to store data files, manifest files, and table metadata files. Must be an absolute path when present. See [Table Locations](#table-location-specification). | | ||
|
danielcweeks marked this conversation as resolved.
|
||
| | _required_ | **`last-sequence-number`** | The table's highest assigned sequence number, a monotonically increasing long that tracks the order of snapshots in a table. | | ||
| | _required_ | **`last-updated-ms`** | Timestamp in milliseconds from the unix epoch when the table was last updated. Each table metadata file should update this field just before writing. | | ||
| | _required_ | **`last-column-id`** | An integer; the highest assigned column ID for the table. This is used to ensure columns are always assigned an unused ID when evolving schemas. | | ||
| | | **`schema`** | The table’s current schema. (**Deprecated**: use `schemas` and `current-schema-id` instead) | | ||
| | _required_ | **`schemas`** | A list of schemas, stored as objects with `schema-id`. | | ||
| | _required_ | **`current-schema-id`** | ID of the table's current schema. | | ||
| | | **`partition-spec`** | The table’s current partition spec, stored as only fields. Note that this is used by writers to partition data, but is not used when reading because reads use the specs stored in manifest files. (**Deprecated**: use `partition-specs` and `default-spec-id` instead) | | ||
| | _required_ | **`partition-specs`** | A list of partition specs, stored as full partition spec objects. | | ||
| | _required_ | **`default-spec-id`** | ID of the "current" spec that writers should use by default. | | ||
| | _required_ | **`last-partition-id`** | An integer; the highest assigned partition field ID across all partition specs for the table. This is used to ensure partition fields are always assigned an unused ID when evolving specs. | | ||
| | _optional_ | **`properties`** | A string to string map of table properties. This is used to control settings that affect reading and writing and is not intended to be used for arbitrary metadata. For example, `commit.retry.num-retries` is used to control the number of commit retries. | | ||
| | _optional_ | **`current-snapshot-id`** | `long` ID of the current table snapshot; must be the same as the current ID of the `main` branch in `refs`. | | ||
| | _optional_ | **`snapshots`** | A list of valid snapshots. Valid snapshots are snapshots for which all data files exist in the file system. A data file must not be deleted from the file system until the last snapshot in which it was listed is garbage collected. | | ||
| | _optional_ | **`snapshot-log`** | A list (optional) of timestamp and snapshot ID pairs that encodes changes to the current snapshot for the table. Each time the current-snapshot-id is changed, a new entry should be added with the last-updated-ms and the new current-snapshot-id. When snapshots are expired from the list of valid snapshots, all entries before a snapshot that has expired should be removed. | | ||
| | _optional_ | **`metadata-log`** | A list (optional) of timestamp and metadata file location pairs that encodes changes to the previous metadata files for the table. Each time a new metadata file is created, a new entry of the previous metadata file location should be added to the list. Tables can be configured to remove oldest metadata log entries and keep a fixed-size log of the most recent entries after a commit. | | ||
| | _required_ | **`sort-orders`** | A list of sort orders, stored as full sort order objects. | | ||
| | _required_ | **`default-sort-order-id`** | Default sort order id of the table. Note that this could be used by writers, but is not used when reading because reads use the specs stored in manifest files. | | ||
| | _optional_ | **`refs`** | A map of snapshot references. The map keys are the unique snapshot reference names in the table, and the map values are snapshot reference objects. There is always a `main` branch reference pointing to the `current-snapshot-id` even if the `refs` map is null. | | ||
| | _optional_ | **`statistics`** | A list (optional) of [table statistics](#table-statistics). | | ||
| | _optional_ | **`partition-statistics`** | A list (optional) of [partition statistics](#partition-statistics). | | ||
| | _required_ | **`next-row-id`** | A `long` higher than all assigned row IDs; the next snapshot's `first-row-id`. See [Row Lineage](#row-lineage). | | ||
| | _optional_ | **`encryption-keys`** | A list (optional) of [encryption keys](#encryption-keys) used for table encryption. | | ||
|
|
||
| For serialization details, see Appendix C. | ||
|
|
||
|
|
@@ -1647,6 +1733,30 @@ The binary single-value serialization can be used to store the lower and upper b | |
|
|
||
| ## Appendix E: Format version changes | ||
|
|
||
| ### Version 4 | ||
|
|
||
| Relative path support is added in v4. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I feel we probably need to introduce sub headers for Version 4, as we have so many changes scoped for V4. if each feature has multiple paragraphs, it is hard to track. The alternative is that each feature only has bullet points. E.g. the row lineage in V3 section below has a large bullet list. I am slightly favor the former. |
||
|
|
||
| Reading v3 metadata for v4: | ||
|
|
||
| * All location fields are fully-qualified paths and interpreted as absolute paths for v4 | ||
| * Any location field without a uri scheme prefix must prepend a scheme component consistent with v4 absolute paths | ||
|
|
||
| Writing v4 metadata: | ||
|
|
||
| * Table metadata JSON: | ||
| * `location` is now optional and must be absolute when present | ||
| * When not present, the table location must be managed externally and provided when loading the metadata | ||
| * Location fields in all metadata structures may contain relative paths | ||
| * Writers should produce relative paths by default for files that reside under the table location | ||
| * Absolute paths must be used for files that do not share a common prefix with the table location | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same "shares a common prefix" ambiguity flagged in the inline at line 1905. Line 1750 above ("files that reside under the table location") is also informal terminology for what is now formally defined in Path Relativization. Suggested rewording for lines 1750–1751:
This way the writer rule is exactly the Path Relativization rule applied — no chance of producing a relative form that doesn't round-trip safely under table relocation. |
||
|
|
||
| Reading v4 metadata: | ||
|
|
||
| * Readers must check whether location fields contain a URI scheme to determine if a path is absolute or relative | ||
| * Relative paths must be resolved against the table location before use (see [Path Resolution](#path-resolution)) | ||
| * When `location` is omitted, the table location must be provided (see [Table Location Specification](#table-location-specification)) | ||
|
|
||
| ### Version 3 | ||
|
|
||
| Default values are added to struct fields in v3. | ||
|
|
@@ -1777,6 +1887,24 @@ Note that these requirements apply when writing data to a v2 table. Tables that | |
|
|
||
| This section covers topics not required by the specification but recommendations for systems implementing the Iceberg specification to help maintain a uniform experience. | ||
|
|
||
| ### Path Construction | ||
|
|
||
| Path construction is the process by which new file locations are created for output files referenced by metadata. While the specific construction logic is not strictly required by the spec, the following guidance is provided for reference implementations to encourage consistency. | ||
|
|
||
| The table properties `write.metadata.path` and `write.data.path` control where metadata and data files are written. When not specified, these default to the values `metadata` and `data` respectively. | ||
|
|
||
| For all metadata files: | ||
|
|
||
| * If `write.metadata.path` is an absolute path, it is used directly as the base for new metadata files. | ||
| * If `write.metadata.path` is a relative path, the metadata base is the table location joined to the `write.metadata.path` value with a URI separator `/`. | ||
|
|
||
| For data files: | ||
|
|
||
| * If `write.data.path` is an absolute path, it is used directly as the base for new data files. | ||
| * If `write.data.path` is a relative path, the base is the table location joined to the `write.data.path` value with a URI separator `/`. | ||
|
|
||
| When persisting paths into metadata, writers should relativize paths against the table location (see [Path Relativization](#path-relativization)). If a file's absolute path shares a common prefix with the table location, the relative portion should be stored. Otherwise, the absolute path should be stored. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This recommendation wasn't updated to match the new boundary-aware rule in Path Relativization. "Shares a common prefix" is the original ambiguous phrasing — the same one that allows Suggested rewording to mirror the normative section:
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If allowed by the table version? |
||
|
|
||
| ### Point in Time Reads (Time Travel) | ||
|
|
||
| Iceberg supports two types of histories for tables. A history of previous "current snapshots" stored in ["snapshot-log" table metadata](#table-metadata-fields) and [parent-child lineage stored in "snapshots"](#table-metadata-fields). These two histories | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.