Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
8c5d276
add materialzied view to view spec
JanKaul Aug 21, 2024
afc4a0d
add uuid to source table
JanKaul Aug 29, 2024
b2d0b68
Aktualisieren von view-spec.md
JanKaul Sep 5, 2024
27783c7
improve refresh-state description
JanKaul Sep 7, 2024
e85ab16
remove identifier from refresh-state
JanKaul Sep 19, 2024
cff9596
fix comments
JanKaul Oct 4, 2024
a8b52b2
incorporate comments
JanKaul Nov 21, 2024
521477f
fix MV introduction
JanKaul Nov 25, 2024
ed85e95
Update format/view-spec.md
JanKaul Dec 4, 2024
3bc583c
fix comments
JanKaul Dec 4, 2024
49d5da8
fix spelling
JanKaul Dec 10, 2024
d18c9da
fix view-version-id in refresh-state
JanKaul Dec 15, 2024
eb7d71b
Update format/view-spec.md
JanKaul Jan 10, 2025
8ffff63
Update format/view-spec.md
JanKaul Jan 10, 2025
6b065f5
fix introduction wording
JanKaul Jan 10, 2025
0e17881
rename full identifier to table identifier
JanKaul Feb 12, 2025
7e9dc11
fix comments
JanKaul Feb 18, 2025
efed628
Update format/view-spec.md
JanKaul May 13, 2025
0673113
Update format/view-spec.md
JanKaul May 13, 2025
476aced
Update format/view-spec.md
JanKaul May 13, 2025
a02ff98
Merge branch 'apache:main' into materialized-view-spec
JanKaul Jul 6, 2025
9a377e0
clarify storage "fresh", "stale" and "invalid"
JanKaul Jul 6, 2025
3fec943
clarify that refresh-state is set on every storage table snapshot
JanKaul Jul 6, 2025
413ceb4
Add reference to from table to MV spec
Jul 6, 2025
295035e
Remove optional catalog field from storage table identifier
Jul 6, 2025
25adf5b
fix typo
Jul 6, 2025
ae0b005
Update format/view-spec.md
stevenzwu Jul 28, 2025
95740e0
Update format/view-spec.md
JanKaul Aug 20, 2025
9b492c9
fix refresh metadata
JanKaul Aug 20, 2025
878b66b
fix comment
JanKaul Aug 20, 2025
fe2dec7
fix duplication
JanKaul Aug 21, 2025
a2ca4b2
fix view-version-id
JanKaul Oct 22, 2025
6dffbee
update refresh-state
JanKaul Oct 22, 2025
ec2ac6a
fix comments
JanKaul Oct 29, 2025
48c1553
add max-staleness
JanKaul Nov 26, 2025
d154f8d
fix lint errors
JanKaul Nov 26, 2025
25d70dd
Add the case for max-staleness being null
JanKaul Nov 26, 2025
e07b7ef
remove last line
JanKaul Nov 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions format/spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -1846,3 +1846,7 @@ The Geometry and Geography class hierarchy and its Well-known text (WKT) and Wel
Points are always defined by the coordinates X, Y, Z (optional), and M (optional), in this order. X is the longitude/easting, Y is the latitude/northing, and Z is usually the height, or elevation. M is a fourth optional dimension, for example a linear reference value (e.g., highway milepost value), a timestamp, or some other value as defined by the CRS.

The version of the OGC standard first used here is 1.2.1, but future versions may also be used if the WKB representation remains wire-compatible.

## Appendix H: Materialized Views

Iceberg tables can be used as storage tables for [Iceberg Materialized Views](view-spec.md#materialized-views). The Materialized View specification is an extension of the [View Spec](view-spec.md) that defines how precomputed query results are stored and maintained using Iceberg tables as the underlying storage layer.
72 changes: 72 additions & 0 deletions format/view-spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,28 @@ An atomic swap of one view metadata file for another provides the basis for maki

Writers create view metadata files optimistically, assuming that the current metadata location will not be changed before the writer's commit. Once a writer has created an update, it commits by swapping the view's metadata file pointer from the base location to the new location.

### Materialized Views

Materialized views are a type of view with precomputed results from the view query stored as a table.
When queried, engines may return the precomputed data for the materialized views, shifting the cost of query execution to the precomputation step.

Iceberg materialized views are implemented as a combination of an Iceberg view and an underlying Iceberg table, the "storage-table", which stores the precomputed data.
Materialized View metadata is a superset of View metadata with an additional pointer to the storage table. The storage table is an Iceberg table with additional materialized view refresh state metadata.
Refresh metadata contains information about the "source tables" and/or "source views", which are the tables/views referenced in the query definition of the materialized view.
During read time, a materialized view (storage table) can be interpreted as "fresh", "stale" or "invalid", depending on the following situations:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are many use cases for allowing an engine to use a stale materialization, can we add a "warm" situation with the description:

warm - The snapshot_ids do not match for at-least one source table and the snapshot was committed after the current time minus materialization.max-stalessness-ms

* **fresh** -- The `snapshot_id`s of the last refresh operation match the current `snapshot_id`s of all the source tables.
* **stale** -- The `snapshot_id`s do not match for at-least one source table, indicating that a refresh operation needs to be performed to capture the latest source table changes.
* **invalid** -- The current `version_id` of the materialized view does not match the `view-version-id` of the refresh state.

## Specification

### Terms

* **Schema** -- Names and types of fields in a view.
* **Version** -- The state of a view at some point in time.
* **Storage table** -- Iceberg table that stores the precomputed data of the materialized view.
* **Source table** -- A table reference that occurs in the query definition of the materialized view. The materialized view depends on the data from the source tables.
* **Source view** -- A view reference that occurs in the query definition of the materialized view. The materialized view depends on the definitions from the source views.

### View Metadata

Expand All @@ -63,11 +79,13 @@ The view version metadata file has the following fields:
| _required_ | `versions` | A list of known [versions](#versions) of the view [1] |
| _required_ | `version-log` | A list of [version log](#version-log) entries with the timestamp and `version-id` for every change to `current-version-id` |
| _optional_ | `properties` | A string to string map of view properties [2] |
| _optional_ | `max-staleness-ms` | The maximum time interval in milliseconds after a refresh operation during which the materialized view's data is considered fresh [3] |

Notes:

1. The number of versions to retain is controlled by the view property: `version.history.num-entries`.
2. Properties are used for metadata such as `comment` and for settings that affect view maintenance. This is not intended to be used for arbitrary metadata.
3. The `max-staleness-ms` field only applies to materialized views and must be set to `null` for common views. If `max-staleness-ms` is not `null` and the time elapsed since the last refresh operation is less than `max-staleness-ms`, the query engine may return data directly from the `storage-table` without evaluating freshness based on the source tables and views. If `max-staleness-ms` is `null` for a materialized view, the data in the `storage-table` is always considered fresh.

#### Versions

Expand All @@ -82,9 +100,12 @@ Each version in `versions` is a struct with the following fields:
| _required_ | `representations` | A list of [representations](#representations) for the view definition |
| _optional_ | `default-catalog` | Catalog name to use when a reference in the SELECT does not contain a catalog |
| _required_ | `default-namespace` | Namespace to use when a reference in the SELECT is a single identifier |
| _optional_ | `storage-table` | A [storage table identifier](#storage-table-identifier) of the storage table |

Copy link
Contributor

@talatuyarer talatuyarer Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max-staleness is optional field
The maximum time-lag allowed for the view's data, in milliseconds. A query planner will consider the view 'stale' if its data is older than this duration. A value of 0 or null implies no staleness target is defined, and the view must be refreshed manually.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought of max-staleness as a configuration property more than a metadata field. I think we touched on this in the last sync but didn't come to a conclusion.

Anyway, I like your description and I can add it either as a property or a metadata field, if there is consensus.

When `default-catalog` is `null` or not set, the catalog in which the view is stored must be used as the default catalog.

When 'storage-table' is `null` or not set, the entity is a common view, otherwise it is a materialized view.

#### Summary

Summary is a string to string map of metadata about a view version. Common metadata keys are documented here.
Expand Down Expand Up @@ -160,6 +181,57 @@ Each entry in `version-log` is a struct with the following fields:
| _required_ | `timestamp-ms` | Timestamp when the view's `current-version-id` was updated (ms from epoch) |
| _required_ | `version-id` | ID that `current-version-id` was set to |

#### Storage Table Identifier

The table identifier for the storage table that stores the precomputed results.

| Requirement | Field name | Description |
|-------------|----------------|-------------|
| _required_ | `namespace` | A list of strings for namespace levels |
| _required_ | `name` | A string specifying the name of the table/view |

### Storage table metadata

This section describes additional metadata for the storage table that supplements the regular table metadata and is required for materialized views.
The property "refresh-state" is set on the [snapshot summary](https://iceberg.apache.org/spec/#snapshots) property of every storage table snapshot to determine the freshness of the precomputed data of the storage table.

| Requirement | Field name | Description |
|-------------|-----------------|-------------|
| _required_ | `refresh-state` | A [refresh state](#refresh-state) record stored as a JSON-encoded string |

#### Refresh state

The refresh state record captures the state of all source tables, views, and materialized views in the materialized view's fully expanded query tree at refresh time. Source table states are stored in `source-table-states` and source view states in `source-view-states`. For source views, `source-view-states` includes indirect references — tables or views nested within other views (exluding MVs) but not directly referenced in the query.
For source materialized views, both the source view and its storage table are included in the refresh state. Indirect references are excluded for materialized view sources; during read time, query engines may recursively expand the query tree to determine freshness. The refresh state has the following fields:

| Requirement | Field name | Description |
|-------------|----------------|-------------|
| _required_ | `view-version-id` | The `version-id` of the materialized view when the refresh operation was performed |
| _required_ | `source-table-states` | A list of [source table](#source-table) records for all tables that are directly or indirectly referenced in the materialized view query |
| _required_ | `source-view-states` | A list of [source view](#source-view) records for all views that are directly or indirectly referenced in the materialized view query |
| _required_ | `refresh-start-timestamp-ms` | A timestamp of when the refresh operation was started |

#### Source table

A source table record captures the state of a source table at the time of the last refresh operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the state of a source table (including source MV's storage table)?


| Requirement | Field name | Description |
|-------------|----------------|-------------|
| _required_ | `uuid` | The uuid of the source table |
| _required_ | `snapshot-id` | Snapshot-id of when the last refresh operation was performed |
| _optional_ | `ref` | Branch name of the source table being referenced in the view query |

When `ref` is `null` or not set, it defaults to "main".

#### Source view

A source view record captures the state of a source view at the time of the last refresh operation.

| Requirement | Field name | Description |
|-------------|----------------|-------------|
| _required_ | `uuid` | The uuid of the source view |
| _required_ | `version-id` | Version-id of when the last refresh operation was performed |

## Appendix A: An Example

The JSON metadata file format is described using an example below.
Expand Down