Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 70 additions & 5 deletions mkdocs/docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -1527,17 +1527,40 @@ def cleanup_old_snapshots(table_name: str, snapshot_ids: list[int]):
cleanup_old_snapshots("analytics.user_events", [12345, 67890, 11111])
```

## Views
## Create a view

PyIceberg supports view operations.

### Check if a view exists
To create a view from a catalog:

```python
import time
import pyarrow as pa
from pyiceberg.catalog import load_catalog
from pyiceberg.view import SQLViewRepresentation, ViewVersion

catalog = load_catalog("default")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a section on top of create view to specify that in order to use the view endpoints from the catalog, the view configuration needs to have "view-endpoints-supported": "true"

from pyiceberg.catalog import load_catalog

catalog = load_catalog(
    "docs",
    **{
        "uri": "http://127.0.0.1:8181",
        "s3.endpoint": "http://127.0.0.1:9000",
        "py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO",
        "s3.access-key-id": "admin",
        "s3.secret-access-key": "password",
        "view-endpoints-supported": "true",
    }
)

catalog.view_exists("default.bar")

identifier = "default.some_view"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would just put the string directly in the create_view call

schema = pa.schema([pa.field("some_col", pa.int32())])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: for the example I believe would be best to show an Iceberg schema type instead of arrow but also note that create_view also accepts an arrow schema

view_version = ViewVersion(
version_id=1,
schema_id=1,
timestamp_ms=int(time.time() * 1000),
summary={},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
summary={},
summary={"spark-version": "4.1"},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to show what the summary is usually used for

representations=[
SQLViewRepresentation(
type="sql",
sql="SELECT 1 as some_col",
dialect="spark",
)
],
default_namespace=["default"],
)

catalog.create_view(
identifier=identifier,
schema=schema,
view_version=view_version,
)
```

## Register a view
Expand All @@ -1551,6 +1574,48 @@ catalog.register_view(
)
```

## Load a view

Loading the `some_view` view:

```python
view = catalog.load_view("default.some_view")
# Equivalent to:
view = catalog.load_view(("default", "some_view"))
# The tuple syntax can be used if the namespace or view contains a dot.
```

This returns a `View` that represents an Iceberg view. You can access the SQL representation for a specific dialect:

```python
sql_representation = view.sql_for("spark")
print(sql_representation.sql)
```

## Check if a view exists

To check whether the `some_view` view exists:

```python
catalog.view_exists("default.some_view")
```

## List views

To list views in the `default` namespace:

```python
catalog.list_views("default")
```

## Drop a view

To drop a view:

```python
catalog.drop_view("default.some_view")
```

## Table Statistics Management

Manage table statistics with operations through the `Table` API:
Expand Down