A DataFrame-like library for working with Apache Iceberg tables using REST catalogs with local execution.
IceFrame provides a simple, intuitive API for creating, reading, updating, and deleting Iceberg tables, as well as performing maintenance operations and exporting data.
- DataFrame API: Familiar interface for working with tables
- Local Execution: Uses PyIceberg, PyArrow, and Polars for efficient local processing
- Catalog Support: Works with REST catalogs (including Dremio, Tabular, etc.) and supports credential vending
- CRUD Operations: Create, Read, Update, Delete tables and data
- Maintenance: Expire snapshots, remove orphan files, compact data files
- Export: Export data to Parquet, CSV, and JSON
pip install iceframeFor cloud storage support:
pip install "iceframe[aws]" # AWS S3
pip install "iceframe[gcs]" # Google Cloud Storage
pip install "iceframe[azure]" # Azure Data Lake Storage- Create a
.envfile with your catalog credentials (see.env.example):
ICEBERG_CATALOG_URI=https://catalog.dremio.cloud/api/iceberg
ICEBERG_TOKEN=your_token
ICEBERG_WAREHOUSE=your_warehouse
ICEBERG_CATALOG_TYPE=rest- Use IceFrame in your code:
from iceframe import IceFrame
from iceframe.utils import load_catalog_config_from_env
import polars as pl
# Initialize
config = load_catalog_config_from_env()
ice = IceFrame(config)
# Create a table
schema = {
"id": "long",
"name": "string",
"created_at": "timestamp"
}
ice.create_table("my_table", schema)
# Append data
data = pl.DataFrame({
"id": [1, 2],
"name": ["Alice", "Bob"],
"created_at": [pl.datetime(2024, 1, 1), pl.datetime(2024, 1, 2)]
})
ice.append_to_table("my_table", data)
# Read data
df = ice.read_table("my_table")
print(df)
# Query Builder API
from iceframe.expressions import col
from iceframe.functions import sum
df = (ice.query("my_table")
.select("name", sum(col("id")).alias("total_id"))
.group_by("name")
.execute())
print(df)IceFrame builds on top of PyIceberg, adding high-level abstractions and missing features.
| Feature | PyIceberg (Native) | IceFrame (Enhanced) |
|---|---|---|
| Table CRUD | Low-level API | Simplified create_table, drop_table |
| Data Writing | Arrow/Pandas integration | Polars integration, Auto-schema inference |
| Branching | Basic support (WIP) | create_branch, fast_forward, WAP Pattern |
| Compaction | rewrite_data_files (limited) |
bin_pack, sort strategies (Polars-based) |
| Views | Catalog-dependent | Unified ViewManager abstraction |
| Maintenance | expire_snapshots |
GarbageCollector, Native remove_orphan_files |
| SQL Support | None | Fluent Query Builder (select, filter, join) |
| Ingestion | add_files |
add_files wrapper + Incremental Ingestion recipes |
| Rollback | manage_snapshots |
rollback_to_snapshot, rollback_to_timestamp |
| Async | None | AsyncIceFrame for non-blocking I/O |
- Architecture
- Creating Tables
- Reading Tables
- Updating Tables
- Deleting Tables
- Query Builder API
- Namespace Management
- Schema Evolution
- Partition Management
- Data Quality
- Table Maintenance
- Exporting Data
- CLI Usage
- Dependencies
- Incremental Processing
- Table Statistics
- Scalability Features
- Advanced Iceberg Features
- JOIN Support
- Branching & Tagging
- Rollback & History
- Bulk Ingestion
- Catalog Operations
- Native Maintenance
- Async Operations
- AI Agent
- Pydantic Integration
- Notebook Integration
- Data Ingestion
- ETL Pipeline - Simple Extract-Transform-Load workflow
- SCD Type 2 - Handling slowly changing dimensions
- Incremental Ingestion - Processing only new data
- Data Quality Gate - Write-Audit-Publish pattern