Skip to content

Dataframe like library and AI Agent for working with Apache Iceberg in Python, using pyiceberg plus natively implemented procedure extensions

Notifications You must be signed in to change notification settings

AlexMercedCoder/iceframe

Repository files navigation

IceFrame (Alpha)

A DataFrame-like library for working with Apache Iceberg tables using REST catalogs with local execution.

IceFrame provides a simple, intuitive API for creating, reading, updating, and deleting Iceberg tables, as well as performing maintenance operations and exporting data.

Features

  • DataFrame API: Familiar interface for working with tables
  • Local Execution: Uses PyIceberg, PyArrow, and Polars for efficient local processing
  • Catalog Support: Works with REST catalogs (including Dremio, Tabular, etc.) and supports credential vending
  • CRUD Operations: Create, Read, Update, Delete tables and data
  • Maintenance: Expire snapshots, remove orphan files, compact data files
  • Export: Export data to Parquet, CSV, and JSON

Installation

pip install iceframe

For cloud storage support:

pip install "iceframe[aws]"   # AWS S3
pip install "iceframe[gcs]"   # Google Cloud Storage
pip install "iceframe[azure]" # Azure Data Lake Storage

Quick Start

  1. Create a .env file with your catalog credentials (see .env.example):
ICEBERG_CATALOG_URI=https://catalog.dremio.cloud/api/iceberg
ICEBERG_TOKEN=your_token
ICEBERG_WAREHOUSE=your_warehouse
ICEBERG_CATALOG_TYPE=rest
  1. Use IceFrame in your code:
from iceframe import IceFrame
from iceframe.utils import load_catalog_config_from_env
import polars as pl

# Initialize
config = load_catalog_config_from_env()
ice = IceFrame(config)

# Create a table
schema = {
    "id": "long",
    "name": "string",
    "created_at": "timestamp"
}
ice.create_table("my_table", schema)

# Append data
data = pl.DataFrame({
    "id": [1, 2],
    "name": ["Alice", "Bob"],
    "created_at": [pl.datetime(2024, 1, 1), pl.datetime(2024, 1, 2)]
})
ice.append_to_table("my_table", data)

# Read data
df = ice.read_table("my_table")
print(df)

# Query Builder API
from iceframe.expressions import col
from iceframe.functions import sum

df = (ice.query("my_table")
      .select("name", sum(col("id")).alias("total_id"))
      .group_by("name")
      .execute())
print(df)

Feature Comparison: IceFrame vs PyIceberg

IceFrame builds on top of PyIceberg, adding high-level abstractions and missing features.

Feature PyIceberg (Native) IceFrame (Enhanced)
Table CRUD Low-level API Simplified create_table, drop_table
Data Writing Arrow/Pandas integration Polars integration, Auto-schema inference
Branching Basic support (WIP) create_branch, fast_forward, WAP Pattern
Compaction rewrite_data_files (limited) bin_pack, sort strategies (Polars-based)
Views Catalog-dependent Unified ViewManager abstraction
Maintenance expire_snapshots GarbageCollector, Native remove_orphan_files
SQL Support None Fluent Query Builder (select, filter, join)
Ingestion add_files add_files wrapper + Incremental Ingestion recipes
Rollback manage_snapshots rollback_to_snapshot, rollback_to_timestamp
Async None AsyncIceFrame for non-blocking I/O

Documentation

Advanced Features

Scalability

Recipes & Patterns

About

Dataframe like library and AI Agent for working with Apache Iceberg in Python, using pyiceberg plus natively implemented procedure extensions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages