# From JSON to time travel in a second

This notebook demonstrates the power of Apache Iceberg in just a few lines of code:

* Read JSON data, create an Iceberg table and commit the data to the table
* Query the table instantly with SQL
* Append more data and query again
* Travel back in time to the first commit

All of this happens in seconds, with full ACID transactions and automatic versioning with the help of a so-called catalog. To get started, we'll use a simple SQLite-based catalog for demo purposes.


In [None]:
import daft
import pyarrow as pa
from pathlib import Path
from pyiceberg.catalog.sql import SqlCatalog
from IPython.display import display

warehouse_path = Path('../data/warehouse_quick_demo').absolute()
warehouse_path.mkdir(parents=True, exist_ok=True)
catalog_db = warehouse_path / 'catalog.db'
catalog_db.unlink(missing_ok=True)  # Fresh start

catalog = SqlCatalog(
    'demo',
    **{'uri': f'sqlite:///{catalog_db}', 'warehouse': f'file://{warehouse_path}'}
)
catalog.create_namespace('demo')
print("✅ Catalog ready")

Now we import some data, create an Iceberg table and insert the data into the table, then query it with SQL.

In [None]:
# Read first 100K events from JSON
df_events = daft.read_json('../data/input/events.jsonl')
df_batch1 = df_events.limit(100000)

# Convert to Arrow, create Iceberg table and append the data
arrow_table = df_batch1.to_arrow()
iceberg_table = catalog.create_table('demo.events', schema=pa.schema(arrow_table.schema))
iceberg_table.append(arrow_table)

# Query with Daft
df = daft.read_iceberg(iceberg_table)
daft.sql("SELECT COUNT(*) as total FROM df").show()

Let's add some more data and query the table again.

In [None]:
# Append next 100K events
df_batch2 = df_events.offset(100000).limit(100000)
arrow_table = df_batch2.to_arrow()
iceberg_table.append(arrow_table)

# Query again - now includes both batches
df = daft.read_iceberg(iceberg_table)
print("\nTotal events after append:")
daft.sql("SELECT COUNT(*) as total FROM df").show()

Each commit to an Iceberg table creates a new version, and you can go back to older versions ("time travel").

In [None]:
# Time travel: go back to first snapshot
history = iceberg_table.history()
first_snapshot_id = history[0].snapshot_id

# Read data as it was in the first snapshot
df_past = daft.read_iceberg(iceberg_table, snapshot_id=first_snapshot_id)
daft.sql("SELECT COUNT(*) as total FROM df_past").show()

print("\nYou just:")
print("  • Created a versioned table from JSON")
print("  • Queried it with SQL")
print("  • Appended new data")
print("  • Traveled back in time")
print("\nAll with ACID transactions, no data rewrites, in seconds!")