# Expiring Data

Over time, an Icechunk Repository will accumulate many snapshots, not all of which need to be kept around.

"Expiration" allows you to mark snapshots as expired, and "garbage collection" deletes all data (manifests, chunks, snapshots, etc.) associated with expired snapshots.

First create a Repository, configured so that there are no "inline" chunks. This will help illustrate that data is actually deleted.

In [1]:
import icechunk

repo = icechunk.Repository.create(icechunk.in_memory_storage(), config=icechunk.RepositoryConfig(inline_chunk_threshold_bytes=0))

## Generate a few snapshots

Let us generate a sequence of snapshots

In [2]:
import zarr
import time

for i in range(10):
    session = repo.writable_session("main")
    array = zarr.create_array(session.store, name="array", shape=(10,), fill_value=-1, dtype=int, overwrite=True)
    array[:] = i
    session.commit(f"snap {i}")
    time.sleep(0.1)

There are 10 snapshots

In [3]:
ancestry = list(repo.ancestry(branch="main"))
ancestry

[SnapshotInfo(id="SMQNJ3NVGKSMXFVB6QQG", parent_id=9S8E190XB39873W9VPSG, written_at=datetime.datetime(2025,3,21,21,40,39,295880, tzinfo=datetime.timezone.utc), message="snap 9..."),
 SnapshotInfo(id="9S8E190XB39873W9VPSG", parent_id=0EQPVS288RF079JFTYX0, written_at=datetime.datetime(2025,3,21,21,40,39,191270, tzinfo=datetime.timezone.utc), message="snap 8..."),
 SnapshotInfo(id="0EQPVS288RF079JFTYX0", parent_id=GD5AQP077SSSFZG072QG, written_at=datetime.datetime(2025,3,21,21,40,39,81612, tzinfo=datetime.timezone.utc), message="snap 7..."),
 SnapshotInfo(id="GD5AQP077SSSFZG072QG", parent_id=5B1B4NBNZ8FSK0QYM05G, written_at=datetime.datetime(2025,3,21,21,40,38,972363, tzinfo=datetime.timezone.utc), message="snap 6..."),
 SnapshotInfo(id="5B1B4NBNZ8FSK0QYM05G", parent_id=C4NMFFAF1R973AKN6SS0, written_at=datetime.datetime(2025,3,21,21,40,38,864720, tzinfo=datetime.timezone.utc), message="snap 5..."),
 SnapshotInfo(id="C4NMFFAF1R973AKN6SS0", parent_id=9F1D0ZM3Q6GWKVNDGSNG, written_at=datetim

## Expire snapshots

!!! danger
    Expiring snapshots is an irreversible operation. Use it with care. 

First we must expire snapshots. Here we will expire any snapshot older than the 5th one.

In [4]:
expiry_time = ancestry[5].written_at
expiry_time

datetime.datetime(2025, 3, 21, 21, 40, 38, 760523, tzinfo=datetime.timezone.utc)

In [5]:
repo.expire_snapshots(older_than=expiry_time)

{'0SR4GZQC0ETJG0X3ZKN0',
 '9F1D0ZM3Q6GWKVNDGSNG',
 'QWE7M59HGZVKSCWZYE60',
 'XNGF9QYWDS1JSR79KE3G'}

This prints out a list of snapshots that were expired.

!!! note
    The first snapshot is never expired!


Confirm that these are the right snapshots (remember that ancestry list commits in decreasing order of `written_at` time):

In [6]:
[a.id for a in ancestry[-5:-1]]

['9F1D0ZM3Q6GWKVNDGSNG',
 'QWE7M59HGZVKSCWZYE60',
 'XNGF9QYWDS1JSR79KE3G',
 '0SR4GZQC0ETJG0X3ZKN0']

Note that ancestry is now shorter:

In [7]:
list(repo.ancestry(branch="main"))

[SnapshotInfo(id="SMQNJ3NVGKSMXFVB6QQG", parent_id=9S8E190XB39873W9VPSG, written_at=datetime.datetime(2025,3,21,21,40,39,295880, tzinfo=datetime.timezone.utc), message="snap 9..."),
 SnapshotInfo(id="9S8E190XB39873W9VPSG", parent_id=0EQPVS288RF079JFTYX0, written_at=datetime.datetime(2025,3,21,21,40,39,191270, tzinfo=datetime.timezone.utc), message="snap 8..."),
 SnapshotInfo(id="0EQPVS288RF079JFTYX0", parent_id=GD5AQP077SSSFZG072QG, written_at=datetime.datetime(2025,3,21,21,40,39,81612, tzinfo=datetime.timezone.utc), message="snap 7..."),
 SnapshotInfo(id="GD5AQP077SSSFZG072QG", parent_id=5B1B4NBNZ8FSK0QYM05G, written_at=datetime.datetime(2025,3,21,21,40,38,972363, tzinfo=datetime.timezone.utc), message="snap 6..."),
 SnapshotInfo(id="5B1B4NBNZ8FSK0QYM05G", parent_id=C4NMFFAF1R973AKN6SS0, written_at=datetime.datetime(2025,3,21,21,40,38,864720, tzinfo=datetime.timezone.utc), message="snap 5..."),
 SnapshotInfo(id="C4NMFFAF1R973AKN6SS0", parent_id=0DXDJGNVAAS1HRF398FG, written_at=datetim

## Delete expired data

Use `Repository.garbage_collect` to delete data associated with expired snapshots

In [8]:
repo.garbage_collect(expiry_time)

GCSummary(bytes_deleted=3838, chunks_deleted=4, manifests_deleted=4, snapshots_deleted=4, attributes_deleted=0, transaction_logs_deleted=4)