# Getting information about data store collections (datasets) using `ecmwf-datastores-client`

## Description

This script retrieves and lists all available datastore datasets (also known as *collection IDs*), sorted by their last update date.

It returns a `Collections` instance. Once retrieved, the details of a specific dataset can be explored individually.

## What You Need Before Starting

1. You have an **active internet connection**.
2. You are using **Python 3.12.8** (or a compatible version).
3. The latest version of the `ecmwf-datastores-client` package is installed and configured on your system. Also, the extra libraries that you need. You can uncomment and run the next cell.
4. You have a **CDS account** or **ADS account** with your API credentials stored in a `.ecmwfdatastoresrc` file. If not, please refer to the notebook "Getting started".

In [None]:
# Install the required libraries to run this notebook
# !pip install -U ecmwf-datastores-client itables

In [None]:
import os

import pandas as pd

from ecmwf.datastores import Client

**Tip:** If you don’t want to see warnings while running your notebook, you can uncomment the following cell:

In [None]:
# import warnings
# warnings.filterwarnings("ignore")

## Set Up the API Client

Connect to the Copernicus Data Store using your saved credentials.

In [None]:
client = Client()
client.check_authentication()
print("✅ Connected successfully to the Data Store!")

## Retrieving collections information

We are going to explore how to search and browse through our catalogue collections.  

For that we are going to user the `get_collections()` function which allows you to find and list all collections from our catalogue. 

In [None]:
# Requests collections, sorting them by most recently updated
collections = client.get_collections(sortby="update")

In [None]:
#  Create an empty list to store the collection details we extract
collection_info = []

# Loop through the collections and get the information,
# as long as there are more collections to retrieve
while collections is not None:  # Loop over pages
    collection_info.extend(
        [collection["id"], collection["title"]]
        for collection in collections.json["collections"]
    )
    collections = collections.next  # Move to the next page

# After this code runs, collection_info contains ID and title pairs
# for all collections, starting with the most recently updated ones.

In [None]:
print(
    "There are %i collections available from %s." % (len(collection_info), client.url)
)

### Organise the collections' data into a convenient table format

In [None]:
# Empty list that will hold our formatted collection information
collections_data = []

# The loop goes through each pair of collection ID and title we gathered earlier
# For each collection, it creates a small "package" of information (a dictionary)
# Each package contains the collection's ID and title with clear labels
for collection_id, collection_title in collection_info:
    collections_data.append(
        {
            "collection_id": collection_id,
            "collection_title": collection_title,
        }
    )

# Finally, convert our list of dictionaries into a table (a DataFrame),
# where each row represents one collection from our system
df_collections = pd.DataFrame(collections_data)

In [None]:
# Display the collections data as an interactive table
from itables import init_notebook_mode, show

# Initialize interactive table mode
init_notebook_mode(all_interactive=True)

# Rename columns for clarity
df_collections_display = df_collections.rename(
    columns={"collection_id": "Collection ID", "collection_title": "Collection Name"}
)

# Configure table options for better appearance and functionality
show(
    df_collections_display,
    columnDefs=[
        {"width": "35%", "targets": 0},  # ID column width
        {"width": "65%", "targets": 1},  # Name column width
    ],
    dom="Bfrtip",  # Add buttons, filter, processing indicator
    buttons=["copy", "csv", "excel"],  # Add export options
    paging=True,
    pageLength=15,  # Show 15 rows per page
    lengthMenu=[
        [10, 15, 25, 50, -1],
        ["10 rows", "15 rows", "25 rows", "50 rows", "All"],
    ],
    order=[[1, "asc"]],  # Default sort by Collection Name
    style="display: inline-block; width: 100%",
)

print("\nTip: You can search, sort, and export this table using the controls above.")

## Working with Individual Collections

Once you've found a collection that interests you, you can get more detailed information about it using the get_collection() function.

### How to Use the Function

In [None]:
# Retrieve detailed information about a specific collection
# You need to provide the Collection ID (You can find collection IDs in the table we created above)
collection = client.get_collection(collection_id="reanalysis-era5-pressure-levels")

### What Information You'll Get

When you retrieve a collection, you'll have access to many useful details:

#### Basic Information
- **Title**: The name of the collection (`collection.title`)
- **Description**: A detailed explanation of what the collection contains (`collection.description`)
- **ID**: The unique identifier for the collection (`collection.id`)

#### Geographic Coverage
- **Bounding Box**: The geographic area covered by the collection (`collection.bbox`)
- This is given as (West, South, East, North) coordinates

#### Time Period
- **Begin Date**: When the data in this collection starts (`collection.begin_datetime`)
- **End Date**: When the data in this collection ends (`collection.end_datetime`)

#### Publication Information
- **Published Date**: When the collection was first made available (`collection.published_at`)
- **Last Updated**: When the collection was last modified (`collection.updated_at`)

#### Access Information
- **URL**: Where the collection can be accessed (`collection.url`)

### Example: Getting and Displaying Collection Details

In [None]:
# Extract and display the key information
print("\n📊 DATASET DETAILS")
print("=" * 50)
print(f"Name: {collection.title}")
print(f"ID: {collection.id}")
print("-" * 50)
print("Description:")
print(collection.description)
print("-" * 50)
print(f"Published at: {collection.published_at}")
print(f"Last updated at: {collection.updated_at}")
print("-" * 50)
print(f"Coverage starts at: {collection.begin_datetime}")
print(f"Coverage ends at: {collection.end_datetime}")
print("-" * 50)
print(f"Spatial coverage (W, S, E, N): {collection.bbox}")