# Getting information about data store collections (datasets) using `ecmwf-datastores-client`

## Description

This script retrieves and lists all available datastore datasets (also known as *collection IDs*), sorted by their last update date.

It returns a `Collections` instance. Once retrieved, the details of a specific dataset can be explored individually.

## What you need before starting

1. You have an **active internet connection**.
2. You are using **Python 3.12.8** (or a compatible version).
3. The latest version of the `ecmwf-datastores-client` package is installed and configured on your system. Also, the extra libraries that you need. You can uncomment and run the next cell.
4. You have a **CDS account** or **ADS account** with your API credentials stored in a `.ecmwfdatastoresrc` file. If not, please refer to the notebook "Getting started".

In [None]:
# Install the required libraries to run this notebook
# !pip install -U ecmwf-datastores-client itables

from ecmwf.datastores import Client

## Set up the API client

Connect to the Data Store service using the credentials saved in your `.ecmwfdatastoresrc` file, we also include a check to ensure that the credential found are valid.

In [None]:
client = Client()
client.check_authentication()
print("✅ Connected successfully to the Data Store!")

## Retrieving collections information

We are going to explore how to search and browse through our catalogue collections.  

For that we are going to user the `get_collections()` function which allows you to find and list all collections from our catalogue. 

In [None]:
# Requests collections, sorting them by most recently updated
collections = client.get_collections(sortby="id")

In [None]:
#  Create an empty list to store the collection details we extract
collections_info = []

# Loop through the collections and get the information,
# as long as there are more collections to retrieve
while collections is not None:  # Loop over pages
    collections_info.extend(
        {
            "Collection ID": collection["id"],
            "Collection Name": collection["title"],
        }
        for collection in collections.json["collections"]
    )
    collections = collections.next  # Move to the next page

# After this code runs, collections_info contains ID and title pairs
# for all collections, starting with the most recently updated ones.
print(
    "There are %i collections available from %s." % (len(collections_info), client.url)
)

### Organise the collections metadata into a convenient table format

We use `pandas` and `itables` to present the collections in an interactive table that can be filtered and sorted.

In [None]:
# Display the collections data as an interactive table
import pandas as pd
from itables import init_notebook_mode, show

# Initialize interactive table mode
init_notebook_mode(all_interactive=True)

# We turn our collection_info list into a DataFrame for display
df_collections_display = pd.DataFrame(collections_info)

# Configure table options for better appearance and functionality
show(
    df_collections_display,
    columnDefs=[
        {"width": "35%", "targets": 0},  # ID column width
        {"width": "65%", "targets": 1},  # Name column width
    ],
    dom="Bfrtip",  # Add buttons, filter, processing indicator
    buttons=["copy", "csv", "excel"],  # Add export options
    paging=True,
    pageLength=15,  # Show 15 rows per page
    lengthMenu=[
        [10, 15, 25, 50, -1],
        ["10 rows", "15 rows", "25 rows", "50 rows", "All"],
    ],
    order=[[1, "asc"]],  # Default sort by Collection Name
    style="display: inline-block; width: 100%",
)

print("\nTip: You can search, sort, and export this table using the controls above.")

## Working with individual collections

Once you've found a collection that interests you, you can get more detailed information about it using the get_collection() function.

In [None]:
# Retrieve detailed information about a specific collection
# You need to provide the Collection ID (You can find collection IDs in the table we created above)
collection = client.get_collection(collection_id="reanalysis-era5-pressure-levels")

### What information is accessible

When you retrieve a collection, you have access to many useful details:

#### Basic information
- **Title**: The name of the collection (`collection.title`)
- **Description**: A detailed explanation of what the collection contains (`collection.description`)
- **ID**: The unique identifier for the collection (`collection.id`)

#### Geographic coverage
- **Bounding box**: The geographic area covered by the collection (`collection.bbox`)
- This is given as (West, South, East, North) coordinates

#### Time period
- **Begin date**: When the data in this collection starts (`collection.begin_datetime`)
- **End date**: When the data in this collection ends (`collection.end_datetime`)

#### Publication Information
- **Published date**: When the collection was first made available (`collection.published_at`)
- **Last updated**: When the collection was last modified (`collection.updated_at`)

#### Access information
- **URL**: Where the collection can be accessed (`collection.url`)

### Example: Displaying collection details

In [None]:
# Extract and display the key information
print(f"""
📊 DATASET DETAILS\n
{'='*50}\n
Name: {collection.title}
ID: {collection.id}
{'-'*50}\n
Description:\n
{collection.description}
{'-'*50}\n
Published at: {collection.published_at}
Last updated at: {collection.updated_at}
{'-'*50}\n
Coverage starts at: {collection.begin_datetime}
Coverage ends at: {collection.end_datetime}
{'-'*50}\n
Spatial coverage (W, S, E, N): {collection.bbox}
""")