# Advanced Data Retrieval with `ecmwf-datastores-client`

This guide shows you different ways to download climate/atmosphere data from the Copernicus Data Store. These methods give you more control over how you request and manage your data downloads.

## What You Need Before Starting

1. An **active internet connection**
2. **Python** installed on your computer
3. The `ecmwf-datastores-client` package installed, otherwise uncomment and run the next cell
4. A **CDS account** or **ADS account** with your API key set up (see the "Getting Started" notebook)

In [None]:
# !pip install -U ecmwf-datastores-client

In [None]:
# Libraries
import os
import time

from ecmwf.datastores import Client

**Tip:** If you don’t want to see warnings while running your notebook, you can uncomment the following cell:

In [None]:
# import warnings
# warnings.filterwarnings("ignore")

## Connect to the Data Store

First, we'll connect to the data store:

In [None]:
client = Client()

# Check that we're connected successfully
connection_info = client.check_authentication()
print("✅ Connected successfully to the Data Store!")

## Prepare Your Data Request

Let's define what data we want to download:

In [None]:
# Define the ERA5 dataset and request parameters
collection_id = "reanalysis-era5-single-levels"
request = {
    "product_type": ["reanalysis"],
    "variable": ["2m_temperature"],
    "year": ["2022"],
    "month": ["01"],
    "day": ["01"],
    "time": ["00:00"],
    "data_format": "grib",
    "download_format": "unarchived",
}

# # Define the CAMS global atmospheric composition forecasts request parameters
# collection_id = "cams-global-atmospheric-composition-forecasts"
# request = {
#     "variable": ["2m_temperature"],
#     "date": ["2025-05-15/2025-05-15"],
#     "time": ["00:00"],
#     "leadtime_hour": ["0","1","2","3","4","5","6","7","8","9","10"],
#     "type": ["forecast"],
#     "data_format": "grib"
# }

## Option 1: Submit and Wait For Results

With this approach, you submit your request and wait until it's processed before deciding what to do with the data:

In [None]:
# Submit the request and wait until it's processed
print("Submitting data request and waiting for it to process...")
results = client.submit_and_wait_on_results(collection_id, request)
print("✅ Request complete! Data is ready for download.")

In [None]:
# Now you can look at information about the data before downloading
print(f"File size: {results.content_length / 1024 / 1024:.2f} MB")
print(f"File type: {results.content_type}")

In [None]:
# When you're ready, download the data
results.download(target="sample_submit_wait.grib")
print("✅ File downloaded successfully!")

## Option 2: Submit Request Without Waiting

Sometimes you want to submit a request and check on it later, especially for large data requests:

In [None]:
# Just submit the request without waiting
print("Submitting data request...")
remote_job = client.submit(collection_id, request)
print(f"✅ Request submitted! Job ID: {remote_job.request_id}")

In [None]:
# You can check the status of your request
print(f"Current status: {remote_job.status}")

In [None]:
# When you know the data is ready, you can download it
if remote_job.results_ready:
    remote_job.download(target="sample_submit.grib")
    print("✅ Download complete!")
else:
    print("The data is still being processed. Check back later.")

In [None]:
# If needed, you can update the status information
remote_job.update()
print(f"Updated status: {remote_job.status}")

## Monitoring Your Data Request

For large requests that might take a while, you can monitor progress:

In [None]:
print("Checking on data request until it's ready...")
while not remote_job.results_ready:
    # Update the status information
    remote_job.update()

    # Show the current status
    print(f"Status: {remote_job.status}")

    # If the job is finished but had an error
    if remote_job.status == "failed":
        print("❌ The request failed.")
        break

    # Wait for 10 seconds before checking again
    print("Waiting 10 seconds before checking again...")
    time.sleep(10)

# Download the data if it's ready
if remote_job.results_ready:
    remote_job.download(target="sample_submit_time.grib")
    print("✅ Download complete!")

## Downloading By Job ID

If you have a job ID from a previous request, you can download it directly:

In [None]:
# Using a job ID from a previous request
job_id = remote_job.request_id  # This would be the ID from a previous request

# Download using the job ID
client.download_results(job_id, target="sample_job.grib")
print("✅ Downloaded data using the job ID!")

## Tips for Working with Large Data Requests

- For large datasets, consider using Option 2 (submit without waiting)
- Limit your time range (fewer years/months) in each request
- Request fewer variables at once
- Consider reducing the geographic area if you only need a specific region
- Save your job IDs so you can come back to them later if needed