# Amazon S3 Buckets
---
ECMWF open data can be retrieved from the Amazon S3 buckets using the [`earthkit`](https://earthkit-data.readthedocs.io/en/latest/guide/sources.html#s3) or `ecmwf-opendata` Python libraries.

## The `earthkit` and `ecmwf-opendata` package

Below, two examples for downloading data from the **Amazon's AWS** location.

In [None]:
# !pip3 install earthkit ecmwf-opendata

In [None]:
from ecmwf.opendata import Client

client = Client(source="aws")
request = {
    "time": 0,
    "type": "fc",
    "step": 24,
    "param": "2t",
}
client.retrieve(request, "aws_2t_data.grib2")
da_2t = ekd.from_source("file", "aws_2t_data.grib2")
da_2t.ls()

In [2]:
import earthkit.data as ekd

data = ekd.from_source("s3", {
    "endpoint": "s3.amazonaws.com",
    "region": "eu-central-1",
    "bucket": "ecmwf-forecasts",
    "objects": "20230118/00z/0p4-beta/oper/20230118000000-0h-oper-fc.grib2"
}, anon=True)
ds = data.to_xarray()
ds

                                                                                

::::{important}
When you need to download historical data, bear in mind the file-naming convention.
:::{dropdown} File-naming convention
An example (for HH=`00z` and stream=`oper`) shows how a filename has changed since year 2023
* 20230118/00z/0p4-beta/oper/20230118000000-0h-oper-fc.grib2, <br> in Februar 2024 one can choose between the resolution `0p4-beta` and `0p25`
* 20240201/00z/$\color{red}{\text{0p4-beta}}$/oper/20240201000000-0h-oper-fc.grib2
* 20240201/00z/$\color{red}{\text{0p25}}$/oper/20240201000000-0h-oper-fc.grib2, <br> in March 2024 one can choose between `aifs` and `ifs` (only `ifs` is available in `0p4-beta` or `0p25`)
* 20240301/00z/$\color{red}{\text{aifs}}$/0p25/oper/20240301000000-0h-oper-fc.grib2
* 20240301/00z/$\color{red}{\text{ifs}}$/0p4-beta/oper/20240301000000-0h-oper-fc.grib2, <br> in February 2025 one can choose between `aifs-single` and `aifs` (`ifs` does not contain any changes)
* 20250210/00z/$\color{red}{\text{aifs-single}}$/0p25/$\color{red}{\text{experimental}}$/oper/20250210000000-0h-oper-fc.grib2
* 20250210/00z/$\color{red}{\text{aifs}}$/0p25/oper/20250210000000-0h-oper-fc.grib2, <br> in March 2025 the file-naming convention is the same as we know it today
* 20250301/00z/aifs-single/0p25/oper/20250301000000-0h-oper-fc.grib2
* 20250301/00z/ifs/0p25/oper/20250301000000-0h-oper-fc.grib2
:::
::::

## Retrieve data for only one parameter

To download the single `2t` parameter, we read the `_offset` and `_length` values from the corresponding index file.

In [3]:
index_file = ekd.from_source("s3",
                             {"endpoint": "s3.amazonaws.com",
                              "region": "eu-central-1",
                              "bucket": "ecmwf-forecasts",
                              "objects": "20250430/12z/aifs-single/0p25/oper/20250430120000-12h-oper-fc.index",
                             }, anon=True)
index_file = index_file.to_pandas()
value = index_file.iloc[[42]].to_string(index=False, header=False)
value

                                                                                

'{"domain": "g"  "date": "20250430"  "time": "1200"  "expver": "0001"  "class": "ai"  "type": "fc"  "stream": "oper"  "step": "12"  "levtype": "sfc"  "param": "2t"  "_offset": 34015908  "_length": 560208} NaN'

In [None]:
req = {"endpoint": "s3.amazonaws.com",
       "region": "eu-central-1",
       "bucket": "ecmwf-forecasts",
       "objects": { "object": "20250430/12z/aifs-single/0p25/oper/20250430120000-12h-oper-fc.grib2", "parts": (34015908, 560208)},
   }

data = ekd.from_source("s3", req, anon=True)
data.ls()

:::{note}
The `parts` (byte ranges) can be specified when we want to retrieve data only for a specific parameter from the selected file. A type of a single part is list or tuple, i.e. (`_offset`, `_length`), where `_offset` is the start byte position and `_length` is the number of bytes to be read from the offset.
:::

:::{warning}
When we do not specify the `parts` option and want to convert our result to xarray.Dataset, we receive the following error message: <br>
ValueError: Variable "sot" has inconsistent dimension "levelist" compared to other variables. Expected values: (13) \[\[, 5, 0, ,,  , 1, 0, 0, ,,  , 1, 5, 0, ,,  , 2, 0, 0, ,,  , 2, 5, 0, ,,  , 3, 0, 0, ,,  , 4, 0, 0, ,,  , 5, 0, 0, ,,  , 6, 0, 0, \]..., 1000\], got: (2) \[1, 2\]. Length mismatch: 13 != 2
<br>
<br>
When we specify the `stream` option, the following error is thrown: <br>
NotImplementedError: earthkit.data.sources.stream.StreamFieldList.\__len\__()
:::