# Introduction to the ERA 5 Data

The ERA5 dataset is the fifth iteration of the ECMWF ReAnalysis dataset, spanning from 1950 to the present. ECMWF is the "European Centre for Medium-Range Weather Forecasts".
The dataset provides comprehensive and high-resolution historical weather and climate data. The source data is from the [Copernicus Climate Data Store (CDS)](https://cds.climate.copernicus.eu/#!/home). A comprehensive data documentation guide is available [here](https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation). In total, the entire CDS ERA data is over 10Petabytes.

Fortunately for us, there are existing [Python](https://github.com/Climate-CAFE/era5-daily-heat-aggregation-python) and [R](https://github.com/Climate-CAFE/era5-daily-heat-aggregation) packages that have gone ahead and demonstrated extracting the data from the API for us, so we are going to use those to develop our workflow. Specifically, we're trying to understand the
following characteristics of the data:

* size, 
* how to download, 
* what are the key transformations to map things into the health sheds
* two important variables: 
    * 2m air temp, and, 
    * 2m air dew point

Let's get started


Important: we need to install the CDS API first, so you'll need to grab an API key. First, you must register for an account and accept the T&Cs, afterwhich the page [here](https://ecmwf-projects.github.io/copernicus-training-c3s/cds-tutorial.html#install-the-cds-api-key) will autopopulate an API key for you. The following code shows a test case to make sure your API key works

In [2]:
import cdsapi

client = cdsapi.Client()

dataset = 'reanalysis-era5-pressure-levels'
request = {
  'product_type': ['reanalysis'],
  'variable': ['geopotential'],
  'year': ['2024'],
  'month': ['03'],
  'day': ['01'],
  'time': ['13:00'],
  'pressure_level': ['1000'],
  'data_format': 'grib',
}
target = 'download.grib'

client.retrieve(dataset, request, target)

2025-03-03 13:31:07,682 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.
2025-03-03 13:31:07,959 INFO Request ID is 07de689d-b7df-439b-b303-2214b8f3eec0
2025-03-03 13:31:08,091 INFO status has been updated to accepted
2025-03-03 13:34:00,781 INFO status has been updated to successful


1fd5a2b7ad40b8c614c78061a75d30d0.grib:   0%|          | 0.00/1.98M [00:00<?, ?B/s]

'download.grib'

This demonstration is expected to amass 9GB of data for raw raster files (24 years, 12 files per year). The demonstration generates the 24 years of heat measures across Kenya administrative boundaries, in 1-month periods of ERA5-Land data across Kenya with three variables (2-m temp, dew point temp, skin temp)

In [3]:
# imports as recommended by the github repo
import cdsapi
import geopandas as gpd
import os


I'll use pyprojroot to specify a data path

In [4]:
from pyprojroot.here import here

ecmw_dir = here("data")

In [17]:
def create_dir(path):

    if not os.path.exists(path):
        os.makedirs(path)

    return path

In [6]:
create_dir(ecmw_dir)

In [7]:
# create a directory for the kenya data
create_dir(os.path.join(ecmw_dir, "Kenya_GADM"))

Next, we need to manually fetch this GADM file for Kenya from here: https://gadm.org/download_country.html

This is a boundaries geopackage; GeoBoundaries is a global database of administrative boundaries (e.g., countries, states, provinces, districts). Hence, this file provides the
boundaries for Kenyan regions

In [8]:
kenya_shape =  gpd.read_file(os.path.join(ecmw_dir, "Kenya_GADM/gadm41_KEN.gpkg"), layer = "ADM_ADM_0")


In [9]:
kenya_shape

Unnamed: 0,GID_0,COUNTRY,geometry
0,KEN,Kenya,"MULTIPOLYGON (((39.38014 -4.71792, 39.37986 -4..."


The bounding box represents the coordinates of the shapefile, which is what we'll
use to query Copernicus. Think of it like a mask provided in a file

In [10]:
kenya_bbox = kenya_shape.total_bounds

In [11]:
kenya_bbox

array([33.909588  , -4.720417  , 41.92621613,  5.06116581])

Technical: Add a small buffer around the bounding box to ensure the whole region 
is queried, and round the parameters to a 0.1 resolution. A 0.1 resolution
is applied because the resolution of netCDF ERA5 data is .25x.25
https://confluence.ecmwf.int/display/CKB/ERA5%3A+What+is+the+spatial+reference


In [12]:
kenya_bbox[0] = round(kenya_bbox[0], 1) - 0.1
kenya_bbox[1] = round(kenya_bbox[1], 1) - 0.1
kenya_bbox[2] = round(kenya_bbox[2], 1) + 0.1
kenya_bbox[3] = round(kenya_bbox[3], 1) + 0.1

In [13]:
# to build a query, specify [xmin, ymin, xmax, ymax]
query_area = [kenya_bbox[0], kenya_bbox[1], kenya_bbox[2], kenya_bbox[3]]

In [14]:
query_years = list(range(2000, 2024))
query_years_str = [str(x) for x in query_years]

query_months = list(range(1, 13))
query_months_str = [str(x).zfill(2) for x in query_months]

In [18]:
output_dir = create_dir(os.path.join(ecmw_dir, "ERA5_out"))

In [None]:
for year_str in query_years_str:
    # Track progress
    print("Now processing year ", year_str, "\n")

    # For each year, the query is divided into each month sections. 
    # If a request is too large, it will not be accepted by the CDS servers, 
    # so this division of requests is required.

    for month_str in query_months_str:
        # Track progress
        print("Now processing month ", month_str, "\n")

        # The below is the formatted API request language. All of the inputs
        # specified below in proper formatting can be identified by forming a 
        # request using the Copernicus CDS point-and-click interface for data
        # requests. https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-land?tab=form
        # Select the variables, timing, and netcdf as the output format, and then 
        # select "Show API Request" at the bottom of the screen. 
    
        # Note that the argument in the download() function is the file path and 
        # file name that data will be exported to and stored at. If using a loop, 
        # ensure that the unique features of each request are noted in the output.

        # Note: need to create "ERA5_Out" subfolder on your path
 
        dataset = "reanalysis-era5-land"
        request = {
                    "product_type": "reanalysis",
                    "variable": ["2m_dewpoint_temperature",
                                 "2m_temperature",
                                 "skin_temperature"], 
                    "year": year_str,
                    "month": month_str,
                    "day": [  
                            "01", "02", "03",
                            "04", "05", "06",
                            "07", "08", "09",
                            "10", "11", "12",
                            "13", "14", "15",
                            "16", "17", "18",
                            "19", "20", "21",
                            "22", "23", "24",
                            "25", "26", "27",
                            "28", "29", "30",
                            "31"],
                    "time": [
                             "00:00", "01:00", "02:00",
                             "03:00", "04:00", "05:00",
                             "06:00", "07:00", "08:00",
                             "09:00", "10:00", "11:00",
                             "12:00", "13:00", "14:00",
                             "15:00", "16:00", "17:00",
                             "18:00", "19:00", "20:00",
                             "21:00", "22:00", "23:00"],
                    "data_format": "netcdf",
                    "download_format": "unarchived",
                    "area": query_area
        }

        client = cdsapi.Client()
        client.retrieve(dataset, request).download(os.path.join(output_dir, 
                                                                "{}_{}.nc".format(year_str, month_str)))


Now processing year  2000 

Now processing month  01 



2025-03-03 13:52:06,896 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.
2025-03-03 13:52:07,181 INFO Request ID is 56d97887-22e9-441c-b33c-2236e5feaa87
2025-03-03 13:52:07,308 INFO status has been updated to accepted
2025-03-03 13:52:15,943 INFO status has been updated to successful


3f0a8829f1720f8fa1289e11eedada58.nc:   0%|          | 0.00/23.4M [00:00<?, ?B/s]

Now processing month  02 



2025-03-03 13:52:26,335 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.
2025-03-03 13:52:26,730 INFO Request ID is 02e31b6b-e047-4f4b-bf56-4eda01d5d08a
2025-03-03 13:52:27,070 INFO status has been updated to accepted
2025-03-03 13:52:36,181 INFO status has been updated to running
2025-03-03 14:02:50,214 INFO status has been updated to successful


bdfda68cb5ba4affec5b1f592ec6ed5f.nc:   0%|          | 0.00/20.3M [00:00<?, ?B/s]

Now processing month  03 



2025-03-03 14:02:56,179 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.
2025-03-03 14:02:56,555 INFO Request ID is 3e4ebcbc-7038-43b4-927b-b4ce291fd60f
2025-03-03 14:02:56,703 INFO status has been updated to accepted
2025-03-03 14:03:18,825 INFO status has been updated to running
2025-03-03 14:11:18,070 INFO status has been updated to successful


eadfd863f7f6a6f82d1f077b6f137cdf.nc:   0%|          | 0.00/24.4M [00:00<?, ?B/s]

Now processing month  04 



2025-03-03 14:11:27,837 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.
2025-03-03 14:11:28,199 INFO Request ID is f2a01b32-4853-4d2d-a873-9986d4c668fa
2025-03-03 14:11:28,358 INFO status has been updated to accepted
2025-03-03 14:11:33,798 INFO status has been updated to running
2025-03-03 14:17:48,663 INFO status has been updated to successful


7790b627e1ed17c07398e4943ddc66f8.nc:   0%|          | 0.00/20.2M [00:00<?, ?B/s]

Now processing month  05 



2025-03-03 14:17:54,131 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.
2025-03-03 14:17:54,453 INFO Request ID is 1cf8b1bc-ba8b-4865-a895-1634b22b0cb0
2025-03-03 14:17:54,572 INFO status has been updated to accepted
2025-03-03 14:18:03,340 INFO status has been updated to running
2025-03-03 14:24:14,619 INFO status has been updated to successful


af8da5f83b8f4edfcfdc9e9b01e85167.nc:   0%|          | 0.00/22.7M [00:00<?, ?B/s]

Now processing month  06 



2025-03-03 14:24:18,644 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.
2025-03-03 14:24:18,893 INFO Request ID is 8d9297fa-d2d6-4cf8-8473-669175f205a6
2025-03-03 14:24:19,004 INFO status has been updated to accepted
2025-03-03 14:24:27,645 INFO status has been updated to running
2025-03-03 14:30:40,248 INFO status has been updated to successful


f222a9f45eec1093306d1f8bbf9bbdd2.nc:   0%|          | 0.00/21.5M [00:00<?, ?B/s]

Now processing month  07 



2025-03-03 14:30:45,830 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.
2025-03-03 14:30:46,161 INFO Request ID is 13f963a5-df07-42b0-9535-40f5dd511890
2025-03-03 14:30:46,284 INFO status has been updated to accepted
2025-03-03 14:30:54,964 INFO status has been updated to running
2025-03-03 14:37:06,377 INFO status has been updated to successful


a56370ec686f31437a5c9328b74e48da.nc:   0%|          | 0.00/22.8M [00:00<?, ?B/s]

Now processing month  08 



2025-03-03 14:37:13,309 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.
2025-03-03 14:37:13,713 INFO Request ID is 33d25bd3-d6a3-4faa-8b21-7e9d93faf9f2
2025-03-03 14:37:13,869 INFO status has been updated to accepted
2025-03-03 14:37:22,628 INFO status has been updated to running
2025-03-03 14:41:33,768 INFO status has been updated to successful


a393f6cb418b557a044ef0b4e0fb3b68.nc:   0%|          | 0.00/22.7M [00:00<?, ?B/s]

Now processing month  09 



2025-03-03 14:41:37,733 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.
2025-03-03 14:41:38,015 INFO Request ID is bc09a8e9-669b-4d05-85c4-e4cf6f0cc897
2025-03-03 14:41:38,137 INFO status has been updated to accepted
2025-03-03 14:41:46,811 INFO status has been updated to running


From the logs, it looks like one month of data takes approximately 10 minutes. In this query, we might end up with 48 hours of downloading for the full 24 years. Clearly this will need to be multithreaded/paralleled to be efficient.