# Daily and Monthly statistics

This notebook will demonstrate how to use the earthkit libraries to access some ERA5 data, reduce the date to daily (or monthly) time steps and plot the data.

For this exercise we will use the earthkit-data package to access the data, earthkit-climate to calculate the daily(/mothly) statistics and earthkit-maps to plot the results.

The earthkit-climate routines are *currently* based on xarray, hence they return raw xarray objects. *This may change in future versions of Earthkit*.

In [1]:
import numpy as np # Everyone loves a numpy!

from earthkit import data as ek_data
from earthkit.climate import aggregate as ek_aggregate
from earthkit import maps as ek_maps


## Request some data from the CDS

For this example we are going to use the first 3 months of 2015

In [12]:
cds_dataset_name = 'reanalysis-era5-single-levels'

# We use an Earthkit bounding box object to describe our area,
# this clears up any lack of clarity of the order of: North, South, East, West.
area = ek_data.utils.bbox.BoundingBox(north=80, south=20, west=-30, east=100)
cds_request = {
        'product_type': 'reanalysis',
        'variable': '2m_temperature',
        'year': '2015',
        'month': [
            '01', '02', '03',
            # '04', '05', '06',
            # '07', '08', '09',
            # '10', '11', '12',
        ],
        'day': [
            '01', '02', '03',
            '04', '05', '06',
            '07', '08', '09',
            '10', '11', '12',
            '13', '14', '15',
            '16', '17', '18',
            '19', '20', '21',
            '22', '23', '24',
            '25', '26', '27',
            '28', '29', '30',
            '31',
        ],
        'time': [
            '00:00', '01:00', '02:00',
            '03:00', '04:00', '05:00',
            '06:00', '07:00', '08:00',
            '09:00', '10:00', '11:00',
            '12:00', '13:00', '14:00',
            '15:00', '16:00', '17:00',
            '18:00', '19:00', '20:00',
            '21:00', '22:00', '23:00',
        ],
}
era5_T2M_data = ek_data.from_source("cds", cds_dataset_name, cds_request)

cds_request.update({"variable": "total_precipitation"})
era5_TP_data = ek_data.from_source("cds", cds_dataset_name, cds_request)


2023-09-06 15:00:29,325 INFO Welcome to the CDS
2023-09-06 15:00:29,325 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-single-levels
2023-09-06 15:00:29,402 INFO Request is queued
2023-09-06 15:00:30,445 INFO Request is running
2023-09-06 15:02:23,167 INFO Request is completed
2023-09-06 15:02:23,168 INFO Downloading https://download-0011-clone.copernicus-climate.eu/cache-compute-0011/cache/data0/adaptor.mars.internal-1694008829.9609897-11341-14-6b48a64c-7cee-4b8e-b2cb-507e72c42a0b.grib to /var/folders/l2/529q7bzs665bnrn7_wjx1nsr0000gn/T/earthkit-data-edwardcomyn-platt/cds-retriever-da665c0a2b7593979240227252c22a4f97121426d07929af09a8f07b5ca037b4.cache.tmp (4.2G)
 29%|██▉       | 1.21G/4.18G [02:40<06:18, 8.43MB/s]

View the data object in the format that you prefer:

In [3]:
# As an xarray:
era5_T2M_data.to_xarray()
# # As a fieldlist
# era5_data.ls()
# # As a numpy array:
# era5_data.to_numpy()

## Calculate the daily statistics

### Daily mean

First we calculate the daily mean of the 2m air temperature

In [4]:
era5_daily_mean = ek_aggregate.temporal.daily_mean(era5_T2M_data)
era5_daily_mean

### Different time zone

ERA5 data is provided with a time dimension local the UTC, hence the daily statistics are generally only
useful to locations in that time zone. It is possible to provide a time shift if you are more interested in
a time zone away from UTC.

The following calculates the daily mean relative to a timezone 12 hours ahead of UTC. 
Please be aware that your data request should probably be modified to account for the
additional timesteps required for the start/end days.
In the example below we have one additional day in our returned object because the first
and last days are now made up of partial days and should be omitted from further analysis.

In [5]:
era5_daily_mean_12 = ek_aggregate.temporal.daily_mean(era5_T2M_data, time_shift={'hours': 12})
era5_daily_mean_12

In [9]:
daily_accumulation = ek_aggregate.temporal.daily_sum(era5_TP_data)
daily_accumulation