In [2]:
import cdsapi
import os
import shutil
import xarray as xr
import pandas as pd

# Downloading and processing of ERA5 data

This notebook downloads **hourly** ERA5 data (for all days and months) for the specified variables, geographic area and years. Results are combined into a single netCDF for further processing. Download volumes are restricted by the API and so, to avoid errors, the code here downloads one year of data at a time and then stitches the results together using xarray. This is slow, but usually reliable. The API calls used in the code below are based on the examples originally provided by Daniel [here](https://82.223.43.150/watexr/pl/back3zg8z788ddspzjaxnnhprw).

**Note:** You must first follow the instructions [here](https://cds.climate.copernicus.eu/api-how-to#install-the-cds-api-key) to create a Copernicus user account and install the `cdsapi` client. Then fill-in the details in section 1, below, and run the notebook.

For the "common papers" component of WATExR, I believe the variables currently of interest are listed in `05_era5_s5_vars_of_interest.xlsx` (at least, for the Norwegian case study).

## 1. User input

In [3]:
# Area of interest. Format:
# lat_min/lon_min/lat_max/lon_max
area = '59.86/10.68/59.34/11.13'

# Variables of interest. See here for details:
# https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=form
variables = ['10m_u_component_of_wind', 
             '10m_v_component_of_wind', 
             '2m_dewpoint_temperature',
             '2m_temperature', 
             'surface_pressure', 
             'surface_solar_radiation_downwards',
             'surface_thermal_radiation_downwards', 
             'total_cloud_cover', 
             'total_precipitation',
            ]

# Years of interest
years = range(1980, 2020)

# Folder in which to save annual netCDFs
raw_fold = '/home/jovyan/shared/WATExR/ERA5/raw_netcdf'

# Path for final/tidied netCDF
merged_nc = '/home/jovyan/shared/WATExR/ERA5/morsa_era5_merged.nc'

## 2. Download ERA5

**Note:** The cell below may take a *long* time to run - **expect > 24 hours** for the full dataset.

In [3]:
%%time

c = cdsapi.Client()

# Loop over years
for year in years:
    # Build request
    request = {
        'product_type': 'reanalysis',
        'format': 'netcdf',
        'area': area,
        'variable': variables,
        'year': year,
        'month': [
            '01', '02', '03',
            '04', '05', '06',
            '07', '08', '09',
            '10', '11', '12',
        ],
        'day': [
            '01', '02', '03',
            '04', '05', '06',
            '07', '08', '09',
            '10', '11', '12',
            '13', '14', '15',
            '16', '17', '18',
            '19', '20', '21',
            '22', '23', '24',
            '25', '26', '27',
            '28', '29', '30',
            '31',
        ],
        'time': [
            '00:00', '01:00', '02:00',
            '03:00', '04:00', '05:00',
            '06:00', '07:00', '08:00',
            '09:00', '10:00', '11:00',
            '12:00', '13:00', '14:00',
            '15:00', '16:00', '17:00',
            '18:00', '19:00', '20:00',
            '21:00', '22:00', '23:00',
        ],
    }
    
    # Get nc file
    out_path = os.path.join(raw_fold, f'era5_{year}.nc')
    c.retrieve('reanalysis-era5-single-levels',
               request,
               out_path,
              )

2020-01-17 16:37:50,646 INFO Welcome to the CDS
2020-01-17 16:37:50,647 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-single-levels
2020-01-17 16:37:50,958 INFO Request is completed
2020-01-17 16:37:50,959 INFO Downloading http://136.156.132.201/cache-compute-0004/cache/data6/adaptor.mars.internal-1579180476.430084-929-5-dfae99a7-e15f-4f11-8e49-c39ee72b1fdf.nc to /home/jovyan/shared/WATExR/ERA5/raw_netcdf/era5_1980.nc (963.9K)
2020-01-17 16:37:51,106 INFO Download rate 6.4M/s
2020-01-17 16:37:51,400 INFO Welcome to the CDS
2020-01-17 16:37:51,401 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-single-levels
2020-01-17 16:37:51,595 INFO Downloading http://136.156.133.41/cache-compute-0013/cache/data3/adaptor.mars.internal-1579185688.9987106-17248-24-e7583da9-e478-442c-a3f3-e1dce6e70af0.nc to /home/jovyan/shared/WATExR/ERA5/raw_netcdf/era5_1981.nc (961.3K)
2020-01-17 16:37:51,745 INFO Download rate 6.

CPU times: user 18.9 s, sys: 2.67 s, total: 21.6 s
Wall time: 1d 3h 19min 2s


## 2. Merge and tidy annual netCDFs

The code below merges the annual netCDFs into a single file and changes the ERA5 varaible names to match those in the S5 dataset from UNICAN. It also resolves some inconsistencies with variable names in ERA5: netCDFs for recent years (i.e. 2019) use a different variable naming convention compared to earlier data (see [here](https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation#ERA5:datadocumentation-Dataupdatefrequency) for details). 

In [4]:
# Read var name lookup
df = pd.read_excel('05_era5_s5_vars_of_interest.xlsx')
df

Unnamed: 0,era5_name,era5_code,era5_unit,c4r_s5_code,c4r_s5_unit,agg_func,notes
0,Surface pressure,sp,Pa,psl,Pa,mean,
1,Total cloud cover,tcc,,tcc,,mean,
2,10 metre U wind component,u10,m/s,uas,m/s,mean,
3,10 metre V wind component,v10,m/s,vas,m/s,mean,
4,2 metre temperature,t2m,K,tas,K,mean,Convert to C
5,2 metre dewpoint temperature,d2m,K,tdps,K,mean,Convert to C
6,Surface solar radiation downwards,ssrd,J/m2,rsds,W/m2,sum,
7,Surface thermal radiation downwards,strd,J/m2,rlds,W/m2,sum,
8,Total precipitation,tp,m,tp,m,sum,Convert to mm


In [6]:
# Read all .nc files
nc_paths = os.path.join(raw_fold, '*.nc')
ds = xr.open_mfdataset(nc_paths, combine='by_coords')

# Tidy variables
for var_name in ds.variables:
    if var_name[-4:] == '0005':
        # Not needed
        del ds[var_name]
    elif var_name[-4:] == '0001':
        # Recent data (2019 only). Merge with other series
        var, idx = var_name.split('_')
        ds[var] = ds[var].combine_first(ds[var_name])
        del ds[var_name]

# Rename vars to match S5
names_dict = dict(zip(df['era5_code'], df['c4r_s5_code']))
ds = ds.rename(names_dict)

# Save
ds.to_netcdf(merged_nc)

ds