# 002. Using the data_downloader.py methods for easy data retrieval

The data downloader is written to be easily accessible for the user, while using the highest efficiency in data retrieval. This is, requesting data for all needed parameters, levels etc., for only one full month (this is due to the data being saved as one month per tape in the MARS archive; for more information see: https://confluence.ecmwf.int/display/CKB/How+to+download+ERA5).

### 1) import the downloader

In [None]:
import link_src

### 2) specify the CDS dataset name and the directory to save them
the `dataset_name` may either be 
- `"reanalysis-era5-pressure-levels"` to retrieve data at pressure levels, or
- `"reanalysis-era5-single-levels"` to request fields where only one single level exists, f.e. precipitation.

In [None]:
ds = CDS_Dataset(dataset_name='reanalysis-era5-pressure-levels',
                 save_to_folder='./example_download/'  # path to where datasets shall be stored
                )

### 2) define the request's content
- to retrieve an aerial subset of the globe, use the `area` keyword. It can be a list of degrees latitude/longitude values for the northern, western, southern and eastern bounds of the area.

- to get other grid resolutions than 0.25 degrees, use the `grid` keyword.

- the `format` of the data can be `"netcdf"` or `"grib"`.

- the available `variable`s are listed in the ERA5 catalogue (https://apps.ecmwf.int/data-catalogues/era5), but should be written in lower case letters here with underscores (`_`) instead of whitespaces (` `).

- if requesting pressure levels, supply them with the `pressure level` keyword.

In [None]:
# define areas of interest
area_dict = dict(danube=[50, 7, 47, 20],
                 asia=[55, -140, 0, 35],
                 usa=[50, -125, 25, -70])

# define time frame
year_start = 2000
year_end = 2000
month_start = 1
month_end = 12

# define requested variables
request = dict(product_type='reanalysis', 
               format='netcdf',
               area=area_dict['usa'],
               variable=['geopotential', 'temperature', 'specific_humidity'], 
               pressure_level=['850', '700', '500'])

### 3) start the data request
To speed up the retrieval of the dataset, we can send multiple requests at the same time. One request optimally consists of only one month of data - because that is the way the data is stored in the tape band archive. The number of recommended parallel requests is probably around 20. The number of requests is set through the argument "N_parallel_requests". 



In [None]:
ds.get(years = [str(y) for y in range(year_start, year_end+1)], 
       months = [str(a).zfill(2) for a in range(month_start, month_end+1)], 
       request = request, 
       N_parallel_requests = 12)