
# RADOLAN RW download and upload to metacatalog, including creation of metadata

This is the final solution, using `radolan_to_netcdf` for download and splitting the netCDF daily when uploading to metacatalog!

All available RADOLAN RW (hourly resolution) data: **2005 - 2021**

In [1]:
import tarfile
import gzip
from glob import glob
import os

import tqdm
import xarray as xr

import radolan_to_netcdf as rtn
#import cf

from metacatalog import api, ext

In [2]:
%%time

!wget -q --show-progress -r -np -A .tar.gz -R "index.html*" https://opendata.dwd.de/climate_environment/CDC/grids_germany/hourly/radolan/historical/bin/

opendata.dwd.de/cli     [ <=>                ]   2,42K  --.-KB/s    in 0s      
opendata.dwd.de/cli     [ <=>                ]   1,10K  --.-KB/s    in 0s      
opendata.dwd.de/cli     [ <=>                ]   1,68K  --.-KB/s    in 0s      
opendata.dwd.de/cli     [ <=>                ]   1,68K  --.-KB/s    in 0s      
opendata.dwd.de/cli     [ <=>                ]   1,68K  --.-KB/s    in 0s      
opendata.dwd.de/cli     [ <=>                ]   1,68K  --.-KB/s    in 0s      
opendata.dwd.de/cli     [ <=>                ]   1,68K  --.-KB/s    in 0s      
opendata.dwd.de/cli     [ <=>                ]   1,68K  --.-KB/s    in 0s      
opendata.dwd.de/cli     [ <=>                ]   1,68K  --.-KB/s    in 0s      
opendata.dwd.de/cli     [ <=>                ]   1,68K  --.-KB/s    in 0s      
opendata.dwd.de/cli     [ <=>                ]   1,68K  --.-KB/s    in 0s      
opendata.dwd.de/cli     [ <=>                ]   1,68K  --.-KB/s    in 0s      
opendata.dwd.de/cli     [ <=>           

Corrupted binary files in `RW-200508.tar.gz` -> delete `RW-200506.tar.gz, RW-200507.tar.gz, RW-200508.tar.gz`  
So our RADOLAN data will start with `RW-200509.tar.gz` -> if we would fix the binary files by hand, this would be needed everytime we re-download the data, so we just don`t use the first three months.

Function to extract downloaded binary files into netCDF files

In [2]:
# delete RW-200506.tar.gz, RW-200507.tar.gz, RW-200508.tar.gz
!rm -v opendata.dwd.de/climate_environment/CDC/grids_germany/hourly/radolan/historical/bin/2005/RW-200506.tar.gz opendata.dwd.de/climate_environment/CDC/grids_germany/hourly/radolan/historical/bin/2005/RW-200507.tar.gz opendata.dwd.de/climate_environment/CDC/grids_germany/hourly/radolan/historical/bin/2005/RW-200508.tar.gz

'opendata.dwd.de/climate_environment/CDC/grids_germany/hourly/radolan/historical/bin/2005/RW-200506.tar.gz' wurde entfernt
'opendata.dwd.de/climate_environment/CDC/grids_germany/hourly/radolan/historical/bin/2005/RW-200507.tar.gz' wurde entfernt
'opendata.dwd.de/climate_environment/CDC/grids_germany/hourly/radolan/historical/bin/2005/RW-200508.tar.gz' wurde entfernt


In [10]:
def tar2netcdf(input_path: str, output_path: str):
    """
    Untar DWD binary downloads and store as daily netCDF files under path.

    Note:
    ------
    This function is not restartable at the moment!

    Parameters:
    ------
    input_path: str 
        path to the folder where binary DWD downloads are stored (yearly folders).
        Usually something like *"./opendata.dwd.de/climate_environment/CDC/grids_germany/5_minutes/radolan/reproc/2017_002/bin"*
    output_path: str
        where to store generated netCDF files
    """
    # get the absolute output_path, where netCDF files are saved
    output_path = os.path.abspath(output_path)

    # create folder in output path
    os.makedirs(output_path, exist_ok=True)

    # loop over binary files
    for year in sorted(glob(f"{input_path}/*")):
        print(f"Extracting data for the year {year[-4:]}")
        for month in tqdm.tqdm(sorted(glob(year + '/*'))):
            with gzip.open(month, 'r') as fd:
                with tarfile.open(fileobj = fd) as tar_month:
                    fn_list_hour = sorted([f.name for f in tar_month.getmembers()])

                    for fn in fn_list_hour:
                        # fn: 'raa01-rw_10000-0506010050-dwd---bin.gz'
                        # netCDF file name
                        fn_netcdf = f"{output_path}/{year[-4:]}{fn[-21:-17]}_radolan_rw.nc" # fn[-21:-17] -> %m%d

                        # only create new netCDF when it does not already exist, a new netCDF file will be created if filename (-> day) changes
                        if os.path.exists(fn_netcdf):
                            pass
                        else:
                            # create (empty) daily netCDF                    
                            rtn.create_empty_netcdf(fn=fn_netcdf, product_name='RW')
                        
                        # extract hourly file
                        f_hour = tar_month.extractfile(fn)

                        with gzip.open(f_hour) as gz_hour:
                            # extract hourly data, append to previously created daily netCDF
                            data, metadata = rtn.read_in_one_bin_file(gz_hour)
                            rtn.append_to_netcdf(
                                fn_netcdf, 
                                data_list=[data, ], 
                                metadata_list=[metadata, ],
                            )

                

In [11]:
tar2netcdf(input_path="/data/qt7760/radolan_rw_binary/opendata.dwd.de/climate_environment/CDC/grids_germany/hourly/radolan/historical/bin/",
           output_path="/data/qt7760/radolan_rw/")


Extracting data for the year 2005


100%|██████████| 4/4 [02:21<00:00, 35.28s/it]


Extracting data for the year 2006


100%|██████████| 12/12 [07:12<00:00, 36.02s/it]


Extracting data for the year 2007


100%|██████████| 12/12 [07:23<00:00, 36.96s/it]


Extracting data for the year 2008


100%|██████████| 12/12 [08:22<00:00, 41.89s/it]


Extracting data for the year 2009


 83%|████████▎ | 10/12 [07:50<01:34, 47.08s/it]


KeyboardInterrupt: 

In [13]:
xr.open_mfdataset("./data/*.nc", engine="h5netcdf")

: 

: 

Metadata: 
- https://opendata.dwd.de/climate_environment/CDC/grids_germany/hourly/radolan/historical/bin/DESCRIPTION_gridsgermany-hourly-radolan-historical-bin_en.pdf
- https://opendata.dwd.de/climate_environment/CDC/grids_germany/hourly/radolan/historical/bin/BESCHREIBUNG_gridsgermany-hourly-radolan-historical-bin_de.pdf