# Working with AgERA5 dataset

In this notebook we will:
- Download several parameters from AgERA5 dataset for period 2010 - 2014 
- Select one point representing city of Graz
- Save the result in monthly netCDF file

Besides imported libraries, to work with this notebook, please make sure that you have **dask** and **netcdf4** python libraries installed. They are used internally by **xarray** to work with netCDF files.

Before starting with the notebook, make sure you have an account on [Climate Data Store](https://cds.climate.copernicus.eu/cdsapp#!/home) and follow instructions on [How to install CDSAPI key](https://cds.climate.copernicus.eu/api-how-to).

In [7]:
# pip install --upgrade metpy
# pip install cdsapi
# conda install -c conda-forge xarray dask netCDF4 bottleneck

In [1]:
import xarray as xr
import cdsapi
import zipfile
from pathlib import Path
import os
import shutil

c = cdsapi.Client()

First function will download one month of data in netcdf daily files which will come zipped.  
It takes as arguments: path to directory where you want to save the data, month, year, variable and statistic that is computed (minimum, maximum, mean etc) over the 24 hr period.  

You can find which parameters have which statistics computed in this the web download application for this dataset:  
https://cds.climate.copernicus.eu/terms#!/dataset/sis-agrometeorological-indicators?tab=form  
On this page you can select parameters and see the example api request.

In [9]:
def retrieve_a_month(path, year, month, variable, statistic):
    filename = path + year + '_' + month + '.zip'
    c.retrieve(
    'sis-agrometeorological-indicators',
    {
        'format': 'zip',
        'variable': variable,
        'statistic': statistic,
        'year': year,
        'month': month,
        'day': [
            '01', 
            '02', '03',
             '04', '05', '06',
             '07', '08', '09',
             '10', '11', '12',
             '13', '14', '15',
             '16', '17', '18',
             '19', '20', '21',
             '22', '23', '24',
             '25', 
             '26', '27',
             '28', '29', '30',
             '31',
        ],
    },
    filename)

This function will unzip the file in the directory that is named by the year.  This was for my convinience, because I had other netcdf files in the parent directory.

In [10]:
def unzip_file(path,year,month):
    filename = path + year + '_' + month  + '.zip'
    new_dir = path + year
    with zipfile.ZipFile(filename, 'r') as zip_ref:
        zip_ref.extractall(new_dir)

This function deletes all the files in the 'year' directory as well as zip file (as the next downloaded file will have the same name, it does not contain variable name).  
This was just for my convenience again.

In [11]:
def delete_month_files(path,year,month):
    path_to_year = path + year
    for f in Path(path_to_year).glob('*.nc'):
        try:
            f.unlink()
        except OSError as e:
            print("Error: %s : %s" % (f, e.strerror))
    zip_to_delete = path + year + '_' + month + '.zip'
    os.remove(zip_to_delete)
    


Here we define the years and months we want to download as well as path to where we want to save the data.

In [2]:
years = [str(i) for i in range(2019, 2020)]
months = [
            '01', 
    '02', 
     '03',
     '04', '05', '06',
     '07', '08', '09',
     '10', 
     '11', 
         '12',
    ]

# path = (r'C:Users\vpetric\Desktop\test')
path = './'


parameter = [('cloud_cover','24_hour_mean'),
             
             ('2m_temperature','24_hour_mean'),
             ('10m_wind_speed','24_hour_mean'), 
             
             ('2m_dewpoint_temperature','24_hour_mean'),
             ('vapour_pressure', '24_hour_mean')
            ]



We make an output directory for final data to be saved

In [13]:
graz_dir = path + 'graz_data/'
Path(graz_dir).mkdir(parents=True, exist_ok=True)

We define parameters and computed statistics we want.  
**Note** that not all combinations of parameters and statistics are allowed. You can see which parameter has what calculated in the CDS web download applicaiton.

And finally we run the whole workflow:

In [16]:
destination_folder = r'C:\Users\vpetric\Skidanje podataka s postaja\Podaci'
for year in years:
    new_dir = path + year
    Path(new_dir).mkdir(parents=True, exist_ok=True)    
    for variable, statistic in parameter:
        for month in months:
            retrieve_a_month(path,year,month,variable, statistic)
            unzip_file(path,year,month)
            nc_directory = path + year
# #             #open the dataset
            data = xr.open_mfdataset(nc_directory + '/*.nc')
# #             #filter the point representing Graz
            graz_data = data.sel(lat = 45.81, lon = 15.96, method = 'nearest')
# #             #make a filename
            nc_name = graz_dir + 'graz_' + variable + '_' + statistic + '_' + year + '_' + month + '.nc'
# #             #save as netcdf
            graz_data.to_netcdf(nc_name)
    
#             source_dir = r'C:\Users\vpetric\Skidanje podataka s postaja'
#             target_dir = f'C:/Users/vpetric/Skidanje podataka s postaja/Podaci/file_{variable}_{month}'
    
#             file_names = os.listdir(source_dir)
#             for file_name in file_names:
#                 shutil.move(os.path.join(source_dir, file_name), target_dir)
   
            
# #             #delete all the data because it is kind of a lot if we keep all
# #              delete_month_files(path,year,month)    

2022-03-15 12:58:07,234 INFO Welcome to the CDS
2022-03-15 12:58:07,241 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/sis-agrometeorological-indicators
2022-03-15 12:58:07,341 INFO Request is completed
2022-03-15 12:58:07,343 INFO Downloading https://download-0011.copernicus-climate.eu/cache-compute-0011/cache/data6/dataset-sis-agrometeorological-indicators-369ad45b-e05b-4d8c-8929-741598efbb14.zip to ./2019_01.zip (197.9M)
2022-03-15 12:58:45,740 INFO Download rate 5.2M/s                                                                      
  indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
  indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
2022-03-15 12:58:49,331 INFO Welcome to the CDS
2022-03-15 12:58:49,333 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/sis-agrometeorological-indicators
2022-03-15 12:58:49,457 INFO Downloading https://download-0011.copernicus-climate.eu/cache-co

2022-03-15 13:04:07,072 INFO Welcome to the CDS
2022-03-15 13:04:07,074 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/sis-agrometeorological-indicators
2022-03-15 13:04:07,162 INFO Downloading https://download-0009.copernicus-climate.eu/cache-compute-0009/cache/data3/dataset-sis-agrometeorological-indicators-b9d239f9-ed3f-47df-a9be-c5bcc405c78c.zip to ./2019_08.zip (196.4M)
2022-03-15 13:04:32,664 INFO Download rate 7.7M/s                                                                      
  indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
  indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
2022-03-15 13:04:57,989 INFO Welcome to the CDS
2022-03-15 13:04:57,991 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/sis-agrometeorological-indicators
2022-03-15 13:04:58,086 INFO Downloading https://download-0009.copernicus-climate.eu/cache-compute-0009/cache/data3/dataset-sis-agrometeorologi

2022-03-15 13:22:17,018 INFO Download rate 5.3M/s                                                                      
  index = joiner(matching_indexes)
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  return self.array[key]
  indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
  indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
2022-03-15 13:23:43,737 INFO Welcome to the CDS
2022-03-15 13:23:43,765 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/sis-agrometeorological-indicators
2022-03-15 13:23:43,816 INFO Request is queued
2022-03-15 13:23:44,861 INFO Request is running
2022-03-15 13:23:48,720 INFO Request is completed
2022-03-15 13:23:48,723 INFO Downloading https://download-0011.copernicus-climate.eu/

2022-03-15 13:31:41,848 INFO Downloading https://download-0011.copernicus-climate.eu/cache-compute-0011/cache/data9/dataset-sis-agrometeorological-indicators-d6d01184-04b9-4bc5-bfc8-72af64f1e070.zip to ./2019_10.zip (151.7M)
2022-03-15 13:31:57,355 INFO Download rate 9.8M/s                                                                      
  index = joiner(matching_indexes)
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  return self.array[key]
  indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
  indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
2022-03-15 13:33:27,205 INFO Welcome to the CDS
2022-03-15 13:33:27,210 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/sis-agrometeorological-indicators
2022-

2022-03-15 13:43:41,987 INFO Welcome to the CDS
2022-03-15 13:43:42,021 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/sis-agrometeorological-indicators
2022-03-15 13:43:42,222 INFO Downloading https://download-0012.copernicus-climate.eu/cache-compute-0012/cache/data6/dataset-sis-agrometeorological-indicators-7a0bf167-5c7a-4316-9356-07d5f5aa28dd.zip to ./2019_03.zip (206.5M)
2022-03-15 13:43:57,780 INFO Download rate 13.3M/s                                                                     
  index = joiner(matching_indexes)
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  return self.array[key]
  indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
  indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
2022-

2022-03-15 13:57:52,481 INFO Download rate 6M/s                                                                        
  index = joiner(matching_indexes)
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  return self.array[key]
  indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
  indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
2022-03-15 13:59:56,653 INFO Welcome to the CDS
2022-03-15 13:59:56,655 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/sis-agrometeorological-indicators
2022-03-15 13:59:56,844 INFO Downloading https://download-0014.copernicus-climate.eu/cache-compute-0014/cache/data2/dataset-sis-agrometeorological-indicators-72e5da3d-cbd5-473b-aedf-7d76a9760aa7.zip to ./2019_08.zip (206.3M)
2022-

2022-03-15 14:13:37,351 INFO Welcome to the CDS
2022-03-15 14:13:37,353 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/sis-agrometeorological-indicators
2022-03-15 14:13:37,408 INFO Request is queued
2022-03-15 14:13:38,754 INFO Request is running
2022-03-15 14:13:42,878 INFO Request is completed
2022-03-15 14:13:42,881 INFO Downloading https://download-0008.copernicus-climate.eu/cache-compute-0008/cache/data0/dataset-sis-agrometeorological-indicators-a4bafd39-86da-4194-a406-eec8cfecbbdf.zip to ./2019_12.zip (207.4M)
2022-03-15 14:14:10,706 INFO Download rate 7.5M/s                                                                      
  indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
  indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
2022-03-15 14:16:49,956 INFO Welcome to the CDS
2022-03-15 14:16:49,959 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/sis-agrometeorological-indicator

2022-03-15 14:33:06,743 INFO Welcome to the CDS
2022-03-15 14:33:06,798 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/sis-agrometeorological-indicators
2022-03-15 14:33:06,924 INFO Downloading https://download-0004.copernicus-climate.eu/cache-compute-0004/cache/data2/dataset-sis-agrometeorological-indicators-7bc41ee2-24d7-4591-af55-f77f7aac83ae.zip to ./2019_05.zip (151.5M)
2022-03-15 14:33:48,303 INFO Download rate 3.7M/s                                                                      
  index = joiner(matching_indexes)
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  return self.array[key]
  indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
  indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
2022-

2022-03-15 14:50:35,082 INFO Download rate 8M/s                                                                        
  index = joiner(matching_indexes)
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  return self.array[key]
  indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
  indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
2022-03-15 14:54:14,982 INFO Welcome to the CDS
2022-03-15 14:54:14,983 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/sis-agrometeorological-indicators
2022-03-15 14:54:15,046 INFO Request is queued
2022-03-15 14:54:16,104 INFO Request is running
2022-03-15 14:54:28,482 INFO Request is completed
2022-03-15 14:54:28,484 INFO Downloading https://download-0008.copernicus-climate.eu/

  indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
  indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
2022-03-15 15:17:08,125 INFO Welcome to the CDS
2022-03-15 15:17:08,150 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/sis-agrometeorological-indicators
2022-03-15 15:17:08,247 INFO Request is queued
2022-03-15 15:17:09,290 INFO Request is running
2022-03-15 15:17:16,573 INFO Request is completed
2022-03-15 15:17:16,577 INFO Downloading https://download-0002.copernicus-climate.eu/cache-compute-0002/cache/data9/dataset-sis-agrometeorological-indicators-4887cbbf-2417-4925-ac6d-748d074c37c6.zip to ./2019_02.zip (178.9M)
2022-03-15 15:17:50,673 INFO Download rate 5.2M/s                                                                      
  index = joiner(matching_indexes)
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set

2022-03-15 15:41:21,930 INFO Welcome to the CDS
2022-03-15 15:41:21,934 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/sis-agrometeorological-indicators
2022-03-15 15:41:22,020 INFO Request is queued
2022-03-15 15:41:23,064 INFO Request is running
2022-03-15 15:41:43,077 INFO Request is completed
2022-03-15 15:41:43,080 INFO Downloading https://download-0012.copernicus-climate.eu/cache-compute-0012/cache/data6/dataset-sis-agrometeorological-indicators-2d6c20ac-2160-43de-9030-08ec46bed2b2.zip to ./2019_06.zip (191.3M)
2022-03-15 15:42:27,872 INFO Download rate 4.3M/s                                                                      
  index = joiner(matching_indexes)
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  return self.array[key]
  indexer 

2022-03-15 16:09:55,666 INFO Welcome to the CDS
2022-03-15 16:09:55,668 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/sis-agrometeorological-indicators
2022-03-15 16:09:55,774 INFO Request is queued
2022-03-15 16:09:56,820 INFO Request is running
2022-03-15 16:10:09,215 INFO Request is completed
2022-03-15 16:10:09,219 INFO Downloading https://download-0000.copernicus-climate.eu/cache-compute-0000/cache/data7/dataset-sis-agrometeorological-indicators-7745ba7d-4fa0-4027-a667-77782e45e9bb.zip to ./2019_10.zip (196.7M)
2022-03-15 16:10:57,037 INFO Download rate 4.1M/s                                                                      
  index = joiner(matching_indexes)
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  return self.array[key]
  indexer 

In [14]:
for year in years:
    new_dir = path + year
    Path(new_dir).mkdir(parents=True, exist_ok=True)    
    for variable, statistic in parameter:
        for month in months:
            
#             #delete all the data because it is kind of a lot if we keep all
            delete_month_files(path,year,month)  

FileNotFoundError: [WinError 2] The system cannot find the file specified: './2019_01.zip'

Now we can open the new small netcdf files using xarray again. From there we can convert to pandas dataframe as well.

In [3]:
import os
import xarray as xr
import pandas as pd

# graz_dir = '../../output_intermediate/95_graz_data'
# graz_dir = '../data2019'
graz_dir = r'C:\Users\vpetric\Skidanje podataka s postaja\graz_data'

df = pd.DataFrame()
for variable, statistic in parameter:
    tmp2 = pd.DataFrame()
#     for year in range(2010, 2022):
    for year in years:
        tmp1 = pd.DataFrame()
        for month in months:
            
            path = os.path.join(graz_dir, 
                                'graz_' + variable + '_' + statistic + f'_{year}_' + month + '.nc')  
            try:
                tmp = xr.open_mfdataset(path).to_dataframe().drop(['lon', 'lat'], axis=1).drop_duplicates()
                tmp1 = pd.concat([tmp1, tmp], axis=0)
            except:
                print(f'failed: {path}')

        tmp2 = pd.concat([tmp2, tmp1], axis=0)
        print(f'Done year: {year}')
        
print(f'Done: {variable}')
df = df.join(tmp2, how='outer')
    
df = df.drop_duplicates()
df

Done year: 2019
Done year: 2019
Done year: 2019
Done year: 2019
Done year: 2019
Done: vapour_pressure


Unnamed: 0_level_0,Cloud_Cover_Mean,Dew_Point_Temperature_2m_Mean,Temperature_Air_2m_Mean_24h,Vapour_Pressure_Mean,Wind_Speed_10m_Mean
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2019-01-01,0.513621,270.852692,273.349487,5.318070,1.645342
2019-01-02,0.413169,268.585144,274.684113,4.603710,2.528899
2019-01-03,0.550000,264.981750,272.861450,3.553739,3.902927
2019-01-04,0.591312,264.868134,271.227051,3.496798,2.558257
2019-01-05,0.886479,268.008911,271.696198,4.365333,1.676115
...,...,...,...,...,...
2019-12-27,0.642538,273.451904,275.346069,6.335325,1.330873
2019-12-28,0.287067,267.957886,274.620728,4.369819,3.989184
2019-12-29,0.371120,266.773010,272.846161,3.985858,3.488879
2019-12-30,0.019747,267.687042,270.926544,4.257966,1.243226


In [4]:
df.head(365)


Unnamed: 0_level_0,Cloud_Cover_Mean,Dew_Point_Temperature_2m_Mean,Temperature_Air_2m_Mean_24h,Vapour_Pressure_Mean,Wind_Speed_10m_Mean
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2019-01-01,0.513621,270.852692,273.349487,5.318070,1.645342
2019-01-02,0.413169,268.585144,274.684113,4.603710,2.528899
2019-01-03,0.550000,264.981750,272.861450,3.553739,3.902927
2019-01-04,0.591312,264.868134,271.227051,3.496798,2.558257
2019-01-05,0.886479,268.008911,271.696198,4.365333,1.676115
...,...,...,...,...,...
2019-12-27,0.642538,273.451904,275.346069,,1.330873
2019-12-28,0.287067,267.957886,274.620728,,3.989184
2019-12-29,0.371120,266.773010,272.846161,,3.488879
2019-12-30,0.019747,267.687042,270.926544,,1.243226


In [18]:
df.to_csv(r'C:\Users\vpetric\Skidanje podataka s postaja\Zagreb.csv')


In [None]:
factor = []

In [None]:
for factor in factors:
    for year in years:
        df_f = df[[factor]].loc[year]
        df_f['month'] = df_f.index.month
        df_f['day'] = df_f.index.day
        df_pivot = df_f.pivot(index='day', columns='month', values=factor)

        plt.figure(figsize=(8, 6), dpi=80)
        ax = sns.heatmap(df_pivot,linewidths=.5,cmap="YlGnBu")
        #plt.title(year+'\n'+factor, fontdict={'fontsize':15})
        plt.title(year+', '+factor, fontdict={'fontsize':15})
        plt.xlabel('Months', fontsize = 15) # x-axis label with fontsize 15
        plt.ylabel('Days', fontsize = 15) # y-axis label with fontsize 15
        #plt.yticks(rotation=90)