# Retrieve ERA5  Reanalysis Data from Copernicus API

# Table of Contents
 1. [Purpose](#Purpose)
 1. [Instructions](#Instructions)
   1. [Inputs](#Inputs)
   1. [Outputs](#Outputs)
   1. [Requirements](#Requirements)
 1. [Additional resources](#Additional-resources)
 1. [Import packages](#Import-packages)
 1. [User Inputs](#User-Inputs)
 1. [Download data](#Download-data)
 1. [Ingest data](#Ingest-data)
 1. [Transform data](#Transform-data)
 1. [Export data](#Export-data)
 1. [Wrap up](#Wrap-up)

# Purpose
[Back to TOC](#Table-of-Contents) | [Submit feedback to the Huracan team](https://forms.office.com/r/3retMYiQPZ)

Queries Copernicus CDS API ERA5 data and returns the 9 nearest points data in .txt format.<br>

# Instructions
[Back to TOC](#Table-of-Contents) | [Submit feedback to the Huracan team](https://forms.office.com/r/3retMYiQPZ)

1. Verify that you have cdspi package installed and Auth token configured (https://cds.climate.copernicus.eu/api-how-to). Verify that xarray and dask packages are installed as well.
2. Fill out all inputs in User Input cell
3. Run relevant cells in this notebook

**To check hornet job queue for status of previously submitted jobs:**
1. Re-run User Input cell
2. Re-run cell for checking status of your request within the hornet queue

## Inputs
- site:  Name of the site (used for naming files generated in Outputs)
- lat:   Latitude of the site (decimal deg.)
- lon:   Longitude of the site (decimal deg.)
- outputDir:   Output directory where .txt should be saved
- start_date:   The start date for data query ('dd-mm-yyyy')
- end_date:   The end date for data query ('dd-mm-yyyy')
- TandPData: Chose True or False (no quotation marks) to request separate temperature and pressure data files
- otherData: Select other datasets that may be relevant. Variable names may be found in the data documentation at https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation

## Outputs
A .txt file for each of the selected variables. Each file contains the hourly time series of that specific variable and period, for the ERA5 cell that corresponds to the input coordinates and the 8 adjacent cells.


## Requirements
Must be run in huracan_env environment, may require a connection to the Houston, Singapore or UK VPN.

## Note
This notebook is mainly intended for the download of ERA5 reanalysys data from recent months or years that is not available from Green Power Monitoring servers. Since data requests take some time to be fulfilled by CDS due to the high volume of requests it manages on a regular basis, downloading many years worth of data or a lot of variables may take a large amount of time. Should that be the case, running multiple instances of this notebook will allow to log several request in the CDS queue simultaneously, which will significantly reduce the time needed to complete the download.

# Additional resources
 - [Huracan Training - Wind](https://dev.azure.com/energy-innovation/huracan/_wiki/wikis/huracan.wiki/74/Huracan-Training-Wind) - training videos and other resources
 - [Submit feedback to the Huracan team](https://forms.office.com/r/3retMYiQPZ)

# Import packages
[Back to TOC](#Table-of-Contents) | [Submit feedback to the Huracan team](https://forms.office.com/r/3retMYiQPZ)

In [1]:

import huracan as hn
import numpy as np
import pandas as pd
import xarray as xr
import math
import os
#import time
import datetime

from get_era5_cdsapi import *
from process_era5 import *

import logging
logger = logging.getLogger("cdsapi")
logger.setLevel(logging.INFO)


verit_user = hn.utils.login.get_user()
min_hn_version = '2020.12.23'
hn_version = hn.__version__
hn.utils.tests.assert_huracan_is_current(min_hn_version)

# User Inputs
[Back to TOC](#Table-of-Contents) | [Submit feedback to the Huracan team](https://forms.office.com/r/3retMYiQPZ)

In [2]:
# Project info
site = "Test Site"
project_number = None
lat =  40.9678
lon =  26.2941
outputDir = r"."


start_date = '2021-10-01'
#end_date = '2022-03-25'
end_date = None # Set end date to None to use current date  

if end_date is None:
    d = datetime.datetime.now()
    end_date = d.date().strftime("%Y-%m-%d")
    print(f'Setting end date to : {end_date}')
    
# define the datasets required
windData = True
TandPData = False
otherData = None
otherData = ['forecast_surface_roughness','large_scale_snowfall_rate_water_equivalent']


Setting end date to : 2022-03-31


# Download data
[Back to TOC](#Table-of-Contents) | [Submit feedback to the Huracan team](https://forms.office.com/r/3retMYiQPZ)

This function downloads ERA5 data for the selected variables from Copernicus Data Store via its API (cdsapi). First it prepares the latitude and longitude inputs and then passes those inputs along with the rest to the function that retrieves the data from CDS. After calling this function, the output will be a series of .nc files containing the downloaded data for each variable and year.

You can check the status of you request [here](https://cds.climate.copernicus.eu/cdsapp#!/yourrequests)

In [3]:
vars, vars_others=download_era5(site, lat, lon, start_date, end_date, outputDir, windData,TandPData, otherData)

--- Year: 2021

- Requesting var: 100m_u_component_of_wind

Current Time:13:11:41


INFO:cdsapi:Welcome to the CDS
INFO:cdsapi:Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-single-levels
INFO:cdsapi:Request is queued


KeyboardInterrupt: 

# Ingest data
[Back to TOC](#Table-of-Contents) | [Submit feedback to the Huracan team](https://forms.office.com/r/3retMYiQPZ)

Since .nc data is somewhat complex to work with, this function processes previously downloaded .nc files to dataframes for each variable and year. As inputs, it takes the start and finish date, as well as the variables that have been downloaded. Returns a dataframe that aggregates all the downloaded information, on which operations and modifications can be more easily performed.

In [None]:
df_data, vars_short=ingest_era5(start_date, end_date, vars, site, outputDir)

In [None]:
df_data

# Transform data
[Back to TOC](#Table-of-Contents) | [Submit feedback to the Huracan team](https://forms.office.com/r/3retMYiQPZ)

Now that data is stored in a dataframe and ready for manipulation, this function takes the ERA5 dataframe, along with the downloaded variables, and performs two sequential operations: 
-Step1: transforms wind data from cartesian to polar form, and changes units of temperature to ºC and pressure to mbar.
-Step2: transposes to match DNVGL 9 coordinates format to export.
Returns the transformed dataframe, along with an array containing the 9 coordinates.

In [None]:
df_data,cpoints=transform_era5(df_data, windData, TandPData, otherData, vars_others, vars_short)

In [None]:
df_data

# Export data
[Back to TOC](#Table-of-Contents) | [Submit feedback to the Huracan team](https://forms.office.com/r/3retMYiQPZ)

Finally, it is necessary to export the data so that it may be used in other applications. This function takes the ERA 5 dataframe, the coordinates array and initial inputs. For each of the datasets requested, this function prepares and exports a .txt file with the hourly values for each variable: wind, temperature and pressure, and other data.

In [None]:
export_era5(df_data, cpoints, site, windData, TandPData, otherData, vars, vars_others, vars_short, outputDir)

# Wrap-up
[Back to TOC](#Table-of-Contents) | [Submit feedback to the Huracan team](https://forms.office.com/r/3retMYiQPZ)

## Check that cells were run in order

Please run the cell below to verify that cells were run in order. This will show a warning if the number of cell executions is more than the number of cells in the original version of this notebook.

In [None]:
#cells_run = get_ipython().execution_count
#test_result = hn.utils.tests.compare_cell_count(cells_in_notebook=12, cells_executed=cells_run, print_positive=True)