# Reanalysis comparison with observations for energy sector applications 

Production date: 12-05-2024

Produced by: CNR-ISMAR

## 🌍 Use case: Investigating space-time variability of surface wind speed and its extremes in support of the wind energy sector 

## ❓ Quality assessment question

* **Do state-of-the-art reanalysis products correctly represent surface and near-surface wind speeds?**


By comparing model results with independent observations, differences between the model and observations can be identified and quantified. Here we extract results from three recent papers, which concentrate on evaluating the latest reanalysis by the European Centre for Medium-Range Weather Forecasts (ECMWF), ERA5, in its capability to correctly represent the wind speed both at surface and at the wind turbine hub heights, over those regions where the wind speed is significant for the wind energy production. 

Typical modern turbine hub heights are between 80 and 120 m, which can be preferable to surface level measurements when considering the absence of surrounding obstructions and thus for wind field monitoring and applications in power generation. At the same time, the maintenance of these structures is much more expensive than that of the surface stations, and therefore these data appear scarce, or make up time series that are not always long enough to serve the purpose.


## 📢 Quality assessment statements

```{admonition} These are the key outcomes of this assessment
:class: note
* ERA5 is the best-performing reanalysis for near-surface wind estimates and provides valuable climate information in near-real time, beneficial for various wind energy applications
* Accuracy: ERA5 shows good agreement with observational data across Europe. It successfully captures the spatial and temporal variability of wind speeds
* Biases: Some regional biases are noted, but overall, ERA5 performs better than previous reanalyses, such as ERA-Interim
* Climatological Patterns: ERA5 effectively reproduces climatological patterns and extremes, providing a reliable dataset for wind energy and climate studies
* Extremes: Both overestimation of light wind frequency and underestimation of strong wind frequency, and thus extreme events, is inherent in the nature of models
* Trends: A combination of several reanalyses is preferable when analysing long-term trends
```

## 📋 Methodology

The different spatio temporal characteristics of the wind field are of different interest depending on application. Here we finalize this work in support of the wind energy production, extrapolating results from recent scientific literature available [[1]](Ramon), [[2]](Fan). A focus over Europe is also reported [[3]](Molina). 

Uncertainties are provided through the comparison with ERA5 predecessor, ERA Interim [[4]](Dee), and other three state-of-the-art reanalysis products, that are JRA-55 [[5]](Kobayashi), MERRA-2 [[6]](Gelaro), and CFSv2 [[7]](Saha). 

Validation is carried out for two different [observational datasets](key-resources), described in the following. Note that none of the reanalyses considered here ingest surface winds from land stations, meaning that the validation is performed against wind speed measurements that have not already been assimilated. This is important to ensure a fair comparison, since a verification with observations employed in the assimilation of a reanalysis could lead to biased scores.

The Tall Tower Dataset is a worldwide network composed, at the time of the publication of the paper we are citing, by 222 towers [[8]](Ramon-Lledo). The dataset contains quality-controlled wind observations plus other climate observations such as temperature or relative humidity. Among these, those with the longest wind data series were selected, and compared to the average life of such a mast, the achievable length is around 3 years. So here the validation will be done through the comparison with the resulting 77 towers available.

```{figure} attachment:3b96de49-6a75-4ac5-aa9b-b1c04f42079a.png
---
height: 400px
---
Global distribution of the 77 tall towers. Colours indicate, for each tall tower, the height of the measuring level employed in this study. Reproduced from [[1]](Ramon). 
```
The HadISD is also a worldwide web, composed of more than 8000 weather stations, which data are collected from NOAA's NCDC to create a global Integrated Surface Dataset (ISD), and distributed by the UK Met Office Hadley Centre. The validation will be done through the comparison with the Met Office Hadley Centre's ISD, or HadISD, version 3.1.0.2019f, a global subdaily dataset based on the ISD. The 245 stations selected in the paper we are citing are those in Europe that have valid values in at least the 90% hourly of the time steps for the period 1979–2018. This results in 245 hourly averaged wind speed series at 10m height.   

```{figure} attachment:d3e18c0c-a911-4ec5-b871-06ab61a35345.png
---
height: 450px
---
Situation in the map of each meteorological station. Colour scale represents the score (see [Regional frequency distribution of hourly data](regional-frequency) of each location. The frequency distribution of the stations marked with a red symbol are represented in Figure 5: station 88 (× symbol), station 132 (square), station 240 (triangle) and station 242 (upside-down triangle). Reproduced from [[3]](Molina).
```

The [hourly single levels dataset](key-resources) is basically used, with different time aggregation. For consistency with the height at which the measurements are taken, both the variables at 10 m and those at 100 m of ERA5 are taken into account for comparison. When needed, to adjust both surface and near-surface to the closest 100m tower height, a vertical extrapolation of reanalysis wind speed is performed using a power-law relation:

$$
\frac{U_2}{U_1} = (\frac{z_2}{z_1})^\alpha
$$

where U2 and U1 represent wind speeds at heights z2 (e.g. 110 m) and z1 (e.g. 10 m), respectively, and $\alpha$ is a nondimensional wind
shear exponent, which is typically set to 0.143 [[9]](Touma) [[10]](Wang) [[11]](Tian).

Results are shown for:

**[](reanalysis-era5-monthly-single-levels_validation+uncertainties_q02:section-1)**
 * Validation over Europe with monthly average
 * Regional frequency distribution of hourly data

**[](reanalysis-era5-monthly-single-levels_validation+uncertainties_q02:section-2)**


## 📈 Analysis and results

(reanalysis-era5-monthly-single-levels_validation+uncertainties_q02:section-1)=
### 1. Uncertainties and validation in the tower regions with seasonal average

```{figure} attachment:87dcddde-2174-4c75-befe-0d8547ce8af9.png
---
height: 450px
---
Distribution plots summarizing the differences between observed and modelled seasonal climatologies for 77 tall towers in (a) December–January–February and (b) June–July–August. Reproduced from [[1]](Ramon). 
```

In this figure, seasonal climatologies have been computed from both tall tower observations and reanalysis datasets and their differences are plotted by means of distributions. The ERA5 and MERRA2 near-surface seasonal mean winds, together with surface winds from the five reanalysis products, are assessed. The multi-reanalysis mean (MR), which has been computed using only surface wind fields, is also included. 

In general terms, reanalysis datasets tend to show weaker seasonal mean winds than observed in both December–January–February (DJF) and June–July–August (JJA), JRA55 being the dataset that provides the widest range of values, as well as the biggest underestimation, out of the five reanalyses plus the MR. Regarding ERA5, it is observed that the near-surface winds reduce the spread of differences in JJA compared with the ERA5 surface winds.

#### Validation over Europe with monthly average

```{figure} attachment:05b239b1-1e39-4b0e-a44d-dd985b02ab37.png
---
height: 400px
---
Monthly wind speed (1979–2018) of HadISD meteorological stations (green colour) and ERA5 cells (transparent colour) box plots. Limits of the boxes represent the locations in the 25th and 75th percentile, and the black line in the middle represents the 50th percentile. The upper whisker is located at the maximum value, whereas the lower whisker is located at the minimum value. Reproduced from [[3]](Molina). 
```

Figure above shows the average annual cycle of 245 locations (weather stations and corresponding reanalysis grid cells) across Europe. Each box represents the average monthly distribution of hourly data at the stations over the period 1979-2018, green are the observations, transparent the reanalysis.

In substantial agreement with what found in section 1, it can be noted that ERA5 presents median values that are slightly larger in winter and slightly smaller in summer, and that upper extreme wind values (whisker ends) are generally larger for ERA5 than for observations, which means that the reanalysis gives a wider range of monthly wind values for the analysed period. 
It can be particularly noted for the autumn-winter months. Cold months also exhibit a clear asymmetry in ERA5 boxes, with lower percentiles width (25–50) larger than higher ones (50–75), which is not seen in observations boxes, although interquartile range (percentile 75 vs. percentile 25) is quite similar for both observations and reanalysis. Summer ERA5 boxes are, on the contrary, symmetric, although lower box values (percentile 25) seem to give smaller values than observational ones. Differences among locations seem to be larger in autumn-winter months, as the interquartile range and whiskers are bigger than spring-summer months in both series. 

(regional-frequency)=
#### Regional frequency distribution of hourly data

```{figure} attachment:0f9113d8-a4f5-4e26-8177-f4b58557552b.png
---
height: 400px
---
Frequency distribution of the hourly ERA5 reanalysis vs HadISD observations illustrating the total score in (a: station 88) the best score test (0.97) and (b: station 132, c: station 240 and d: station 242) the poorest score (0.45, 0.45 and 0.5, respectively). The location of each station can be seen marked with a red symbol in Figure 2: × symbol for station 88, square for 132, triangle for 240 and upside-down triangle for 242. Reproduced from [[3]](Molina). 
```

The score is an evaluation method based on the amount of overlap between the frequency distribution of the wind speed for observations:

$$
score = \sum_1^n min(Z_m,Z_0)
$$

where n is the number of bins used to calculate the frequency distribution for a given location (here, 0.5 m/s has been used as bin size), $Z_m$ is the fraction (or frequency) of values in a given bin from the reanalysis and $Z_0$ is the fraction (or frequency) of values in a given bin from the observed data. The sum of all $Z$ m is 1, and the same for $Z_0$ sum.

To illustrate specific cases with small and, more interestingly, large differences between wind distributions, figure 5 presents hourly frequency distribution performance of the best (a: station 88) and some of the worst (b: station 132, c: station 240 and d: station 242) stations. In the best one, reanalysis and observation distributions perfectly fits. For the poorer scored stations, it is seen that
the reanalysis tends to largely overestimate lower wind speed frequencies (0–3 m/s range) and underestimate the higher ones (4–8 m/s range).

Both overestimation of light wind frequency and underestimation of strong wind frequency, have been also seen in previous works when models are used and can be related to the fact that solving the dynamics equations is applied to each point continuously, at defined grids. Also, parameterizations in models, which means a limitation in several ways to fully describe atmospheric mechanisms, can lead to an underestimation of the most extreme events (see e.g. [[12]](Larsen); [[13]](Cannon)).

(reanalysis-era5-monthly-single-levels_validation+uncertainties_q02:section-2)=
### 2. Trends

```{figure} attachment:5ee722b0-03c4-4a4f-b2e4-ab7b8843e7c4.png
---
height: 700px
---
Normalized linear trend (% per decade), calculated as the linear trend of surface wind speeds divided by the seasonal mean surface wind speeds in DJF over the 1980–2017 period for (a) ERA-Interim, (b) ERA5, (c) JRA55, (d) MERRA2, and (e) R1. Hatched regions in (a)–(e) indicate where the trends are significant at the 95% confidence level. In (f) we represent an agreement map between the five reanalyses. Blues (reds) indicate agreement between the five reanalyses about the negative (positive) trends in the surface wind speed in DJF in the 1980–2017 period. An asterisk indicates that the trends are significant at the 95% confidence level: no asterisk indicates that the trends are not significant. One asterisk (∗) means that only one of the reanalyses has significant trends, two asterisks (∗∗) inform us that two reanalyses have significant trends, and so on. Reproduced from [[1]](Ramon). 
```

```{figure} attachment:8c612a1e-8ecb-4ed3-85ae-f1eefdd75ffb.jpg
---
height: 400px
---
Comparison on the overall trends during 1989–2018 in the observations against that in the five products. The gray, red, blue, pink, green, and orange bars represent observations, ERA5, ERA-Interim, JRA-55, CFSv2, and MERRA-2, respectively. The error bars show the upper and lower confidence limits of the Sen slope [[14]](Gocic) values with 95% confidence interval. Significance levels are expressed with stars, triangles, and circles, representing p < 0.01, p < 0.05, and p > 0.05, respectively. Reproduced from [[2]](Fan). 
```

Linear trends are presented in Figure 6 and 7 as the rate of change of wind speed over the considered period for the five chosen reanalysis products. At a global scale, no products show significant agreement with observations and large uncertainties and disagreements can be found between the reanalysis products. 

Also, observed land surface wind speeds have varied significantly over the past few decades, the decreasing trends having recently reversed (see e.g. Fig. 4, [[2]](Fan)). 

This is clearly connected to the assessment of wind resources. Regionally, recent increasing wind speed trends have led to rapid growth in potential wind energy production after the turning points - see Table 5, [[2]](Fan) - in Asia, including the increasing by 45% in Southeast Asia, 34% in East Asia, 30% in central Asia, and 23% in South Asia. In addition, potential wind energy production in South and North America, Africa, Australia, and Europe has increased by 20%, 10%, 12%, 23%, and 2%, respectively. Despite relatively slow growth of wind energy in Europe, the trends of potential wind energy production in all regions is positive in recent decades.

After the turning point, however, none of the reanalysis products shows the same positive and significant wind speed trend as the observations. The figure below shows that the products are highly uncertain in reproducing the observed decadal variations of wind speed at both the global and regional scale.



```{figure} attachment:47e78df7-b388-4e1f-b5ad-dff3a20ffda0.jpg
---
height: 1000px
---
Comparisons of the piecewise trends of the five reanalysis products with the observations in the following regions: (a) global, (b) North America, (c) central Asia, (d) Southeast Asia, (e) Africa, (f) Europe, (g) South America, (h) East Asia, (i) South Asia, and (j) Australia. The gray, red, blue, pink, green, and orange bars represent observations, ERA5, ERA-Interim, JRA-55, CFSv2, and MERRA-2, respectively. The trends and their 95% confidence intervals of Sen slope are calculated in each segment period using an M-K test. Significance levels are expressed with three stars, two stars, and n.s., representing p < 0.01, p < 0.05, and p > 0.05, respectively. Reproduced from [[2]](Fan). 
```

## ℹ️ If you want to know more	

[C3S: data to know which way the wind blows](https://climate.copernicus.eu/c3s-data-know-which-way-wind-blows)

[EU wind energy](https://energy.ec.europa.eu/topics/renewable-energy/eu-wind-energy_en)

[Renewables - Energy System - IEA](https://www.iea.org/energy-system/renewables) 

(key-resources)=
### Key resources

Some key resources and further readings were linked throughout this assessment. 

<a id='The_CDS_catalogue_entries_for_the_data_used_were:'></a>
The CDS catalogue entries for the data used were:

* ERA5 hourly data on single levels from 1940 to present: https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels?tab=overview
  
* ERA5 monthly averaged data on single levels from 1940 to present: https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels-monthly-means?tab=overview

<a id='Observational_datasets_can_be_publicly_accessed_at:'></a>
Observational datasets can be publicly accessed at: 

* The Tall Tower Dataset: in the data repository EUDAT at https://doi.org/10.23728/b2share.0d3a99db75df4238820ee548f35ee36b

* HadISD: https://www.metoffice.gov.uk/hadobs/hadisd/


### References

(Ramon)=
[[1]](https://rmets.onlinelibrary.wiley.com/doi/full/10.1002/qj.3616) Ramon, J. et al., 2019: What global reanalysis best represents near-surface winds? Quart. J. Roy. Meteor. Soc., 145, 3236–3251

(Fan)=
[[2]](https://doi.org/10.1175/JAMC-D-20-0037.1) Fan, W., Y. Liu, A. Chappell, L. Dong, R. Xu, M. Ekström, T. Fu, and Z. Zeng, 2021: Evaluation of Global Reanalysis Land Surface Wind Speed Trends to Support Wind Energy Development Using In Situ Observations. J. Appl. Meteor. Climatol., 60, 33–50

(Molina)=
[[3]](http://dx.doi.org/10.1002/joc.7103) Molina, M., Gutiérrez, C., Sánchez, E. Int J Climatol. 2021;41:4864–4878

(Dee)=
[[4]](https://doi.org/10.1002/qj.828) Dee, D. P., and Coauthors, 2011: The ERA-Interim reanalysis:
Configuration and performance of the data assimilation system. Quart. J. Roy. Meteor. Soc., 137, 553–597

(Kobayashi)=
[[5]](https://doi.org/10.2151/jmsj.2015-001) Kobayashi, S., and Coauthors, 2015: The JRA-55 reanalysis:
General specifications and basic characteristics. J. Meteor. Soc. Japan, 93, 5–48 

(Gelaro)=
[[6]](https://doi.org/10.1175/JCLI-D-16-0758.1) Gelaro, R., and Coauthors, 2017: The Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2). J. Climate, 30, 5419–5454

(Saha)=
[[7]](https://doi.org/10.1175/JCLI-D-12-00823.1) Saha, S., and Coauthors, 2014: The NCEP Climate Forecast System Version 2. J. Climate, 27, 2185–2208

(Ramon-Lledo)=
[[8]](https://doi.org/10.5194/essd-12-429-2020) Ramon, J. and Lledó, L. (2019) The Tall Tower Dataset. Technical Note. Barcelona: Barcelona Supercomputing Center–Centro Nacional de Supercomputación.

(Touma)=
[[9]](https://doi.org/10.1080/00022470.1977.10470503) Touma, J.S. (1977) Dependence of the wind profile power law on stability for various locations. Journal of the Air Pollution Control Association, 27, 863–866. 

(Wang)=
[[10]](https://doi.org/10.1016/j.rser.2016.01.057) Wang, J., J. Hu, and K. Ma, 2016: Wind speed probability distribution estimation and wind energy assessment. Renewable Sustainable Energy Rev., 60, 881–899.

(Tian)=
[[11]](https://doi.org/10.1016/j.energy.2018.11.027) Tian, Q., G. Huang, K. Hu, and D. Niyogi, 2019: Observed and global climate model based changes in wind power potential over the Northern Hemisphere during 1979–2016. Energy, 167, 1224–1235.

(Larsen)=
[[12]](https://doi.org/10.1002/we.318) Larsén, X. and Mann, J. (2009) Extreme winds from the ncep/ncar reanalysis data. Wind Energy, 12, 556–573.

(Cannon)=
[[13]](https://doi.org/10.1016/j.renene.2014.10.024) Cannon, D., Brayshaw, D., Methven, J., Coker, P. and Lenaghan, D. (2015) Using reanalysis data to quantify extreme wind power generation statistics: a 33 year case study in Great Britain. Renewable Energy, 75, 767–778.

(Gocic)=
[[14]](https://doi.org/10.1016/j.gloplacha.2012.10.014) Gocic, M., and S. Trajkovic, 2013: Analysis of changes in meteorological variables using Mann–Kendall and Sen’s slope estimator statistical tests in Serbia. Global Planet. Change, 100, 172–182.