# Weighted RMSE for spatial fields

We are often interested in measuring the difference between two spatial fields. These could be a model run and observations, or perhaps a model experiment versus a reference solution. This notebook shows how to implement an area-weighted root mean square error (RMSE). We will assume that the fields are 2-d (e.g., lat, lon) and that the weights are 1-d (e.g., cos(lat)). 

The RMSE is the square root of the MSE. The weighted MSE can be expressed

$$
\mathrm{MSE} = \sum_i w_i \left( (\hat{f_i} - f_i)^2 \right)
$$

where $w_i$ are the _normalized_ weights, $\hat{f_i}$ is the field estimate and $f_i$ is the reference or expected value. Of course, since squaring the difference makes it positive definite, the order of the difference does not matter. The subscript $i$ indicates all the points; when we apply the formula these are all 2-dimensional fields, but we just use a single index. One important note is that we are applying the weight to the difference of the fields, and _not_ to the fields individually. 

Then the RMSE is just
$$ \mathrm{RMSE} = \sqrt{\mathrm{MSE}} $$

If the weights are not normalized (i.e., $\sum_i w_i \ne 1$), the MSE expression should be divided by the sum of the weights.

In [1]:
import numpy as np
import xarray as xr

In [12]:
## example data
lat = np.linspace(-90, 90, 91)
lat = xr.DataArray(lat, dims=["lat"], coords={"lat":lat}, attrs={"name":"lat", "long_name":"latitude", "units":"degrees_north"})
lon = np.linspace(0,358,180)
lon = xr.DataArray(lon, dims=["lon"], coords={"lon":lon}, attrs={"name":"lon", "long_name":"longitude", "units":"degrees_east"})

In [14]:
fld1 = np.random.random((len(lat),len(lon)))
fld1 = xr.DataArray(fld1, dims=["lat","lon"], coords={"lat":lat, "lon":lon})

In [16]:
fld2 = np.random.random((len(lat),len(lon)))
fld2 = xr.DataArray(fld2, dims=["lat","lon"], coords={"lat":lat, "lon":lon})

In [17]:
wgt = np.cos(np.radians(lat))

In [18]:
# RMSE 
# = np.sqrt( MSE )
# MSE = MEAN( (A_i - E_i(A))**2) where E(A) is the expectation or the truth or reference value and i is index over all points

# For a weighted version, just make MEAN be the weighted mean:
# wMSE = wMEAN( (A_i - E_i(A))**2 ) = SUM( w_i * (A_i - E_i(A))**2 ) / SUM(w_i)


In [70]:
def wgt_rmse(fld1, fld2, wgt):
    """Calculated the area-weighted RMSE. 
    
    Inputs are 2-d spatial fields, fld1 and fld2 with the same shape.
    They can be xarray DataArray or numpy arrays.
    
    Input wgt is the weight vector, expected to be 1-d, matching length of one dimension of the data.
    
    Returns a single float value.
    """
    assert len(fld1.shape) == 2
    assert fld1.shape == fld2.shape
    if isinstance(fld1, xr.DataArray) and isinstance(fld2, xr.DataArray):
        return (np.sqrt(((fld1 - fld2)**2).weighted(wgt).mean())).values.item()
    else:
        check = [len(wgt) == s for s in fld1.shape]
        if ~np.any(check):
            raise IOError(f"Sorry, weight array has shape {wgt.shape} which is not compatible with data of shape {fld1.shape}")
        check = [len(wgt) != s for s in fld1.shape]
        dimsize = fld1.shape[np.argwhere(check).item()]  # want to get the dimension length for the dim that does not match the size of wgt
        warray = np.tile(wgt, (dimsize, 1)).transpose()   # May need more logic to ensure shape is correct.
        warray = warray / np.sum(warray) # normalize
        wmse = np.sum(warray * (fld1 - fld2)**2)
        return np.sqrt( wmse ).item()

In [75]:
## XARRAY VERSION
print(f"Xarray DataArray input: {wgt_rmse(fld1, fld2, wgt)}")
## NUMPY
print(f"Numpy array input: {wgt_rmse(fld1.values, fld2.values, wgt)}")

Xarray DataArray input: 0.4109429010580385
Numpy array input: 0.4109429010580347
