## Vertical interpolation

Applied to horizontally coarse-grained data

**Why is linear interpolation insufficient?** <br>
Assume z_ifc_highres = \[0,1,2,3\] and z_ifc_lowres = \[0, 3\]. <br>
Assume clc_highres = \[0, 100, 0\] (is defined on full levels). After linear interpolation to the full level at a height of 1.5 we get clc_lowres = \[100\].

Now what we want is actually clc_lowres = \[33.33\] as an average/integral over the high-res grid cells. <br>
Note that for instance for z_ifc_highres = \[0,1,2\] and z_ifc_lowres = \[0, 2\], clc_highres = \[0, 100\] we'd get clc_lowres = \[50\] for both methods.

Let $\mathcal{G}$ be an arbitrary low-res grid cell with z(upper half level) $ = z_u$ and z(lower half level) $ =  z_l$.

Our goal is to compute $x(\mathcal{G}) = \frac{1}{z_u-z_l}\int_{z_l}^{z_u} \hat{x}$

as the coarse-grained variable $x(\mathcal{G})$. We integrate over the high-res variables $\hat{x}$. <br>
Let $\{\mathcal{H}_i\}_{i=1}^n$ be the high-res grid cells, where $z_u \geq \hat{z}^i_l \geq z_l$ or $z_u \geq \hat{z}^i_u \geq z_l$. Here $\hat{z}^i_l$ and $\hat{z}^i_u$ is the lower/upper half level of $\mathcal{H}_i$.
Then $\int_{z_u}^{z_l} \hat{x} = \sum_{i=1}^n \hat{x}(\mathcal{H}_i) (min(z_u, \hat{z}^i_u) - max(z_l, \hat{z}^i_l))$.


In [1]:
# 1) Load the data we want to interpolate vertically (Define the path, load all high-res and low-res half levels)
# 2) Compute the coarse-grained versions.
# 3) Save the output as a netcdf-file.

# Note: Be careful with NANs

In [2]:
import os
import xarray as xr
import numpy as np

In [3]:
# Define all paths
narval_path = '/pf/b/b309170/my_work/NARVAL'
clc_path = os.path.join(narval_path, 'data/clc')
output_path = os.path.join(narval_path, 'data_var_vertinterp/clc')
zg_lowres_path = os.path.join(narval_path, 'data_var_vertinterp/zg')
zg_highres_path = os.path.join(narval_path, 'data/z_ifc')

# Which file to load
input_file = os.listdir(clc_path)[0]
print(input_file)

clc_R02B04_NARVALI_2013120800_cloud_DOM01_0029.nc


In [4]:
len(os.listdir(clc_path))

1699

In [5]:
# Load files (ds_zh_lr = ds_zhalf_lowres)
ds = xr.open_dataset(os.path.join(clc_path, input_file))
ds_zh_lr = xr.open_dataset(os.path.join(zg_lowres_path, 'zghalf_icon-a_capped.nc'))
ds_zh_hr = xr.open_dataset(os.path.join(zg_highres_path, 'z_ifc_R02B04_NARVALI_fg_DOM01.nc'))

In [7]:
# Extract values
clc = ds.clc.values
zh_lr = ds_zh_lr.zghalf.values
zh_hr = ds_zh_hr.z_ifc.values
clc.shape

(1, 75, 20480)

In [7]:
# Extract not-nan entries (clc_n = clc_notnan)
not_nan = ~np.isnan(clc[0,74,:])
clc_n = clc[:,:,not_nan]
zh_lr_n = zh_lr[:,not_nan]
zh_hr_n = zh_hr[:,not_nan]
# print(zh_lr_n.shape)
# print(zh_hr_n.shape)
# clc_n.shape

In [11]:
# Modify the ndarray. Desired output shape: (1, 31, 1306). (clc_out = clc, vertically interpolated)
clc_out = np.full((1, 31, 1306), np.nan)

# Pseuodocode:
# For every horizontal field i: <-- Maybe we can slice over the horizontal fields
# For every layer j:
# Set z_u=zh_lr_n[j, i] and z_l=zh_lr_n[j+1, i] <-- Define z_u and z_l, as encompassing layer j
# Collect all k with z_l <= zh_hr_n[k,i] <= z_u <-- Get all high-res half-level in between z_l and z_u
# sum += (np.minimum(z_u, zh_hr_n[k,i]) - np.maximum(zh_hr_n[k+1,i], z_l))*clc[0, k, i] over all k
# clc_out[0, j, k] = sum/(z_u - z_l)

# Pretty fast implementation:
for j in range(1):
    z_u = zh_lr_n[j, :]
    z_l = zh_lr_n[j+1, :]
    weights = np.maximum(np.minimum(z_u, zh_hr_n[:-1]) - np.maximum(zh_hr_n[1:], z_l), 0)
    
#     Equivalent to clc_out[0,j,:] = np.diagonal(weights.T @ clc_n[0])/(z_u - z_l), only much faster:
    clc_out[0,j,:] = np.einsum('ij,ji->i', weights.T, clc_n[0])/(z_u - z_l)
    
    print(len((z_u - z_l)))
    
    # If the low-dim grid extends farther than the high-dim grid, we reinsert nans:
    should_be_nan = np.where(np.abs((z_u - z_l) - np.sum(weights, axis = 0)) >= 0.5)
    clc_out[0,j,should_be_nan] = np.full(len(should_be_nan), np.nan)

1306


In [122]:
# Put it back in
clc_new = np.full((1, 31, 20480), np.nan)
clc_new[:,:,not_nan] = clc_out
clc_new_da = xr.DataArray(clc_new, coords={'time':ds.time, 'lon':ds.clon, 'lat':ds.clat, 'height':ds.height[:31]}, 
                          dims=['time', 'height', 'cell'], name='clc') 

# Save it in a new file
output_file = 'int_var_' + input_file
clc_new_da.to_netcdf(os.path.join(output_path, output_file))

### TESTING
As a first test I compared the means, max/means with the linearly interpolated data to see if they are close. <br>
As a second test I compare the vertically interpolated cloud cover for an arbitrary data point:

In [109]:
# Arbitrary data point
k = 24
l = 700

z_u = zh_lr_n[k, l] #1674.72
z_l = zh_lr_n[k+1, l] #1302.34
np.where(zh_hr_n[:, l] <= z_u) #60 and above
np.where(zh_hr_n[:, l] >= z_l) #61 and below
zh_hr_n[59, l] #1784.95
zh_hr_n[60, l] #1615.85
zh_hr_n[61, l] #1454.5
zh_hr_n[62, l] #1300.94
zh_hr_n[60, l] - zh_hr_n[61, l] #161.35
zh_hr_n[61, l] - z_l #152.15
z_u - zh_hr_n[60, l] #58.87
clc_n[0, 59, l] #5.35
clc_n[0, 60, l] #4.72
clc_n[0, 61, l] #4.85

z_u-z_l #372.38

372.3834956168473

In [111]:
np.abs(clc_out[0,k,l]-(58.87*5.35+161.35*4.715+152.15*4.85)/372.3835) < 1e-2

True