# Particle interpolation with PanGEO and ICON mesh

This notebook looks at how we might be able to 

* use UXArray to load and store ICON mesh data and
* interpolate gridded data from the ICON vertices (or face centroids) to particle positions using [PanGeo-PyInterp](https://pangeo-pyinterp.readthedocs.io/en/latest/auto_examples/ex_unstructured.html)

1Before getting started, [download the input ICON mesh file](https://owncloud.gwdg.de/index.php/s/M5rjQDWel5OZcHH/download?path=%2F&files=Ocean_Channel_18000x6000_80km.nc) and move the file to a subdirectory in your current working directory called `./data`

In [4]:
# ! mkdir ./data
# ! mv ~/Downloads/Ocean_Channel_18000x6000_80km.nc ./data
! ls ./data

Ocean_Channel_18000x6000_80km.nc


In [8]:
import uxarray as ux

print(ux.__version__)
grid_path="./data/Ocean_Channel_18000x6000_80km.nc"

uxds = ux.open_grid(grid_path)
uxds.validate()
uxds

2024.8.2
Validating the mesh...
-No duplicate nodes found in the mesh.
-All nodes are referenced by at least one element.
-No face area is close to zero.
Mesh validation successful.


In [26]:
uxds.node_lon.max()

## Building the interpolation mesh for PanGeo-PyInterp
Following [this example](https://pangeo-pyinterp.readthedocs.io/en/latest/auto_examples/ex_unstructured.html), we start by creating a `pyinterp.RTree` mesh. From the ICON mesh, we are free to use the face corner node vertices (`uxds.node_lon`,`uxds.node_lat`), edge centers (`uxds.edge_lon`, `uxds.edge_lat`), or the face centroids (`uxds.face_lon`,`uxds.face_lat`). See [UXArray docs on Spherical Coordinates attributes](https://uxarray.readthedocs.io/en/latest/api.html#spherical-coordinates)

In this example, we'll use (arbitrarily) the face centroids. The data function that we interpolated is set to a function that is periodic in both latitude and longitude

$ f = \cos(4x) \sin(4y) $

where $x$ is the longitude (in radians) and $y$ is the latitude (in radians).

## Setting up the R-Tree mesh structure

In [35]:
import numpy as np
import pyinterp

# Initialize the RTree mesh
mesh = pyinterp.RTree()

# Populate the search tree
x = uxds.face_lon.to_numpy()*np.pi/180.0 # longitude in radians
y = uxds.face_lat.to_numpy()*np.pi/180.0 # latitude in radians

data = np.cos(4.0*x)*np.sin(4.0*y) # Setting up the data function
mesh.packing(np.vstack((uxds.face_lon.to_numpy(),uxds.face_lat.to_numpy())).T, data)

### Setting up the particles
Here, we generate 10,000 particles whose x and y positions are a random uniform distribution within the mesh extents.

In [None]:
# Get extents of model domain
lon_min = uxds.node_lon.min().to_numpy()
lon_max = uxds.node_lon.max().to_numpy()
lat_min = uxds.node_lat.min().to_numpy()
lat_max = uxds.node_lat.max().to_numpy()

# Generate random particle positions
n_particles = 10000 # Set the number of particles
lon_p = np.random.uniform(low=lon_min,high=lon_max,size=(n_particles,))
lat_p = np.random.uniform(low=lat_min,high=lat_max,size=(n_particles,))
x_p = lon_p*np.pi/180.0
y_p = lat_p*np.pi/180.0

# Calculate the exact values of the underlying function at the particle points (for error estimation)
exact_p = np.cos(4.0*x_p)*np.sin(4.0*y_p)

### IDW Method
[From the PyInterp docs](https://pangeo-pyinterp.readthedocs.io/en/latest/auto_examples/ex_unstructured.html)

"IDW uses a weighted average of the surrounding sample points, where the weight assigned to each point is inversely proportional to its distance from the target location. The further away a sample point is from the target location, the less influence it has on the estimated value. This method is relatively simple to implement and computationally efficient, but it can produce over-smoothed results in areas with a lot of sample points and under-smoothed results in areas with few sample points."

Here, we generate a random set of particle positions spread across the mesh and interpolate to the particle positions

In [41]:
import time

t0 = time.time()
data_p_idw, idw_neighbors = mesh.inverse_distance_weighting(np.vstack((lon_p,lat_p)).T,
                                                            within=False,
                                                            k=11,
                                                            num_threads=0)
t1 = time.time()

error_idw = np.max(np.abs(data_p_idw-exact_p))

print(f"IDW runtime : {t1-t0}")
print(f"IDW AbsMax Error : {error_idw}")
print(f"IDW min/max data range : {np.min(data_p_idw),np.max(data_p_idw)}")

IDW runtime : 0.013716459274291992
IDW AbsMax Error : 0.06275874492992772
IDW min/max data range : (-0.9966241385459502, 0.997536068137114)


### Radial Basis Function (RBF) Method
[From the PyInterp docs](https://pangeo-pyinterp.readthedocs.io/en/latest/auto_examples/ex_unstructured.html)

"RBF, on the other hand, models the spatial relationship between sample points and the target location by using a mathematical function (radial basis function) that is based on the distance between the points. The radial basis function is usually Gaussian, multiquadric, or inverse multiquadric. The estimated value at the target location is obtained by summing up the weighted contributions of all sample points. This method is more flexible than IDW as it can produce a wide range of interpolation results, but it can also be computationally expensive and susceptible to overfitting if not implemented carefully."

In [42]:
t0 = time.time()
data_p_rbf, rbf_neighbors = mesh.radial_basis_function(np.vstack((lon_p,lat_p)).T,
                                                            within=False,
                                                            k=11,
                                                            rbf='linear',
                                                            smooth=1e-4,
                                                            num_threads=0)
t1 = time.time()

error_rbf = np.max(np.abs(data_p_rbf-exact_p))

print(f"RBF runtime : {t1-t0}")
print(f"RBF AbsMax Error : {error_rbf}")
print(f"RBF min/max data range : {np.min(data_p_rbf),np.max(data_p_rbf)}")

RBF runtime : 0.023630142211914062
RBF AbsMax Error : 0.08380923023230913
RBF min/max data range : (-0.9918297385473974, 0.9835526185527179)


### Kriging Method
[From the PyInterp docs](https://pangeo-pyinterp.readthedocs.io/en/latest/auto_examples/ex_unstructured.html)

"Kriging, also known as Gaussian process regression, is a geostatistical method that models the spatial structure of the underlying data by using a covariance matrix. The estimated value at the target location is obtained by solving a set of linear equations that balance the fit to the sample points and the smoothness of the estimated surface. Kriging can produce more accurate results than IDW and RBF in many cases, but it requires a good understanding of the spatial structure of the data and can be computationally demanding."


In [43]:
t0 = time.time()
data_p_kriging, kriging_neighbors = mesh.universal_kriging(np.vstack((lon_p,lat_p)).T,
                                                            within=False,
                                                            k=11,
                                                            covariance='matern_12',
                                                            alpha=100_000,
                                                            num_threads=0)
t1 = time.time()

error_kriging = np.max(np.abs(data_p_kriging-exact_p))

print(f"Kriging runtime : {t1-t0}")
print(f"Kriging AbsMax Error : {error_kriging}")
print(f"Kriging min/max data range : {np.min(data_p_kriging),np.max(data_p_kriging)}")

Kriging runtime : 0.021558523178100586
Kriging AbsMax Error : 0.5087485804269507
Kriging min/max data range : (-0.9810884969341782, 0.950652530254272)
