# JB3 - Algorithm Tuning for the basic algorithms

This notebook will show you how to generate an output for the basic regridding algorithms and play around with the tuneable paramters for these algorithms, which are search_radius and max_neighbours. We will first show a comparison of the 3 different algorithms for the same tuning parameters and then we demonstrate how to play around with the tuning parameters. We will use AMSR2 L1b brightness temperatures regridded to an EASE2 9km grid.

You can edit your config file to do the same process for different radiometers and/or bands/variables. 

# Step 1 - Import relevant modules

In [1]:
# Import Relevant Modules
from cimr_rgb.config_file import ConfigFile
from cimr_rgb.data_ingestion import DataIngestion
from cimr_rgb.regridder import ReGridder
from cimr_rgb.grid_generator import GRIDS
from cimr_rgb.product_generator import ProductGenerator

import copy
from numpy import full, nan, nanmean, isnan, isinf, polyfit

# You will need to configure matplotlib for your machine, you can probably remove the two lines below. 
import matplotlib # Remove this is plotting issues. 
tkagg = matplotlib.use('TkAgg') # Remove this if plotting issues. 
import matplotlib.pyplot as plt
plt.ion()

<contextlib.ExitStack at 0x7970469c3a00>

# Step 2 - Add your paths to the configuration file, load the configuration file and validate the configuration file

In this notebook, we show an alternate way to change configuration parameters, by loading 3 separate configuration files. We will use the same inital configuration file, with search radius left empty, but changing the regridding algorithms (NN, IDS, DIB)

In [2]:
# Define path for configuration file of the first regrid.
config_1_path = '/home/beywood/Desktop/MS3/CIMR-RGB/notebooks/JB3 - Tuning parameters for the basic algorithms/JB3.xml'



# Validate your config file
config_object = ConfigFile(config_1_path)

# You can inspect the parameters in the config file. There are additional parameters relevant to your particular configuration
# that are added as part of the validation. 
for attr, value in config_object.__dict__.items():
    print(f"{attr}: {value}")

/home/beywood/Desktop/MS3/CIMR-RGB/dpr/L1B/SMAP/SMAP_L1B_TB_47185_D_20231201T212120_R18290_001.h5
output_path: /home/beywood/Desktop/MS3/CIMR-RGB/notebooks/JB3 - Tuning parameters for the basic algorithms
timestamp: 2024-12-11_21-11-14
timestamp_fmt: %Y-%m-%d_%H-%M-%S
logpar_config: {'version': 1, 'disable_existing_loggers': False, 'loggers': {'rgb-logger': {'level': 'INFO', 'handlers': ['stdout', 'file'], 'propagate': False}}, 'handlers': {'stdout': {'class': 'logging.StreamHandler', 'formatter': 'simple', 'stream': 'ext://sys.stdout', 'level': 'INFO'}, 'file': {'class': 'logging.FileHandler', 'level': 'INFO', 'formatter': 'balanced', 'filename': PosixPath('/home/beywood/Desktop/MS3/CIMR-RGB/notebooks/JB3 - Tuning parameters for the basic algorithms/logs/JB3.log'), 'mode': 'w'}}, 'formatters': {'simple': {'format': '[%(name)s - %(levelname)s]: %(message)s'}, 'simple2': {'format': '[%(name)s - %(levelname)s | %(module)s - L%(lineno)d]: %(message)s'}, 'advanced': {'format': '[%(levelnam

We will begin the notebook by leaving search radius empty. This enforces the SMAP style neighbour search, in which samples are found that fall only within the output grid cell itself. This take longer than using a cirucular search radius. We begin on a 36km EASE2 Grid. 

In [4]:
# Compare the basic regridding algorithms 

regridding_algos = ['DIB', 'IDS', 'NN']

# Perform 3 separate regrids for the 3 different basic regridding algorithms 

# Ingestion
l1b_data = DataIngestion(config_object).ingest_data()

grid_shape = GRIDS[config_object.grid_definition]['n_rows'], GRIDS[config_object.grid_definition]['n_cols']

output_data = {}
for algo in regridding_algos:

    # Create a new config object for each regrid 
    config = copy.deepcopy(config_object)
    # change the regridding algorithm 
    config.regridding_algorithm = str(algo)
    
    if algo == 'NN':
        # It is required to enforce this on the fly for NN
        config.max_neighbours = 1
        config.search_radius = None
    
    # Perfrom regrid 
    l1c_data = ReGridder(config).regrid_data(l1b_data)
    # Extract data needed for plot comparison
    bt_h = l1c_data['L']['bt_h_fore']
    cell_row = l1c_data['L']['cell_row_fore']
    cell_col = l1c_data['L']['cell_col_fore']
    # Put data on grid 
    grid = full(grid_shape, nan)
    
    for sample in range(len(cell_row)):
        grid[cell_row[sample], cell_col[sample]] = bt_h[sample]

    output_data[algo] = grid

[rgb-logger - INFO]: `read_hdf5`
[rgb-logger - INFO]: ---------------------
[rgb-logger - INFO]: `read_hdf5` -- Started Execution
[rgb-logger - INFO]: `apply_smap_qc`
[rgb-logger - INFO]: ---------------------
[rgb-logger - INFO]: `apply_smap_qc` -- Started Execution
[rgb-logger - INFO]: Applying quality control to: `scan_quality_flag`
[rgb-logger - INFO]: Applying quality control to: `data_quality_h`
[rgb-logger - INFO]: Removing bad quality samples from: `z_velocity`
[rgb-logger - INFO]: Removing bad quality samples from: `bt_h`
[rgb-logger - INFO]: Removing bad quality samples from: `y_position`
[rgb-logger - INFO]: Removing bad quality samples from: `sub_satellite_lon`
[rgb-logger - INFO]: Removing bad quality samples from: `longitude`
[rgb-logger - INFO]: Removing bad quality samples from: `z_position`
[rgb-logger - INFO]: Removing bad quality samples from: `latitude`
[rgb-logger - INFO]: Removing bad quality samples from: `y_velocity`
[rgb-logger - INFO]: Removing bad quality sam

  delta_sigma = arccos(


[rgb-logger - INFO]: `lonlat_to_xy`
[rgb-logger - INFO]: ---------------------
[rgb-logger - INFO]: `lonlat_to_xy` -- Started Execution
[rgb-logger - INFO]: `lonlat_to_xy_cea`
[rgb-logger - INFO]: ---------------------
[rgb-logger - INFO]: `lonlat_to_xy_cea` -- Started Execution
[rgb-logger - INFO]: `lonlat_to_xy_cea` -- Executed in: 0.00s
[rgb-logger - INFO]: `lonlat_to_xy_cea` -- CPU User Time (Change): 0.01s
[rgb-logger - INFO]: `lonlat_to_xy_cea` -- CPU System Time: 0.00s
[rgb-logger - INFO]: `lonlat_to_xy_cea` -- CPU Total Time: 0.01s
[rgb-logger - INFO]: `lonlat_to_xy_cea` -- Process-Specific CPU Usage (Before): 0.00%
[rgb-logger - INFO]: `lonlat_to_xy_cea` -- Process-Specific CPU Usage (After): 1150.60%
[rgb-logger - INFO]: `lonlat_to_xy_cea` -- Memory Usage Change: 0.000000 MB
[rgb-logger - INFO]: ---------------------
[rgb-logger - INFO]: `lonlat_to_xy` -- Executed in: 0.01s
[rgb-logger - INFO]: `lonlat_to_xy` -- CPU User Time (Change): 0.01s
[rgb-logger - INFO]: `lonlat_to_xy

  average_values = nanmean(extracted_values, axis=1)


[rgb-logger - INFO]: `lonlat_to_xy`
[rgb-logger - INFO]: ---------------------
[rgb-logger - INFO]: `lonlat_to_xy` -- Started Execution
[rgb-logger - INFO]: `lonlat_to_xy_cea`
[rgb-logger - INFO]: ---------------------
[rgb-logger - INFO]: `lonlat_to_xy_cea` -- Started Execution
[rgb-logger - INFO]: `lonlat_to_xy_cea` -- Executed in: 0.00s
[rgb-logger - INFO]: `lonlat_to_xy_cea` -- CPU User Time (Change): 0.00s
[rgb-logger - INFO]: `lonlat_to_xy_cea` -- CPU System Time: 0.00s
[rgb-logger - INFO]: `lonlat_to_xy_cea` -- CPU Total Time: 0.00s
[rgb-logger - INFO]: `lonlat_to_xy_cea` -- Process-Specific CPU Usage (Before): 0.00%
[rgb-logger - INFO]: `lonlat_to_xy_cea` -- Process-Specific CPU Usage (After): 0.00%
[rgb-logger - INFO]: `lonlat_to_xy_cea` -- Memory Usage Change: 0.000000 MB
[rgb-logger - INFO]: ---------------------
[rgb-logger - INFO]: `lonlat_to_xy` -- Executed in: 0.01s
[rgb-logger - INFO]: `lonlat_to_xy` -- CPU User Time (Change): 0.00s
[rgb-logger - INFO]: `lonlat_to_xy` -

In [5]:
# Plot the data on a 36km grid

fig, axs = plt.subplots(1, 3)
im1 = axs[0].imshow(output_data['NN'][:,700:900], cmap = 'viridis')
axs[0].set_title('NN')
im2 = axs[1].imshow(output_data['DIB'][:,700:900], cmap = 'viridis')
axs[1].set_title('DIB')
im3 = axs[2].imshow(output_data['IDS'][:,700:900], cmap = 'viridis')
axs[2].set_title('IDS')
colorbar = fig.colorbar(im1, ax=axs[0])
colorbar = fig.colorbar(im2, ax=axs[1])
colorbar = fig.colorbar(im3, ax=axs[2])
colorbar.set_label('BT [K]', fontsize=12)
fig.suptitle('EASE2_36km Basic Algorithm Comparison')
plt.show()

{'DIB': array([[238.22451782, 238.46588135, 240.33514404, ..., 238.79963684,
        238.00788879, 240.15751648],
       [238.96035767, 238.33963013, 238.76460266, ..., 236.88739014,
        238.09211731, 239.3848877 ],
       [232.77294922, 234.99015808, 234.69506836, ..., 230.68786621,
        231.99543762, 232.25474548],
       ...,
       [         nan,          nan,          nan, ...,          nan,
                 nan,          nan],
       [         nan,          nan,          nan, ...,          nan,
                 nan,          nan],
       [         nan,          nan,          nan, ...,          nan,
                 nan,          nan]], shape=(406, 964)), 'IDS': array([[237.19053869, 237.97026355, 238.74528191, ..., 238.47260711,
        237.03085497, 239.90924429],
       [237.14267195, 239.4671583 , 239.1956744 , ..., 237.3146695 ,
        238.11039153, 237.72787137],
       [234.82498764, 234.8981886 , 235.30529335, ..., 231.15205756,
        231.50050635, 234.33318644],

# Moving to a 9km 

We now perform the same, but change the grid to a 9km grid to demonstrate the effect of incresaing the search radius. 

In [6]:
# Change the grid to a 9km grid and plt the results 

config_9km = copy.deepcopy(config_object)
config_9km.grid_definition = 'EASE2_G9km'

# Rerun the same code as above with the new grid 

# Ingest data
l1b_data = DataIngestion(config_9km).ingest_data()

# Define grid
grid_shape = GRIDS[config_9km.grid_definition]['n_rows'], GRIDS[config_9km.grid_definition]['n_cols']

output_data = {}
for algo in regridding_algos:
    # Create a new config object for each regrid 
    config = copy.deepcopy(config_9km)
    # change the regridding algorithm 
    config.regridding_algorithm = str(algo)
    if algo == 'NN':
        # It is required to enforce this on the fly for NN
        config.max_neighbours = 1
        config.search_radius = None
    
    # Perfrom regrid 
    l1c_data = ReGridder(config).regrid_data(l1b_data)
    # Extract data needed for plot comparison
    bt_h = l1c_data['L']['bt_h_fore']
    cell_row = l1c_data['L']['cell_row_fore']
    cell_col = l1c_data['L']['cell_col_fore']
    # Put data on grid 
    grid = full(grid_shape, nan)
    
    for sample in range(len(cell_row)):
        grid[cell_row[sample], cell_col[sample]] = bt_h[sample]

    output_data[algo] = grid

[rgb-logger - INFO]: `read_hdf5`
[rgb-logger - INFO]: ---------------------
[rgb-logger - INFO]: `read_hdf5` -- Started Execution
[rgb-logger - INFO]: `apply_smap_qc`
[rgb-logger - INFO]: ---------------------
[rgb-logger - INFO]: `apply_smap_qc` -- Started Execution
[rgb-logger - INFO]: Applying quality control to: `scan_quality_flag`
[rgb-logger - INFO]: Applying quality control to: `data_quality_h`
[rgb-logger - INFO]: Removing bad quality samples from: `z_velocity`
[rgb-logger - INFO]: Removing bad quality samples from: `bt_h`
[rgb-logger - INFO]: Removing bad quality samples from: `y_position`
[rgb-logger - INFO]: Removing bad quality samples from: `sub_satellite_lon`
[rgb-logger - INFO]: Removing bad quality samples from: `longitude`
[rgb-logger - INFO]: Removing bad quality samples from: `z_position`
[rgb-logger - INFO]: Removing bad quality samples from: `latitude`
[rgb-logger - INFO]: Removing bad quality samples from: `y_velocity`
[rgb-logger - INFO]: Removing bad quality sam

  output_var = nansum(output_var, axis =1)/ nansum(weights, axis=1)


[rgb-logger - INFO]: `xy_to_lonlat_cea` -- Executed in: 0.27s
[rgb-logger - INFO]: `xy_to_lonlat_cea` -- CPU User Time (Change): 0.20s
[rgb-logger - INFO]: `xy_to_lonlat_cea` -- CPU System Time: 0.07s
[rgb-logger - INFO]: `xy_to_lonlat_cea` -- CPU Total Time: 0.27s
[rgb-logger - INFO]: `xy_to_lonlat_cea` -- Process-Specific CPU Usage (Before): 0.00%
[rgb-logger - INFO]: `xy_to_lonlat_cea` -- Process-Specific CPU Usage (After): 98.80%
[rgb-logger - INFO]: `xy_to_lonlat_cea` -- Memory Usage Change: 95.554688 MB
[rgb-logger - INFO]: ---------------------
[rgb-logger - INFO]: `xy_to_lonlat` -- Executed in: 0.28s
[rgb-logger - INFO]: `xy_to_lonlat` -- CPU User Time (Change): 0.20s
[rgb-logger - INFO]: `xy_to_lonlat` -- CPU System Time: 0.08s
[rgb-logger - INFO]: `xy_to_lonlat` -- CPU Total Time: 0.28s
[rgb-logger - INFO]: `xy_to_lonlat` -- Process-Specific CPU Usage (Before): 0.00%
[rgb-logger - INFO]: `xy_to_lonlat` -- Process-Specific CPU Usage (After): 100.40%
[rgb-logger - INFO]: `xy_to

In [7]:
# Plot the data on a 9km grid

fig, axs = plt.subplots(1, 3)
im1 = axs[0].imshow(output_data['NN'][1075:1125,3140:3220], cmap = 'viridis')
axs[0].set_title('NN')
im2 = axs[1].imshow(output_data['DIB'][1075:1125,3140:3220], cmap = 'viridis')
axs[1].set_title('DIB')
im3 = axs[2].imshow(output_data['IDS'][1075:1125,3140:3220], cmap = 'viridis')
axs[2].set_title('IDS')
colorbar = fig.colorbar(im1, ax=axs[0])
colorbar = fig.colorbar(im2, ax=axs[1])
colorbar = fig.colorbar(im3, ax=axs[2])
colorbar.set_label('BT [K]', fontsize=12)
plt.suptitle('EASE2_G9km')
plt.show()

Note: In the plot made in the code block above we zoom into the results to demonstrate that there are many empty cells for L band on a 9km EASE grid. The plots appear empty if you display the full gri. In order to fill in the gaps, we can increase the search radius, which is demonstrated in the block below. 

We will now show the effect of changing the search radius on a 9km for IDS. 

In [8]:
# Rerun the same code as above with the new grid, using IDS on a 9km grid, for different search radii

search_radius = [5, 10, 30] #km

# Fix the regridding algo 
config_9km.regridding_algorithm = 'IDS'

grid_shape = GRIDS[config_9km.grid_definition]['n_rows'], GRIDS[config_9km.grid_definition]['n_cols']

output_data = {}
for radius in search_radius:
    # Create a new config object for each regrid 
    config = copy.deepcopy(config_9km)
    # change the search radius 
    config.search_radius = float(radius)*1000
    # Ingest Data
    l1b_data = DataIngestion(config_9km).ingest_data()
    
    # Perfrom regrid 
    l1c_data = ReGridder(config).regrid_data(l1b_data)
    # Extract data needed for plot comparison
    bt_h = l1c_data['L']['bt_h_fore']
    cell_row = l1c_data['L']['cell_row_fore']
    cell_col = l1c_data['L']['cell_col_fore']
    # Put data on grid 
    grid = full(grid_shape, nan)
    
    for sample in range(len(cell_row)):
        grid[cell_row[sample], cell_col[sample]] = bt_h[sample]

    output_data[str(radius)] = grid

[rgb-logger - INFO]: `read_hdf5`
[rgb-logger - INFO]: ---------------------
[rgb-logger - INFO]: `read_hdf5` -- Started Execution
[rgb-logger - INFO]: `apply_smap_qc`
[rgb-logger - INFO]: ---------------------
[rgb-logger - INFO]: `apply_smap_qc` -- Started Execution
[rgb-logger - INFO]: Applying quality control to: `scan_quality_flag`
[rgb-logger - INFO]: Applying quality control to: `data_quality_h`
[rgb-logger - INFO]: Removing bad quality samples from: `z_velocity`
[rgb-logger - INFO]: Removing bad quality samples from: `bt_h`
[rgb-logger - INFO]: Removing bad quality samples from: `y_position`
[rgb-logger - INFO]: Removing bad quality samples from: `sub_satellite_lon`
[rgb-logger - INFO]: Removing bad quality samples from: `longitude`
[rgb-logger - INFO]: Removing bad quality samples from: `z_position`
[rgb-logger - INFO]: Removing bad quality samples from: `latitude`
[rgb-logger - INFO]: Removing bad quality samples from: `y_velocity`
[rgb-logger - INFO]: Removing bad quality sam

In [10]:
# Plot the data on a 9km grid
fig, axs = plt.subplots(1, 3)
im1 = axs[0].imshow(output_data['5'][:,2800:3600], cmap = 'viridis')
axs[0].set_title('search_radius = 5km')
im2 = axs[1].imshow(output_data['10'][:,2800:3600], cmap = 'viridis')
axs[1].set_title('search_radius = 10km')
im3 = axs[2].imshow(output_data['30'][:,2800:3600], cmap = 'viridis')
axs[2].set_title('search_radius = 30')
colorbar = fig.colorbar(im1, ax=axs[0])
colorbar = fig.colorbar(im2, ax=axs[1])
colorbar = fig.colorbar(im3, ax=axs[2])
colorbar.set_label('BT [K]', fontsize=12)
plt.suptitle('Search Radius Comparison on EASE2 G9km')
plt.show()

{'5': array([[         nan,          nan,          nan, ...,          nan,
                 nan,          nan],
       [         nan,          nan,          nan, ...,          nan,
                 nan,          nan],
       [         nan,          nan,          nan, ..., 238.33769226,
                 nan,          nan],
       ...,
       [         nan,          nan,          nan, ...,          nan,
                 nan,          nan],
       [         nan,          nan,          nan, ...,          nan,
                 nan,          nan],
       [         nan,          nan,          nan, ...,          nan,
                 nan,          nan]], shape=(1624, 3856)), '10': array([[238.42680359, 238.45454407, 238.49917603, ..., 238.06008911,
        238.06010437, 238.4140625 ],
       [238.29966736, 238.23847961, 238.20671082, ..., 238.57771301,
        238.48339844, 238.38569641],
       [238.33769226, 238.337677  ,          nan, ..., 238.33769226,
        238.33769226, 238.33770752],
