---
title: Estimating biophysical traits using RTM inversion
subject: Tutorial
subtitle: Notebook to retrieve biophysical traits from satellite imagery
short_title: Bio_RTM
authors:
  - name: Vicente Burchard-Levine
    affiliation:
      - SpecLab-CSIC
    orcid: 0000-0003-0222-8706
    email: vburchard@ica.csic.es
  - name: Héctor Nieto
    affiliations:
      - ICA-CSIC
    orcid: 0000-0003-4250-6424
    email: hector.nieto@ica.csic.es
license: CC-BY-SA-4.0
keywords: TSEB, 3SEB, Copernicus, Satellite
---

# Summary 

Interactive jupyter notebook demonstrating the retrieval of biophysical variables by inverting a Radiative Transfer Model (RTM) using a hybrid appraoch, showing its applicability with Sentinel-2 imagery. 
This notebook will go through:

- Builing synthetic Lool-up-Table (LUT) with pypro4sail
- Training Random Forest algorithm with LUT
- Evaluating Random Forest algorithm to estimate biophysical traits.
- Inversion of Sentinel-2 bands to retrieve biophysical traits

# Instructions
Read carefully all the text and follow the instructions.

:::{hint} 

Once each section is read, run the jupyter code cell underneath (marked as `[]`) by clicking the icon `Run`, or pressing the keys SHIFT+ENTER of your keyboard.


:::

To start, please run the following cell to import all the packages required for this notebook. Once you run the cell below, an acknowledgement message, stating all libraries were correctly imported, should be printed on screen.


# Import Libraries

In [None]:
%matplotlib widget
import os 
import openeo
import numpy as np
import xarray as xr
import rasterio
from osgeo import gdal
from pathlib import Path
import multiprocess as mp
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pypro4sail import machine_learning_regression as inv
from functions.biophysical import get_diffuse_radiation_6S, build_soil_database, SRF_LIBRARY, S2_BANDS
from functions import gdal_utils as gu
from functions.eomaji.utils import date_selector
from sklearn.ensemble import RandomForestRegressor as rf_sklearn
import datetime as dt
from model_evaluation import double_collocation as dc
from dateutil.relativedelta import relativedelta
import datetime
import logging
logging.getLogger("sklearnex").setLevel(logging.ERROR)

print('libraries imported correctly!')

# General workflow

This biophysical retrieval method essentially inverts a radiative transfer model (RTM) using a hybrid method where a machine learning (ML) algorithm between top-of-canopy reflectances and biophysical traits is trained using a physically-based RTM (i.e. PRO4SAIL) and is then applied to the Sentinel-2 bands to retrieve the different biophysical variables. This is the method used in the official Sentinel-2 biophysical processor (documentation found [**here**](https://step.esa.int/docs/extra/ATBD_S2ToolBox_V2.1.pdf)) where the workflow is shown in the figure below:

:::{figure} ./input/figures/Biophysical_processor_figure.png
:alt: S2-BIOPAR
:name: S2-BIOPAR
Flow chart showing how the BIOPAR products are generated operationally (figure taken from [S2 Toolbox ATDB](https://step.esa.int/docs/extra/ATBD_S2ToolBox_V2.1.pdf))
:::


In this notebook, we will use a very similar methodology using the pyhton implementation of [**pypro4sail**](https://github.com/hectornieto/pypro4sail). It is essentially the same method described in the above figure, but using Random Forest regressor instead of an Artificial Neural Network (ANNs) algorithm and using a more computationally efficienty Pro4SAIL model (using jacobians).


# Building synthetic dataset

First, we will build a synthetic dataset where a look-up-table (LUT) will be created using PRO4SAIL that relates top-of-canopy spectra and biophysical variables.

The target biophysical variables are: ["Cab", "Car", "Cm", "Cw", "Ant", "Cbrown","LAI", "leaf_angle"].

### Leaf biochemical traits (from PROSPECT):

- **Cab (Chlorophyll a + b content)**
The amount of chlorophyll pigments per unit leaf area, usually expressed in µg/cm².

- **Car (Carotenoid content)**
Concentration of carotenoids (xanthophylls + carotenes) per unit leaf area, expressed in µg/cm².

- **Cm (Leaf dry matter content)**
The mass of dry matter (structural compounds like cellulose, lignin, proteins, etc.) per unit leaf area, usually in g/cm².

- **Cw (Equivalent leaf water thickness)**
The water content of the leaf per unit area, in cm (equivalent water thickness).

- **Ant (Anthocyanin content)**
Concentration of anthocyanins per unit leaf area (µg/cm²).

- **Cbrown (Brown pigment content)**
A semi-empirical parameter representing the amount of “brown pigments” (products of leaf senescence, degradation of chlorophyll, accumulation of tannins, etc.).

### Canopy structural traits (from SAIL):

- **LAI (Leaf Area Index)**
Total one-sided leaf area per unit ground area (m²/m²).

- **leaf_angle (Leaf angle distribution parameter)**
A parameter describing the average orientation of leaves in the canopy, often represented by an ellipsoidal distribution.
Low values → leaves more horizontally oriented (planophile).
High values → leaves more vertically oriented (erectophile).


## Forward RTM simulations to build LUT

We will perform 40000 PRO4SAIL simulations in Forward mode (i.e. using vegetation paramters to simulate surface reflectance based on the RTM's description of light-canopy intereactions) using global vegetation parameters bounds as specified below, which are taken from global datasets (e.g. [**LOPEX**](https://ecosis.org/package/leaf-optical-properties-experiment-database--lopex93-)). This is done to be able to train the model over a wide range of condtions and be globally applicable.  


In [None]:
n_simulations = 40000

# parameter names
OBJ_PARAM_NAMES = ["Cab", "Car", "Cm", "Cw", "Ant", "Cbrown",
                   "LAI", "leaf_angle"]
# parameter info
PARAM_PROPS = {"Cab": ["Chlorophyll a+b", r"$\mu g\,cm^{-2}$", 1],
               "Car": ["Carotenoids", r"$\mu g\,cm^{-2}$", 1],
               "Cm": ["Dry matter", r"$g\,cm^{-2}$", 3],
               "Cw": ["Water content", r"$g\,cm^{-2}$", 3],
               "Ant": ["Antocyanins", r"$\mu g\,cm^{-2}$", 1],
               "Cbrown": ["Brown pigments", r"$-$", 1],
               "LAI": ["Leaf Area Index", r"$m^{2}\,m^{-2}$", 2],
               "leaf_angle": ["Mean leaf inclination angle", r"º", 1]}

# specify range of variable values
## minimum
MIN_N_LEAF = 1.0  # From LOPEX + ANGERS average
MIN_CAB = 0.0  # From LOPEX + ANGERS average
MIN_CAR = 0.0  # From LOPEX + ANGERS average
MIN_CBROWN = 0.0  # from S2 L2B ATBD
MIN_CM = 0.0017  # From LOPEX + ANGERS average
MIN_CW = 0.000  # From LOPEX + ANGERS average
MIN_ANT = 0.0
MIN_LAI = 0.0
MIN_LEAF_ANGLE = 30.0  # from S2 L2B ATBD
MIN_HOTSPOT = 0.1  # from S2 L2B ATBD
MIN_BS = 0.50  # from S2 L2B ATBD

## maximum
MAX_N_LEAF = 3.0  # From LOPEX + ANGERS average
MAX_CAB = 110.0  # From LOPEX + ANGERS average
MAX_CAR = 30.0  # From LOPEX + ANGERS average
MAX_CBROWN = 2.00  # from S2 L2B ATBD
MAX_CM = 0.0331  # From LOPEX + ANGERS average
MAX_CW = 0.0525  # From LOPEX + ANGERS average
MAX_ANT = 40.0
MAX_LAI = 5  # from S2 L2B ATBD
MAX_LEAF_ANGLE = 80.0  # from S2 L2B ATBD
MAX_HOTSPOT = 0.5  # from S2 L2B ATBD
MAX_BS = 3.5  # from S2 L2B ATBD

prosail_bounds = {'N_leaf': (MIN_N_LEAF, MAX_N_LEAF),
                  'Cab': (MIN_CAB, MAX_CAB),
                  'Car': (MIN_CAR, MAX_CAR),
                  'Cbrown': (MIN_CBROWN, MAX_CBROWN),
                  'Cw': (MIN_CW, MAX_CW),
                  'Cm': (MIN_CM, MAX_CM),
                  'Ant': (MIN_ANT, MAX_ANT),
                  'LAI': (MIN_LAI, MAX_LAI),
                  'leaf_angle': (MIN_LEAF_ANGLE, MAX_LEAF_ANGLE),
                  'hotspot': (MIN_HOTSPOT, MAX_HOTSPOT),
                  'bs': (MIN_BS, MAX_BS)}
df_bounds = pd.DataFrame(prosail_bounds, index=['min', 'max'])
n_simulations = 40000
print(f'Setting up {n_simulations} simulations with inputs bounds:\n\n {df_bounds[OBJ_PARAM_NAMES]}')
params_orig = inv.build_prosail_database(n_simulations,
                                         param_bounds=prosail_bounds,
                                         distribution=inv.SALTELLI_DIST)
print('\nDone!')
print('Table with simulation inputs:')
pd.DataFrame(params_orig)

## Estimate diffuse irradiance

In [None]:
print(f"Running 6S for estimation of diffuse/direct irradiance")
# specify geometric variables (this normally can be acquired from Sentinel-2 metadata)
# As an example, we specify default values
aot = 1. # Aerosol optical thicness
wvp = 25. # water vapour
sza = 37.5 # sun zenith angle
saa = 180 # sun azimuth angle
vza = 25 # sensor viewing angle

# specify date
date_obj = dt.datetime(2023, 8, 5, 10, 30)

skyl = get_diffuse_radiation_6S(aot, wvp, sza, saa, date_obj,
                                                   altitude=0.1)
print('Done!')

## Build soil spectral database

In [None]:
print(f"Building {np.size(params_orig['bs'])} PROSPECTD+4SAIL simulations")
soil_spectrum = build_soil_database(params_orig["bs"])
print('Done!')

# Simulate Sentinel-2 spectra

We are interested in inverting Sentinel-2 top-of-canopy reflectance to retrieve the biophysical variables. As such, we will need to take into account the chracteristics of the Sentinel-2 sensor.


:::{figure} ./input/figures/S2_bands_info.jpg
:alt: S2
:name: S2-bands
Sentinel-2 MSI bands information (figure taken from [Pasqualotto et al. 2019](https://ieeexplore.ieee.org/document/8909218))
:::

In this case, we will use bands ['B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B11', 'B12'] as our 'features' to invert the RTM.

First, will need to convolve the surface reflectance to the spectral response function (SRF) of the Sentinel-2 sensor to best simulate this sensor.

### Visualize Sentinel-2 Spectral Response Function (SRF)

In [None]:
# bands to use in generating LUT and inversion
S2_BANDS = ['B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B11', 'B12']
# Stack spectral bands
srf = []
srf_file = SRF_LIBRARY / f'Sentinel2A.txt'
srfs = np.genfromtxt(srf_file, dtype=None, names=True)
for band in S2_BANDS:
    srf.append(srfs[band])


# open as pandas dataframe
srf_df = pd.read_csv(srf_file, sep = '\t')

band_names = srf_df[S2_BANDS].columns

# plot spectral response function
colormap = plt.cm.rainbow # Choose a colormap
# get color within colormap range for each band (depends on number of bands)
colors = [colormap(x / (len(band_names) - 1)) for x in range(len(band_names))]

# Show spectral response function (SRF) curves
plt.figure(figsize=(9, 5))
plt.title(f'Spectral Response Function (SRF) - Sentinel-2', fontsize=14)
plt.xlabel('Wavelength (nm)', fontsize=12)
plt.xlim(400, 2500)
plt.ylabel('Relative Response (-)', fontsize=12)
plt.ylim(0, 1)
plt.grid(True)
i = 0
for band in band_names:
    plt.plot(srf_df['SR_WL'], srf_df[band], color=colors[i], label = f'{str(band)}')
    i += 1

plt.legend(loc='lower right', ncol=5)
plt.show()



## Build simulated Sentinel-2-like Look-Up-Table (LUT)

Here we will mimic sentinel-2 surface reflectance using the PRO4SAIL RTM. For this testing, we will assume certain geometric and atmospheric conditions with specific values for sun zenith angle (sza), sun azimuth angle (saa), sensor viewing zenith angle (vza), water vapour (wvp).

In [None]:
# if you want to save the LUT generating you can can specify a directory for lut_outfile
lut_outfile = None

# spectral range
wls_sim = np.arange(400, 2501)

# number of CPUs to use to perform simulations 
# (can change depending on number of CPUs in your computer)
njobs = 4

# generate LUT
rho_canopy_vec, params = inv.simulate_prosail_lut_parallel(
        njobs,
        params_orig,
        wls_sim,
        soil_spectrum,
        skyl=skyl,
        sza=sza,
        vza=vza,
        psi=0,
        srf=srf,
        outfile=lut_outfile,
        calc_FAPAR=False,
        reduce_4sail=True)

print('Done!')


# Training model using synthetic LUT

Here we will train a Random Forest (RF) regressor with [**Scikit-learn**](https://scikit-learn.org/stable/) from the synthetic LUT generated with the PROSAIL simulations, which relate Sentinel-2 bands with biophysical variables.

We train individual RF models for each target variables (i.e. LAI, Cab, Car, Cm, Cw, Ant, Cbrown, leaf_angle)


In [None]:
# Approach 1: Train individual random forest model for each variable
print(f"Training {len(OBJ_PARAM_NAMES)} Random forests for "
      f"{','.join(OBJ_PARAM_NAMES)}")

# RF paramters
scikit_regressor_opts = {"n_estimators": 100,
                         "min_samples_leaf": 1,
                         "n_jobs": -1}

start_time = dt.datetime.today()
input_scalers = {}
output_scalers = {}
regs = {}
for i, param in enumerate(OBJ_PARAM_NAMES):
    reg, input_gauss_scaler, output_gauss_scaler, _ = \
        inv.train_reg(rho_canopy_vec,
                      params[param].reshape(-1, 1),
                      scaling_input='normalize',
                      scaling_output='normalize',
                      regressor_opts=scikit_regressor_opts,
                      reg_method="random_forest")

    input_scalers[param] = input_gauss_scaler
    output_scalers[param] = output_gauss_scaler
    regs[param] = reg
    
end_time_standard = dt.datetime.today() - start_time

print("\nProcessing time (Training):")
print(f"\t{len(OBJ_PARAM_NAMES)} Random forests: {end_time_standard}")
    
print('Done!')

# Testing model

We will now test the RF regression model using synthetic sentinel-2-like reflectance to see how well the trained RF model can simulate the RTM. In this case, we will add some random noise to the testing surface reflectance to better mimic real conditions that occur with optical sensors. 


In [None]:
# create synthetic Sentine-2-like spectra
rho_canopy_test, params_test = inv.simulate_prosail_lut_parallel(
        njobs,
        params_orig,
        wls_sim,
        soil_spectrum,
        skyl=skyl,
        sza=sza,
        vza=vza,
        psi=0,
        srf=srf,
        outfile=None,
        calc_FAPAR=False,
        reduce_4sail=True)

# set how much noise to add (either relative or absolute)
rel_unc = 0.1
abs_unc = 0.015
RELATIVE_UNC = 1
ABSOLUTE_UNC = 0
noise_method = RELATIVE_UNC
# add noise to mimic real conditions
if noise_method == RELATIVE_UNC:
    stdev = rho_canopy_test * rel_unc
    white_noise = np.random.normal(scale=stdev, size=rho_canopy_test.shape)
    rho_canopy_test = rho_canopy_test * (1 + white_noise)
else:
    white_noise = np.random.normal(scale=abs_unc, size=rho_canopy_test.shape)
    rho_canopy_test = rho_canopy_test + white_noise
    
print('Done!')

# Evaluation of the retrievals of biophysical variables

Now we will apply the trained model on the testing dataset to see how well we can estimate the biophysical traits using this hybrid approach.

In [None]:
figsize = 16 / 2.45, 22 / 2.45

start_time = dt.datetime.today()
# Apply individual RF model to test data
print(f"Applying individual RF regression model to Sentinel-like spectra")
output_regs = {}
for i, param in enumerate(OBJ_PARAM_NAMES):
    output = output_scalers[param].inverse_transform(
        regs[param].predict(
            input_scalers[param].transform(
                rho_canopy_test)).reshape(-1, 1)).reshape(-1)
    output_regs[param] = output

end_time_standard = dt.datetime.today() - start_time

print("\nProcessing time (Testing):")
print(f"\t{len(OBJ_PARAM_NAMES)} Random forests: {end_time_standard}")
print('\nPloting scatter plots...')
#outfile = out_dir / f"evaluation_singleRF.eps"
fig, axs = plt.subplots(ncols=2, nrows=4,
                        figsize=figsize)

fig.supxlabel("Estimated")
fig.supylabel("Observed")

axs = axs.reshape(-1)
error_table = pd.DataFrame({"Trait" : [], "N": [], "bias": [],
                            "RMSE": [], "r": []})
for i, param in enumerate(OBJ_PARAM_NAMES):
    name, unit, decs = PARAM_PROPS[param]
    txt_template =  ("   N: {:>6d}\n"
                     "bias: {:>6.%sf}\n"
                     "RMSE: {:>6.%sf}\n"
                     "   r: {:>6.2f}")%(decs, decs)
    
    test = output_regs[param]
    cor, *_ = dc.agreement_metrics(params_test[param] ,test)
    bias, mae, rmse = dc.error_metrics(params_test[param], test)
    dc.density_plot(test, params_test[param], axs[i], s=1, rasterized=True)

    absline = np.asarray([[np.amin(params_test[param]), np.amax(params_test[param])],
                          [np.amin(params_test[param]), np.amax(params_test[param])]])

    axs[i].plot(absline[0], absline[1], "k:")
    axs[i].set_title(f"{name} ({unit})")
    axs[i].text(0.05,
                0.95,
                txt_template.format(len(test), bias, rmse, cor),
                va="top",
                fontfamily="monospace",
                transform=axs[i].transAxes)

    error_dict = {"Trait" : [param], "N": [len(test)], "bias": [bias],
                  "RMSE": [rmse], "r": [cor]}
    error_table = pd.concat([error_table, pd.DataFrame(error_dict)],
                            ignore_index=True)

plt.tight_layout()
plt.show()
print('Error metrics table:')
error_table

:::{note}
As shown, this hybrid approach allows to train an empirical RF regressor based on the simulations of synthetic datasets of surface reflectance and biophysical variables from a physically-based RTM. This trained model can then be easily applied to Sentinel-2 imagery and is much more computationally efficient than inverting an RTM using traditional approaches.
:::

# Apply RF model to real Sentinel-2 image

Let us now apply this trained model on sentinel-2 image to retrieve the different biophysical traits. 

We wil again acquire the data from the Copernicus Data Space Ecosystem (CDSE).

:::{important}

In order to execute this notebook, you will need to register in the [**Copernicus Data Space Ecosystem (CSDE)**](https://dataspace.copernicus.eu/) to acquire and process Sentinel-2 imagery.
:::

## Connect to OpenEO Backend

In [None]:
connection = openeo.connect("https://openeo.dataspace.copernicus.eu")
connection.authenticate_oidc()

### Visualize information of Sentinel-2 collection

We can get all the band and metadta information related to the Sentinel-2 L2A product with the *connection.describe_collection("SENTINEL2_L2A")* function

In [None]:
connection.describe_collection("SENTINEL2_L2A")

# Load Sentinel-2 image using OpenEO

Choose date and area of interest of sentinel-2 imager

:::{note}

By default, we will use the image acquired over the WES Almond orchard near the UAV overpass (April 16th 2024)
:::

In [None]:

# Define search parameters
date = datetime.date(2024, 4, 16)
bbox = [-121.35,37.45, -121.10, 37.65] # please insert a bbox here in the form of [minx, miny, maxx, maxy
time_window = [
        str(date + relativedelta(days=-3)),
        str(date + relativedelta(days=+3)),
    ]
aoi = dict(zip(["west", "south", "east", "north"], bbox))

s2_ref_bands = [
        "B02",
        "B03",
        "B04",
        "B05",
        "B06",
        "B07",
        "B08",
        "B8A",
        "B11",
        "B12"
    ]

s2_meta_bands = ["SCL",
                 'WVP',
                 'AOT',
                 'sunAzimuthAngles',
                 'sunZenithAngles',
                 'viewZenithMean'
                ]


### Pre-process Sentinel-2 image

In [None]:
# set up outfile
s2_dir =  Path("./dataset/sentinel_imagery")
s2_outfile = s2_dir / "s2_cube_bio.nc"

overwrite = False

if s2_outfile.exists() and overwrite == False:
    print(f'{s2_outfile} already exists..')
else:
    print('Loading S2 data cube from Copernicus Data Space Ecosystem..')
    # Load Sentinel-2 cube and merge with Biopar
    s2_cube = connection.load_collection(
        "SENTINEL2_L2A", spatial_extent=aoi, temporal_extent=time_window, bands=s2_ref_bands+s2_meta_bands)
    
    print('Mask out non-vegetated pixels...')
    # Apply cloud and shadow mask using SCL (keep only class 4 and 5 = vegetation/bare)
    mask = ~((s2_cube.band("SCL") == 4) | (s2_cube.band("SCL") == 5))
    s2_masked = s2_cube.mask(mask)
    
    print('Select best available pixel from time window')
    # Reduce time dimension by selecting the first valid observation
    s2_best_pixel = s2_masked.reduce_dimension(dimension="t", reducer="first")
    
    print(f'Saving s2 cube as {s2_outfile} .. ')
    s2_best_pixel.download(str(s2_outfile))
    
    print(f'Loading {str(s2_outfile)} as xarray object')
    
s2_cube =  xr.open_dataset(str(s2_outfile))

# get geographic metadata 
x_utm = s2_cube['B02']['x']
y_utm = s2_cube['B02']['y']
# Pixel size
dx = float((x_utm[1] -x_utm[0]))
dy = float((y_utm[1] - y_utm[0]))

# Top-left corner 
x_min = float(s2_cube['x'].min())
y_max = float(s2_cube['y'].max())
#geotransform
gt = (x_min, dx, 0.0, y_max, 0.0, dy)
# projection
prj =  s2_cube.crs.spatial_ref


print(f'Extracting {S2_BANDS} as 3D array cube')
# These are the Sentinel 2 bands to use RTM inversion
s2_xarray = s2_cube[S2_BANDS].to_array(dim="band").rio.write_crs(rasterio.crs.CRS.from_string(prj).to_string())
s2_ar = s2_xarray.values/10000
# get metadata and store in dictionary
meta_dict = {}
for var in s2_meta_bands:
    print(f'Extracting {var} as array')
    var_ar = s2_cube[[var]].to_array(dim="band").values[0]
    meta_dict[var] = var_ar
    



# Build database based on geometric conditions
Now, we will train the model based the actual geometric and atmospheric conditions during the Sentinel-2 overpass. For this, we will again build the LUT using PROSAIL and the metadata provided in the Sentinel-2 images.  

In [None]:
# get sun/viewing angles
sza = np.nanmean(meta_dict['sunZenithAngles'])
saa = np.nanmean(meta_dict['sunAzimuthAngles'])
vza = np.nanmean(meta_dict['viewZenithMean'])
# get aerosol optical thickness and water vapour
aot = np.nanmean(meta_dict['AOT'])/1000
wvp = np.nanmean(meta_dict['WVP'])/1000

date_obj = dt.datetime(2024, 4, 16, 10, 30)

print("Running 6S for estimation of diffuse/direct irradiance")
skyl = get_diffuse_radiation_6S(aot, wvp, sza, saa, date_obj,
                                altitude=0.1)

print(f"Building {np.size(params_orig['bs'])} PROSPECTD+4SAIL simulations")
soil_spectrum = build_soil_database(params_orig["bs"])

# spectral range
wls_sim = np.arange(400, 2501)

# number of CPUs to use to perform simulations
njobs = 4
# generate LUT
rho_canopy_vec, params = inv.simulate_prosail_lut_parallel(
        njobs,
        params_orig,
        wls_sim,
        soil_spectrum,
        skyl=skyl,
        sza=sza,
        vza=vza,
        psi=0,
        srf=srf,
        outfile=lut_outfile,
        calc_FAPAR=False,
        reduce_4sail=True)
print('Done!')

# Train RF model
As before, we will now train the RF algorithm based on the simulated LUT. 

In [None]:
print(f"Training Random forest for {','.join(OBJ_PARAM_NAMES)}")
input_scalers = {}
output_scalers = {}
regs = {}
for i, param in enumerate(OBJ_PARAM_NAMES):
    reg, input_gauss_scaler, output_gauss_scaler, _ = \
        inv.train_reg(rho_canopy_vec, params[param].reshape(-1, 1),
                      scaling_input="normalize", scaling_output="normalize",
                      regressor_opts=scikit_regressor_opts,
                      reg_method="random_forest")

    input_scalers[param] = input_gauss_scaler
    output_scalers[param] = output_gauss_scaler
    regs[param] = reg

print('Done!')

# Apply model on S2 

Now, let us apply the trained RF model on the Sentinel-2 imagery

In [None]:
# get 2D dimensions of array
dims = s2_ar[0,:,:].shape

# only select vegetation/soil pixels
valid = np.logical_or(meta_dict['SCL'] == 4, meta_dict['SCL'] == 5)
valid = np.ravel(valid)
image_array = s2_ar.reshape((s2_ar.shape[0], -1)).T
image_array = image_array[valid]
bio_dict = {}
for i, param in enumerate(OBJ_PARAM_NAMES):
    output = np.full(valid.size, np.nan)
    print(f"Appliying {param} model to S2 image reflectance array")
    if np.any(valid):
        output[valid] = output_scalers[param].inverse_transform(
            regs[param].predict(input_scalers[param].transform(
                image_array)).reshape(-1, 1)).reshape(-1)
    
    output = output.reshape(dims)
    
    if param == 'fAPAR' or param == 'fIPAR':
        min_value = 0
        max_value = 1
    else:
        min_value = inv.prosail_bounds[param][0]
        max_value = inv.prosail_bounds[param][1]
    
    output = np.clip(output, min_value, max_value)
    # save to dictionary
    bio_dict[param] = output
    output_name = f"S2_{param}_{date.strftime('%Y%m%d')}.tif"
    output_file = s2_dir / output_name
    print(f"Saving {param} in {output_file}\n")
    gu.save_image(output, gt, prj, output_file)
    
    del output

print('Done!')

# Visualize retrieved biophysical outputs 
:::{note}
You can also visualize the retrieved biophysical products in QGIS. The rasters should be saved in *"./dataset/sentinel_imagery"*
:::

In [None]:
# get extent [minx, maxx, miny, maxy] of scene
te = [float(s2_cube['x'].min()), float(s2_cube['x'].max()), float(s2_cube['y'].min()), float(s2_cube['y'].max())]

# visualizing outputs 
variables = ['LAI', 'Cab', 'Cw']

fig, axes = plt.subplots(1,3, figsize=(12, 6))
for i,var in enumerate(variables):
    name, unit, _ = PARAM_PROPS[var]
    range_lim = prosail_bounds[var]
    
    if var == 'LAI':
        cmap = 'YlGn'
    elif var == 'Cab':
        cmap = 'PiYG'
    else:
        cmap = 'BrBG'
    
    # entire ROI
    ax = axes[i]
    ar = bio_dict[var]
    
    im = ax.imshow(ar, vmin=range_lim[0], vmax=range_lim[1], cmap=cmap, extent = te)
    ax.set_title(f'{name}', fontsize=14)
    # Add colorbar 
    cbar = fig.colorbar(im, ax=ax, shrink=0.95, orientation='horizontal')
    cbar.set_label(f'{var} ({unit})', fontsize=12)  # Add title to colorbar

plt.tight_layout()
plt.show()

# Conclusions
- Hybrid RTM approach is an effective method combining physically-based modeling with machine learning algotithms
- Since the calibration is performed with synthetic dataset, no in-situ data is required making it globally applicable

:::{warning}

The effectiveness of these methods also depend on the assumptions made in the PRO4SAIL model, which assumes a horizontally and vertically homogeneous turbid vegetation layer. These methods tend to work relatively well in structurally homogenous vegetation such as herbaceous crops/vegetation but uncertainties may be greater in complex agro-forestry systems which have more heterogeneous characteristics (e.g. clumping, multiple vegetation layers, senecent vegetation). 
:::

:::{note}
Please feel free comment any thoughts. This is work in progress!!!
:::