# INTEGRATED USE OF MULTISOURCE REMOTE SENSING DATA FOR NATIONAL SCALE AGRICULTURAL  DROUGHT MONITORING IN KENYA
# ADM-Kenya Workshop, Kenya, May 2024

# Land Surface Temperature (LST) downscaling

This Python script is designed to downscale 1-km Sentinel-3 LST images to 20-m products. The procedure is based on using spectral indices of Sentinel-2 and a random forest model to accurately calculate 20-m LST values.
The input 1-km LST images were already preprocessed using the approach explained in the Drought Model section.


# Tools and Libraries

We will use the following libraries:

    NumPy: A fundamental package for scientific computing in Python.
    scikit-learn: A library for machine learning, including models for classification, regression, ...
    Rasterio: A library for reading and writing geospatial raster datasets.
    GDAL: A library for working with geospatial data
    Datetime: It supplies classes for manipulating dates and times.
    GeoPandas: It extends the datatypes used by pandas to allow spatial operations on geometric types.                             
    Opencv (cv2): A powerful library for working with images in Python. Here, it is used for resampling rasters.
    OS: It provides functions for creating and removing a directory (folder).

In [1]:
import numpy as np
# import matplotlib.pyplot as plt
import rasterio
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_percentage_error, mean_absolute_error, accuracy_score, r2_score
import gdal
from rasterio.transform import from_origin
from rasterio.mask import mask
import geopandas as gpd
from datetime import datetime
import os
import cv2

In this project, the downscaling of Sentinel-3 SLSTR LST data was performed as follows:

<img src="LST_Fig_1.PNG">

Before moving forward to next steps, it is necessary to change directories (path) of LST images (lst_path), Sentinel-2 vegetation/spectral indexes (vi_path), and where to save downscaled LST (output_path).

In [2]:
# Your directory drive:

drive = "D"

In [5]:
lst_path = fr"{drive}:\ADM_Workshop_CCM_ET\LST\LST_1km"  # Change this to the path of your folder
#H:\ADM-Kenya\Results\LST\LST_Busia_S2_1km
vi_path = fr"{drive}:\ADM_Workshop_CCM_ET\LST\VI_20m"  # Change this to the path of your folder

output_path = fr"{drive}:\ADM_Workshop_CCM_ET\LST\LST_20m"  # Change this to the path of your folder

LST_files = [file for file in os.listdir(lst_path) if file.endswith(".tif")]
VI_files = [file for file in os.listdir(vi_path) if file.endswith(".tif")]

In [6]:
LST_10km = rasterio.open(fr'{lst_path}\\{LST_files[0]}')
VI_200m = rasterio.open(fr'{vi_path}\\{VI_files[0]}')

LST_10km, VI_200m

(<open DatasetReader name='D:/ADM_Workshop_CCM_ET/LST/LST_1km/clipped_20210201.tif' mode='r'>,
 <open DatasetReader name='D:/ADM_Workshop_CCM_ET/LST/VI_20m/mVI_Image_2021-02-05_clipped.tif' mode='r'>)

<img src="LST_Fig_2.PNG">

After masking the pixels covered by cloud, 1-km daily LST images were prepared as input to a machine learning model (Equation 1). The model was trained using 5 spectral indices (i.e., predictors) resampled to 1-km, including NDVI, NDMI, GNDVI, EVI, and NMDI, generated by Sentinel-2 images at 20m spatial resolution. Hence, the input of model includes 1-km LST and Sentinel-2 indices. After training a random forest classifier, the model predicts 20-m cropland pixels of Sentinel-2 indices mosaic. Indeed, the trained model was employed by 1-km data to find out the relationship of LST values to indices. Afterwards, the model, structured by those relationships, was utilized to predict 20-m LST values. Additionally, the residuals of model were resampled to 20-m to correct the predictions. The following equations indicate the procedure of LST downscaling, where f is the random forest model, and 〖〖LST〗^p〗_1km and 〖〖LST〗^p〗_20m are the predicted LST values in 1-km and 20-m.

In [7]:
for v in range(0, len(LST_files)):
    # Extract the date from the first LST file
    first_lst_date_string = LST_files[v].split("_")[1].split(".")[0]  # Remove the file extension
    first_lst_date = datetime.strptime(first_lst_date_string, "%Y%m%d")

    closest_vi_file = None
    min_date_difference = float('inf')  # Initialize with a large value

    # Iterate through VI files
    for vi_file in VI_files:
        # Extract the date from the VI file
        date_string_with_extension = vi_file.split("_")[2].split(".")[0]  # Remove the file extension
        vi_date = datetime.strptime(date_string_with_extension, "%Y-%m-%d")

        # Find the difference between the first LST date and the current VI date
        date_difference = abs(first_lst_date - vi_date).days

        # Check if the difference is smaller than the current minimum
        if date_difference <= min_date_difference:
            closest_vi_file = vi_file
            min_date_difference = date_difference

    # Check if the closest VI file has a difference less than or equal to 5 days
    if min_date_difference <= 5:
        print(f"First LST file {LST_files[v]} - OK. Closest VI file: {closest_vi_file}. Difference: {min_date_difference} days.")
        
        LST_1km = rasterio.open(fr'{lst_path}\\{LST_files[v]}')
        VI_20m = rasterio.open(fr'{vi_path}\\{closest_vi_file}')
        
        dataset = gdal.Open(fr'{lst_path}\\{LST_files[v]}')
        bands_lst = dataset.GetRasterBand(1)

        print('Number of Bands = ', dataset.RasterCount, '\nNumber of Horizontal Pixels = ', dataset.RasterXSize,  
              '\nNumber of Vertical Pixels = ', dataset.RasterYSize, '\nBands: ', bands_lst)
        
        lst_1km = LST_1km.read(1)
        lst_meta = LST_1km.read(1)
        # ndvi_1km = NDVI_1km.read(1)
        vi_20m = VI_20m.read()

        lst_1km_shape = lst_1km.shape
        num_bands_VI = vi_20m.shape[0]; print(num_bands_VI), vi_20m.shape


        new_width = lst_1km_shape[1]
        new_height = lst_1km_shape[0]
        filter_size = int(vi_20m.shape[2] / new_width)

        # Direct resampling while ignoring NaN values
        VI_20m_upscaled_full_2 = np.full((num_bands_VI, new_height, new_width), np.nan)

        for band in range(num_bands_VI): 
            for i in range(0, vi_20m.shape[1] - filter_size, filter_size):
                for j in range(0, vi_20m.shape[2] - filter_size, filter_size):
                    # Calculate the indices for the target array
                    target_i = i // filter_size
                    target_j = j // filter_size

                    if target_i < new_height and target_j < new_width:
                        block = vi_20m[band, i:i+filter_size, j:j+filter_size]
                        VI_20m_upscaled_full_2[band, target_i, target_j] = np.nanmean(block)



        VI_20m_upscaled_full_2.shape


        # Check for NaN values in the slice
        has_nan = np.isnan(VI_20m_upscaled_full_2[0,:,:])

        # Print the result
#         print(has_nan)    

        lst_1km = lst_1km.astype(float)
        lst_1km[lst_1km == -9999] = np.nan
        lst_1km[lst_1km < 273] = np.nan
        lst_1km.shape, np.nanmax(lst_1km), np.nanmean(lst_1km)


        nan_mask_lst = np.isnan(lst_1km)

        # Create a mask for NaN values in X_all_bands
        nan_mask_x_all = np.isnan(VI_20m_upscaled_full_2[0])

        # Combine the masks to create a final mask where both arrays have NaN values
        final_nan_mask = np.logical_or(nan_mask_lst, nan_mask_x_all)

        # Apply the final mask to both arrays
        lst_1km[final_nan_mask] = np.nan
        VI_20m_upscaled_full_2[:, final_nan_mask] = np.nan

        lst_1km.shape, VI_20m_upscaled_full_2.shape


        count_non_nan = np.count_nonzero(~np.isnan(VI_20m_upscaled_full_2[1,:,:]))
#         print(count_non_nan)


        # Flatten the 2D arrays to 1D while ignoring NaN values
        y = lst_1km[~final_nan_mask].flatten()
#         print(y.shape)
        # Flatten the 2D arrays to 1D while ignoring NaN values for X
        X_all_bands_flat = VI_20m_upscaled_full_2[:, ~final_nan_mask].T  # Transpose for proper shape

#         print(X_all_bands_flat.shape)


        # Create the regression model
        regression_model_rf = RandomForestRegressor(n_estimators=300, n_jobs=-1)


        # Fit the model
        regression_model_rf.fit(X_all_bands_flat, y)


        predictions_1km_rf = regression_model_rf.predict(X_all_bands_flat)
        residuals_rf = y - predictions_1km_rf

        residuals_rf.shape


        VI_20m_upscaled_full_2.shape, lst_1km.shape, np.nanmax(lst_1km)


        vi_1 = VI_20m.read(1)
        lst_dwscaled = cv2.resize(lst_1km, (vi_1.shape[1], vi_1.shape[0]), interpolation=cv2.INTER_NEAREST)


        nan_mask_lst_dws = np.isnan(lst_dwscaled)

        # Create a mask for NaN values in X_all_bands
        nan_mask_x_all_20m = np.isnan(vi_20m[0])

        # Combine the masks to create a final mask where both arrays have NaN values
        final_nan_mask_20m = np.logical_or(nan_mask_lst_dws, nan_mask_x_all_20m)

        # Apply the final mask to both arrays
        lst_dwscaled[final_nan_mask_20m] = np.nan
        vi_20m[:, final_nan_mask_20m] = np.nan

#         lst_dwscaled.shape, vi_20m.shape


        vi_20m_flat = vi_20m[:, ~final_nan_mask_20m].T  # Transpose for proper shape
#         vi_20m_flat.shape

        downscaled_lst_20m = regression_model_rf.predict(vi_20m_flat) #X20m_valid_all_bands


#         downscaled_lst_20m.shape

        downscaled_lst_nan = np.empty_like(vi_1)

        # Fill the array with NaN values
        downscaled_lst_nan[:] = np.nan

        # Apply the mask to downscaled_lst_20m
        downscaled_lst_nan[~final_nan_mask_20m.reshape(vi_1.shape)] = downscaled_lst_20m

#         downscaled_lst_nan.shape, downscaled_lst_20m.shape


#         final_nan_mask.shape


        # Create an array with NaN values of the same shape as ndvi_10m
        res_lst_nan = np.empty_like(lst_1km)
        res_lst_nan[:] = np.nan
        res_lst_nan[~final_nan_mask.reshape(lst_1km.shape)] = residuals_rf


        res_dwscaled = cv2.resize(res_lst_nan, (vi_1.shape[1], vi_1.shape[0]), interpolation=cv2.INTER_NEAREST)
        res_dwscaled.shape


        lst_c_20m = downscaled_lst_nan + res_dwscaled


        # Define the output file path

        output_file = fr'{output_path}\\{LST_files[v]}'
        #output_file = r'H:\ADM-Kenya\Results\LST\Arid_aoi\March2022\LST_20m_310322_corrected.tif'
        # Get the metadata from the NDVI raster
        vi_metadata = VI_20m.meta

        # Update the metadata for the output GeoTIFF
        dst_metadata = {
            'driver': 'GTiff',
            'dtype': downscaled_lst_nan.dtype,
            'nodata': np.nan,
            'width': downscaled_lst_nan.shape[1],
            'height': downscaled_lst_nan.shape[0],
            'count': 1,  # Assuming a single-band GeoTIFF
            'crs': vi_metadata['crs'],
            'transform': vi_metadata['transform']
        }

        # Save the array as a GeoTIFF
        with rasterio.open(output_file, 'w', **dst_metadata) as dst:
            dst.write(lst_c_20m, 1)  # Assuming a single-band GeoTIFF
    else:
        print(f"First LST file {LST_files[v]} - No VI file found with a difference less than or equal to 5 days.")
        # Continue to the next LST file
        continue


First LST file clipped_20210201.tif - OK. Closest VI file: mVI_Image_2021-02-05_clipped.tif. Difference: 4 days.
Number of Bands =  1 
Number of Horizontal Pixels =  50 
Number of Vertical Pixels =  71 
Bands:  <osgeo.gdal.Band; proxy of <Swig Object of type 'GDALRasterBandShadow *' at 0x000002088EBEE450> >
5


  VI_20m_upscaled_full_2[band, target_i, target_j] = np.nanmean(block)


First LST file clipped_20210203.tif - OK. Closest VI file: mVI_Image_2021-02-05_clipped.tif. Difference: 2 days.
Number of Bands =  1 
Number of Horizontal Pixels =  51 
Number of Vertical Pixels =  72 
Bands:  <osgeo.gdal.Band; proxy of <Swig Object of type 'GDALRasterBandShadow *' at 0x00000208FB88E5A0> >
5


  VI_20m_upscaled_full_2[band, target_i, target_j] = np.nanmean(block)


First LST file clipped_20210204.tif - OK. Closest VI file: mVI_Image_2021-02-05_clipped.tif. Difference: 1 days.
Number of Bands =  1 
Number of Horizontal Pixels =  50 
Number of Vertical Pixels =  72 
Bands:  <osgeo.gdal.Band; proxy of <Swig Object of type 'GDALRasterBandShadow *' at 0x000002088EBEE780> >
5


  VI_20m_upscaled_full_2[band, target_i, target_j] = np.nanmean(block)


First LST file clipped_20210207.tif - OK. Closest VI file: mVI_Image_2021-02-05_clipped.tif. Difference: 2 days.
Number of Bands =  1 
Number of Horizontal Pixels =  51 
Number of Vertical Pixels =  71 
Bands:  <osgeo.gdal.Band; proxy of <Swig Object of type 'GDALRasterBandShadow *' at 0x000002088FBE4BA0> >
5


  VI_20m_upscaled_full_2[band, target_i, target_j] = np.nanmean(block)


First LST file clipped_20210208.tif - OK. Closest VI file: mVI_Image_2021-02-10_clipped.tif. Difference: 2 days.
Number of Bands =  1 
Number of Horizontal Pixels =  50 
Number of Vertical Pixels =  72 
Bands:  <osgeo.gdal.Band; proxy of <Swig Object of type 'GDALRasterBandShadow *' at 0x000002088FBE15D0> >
5


  VI_20m_upscaled_full_2[band, target_i, target_j] = np.nanmean(block)


First LST file clipped_20210212.tif - OK. Closest VI file: mVI_Image_2021-02-10_clipped.tif. Difference: 2 days.
Number of Bands =  1 
Number of Horizontal Pixels =  50 
Number of Vertical Pixels =  71 
Bands:  <osgeo.gdal.Band; proxy of <Swig Object of type 'GDALRasterBandShadow *' at 0x00000208B0426D50> >
5


  VI_20m_upscaled_full_2[band, target_i, target_j] = np.nanmean(block)


First LST file clipped_20210215.tif - OK. Closest VI file: mVI_Image_2021-02-15_clipped.tif. Difference: 0 days.
Number of Bands =  1 
Number of Horizontal Pixels =  51 
Number of Vertical Pixels =  71 
Bands:  <osgeo.gdal.Band; proxy of <Swig Object of type 'GDALRasterBandShadow *' at 0x00000208B04260F0> >
5


  VI_20m_upscaled_full_2[band, target_i, target_j] = np.nanmean(block)


<img src="LST_Fig_3.PNG">