## Sharpening ECOSTRESS LST using Sentinel-2 VSWIR products at High Resolution (<20m) including the downloading of ECOSTRESS granules

Notebook : Quentin Dehaene, mentored by Glynn Hulley    
Original Python Implementation : [Radoslaw Guzinski](https://github.com/radosuav/pyDMS)  
Original Implemenation : [Gao et al.](https://doi.org/10.3390/rs4113287) 

Points of contact : quentindehaene@gmail.com and glynn.hulley@jpl.nasa.gov

### How does the sharpening algorithm, pyDMS, work?

We'll call the ECOSTRESS LST at native 70m **LST-LR** and we'll refer to the Sentinel-2 reflectance multiband raster at 20m as **S2-HR**.

The first step is to **reproject and subset LST-LR to S2-HR Coordinates Reference System (CRS) and extent**. This means that the result, and all images in the process, will be in the S2-HR CRS and will share its extent, even if it implies padding a smaller LST-LR with no-data values to match the extents. Conversely, all the images larger than S2-HR would be clipped to its extent.

The second step consists of **resampling S2-HR to the coarse resolution of LST-LR** (producing a temporary file). In the process, we compute the **homogeneity inside each coarse resolution pixel**.

$$
c_v = \left( \frac{1}{n} \right) \sum_{i=1}^{n} \left( \frac{\sigma_{i}}{\mu_{i}} \right)
$$

where *i* represents the spectral band, *n* is the total number of bands, and $\mu$ and $\sigma$ are the mean and standard deviation of the fine resolution reflectances within a coarse resolution pixel.

The $c_v$ is the **coefficient of variation** of the pixel; the closer to 0, the purer the pixel is. This will be used as **weights in our training**, with the most homogeneous pixels being weighted more and the least homogeneous being discarded (with a threshold set as $c_v = 0.2$), leaving around 80% of the pixels available for training usually.

Using this **resampled Sentinel-2 reflectance (S2-LR)** and **LST-LR**, we can train the regression model.

**Some more details about the model:**

We are training a **bagging of an ensemble of regression trees**. Each of these trees is slightly more complicated than the usual random forest regression tree. The principle of the tree is classic: each node is built to minimize the **mean square error (MSE)** of the subsamples on each side of the node until we reach a satisfying tree depth or the minimum number of samples per leaf. The difference here lies in the way we determine the target value of each leaf: instead of assigning the average of the *y* values (i.e., the LST) to all features in the leaf, we apply a **Bayesian linear regression**. This means that for each feature in the same node, the target values obey the same linear function of the features (the reflectances). So each feature's target value depends on the actual values of the feature.

The model is trained using the resampled **S2 imagery (S2-LR)** as $X_{train}$ and the reprojected **LST-LR** as $y_{train}$ (ground truth). Our forest is trained at low resolution because it is at this resolution that we have what we know to be true and can establish a link between reflectance and LST.

To predict a high-resolution LST map, we'll use the newly trained model with the **S2-HR** as $X$. We thus obtain a first prediction of the LST ($y_{pred}$) at high resolution.

Then comes the **residual analysis**. We resample this prediction to the coarse resolution and compute the residual ($y - y_{pred}$) (the only truth we have is at the low resolution). This compares our predicted LST, resampled at the coarse resolution, to the original **LST-LR**. For that, we don’t actually compare the LST directly but the $LST^4$, which is proportional to the exitance. It makes more sense since, physically, temperature doesn’t have to be conserved in the sharpening, but energy does. Sensors don’t measure temperature but radiance.

The final step is to **smooth the residual**, resample it to the high resolution, and sum it with $y_{pred}$, our predicted LST. We now have a final LST prediction that verifies that the average radiance of all the high-resolution pixels composing a coarse pixel is equal to the radiance in the original LST.

This is the downscaled image that we were looking for.

---

I would like to note that there are two other sharpening techniques implemented in the pyDMS code, developed (and still being developed as of October 2024) by R. Guzinski. All the steps are the same; the only difference is the regression model itself. There is, on one hand, the **Neural Network regressor**—instead of the trees presented here, we are using MLP trees from [scikit-neuralnetwork](https://github.com/aigamedev/scikit-neuralnetwork). On the other hand, the model closer to the original proposition of Gao is based on a [Cubist](https://www.rulequest.com/cubist-unix.html) regressor, recently made available in Python.

As of October 2024, I haven't used the other models enough to express confidence in the results. This will require testing.


In [1]:
# Import cell
from osgeo import gdal
import rasterio
import numpy as np
import os
import matplotlib.pyplot as plt
import random
import getpass
import requests as req
from datetime import datetime 
import pandas as pd
from shapely.geometry import Polygon
import json
import time
import geopandas as gpd
from sentinelhub import (SHConfig, DataCollection, SentinelHubCatalog, SentinelHubRequest, BBox, bbox_to_dimensions, CRS, MimeType, Geometry,MosaickingOrder)
import rioxarray as rxr
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
import matplotlib.animation as animation
from matplotlib import rc
from pyDMS_main.run_pyDMS import *
# If you recieve a No module named '...' error, it is likely because you haven't installed all the necessary packages (cf tutorial)

## Setting up and preprocessing
This whole notebook is actually a wrap over the original run_pyDMS. It is intended to render the sharpening easy and automatic by dealing with an entire folder at a time. Indeed, when interested in an area, we are often looking at a series of images, a heatwave week, a summer month or more. The point here is to have all the files from your [AppEEARS](https://appeears.earthdatacloud.nasa.gov/) request in the same folders : This version is easier to use than the other one available, since you don't need to use [AppEEARS](https://appeears.earthdatacloud.nasa.gov/). Indeed, this code use the API, allowing to create a request directly from python and then to download the request files automatically too. This will simply require your [Earthdata](http://urs.earthdata.nasa.gov) logs and the inputs of your choosing. All the files downloaded will be downscaled using a single Sentinel 2 VSWIR image, downloaded here too, after being preprocessed ealier in the code. To avoid some errors due to seasonal effect (most notably changes in vegetation), I would advise to not sharpen different seasons with the same S2 image.
The output files, will all be written in one folder in the GEOTIFF format, that you can use in any GIS software. For each image a residual GEOTIFF will also be produced, but in most cases you can ignore these files.


For convenience, all of the inputs requested from the user are grouped the next 4 cells.

## Setting up the dowloads and parameters

All the Sentinel data is free to access, but it requires to create an account to download data.
First, if you don't have an account on [Copernicus Data Space](https://dataspace.copernicus.eu/), create one and log in.  
Now access your User Settings : My Account > DashBoard > User Settings (bottom left)  
Create a new OAuth client. And save your newly given token ID and password (called a secret here).  



OAuth Copernicus Data Space

In [8]:
config = SHConfig()
config.sh_base_url = 'https://sh.dataspace.copernicus.eu'
config.sh_token_url = 'https://identity.dataspace.copernicus.eu/auth/realms/CDSE/protocol/openid-connect/token'
config.sh_client_id = os.getenv('OAUTH_CLIENT_ID') # Alternatively you can type you client id
config.sh_client_secret = os.getenv('OAUTH_CLIENT_SECRET') # Alternatively you can type your client secret

Type your NASA Earthdata login and password. If you don't have an account you can create one on [Earthdata](http://urs.earthdata.nasa.gov). If you have any trouble refer to the relative tutorial available [here](https://github.com/ECOSTRESS-Tutorials/).

In [3]:
user = getpass.getpass(prompt = 'Enter NASA Earthdata Login Username: ')      # Input NASA Earthdata Login Username
password = getpass.getpass(prompt = 'Enter NASA Earthdata Login Password: ')  # Input NASA Earthdata Login Password

This cell is made for you to type the directories you want the dowloaded products and results to be written into. When you type a directory, it shouldn't include the final / or \ (otherwise you will get a syntax error).

In [4]:
# Choose your output folder for the downloaded Sentinel-2 products
s2_output_folder = r''
# The task name you enter here is the name of the AppEEARS request. Your S2 request will also be named accordingly for harmonization. 
task_name = input('Enter a Task Name: ')
# Folder where all the ECOSTRESS products will be downloaded. This folder will then contain a subdirectory named after the task name above. 
eco_output_folder = r'' # In the newly created subdirectory, you'll see a QC file folder, an LST folder, an LST_scaled folder and a cloud_mask folder if you're using Collection 2.
# Folder where all the sharpened ECOSTRESS LST files will be written for the scene of interest
dst_dir = r''

Set the parameters for the products to be downloaded

In [5]:
# The coordinates of the bounding box of your choosing, in lat lon : (xmin,ymin,xmax,ymax)
# Use the http://bboxfinder.com to find your box easily (already in the right order)
aoi_coords_wgs_84 = (23.630219,37.909534,23.992081,38.076204) # example

# Choose the resolution of the Sentinel data in meters 10 or 20, this will be the final resolution of the downscaled image. I would advise 20m at all times until further progress.
s2_res = 20
# Set the desired collection for the ECOSTRESS product to be downloaded (1 or 2). Pay attention Collection 2 hasn't been reprocessed for the full years of service of ECOSTRESS 
eco_collec = 1 

# Choose your time interval (beginning, end) in the format (YYYY-MM-DD). You'll get a S2 image from the tile with the lowest cloud coverage available in the interval and all the ECOSTRESS scences overpassing the AOI during this interval : 
interval = ("2023-07-01", "2023-08-31") # example

startYear,startMonth,startDay = interval[0][:4],interval[0][5:7],interval[0][8:]
endYear,endMonth,endDay = interval[1][:4],interval[1][5:7],interval[1][8:]


### Downloading Sentinel 2 product


If you encounter problem with the S2 imagery download, please refer to this [Documentation](https://sentinelhub-py.readthedocs.io/en/latest/index.html) or to this [Copernicus Web page](https://dataspace.copernicus.eu/news/2023-9-28-accessing-sentinel-mission-data-new-copernicus-data-space-ecosystem-apis).

Change the coordinates into a box for the Coipernicus API.  
The bounding box is limited to 2500 pixels, you will recieve an error if the request is larger.

In [None]:
aoi_bbox = BBox(bbox = aoi_coords_wgs_84,crs=CRS.WGS84)
aoi_size = bbox_to_dimensions(aoi_bbox,resolution = s2_res)
print(f"Image shape at {s2_res} m resolution: {aoi_size} pixels") # The size of the box to 2500 pixels in each direction
if (aoi_size[0]>2499 or aoi_size[1]>2499) : 
    raise(ValueError("The box is limited to 2500 pixels in each direction, try again with a smaller bounding box."))

Download the S2 image with the previously defined parameters.

In [None]:
# Request scripts, based off the sentinelhub documentation
# This script will be used to download all the S2 whose resolution is 20m or below
evalscript_all_bands_u20 = """
    //VERSION=3
    function setup() {
        return {
            input: [{
                bands: ["B02","B03","B04","B05","B06","B07","B08","B8A","B11","B12"],
                units: "REFLECTANCE"

            }],
            output: {
                bands: 10,
                sampleType: "FLOAT32"
            }
        };
    }

    function evaluatePixel(sample) {
        return [sample.B02,
                sample.B03,
                sample.B04,
                sample.B05,
                sample.B06,
                sample.B07,
                sample.B08,
                sample.B8A,
                sample.B11,
                sample.B12];
    }
"""
# This script will be used to download all the S2 whose resolution is 10m
evalscript_all_bands_u10 = """
    //VERSION=3
    function setup() {
        return {
            input: [{
                bands: ["B02","B03","B04","B08"],
                units: 'REFLECTANCE'
            }],
            output: {
                bands: 4,
                sampleType: "FLOAT32"
            }
        };
    }

    function evaluatePixel(sample) {
        return [sample.B02,
                sample.B03,
                sample.B04,
                sample.B08,];
    }
"""
# Request the data at 20m
if s2_res == 20 :
    request_all_bands_u20 = SentinelHubRequest(
        data_folder=s2_output_folder,
        evalscript=evalscript_all_bands_u20,
        input_data=[
            SentinelHubRequest.input_data(
                data_collection = DataCollection.SENTINEL2_L2A.define_from("s2l2a", service_url=config.sh_base_url),
                time_interval = interval,
                mosaicking_order=MosaickingOrder.LEAST_CC, # selecting the tile in the interval with the least cloud coverage

            )
        ],
        responses=[SentinelHubRequest.output_response("default", MimeType.TIFF)],
        bbox=aoi_bbox,
        size=aoi_size,
        config=config,
    )
    resp = request_all_bands_u20.save_data(show_progress = True)
# Request the data at 10m
elif s2_res == 10 :
    request_all_bands_u10 = SentinelHubRequest(
        data_folder=s2_output_folder,
        evalscript=evalscript_all_bands_u10,
        input_data=[
            SentinelHubRequest.input_data(
                data_collection = DataCollection.SENTINEL2_L2A.define_from("s2l2a", service_url=config.sh_base_url),
            time_interval = interval,
                mosaicking_order = MosaickingOrder.LEAST_CC
            )
        ],
        responses=[SentinelHubRequest.output_response("default", MimeType.TIFF)],
        bbox=aoi_bbox,
        size=aoi_size,
        config=config,
    )

    resp = request_all_bands_u10.save_data(show_progress= True)
else :
    raise(ValueError('Unexpected resolution please change it to 10 or 20'))

# Rename the downloaded files for clarity
dirs = [d for d in os.listdir(s2_output_folder)]

most_recent_dir = max(dirs, key=lambda d: os.path.getmtime(os.path.join(s2_output_folder, d)))
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
new_name = f"S2_request_{timestamp}"
old_path = os.path.join(s2_output_folder, most_recent_dir)
new_path = os.path.join(s2_output_folder, new_name)
os.rename(old_path, new_path)
beg = interval[0]
end = interval [1]
for files in os.listdir(new_path):
    if files.__contains__('response') :
        os.rename(os.path.join(new_path,files),os.path.join(new_path,f"S2_{s2_res}m_{beg}_{end}.tif"))
        hr_img_path = os.path.join(new_path,f"S2_{s2_res}m_{beg}_{end}.tif") # This file will be used to train our model and downscale the ECOSTRESS image


### Dowloading ECOSTRESS scenes

This part is based off the ECOSTRESS API tutorials available [here](https://github.com/nasa/AppEEARS-Data-Resources/tree/main/Python/tutorials).

Authentication and Token Retrieval for NASA AppEEARS API

In [None]:
api = 'https://appeears.earthdatacloud.nasa.gov/api/'  # Set the AρρEEARS API to a variable
token_response = req.post('{}login'.format(api), auth=(user, password)).json() # Insert API URL, call login service, provide credentials & return json                                                        
token = token_response['token']# Save login token to a variable
head = {'Authorization': 'Bearer {}'.format(token)}  # Create a header to store token information, needed to submit a request
print(token_response)

Above, you should see a Bearer token. Notice that this token will expire approximately 48 hours after being acquired. 

Locate ECOSTRESS products and search for the layers of interest in our case : LST and QC

In [None]:
prods = ['ECO2LSTE.001','ECO_L2_LSTE.002']

lst_response = req.get('{}product/{}'.format(api, prods[1])).json()  
print(list(lst_response.keys()))
if eco_collec == 1 :
    layers = [(prods[0],'SDS_LST'),(prods[0],'SDS_QC')]
elif eco_collec == 2 :
    layers = [(prods[1],'LST'),(prods[1],'QC'),(prods[1],'cloud_mask')]

# List of products and layers    
prodLayer = []
for l in layers: 
    prodLayer.append({
            "layer": l[1],
            "product": l[0]
          })

print(prodLayer)

# Get the available projections list
projections = req.get('{}spatial/proj'.format(api)).json()
projs = {}                                 
for p in projections: projs[p['Name']] = p  # Fill dictionary with `Name` of the projections available as keys


Formulate the AppEEARS request

In [None]:
# Takes the bounding box entered above and creates a polygon for the ECOSTRESS request
def polygon_from_bbox(bbox):
    long0, lat0, long1, lat1 = bbox
    return Polygon([[long0, lat0],
                    [long1,lat0],
                    [long1,lat1],
                    [long0, lat1]])

polyg = polygon_from_bbox(aoi_coords_wgs_84)

aoi_df = gpd.GeoDataFrame(pd.DataFrame(columns = ['bbox']),
        crs = 'epsg:4326',
        geometry = [polyg])

# Change the df to a json
aoi_json = aoi_df.to_json()
aoi_json = json.loads(aoi_json)

# Create the actual task
task_type = ['area']        # Type of task, area in our case
proj = projs['geographic']['Name']  # Set output projection 
outFormat = ['geotiff', 'netcdf4']  # Set output file format type, we'll select geotiff here
startDate = startMonth +'-'+startDay+'-'+startYear           # Start of the date range for which to extract data: MM-DD-YYYY
endDate = endMonth +'-'+endDay+'-'+endYear               # End of the date range for which to extract data: MM-DD-YYYY


task = {
    'task_type': task_type[0],
    'task_name': task_name,
    'params': {
         'dates': [
         {
             'startDate': startDate,
             'endDate': endDate
         }],
         'layers': prodLayer,
         'output': {
                 'format': {
                         'type': outFormat[0]}, 
                         'projection': proj},
         'geo': aoi_json,
    }
}

# Order the task
task_response = req.post('{}task'.format(api), json=task, headers=head).json()                                                                  
task_id = task_response['task_id']                                               
status_response = req.get('{}status/{}'.format(api, task_id), headers=head).json() # Call status service with specific task ID & user credentials
print(status_response)    

### Ping the API
Ping the API every 30 seconds until the request is complete to display the status. It may occur that AppEEARS is slow to process data, I suggest to simply wait and come back later.  

In [None]:
starttime = time.time()
while req.get('{}task/{}'.format(api, task_id), headers=head).json()['status'] != 'done':
    print(req.get('{}task/{}'.format(api, task_id), headers=head).json()['status'])
    time.sleep(30.0 - ((time.time() - starttime) % 30.0))
print(req.get('{}task/{}'.format(api, task_id), headers=head).json()['status'])

Download the ordered files

In [None]:
# Set up output directory using input directory and task name
eco_dest_dir = os.path.join(eco_output_folder,task_name)              
if not os.path.exists(eco_dest_dir):
    os.makedirs(eco_dest_dir) 
    
bundle = req.get('{}bundle/{}'.format(api,task_id), headers=head).json()  # Call API and return bundle contents for the task_id as json

files = {}                                                     
for f in bundle['files']: files[f['file_id']] = f['file_name'] 

for f in files:
    dl = req.get('{}bundle/{}/{}'.format(api, task_id, f), headers=head, stream=True, allow_redirects = 'True') # Get a stream to the bundle file
    if files[f].endswith('.tif'):
        filename = files[f].split('/')[1]
    else:
        filename = files[f] 
    filepath = os.path.join(eco_dest_dir, filename)                                                       
    with open(filepath, 'wb') as f:                                                                  
        for data in dl.iter_content(chunk_size=8192): f.write(data) 
print('Downloaded files can be found at: {}'.format(eco_dest_dir))

### Preprocessing ECOSTRESS Data

Sort the dowloaded files in appropriate subdirectories

In [None]:
# For each directory, if it doesn't exist already, create it and place the dowloaded files in 

QC_dir = os.path.join(eco_dest_dir,'QC')
if not os.path.exists(QC_dir) :
    os.mkdir(QC_dir)
lst_dir = os.path.join(eco_dest_dir,'LST')
if not os.path.exists(lst_dir) :
    os.mkdir(lst_dir)
if eco_collec == 2 : 
    cld_dir = os.path.join(eco_dest_dir,'cloud_mask')
    if not os.path.exists(cld_dir) :
        os.mkdir(cld_dir)
for file in os.listdir(eco_dest_dir) : 
    if file.__contains__('QC') and file.endswith('.tif') :
        source_path = os.path.join(eco_dest_dir,file)
        dst_path = os.path.join(QC_dir,file)
        os.rename(source_path,dst_path)
        
    elif file.__contains__('LST_') and file.endswith('.tif') :
        source_path = os.path.join(eco_dest_dir,file)
        dst_path = os.path.join(lst_dir,file)
        os.rename(source_path,dst_path)
        
    elif file.__contains__('cloud_mask') and file.endswith('.tif') :
        source_path = os.path.join(eco_dest_dir,file)
        dst_path = os.path.join(cld_dir,file)
        os.rename(source_path,dst_path)

Preprocessing the QC files  

The QC files are coded in 16 bits and thus can't be easily seen as a mask file. For convience, we write new Quality Flag (QF) files that respresent only the last two bits of the QC files. Then, there are only four possible values:  
0 when the pixel is of best quality, 1 for nominal quality, 2 if a cloud is detected and 3 if the pixel is not produced. In the downscaling process, pixels with the last two values will be disregarded.   
For more information on the QC files : https://lpdaac.usgs.gov/documents/423/ECO2_User_Guide_V1.pdf (section 2.4)

In [None]:
for file in os.listdir(QC_dir) :
    if not file.endswith('QF.tif') and not file.endswith('.xml') : 
        file_qc = os.path.join(QC_dir,file)
        with rasterio.open(file_qc,'r') as f_qc :
            # Read the QC file, coded in 16 bits
            qc_img = f_qc.read((1)) 
            qc_img[qc_img==-99999] = -1  # Nodata values are read as -99999, we change it to -1 so that the last two bits appear as 11 (which means pixel not produced) and be masked out in the end
            # Select only the last two bits
            qc_img_2 = qc_img & 0b11 
            out_meta = f_qc.meta.copy()
        # Write the last two bits in a new file
        file_qf = file_qc.replace('.tif','_QF.tif')
        with rasterio.open(file_qf,'w',**out_meta) as dst : 
            dst.write(qc_img_2,1)


Scaling the ECOSTRESS LST to normal Kelvin scale.  
  
The LST product is actually scaled at 0.02, the GIS software takes that scale in account before display so you might not see it if you directly display on QGIS or ArcGIS. However, in Python it's easier to apply the scale rather than reading the metadata. 

In [None]:
# If the scaled subdirectory doesn't already exist, create it
lst_dir_sc = os.path.join(eco_dest_dir,'LST_scaled')
if not os.path.exists(lst_dir_sc) :
        os.mkdir(lst_dir_sc)

# Scale each file
for file in os.listdir(lst_dir) : 
        if file.endswith('.tif') :
                with rasterio.open(os.path.join(lst_dir,file),'r') as lr_img: 
                        out_image=lr_img.read().astype('float32')
                        out_image[out_image==0]=np.nan
                        out_meta = lr_img.meta

                out_meta.update({"driver": "GTiff",
                        "height": out_image.shape[1],
                        "width": out_image.shape[2],
                        'dtype' :'float32'})

                dst_path = os.path.join(lst_dir_sc,file)
                with rasterio.open(dst_path,'w',**out_meta) as dst :
                        # Apply the scale 
                        dst.write(out_image*0.02 +0.49) 

## Upsampling using pyDMS

The preprocessing is now over. Let's sharpen using pyDMS. Use one of the following cells depending on the desired extent.

If you use this cell, then the output extent will be the extent of the orignal images downloaded.

In [None]:
useDecisionTree = True # You could change this to False if you want to use the Neural Network intead of the Decision tree, not recommended
files_sharpened = [] # list of the files sharpened
dst_dir_f= os.path.join(dst_dir, f"{task_name}-Results")
# Loop through the directory of LST images scaled
for file in os.listdir(lst_dir_sc) :
    if file.endswith('.tif') :
        if not os.path.exists(dst_dir_f) : # create the output directory if it doesn't exist already
                        os.mkdir(dst_dir_f)
        outputFilename = os.path.join(dst_dir_f,file.replace('.tif','_sharp_S2.tif')) # destination path for the sharpened images
        
        highResFilename = hr_img_path # the high resolution file is the dowloaded s2 image
        lowResFilename = os.path.join(lst_dir_sc,file) # the low resolution file is the scaled LST image
        
        valid = False # Bool that states if a sence is "valid", not too cloudy or not presenting too many unproduced pixels (limit at 25% by default)
        thresh= 0.75 # you can modify this value between 0 (if you accept any file, the unusable pixels will be masked) and 1 (if you only want to sharpen files where every pixel is usable)
        
        # If the scene to be downscaled is a scene from Collection 2
        if eco_collec == 2 : 
            # The QC cloud bit being unreliable in Collection 2, we use the cloud mask directly as a validity mask
            file_cl = 'cloud_mask'.join(file.rsplit('LST', 1))
            lowResMaskFilename = os.path.join(cld_dir,file_cl)
            lowresflags = [0]
            with rasterio.open(lowResMaskFilename,'r') as cld_msk : 
                cld_msk_arr = cld_msk.read(1)
                mask_sz = cld_msk_arr.size
                # Ensure that the scene to be sharpened isn't more than 25% cloudy
            if np.count_nonzero(cld_msk_arr)<(1-thresh)*mask_sz :
                valid = True
        # If the scene to be downscaled belongs to Collection 1        
        else : 
            # For Collection 1, we can rely on the QC (preprocessed earlier) file, no need to use the cloud mask
            file_qc = 'QC'.join(file.rsplit('LST', 1))
            file_qf = file_qc.replace('.tif','_QF.tif')
            lowResMaskFilename = os.path.join(QC_dir,file_qf)        
            with rasterio.open(lowResMaskFilename,'r') as mask :
                mask_array = mask.read(1)
                mask_sz = mask_array.size 
            lowresflags = [0,1]
            # Ensure that the scene to be sharpened isn't more than 25% cloudy or invalid
            if np.count_nonzero(mask_array==0) + np.count_nonzero(mask_array==1)>thresh*mask_sz :
                valid = True
        # Only downscale the files that are "valid" 
        if valid :    
            commonOpts = {"highResFiles":               [highResFilename],
                            "lowResFiles":                [lowResFilename],
                            "lowResQualityFiles":         [lowResMaskFilename],
                            "lowResGoodQualityFlags":     lowresflags, # flags for acceptable pixels
                            "cvHomogeneityThreshold":     0, # the homogeneity threshold will be automatically computed
                            "movingWindowSize":           0,
                            "disaggregatingTemperature":  True} # we are dealing with LST
            dtOpts =     {"perLeafLinearRegression":    True,
                            "linearRegressionExtrapolationRatio": 0.25} # how much extrapolation we accept from the tree
            sknnOpts =   {'hidden_layer_sizes':         (10,),
                            'activation':                 'tanh'}
            nnOpts =     {"regressionType":             REG_sklearn_ann,
                            "regressorOpt":               sknnOpts}
            
            start_time = time.time()

            if useDecisionTree: #  Regression tree
                opts = commonOpts.copy()
                opts.update(dtOpts)
                disaggregator = DecisionTreeSharpener(**opts)
            else: # Neural network
                opts = commonOpts.copy()
                opts.update(nnOpts)
                disaggregator = NeuralNetworkSharpener(**opts)

            # Training the tree
            print("Training regressor...")
            disaggregator.trainSharpener()
            # Applying the newly trained tree to the LR image
            print("Sharpening...")
            downscaledFile = disaggregator.applySharpener(highResFilename, lowResFilename)
            # Performing residual analysis, to ensure that the unsharpened images's exitance is conserved
            print("Residual analysis...")
            residualImage, correctedImage = disaggregator.residualAnalysis(downscaledFile, lowResFilename,
                                                                            lowResMaskFilename,
                                                                            doCorrection=True)
            # Saving the image and its residual 
            print("Saving output...")
            highResFile = gdal.Open(highResFilename)
            if correctedImage is not None:
                outImage = correctedImage
            else:
                outImage = downscaledFile

            outFile = utils.saveImg(outImage.GetRasterBand(1).ReadAsArray(),
                                    outImage.GetGeoTransform(),
                                    outImage.GetProjection(),
                                    outputFilename)
            residualFile = utils.saveImg(residualImage.GetRasterBand(1).ReadAsArray(),
                                            residualImage.GetGeoTransform(),
                                            residualImage.GetProjection(),
                                            os.path.splitext(outputFilename)[0] + "_residual" +
                                            os.path.splitext(outputFilename)[1])
            files_sharpened.append(file)
            outFile = None
            residualFile = None
            downsaceldFile = None
            highResFile = None

            print(time.time() - start_time, "seconds")

If you wish to sharpen only part of the image, for instance just the center of a city and not the suburbs, then use this cell.

In [None]:
useDecisionTree = True # You could change this to False if you want to use the Neural Network intead of the Decision tree, not recommended

files_sharpened = [] #list of the files sharpened
dst_dir_f = os.path.join(dst_dir, f"{task_name}-Results")
# Loop through the directory of LST images
for file in os.listdir(lst_dir_sc) :
    if file.endswith('.tif') :
        if not os.path.exists(dst_dir_f) :  # create the output directory if it doesn't exist already
                        os.mkdir(dst_dir_f)
        outputFilename = os.path.join(dst_dir_f,file.replace('.tif','_sharp_S2_clipped.tif')) # destination path for the sharpened images
        lowResFilename = os.path.join(lst_dir_sc,file) 
        lr_ds = rasterio.open(lowResFilename)
        
        # Make sure the coordinates here are in the correct order, otherwise you'll recieve an error. This extent has to be fully included in the bounding box.
        # projwin =  [xmin,ymax,xmax,ymin]
        projwin = [1.336555,43.650982,1.503754,43.549543] # example of a custom cutout  

        hr_img_path_clipped = hr_img_path.replace('.tif',f'_clipped.tif') # also produces a clipped version of the HR image
        ds = gdal.Open(hr_img_path)
        ds = gdal.Translate(hr_img_path_clipped, ds, projWin = projwin)
        ds = None
        lr_ds = None
        
        highResFilename = hr_img_path_clipped 
        
        valid = False # Bool that states if a sence is "valid", not too cloudy or not presenting too many unproduced pixels (limit at 25% by default)
        thresh= 0.75 # you can modify this value between 0 (if you accept any file, the unusable pixels will be masked) and 1 (if you only want to sharpen files where every pixel is usable)
        
        # If the scene to be downscaled is a scene from Collection 2
        if eco_collec == 2 : 
            # The QC cloud bit being unreliable in Collection 2, we use the cloud mask directly as a validity mask
            file_cl = 'cloud_mask'.join(file.rsplit('LST', 1))
            lowResMaskFilename = os.path.join(cld_dir,file_cl)
            lowresflags = [0]
            with rasterio.open(lowResMaskFilename,'r') as cld_msk : 
                cld_msk_arr = cld_msk.read(1)
                mask_sz = cld_msk_arr.size
                # Ensure that the scene to be sharpened isn't more than 25% cloudy
            if np.count_nonzero(cld_msk_arr)<(1-thresh)*mask_sz :
                valid = True
        # If the scene to be downscaled belongs to Collection 1        
        else : 
            # For Collection 1, we can rely on the QC (preprocessed earlier) file, no need to use the cloud mask
            file_qc = 'QC'.join(file.rsplit('LST', 1))
            file_qf = file_qc.replace('.tif','_QF.tif')
            lowResMaskFilename = os.path.join(QC_dir,file_qf)        
            with rasterio.open(lowResMaskFilename,'r') as mask :
                mask_array = mask.read(1)
                mask_sz = mask_array.size 
            lowresflags = [0,1]
            # Ensure that the scene to be sharpened isn't more than 25% cloudy or invalid
            if np.count_nonzero(mask_array==0) + np.count_nonzero(mask_array==1)>thresh*mask_sz :
                valid = True
        # Only downscale the files that are "valid" four our use case
        if valid :
            commonOpts = {"highResFiles":               [highResFilename],
                            "lowResFiles":                [lowResFilename],
                            "lowResQualityFiles":         [lowResMaskFilename],
                            "lowResGoodQualityFlags":     lowresflags, # flags for acceptable pixels
                            "cvHomogeneityThreshold":     0, # the homogeneity threshold will be automatically computed
                            "movingWindowSize":           0,
                            "disaggregatingTemperature":  True} # we are dealing with LST
            dtOpts =     {"perLeafLinearRegression":    True,
                            "linearRegressionExtrapolationRatio": 0.25} # how much extrapolation we accept from the tree
            sknnOpts =   {'hidden_layer_sizes':         (10,),
                            'activation':                 'tanh'}
            nnOpts =     {"regressionType":             REG_sklearn_ann,
                            "regressorOpt":               sknnOpts}
            
            commonOpts = {"highResFiles":               [highResFilename],
                            "lowResFiles":                [lowResFilename],
                            "lowResQualityFiles":         [lowResMaskFilename],
                            "lowResGoodQualityFlags":     [0,1], # flags for acceptable pixels
                            "cvHomogeneityThreshold":     0, # the homogeneity threshold will be automatically computed
                            "movingWindowSize":           0,
                            "disaggregatingTemperature":  True} # we are dealing with LST
            dtOpts =     {"perLeafLinearRegression":    True,
                            "linearRegressionExtrapolationRatio": 0.25} # how much extrapolation we accept from the tree
            sknnOpts =   {'hidden_layer_sizes':         (10,),
                            'activation':                 'tanh'}
            nnOpts =     {"regressionType":             REG_sklearn_ann,
                            "regressorOpt":               sknnOpts}
            
            start_time = time.time()

            if useDecisionTree: #  Regression tree
                opts = commonOpts.copy()
                opts.update(dtOpts)
                disaggregator = DecisionTreeSharpener(**opts)
            else: # Neural network
                opts = commonOpts.copy()
                opts.update(nnOpts)
                disaggregator = NeuralNetworkSharpener(**opts)

            # Training the tree
            print("Training regressor...")
            disaggregator.trainSharpener()
            # Applying the newly trained tree to the LR image
            print("Sharpening...")
            downscaledFile = disaggregator.applySharpener(highResFilename, lowResFilename)
            # Performing residual analysis, to ensure that the unsharpened images's exitance is conserved
            print("Residual analysis...")
            residualImage, correctedImage = disaggregator.residualAnalysis(downscaledFile, lowResFilename,
                                                                            lowResMaskFilename,
                                                                            doCorrection=True)
            # Saving the image and its residual 
            print("Saving output...")
            highResFile = gdal.Open(highResFilename)
            if correctedImage is not None:
                outImage = correctedImage
            else:
                outImage = downscaledFile

            outFile = utils.saveImg(outImage.GetRasterBand(1).ReadAsArray(),
                                    outImage.GetGeoTransform(),
                                    outImage.GetProjection(),
                                    outputFilename)
            residualFile = utils.saveImg(residualImage.GetRasterBand(1).ReadAsArray(),
                                            residualImage.GetGeoTransform(),
                                            residualImage.GetProjection(),
                                            os.path.splitext(outputFilename)[0] + "_residual" +
                                            os.path.splitext(outputFilename)[1])
            files_sharpened.append(file)
            outFile = None
            residualFile = None
            downsaceldFile = None
            highResFile = None

            print(time.time() - start_time, "seconds")

I adivse to check the residual bias and RMSD to analyze the perfomance of the downscaling. It performs better on images that are not nosiy, clear sky and ideally a quite homogenous envrionment.Any RMSD above 5 is really suspicious and usually means that the training was bad. From my observations, unsmasked clouds often explain a poor performance. 

### Display

Plot one random sharpened image

In [None]:
# Pick one random sharpened file
file = random.choice(files_sharpened)
sharpened_file = os.path.join(dst_dir_f,file.replace('.tif','_sharp_S2.tif'))
raw_file = os.path.join(lst_dir_sc,file)
with rasterio.open(raw_file,'r') as lr_img : 
    raw_lst = lr_img.read(1)
with rasterio.open(sharpened_file,'r') as shrp_img :
    shrp_lst = shrp_img.read(1)
    
# Plot the ECOSTRESS original product
plt.figure(1)
plt.imshow(raw_lst,cmap='viridis')
plt.axis('off')
plt.colorbar(label ='LST(K)')
vmin, vmax = plt.gci().get_clim() # save the limits to share them in the second figure
plt.title("ECOSTRESS LST 70m")
plt.show()

# Plot the ECOSTRESS sharpened product
plt.figure(2)
plt.imshow(shrp_lst,cmap='viridis',clim =(vmin,vmax))
plt.axis('off')
plt.colorbar(label ='LST(K)')
plt.title("ECOSTRESS LST sharpened to 30m")
plt.show()