# ML on ESDC using PyTorch including Transfer Learning
A DeepESDL example notebook

## Linear Regression for prediction of missing land surface temperature values from air temperature values
This notebook demonstrates how to implement Machine Learning on the Earth System Data Cube using the ML library PyTorch, how to safe the model and how to reload it for a second task (Transfer Learning). The workflow is self-contained and based on a generic use case to showcase data loading, sampling strategies, model training, model evaluation and visualisation.

Please, also refer to the DeepESDL documentation and visit the platform's website for further information!

ScaDS.AI, 2023

**This notebook runs with the python environment deepesdl-ml-transfer-learning, please checkout the documentation for help on [changing the environment](https://deepesdl.readthedocs.io/en/v2022.12.1/guide/jupyterlab/).**

### Import necessary libraries


In [1]:
import torch
import xarray as xr
from torch import nn
from torch.utils.data import TensorDataset
from xcube.core.store import new_data_store

### Load Data (Earth System Data Cube)
We load the ESDC (*.zarr) from the s3 data store (lazy load). The ESDC consists of three dimensions (longitude, latitude, time). Out of many available cube variables, which are dask arrays, we load two ("land_surface_temperature", "air_temperature_2m"). 

In [2]:
data_store = new_data_store("s3", root="esdl-esdc-v2.1.1", storage_options=dict(anon=True))
dataset    = data_store.open_data('esdc-8d-0.083deg-184x270x270-2.1.1.zarr')

# Smaller cube for demo case
start_time = "2002-05-21"
end_time   = "2002-08-01"
ds         = dataset[["land_surface_temperature", "air_temperature_2m"]].sel(time=slice(start_time, end_time))
ds

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 355.96 MiB 2.78 MiB Shape (10, 2160, 4320) (10, 270, 270) Dask graph 128 chunks in 3 graph layers Data type float32 numpy.ndarray",4320  2160  10,

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 355.96 MiB 2.78 MiB Shape (10, 2160, 4320) (10, 270, 270) Dask graph 128 chunks in 3 graph layers Data type float32 numpy.ndarray",4320  2160  10,

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


### Train/Test Split assignment
Here you can choose between random sampling and block sampling. Based on the use case, the application of ML analyses on remotely sensed data is prone to autocorrelation, especially when the data is split randomly. Block sampling is one strategy to deal with autocorrelation.

In [3]:
from ml4xcube.splits import assign_rand_split

# random sampling
xds = assign_rand_split(
    ds    = ds,
    split = 0.8
)
xds

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 355.96 MiB 2.78 MiB Shape (10, 2160, 4320) (10, 270, 270) Dask graph 128 chunks in 3 graph layers Data type float32 numpy.ndarray",4320  2160  10,

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 355.96 MiB 2.78 MiB Shape (10, 2160, 4320) (10, 270, 270) Dask graph 128 chunks in 3 graph layers Data type float32 numpy.ndarray",4320  2160  10,

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,711.91 MiB,5.56 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 711.91 MiB 5.56 MiB Shape (10, 2160, 4320) (10, 270, 270) Dask graph 128 chunks in 3 graph layers Data type float64 numpy.ndarray",4320  2160  10,

Unnamed: 0,Array,Chunk
Bytes,711.91 MiB,5.56 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


### Select cuda device if available to use GPU ressources

In [4]:
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

Using cpu device


### Train-/ and Test Set Creation and Preprocessing

We utilize the `sampler` class to process and split the Earth System Data Cube (ESDC) dataset into manageable chunks. This involves several key preprocessing steps, including flattening the data, handling missing values (NaNs), and optionally applying normalization or standardization techniques. The `sampler` is also responsible for dividing the dataset into training and testing subsets based on specified chunk indices, making it easier to train and evaluate models effectively.

In this example, we configure the `XrDataset` class from `ml4xcube.datasets.xr_dataset` to extract specific chunks from the dataset, where each chunk is identified by an index. These chunks are then split into training and testing datasets, targeting the prediction of the `"land_surface_temperature"` variable. The resulting datasets are stored in the respective variables `X_train`, `y_train` for the training set and `X_test`, `y_test` for the testing set.

This streamlined approach ensures that the dataset is prepared and ready for model training, regression analysis, and evaluation in a consistent and automated manner.

In [5]:
from ml4xcube.datasets.xr_dataset import XrDataset

sampler = XrDataset(
    ds         = xds, 
    num_chunks = 2, 
    rand_chunk = False, 
    to_pred    = 'land_surface_temperature', 
)

train_data, test_data = sampler.get_datasets()
X_train, y_train      = train_data 
X_test, y_test        = test_data

{'land_surface_temperature': array([-1.5660757, -1.5810803, -1.6459857, ...,  2.8975375,  2.3301997,
        1.9558117], dtype=float32), 'air_temperature_2m': array([-2.1369941, -2.1369941, -2.1369941, ...,  2.7178104,  2.7178104,
        2.7178104], dtype=float32), 'split': array([1., 1., 0., ..., 1., 1., 1.])}
set train and test data


#### Prepare Datasets before Training

In [6]:
from ml4xcube.datasets.pytorch import prep_dataloader

X_train = X_train.reshape(-1, 1) 
y_train = y_train.reshape(-1, 1)  
X_test  = X_test.reshape(-1, 1)
y_test  = y_test.reshape(-1, 1)

train_ds = TensorDataset(torch.tensor(X_train), torch.tensor(y_train))
test_ds  = TensorDataset(torch.tensor(X_test), torch.tensor(y_test))

train_loader, test_loader = prep_dataloader(
    train_ds    = train_ds, 
    test_ds     = test_ds, 
    batch_size  = 32, 
    num_workers = 5, 
    parallel    = False
)

### Define model, loss and optimizer

In [7]:
# model, loss and optimizer
class Model(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, hidden_size)
        self.fc4 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        return x

lr     = 0.0001
epochs = 10

reg_model = Model(input_size=1, hidden_size=1, output_size=1)
mse_loss  = nn.MSELoss()
optimizer = torch.optim.SGD(reg_model.parameters(), lr=lr)

### Train model

In [8]:
from ml4xcube.training.pytorch import Trainer

# Define the path for saving the best model
best_model_path = './best_model.pth'

# Trainer instance
trainer = Trainer(
    model          = reg_model,
    train_data     = train_loader,
    test_data      = test_loader,
    optimizer      = optimizer,
    model_path     = best_model_path,
    early_stopping = True,
    patience       = 3,
    epochs         = epochs,
    loss           = mse_loss
)

# Start training
reg_model = trainer.train()

Using cpu device
Epoch 1: Average Loss: 7.7443e-01
Epoch 1: Validation Loss: 7.8205e-01
New best model saved with validation loss: 0.7820509825018757
Epoch 2: Average Loss: 7.7442e-01
Epoch 2: Validation Loss: 7.8193e-01
New best model saved with validation loss: 0.7819343874871566
Epoch 3: Average Loss: 7.7443e-01
Epoch 3: Validation Loss: 7.8203e-01
Epoch 4: Average Loss: 7.7442e-01
Epoch 4: Validation Loss: 7.8207e-01
Epoch 5: Average Loss: 7.7443e-01
Epoch 5: Validation Loss: 7.8202e-01
Stopping early due to no improvement.
Loaded best model weights.


### Load pre-trained model and set up
We load the pre-trained model weights into a modified model. The last layer of the pre-trained model is replaced by a new one.
The modified model is then trained on a second task.

In [9]:
# Define the modified model
class ModifiedModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, hidden_size)
        # no layer 4

        # Add a new layer
        self.fc5 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc5(x) # This is the new layer
        return x

# Create an instance of the modified model
reg_model = ModifiedModel(input_size=1, hidden_size=1, output_size=1)

# Load the pre-trained model weights
# strict = False: ignores non matching keys
reg_model.load_state_dict(torch.load(best_model_path), strict=False)
reg_model.eval()

mse_loss  = nn.MSELoss()
optimizer = torch.optim.SGD(reg_model.parameters(), lr=0.01)

# use gpu if available
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

Using cpu device


#### Load Data
Here we use the same ESDC data as before. Normally you would use other data.

In [13]:
data_store = new_data_store("s3", root="esdl-esdc-v2.1.1", storage_options=dict(anon=True))
dataset    = data_store.open_data("esdc-8d-0.083deg-184x270x270-2.1.1.zarr")

# Smaller cube for demo case
start_time = "2002-05-21"
end_time   = "2002-08-01"
ds         = dataset[["land_surface_temperature", "air_temperature_2m"]].sel(time=slice(start_time, end_time))
ds

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 355.96 MiB 2.78 MiB Shape (10, 2160, 4320) (10, 270, 270) Dask graph 128 chunks in 3 graph layers Data type float32 numpy.ndarray",4320  2160  10,

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 355.96 MiB 2.78 MiB Shape (10, 2160, 4320) (10, 270, 270) Dask graph 128 chunks in 3 graph layers Data type float32 numpy.ndarray",4320  2160  10,

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


### Train/Test Split assignment

In [15]:
from ml4xcube.splits import assign_rand_split

# random sampling
xds = assign_rand_split(
    ds    = ds,
    split = 0.8
)
xds

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 355.96 MiB 2.78 MiB Shape (10, 2160, 4320) (10, 270, 270) Dask graph 128 chunks in 3 graph layers Data type float32 numpy.ndarray",4320  2160  10,

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 355.96 MiB 2.78 MiB Shape (10, 2160, 4320) (10, 270, 270) Dask graph 128 chunks in 3 graph layers Data type float32 numpy.ndarray",4320  2160  10,

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,711.91 MiB,5.56 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 711.91 MiB 5.56 MiB Shape (10, 2160, 4320) (10, 270, 270) Dask graph 128 chunks in 3 graph layers Data type float64 numpy.ndarray",4320  2160  10,

Unnamed: 0,Array,Chunk
Bytes,711.91 MiB,5.56 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


### Train-/ and Test Set Creation and Preprocessing

In [16]:
from ml4xcube.datasets.xr_dataset import XrDataset

sampler = XrDataset(
    ds         = xds, 
    num_chunks = 2, 
    rand_chunk = True,
    to_pred    = "land_surface_temperature"
)

train_data, test_data = sampler.get_datasets()
X_train, y_train      = train_data 
X_test, y_test        = test_data

{'land_surface_temperature': array([-2.218797 , -2.1988616, -2.194245 , ...,  1.0552449,  1.3985783,
        1.5523953], dtype=float32), 'air_temperature_2m': array([-2.3387394, -2.3387394, -2.3387394, ...,  1.4443846,  1.4443846,
        1.4443846], dtype=float32), 'split': array([1., 1., 1., ..., 0., 1., 0.])}
set train and test data


#### Prepare Datasets before Training

In [17]:
from ml4xcube.datasets.pytorch import prep_dataloader

X_train = X_train.reshape(-1, 1)  # Making it [num_samples, 1]
y_train = y_train.reshape(-1, 1)  # Making it [num_samples, 1]
X_test  = X_test.reshape(-1, 1)
y_test  = y_test.reshape(-1, 1)

train_ds = TensorDataset(torch.tensor(X_train), torch.tensor(y_train))
test_ds  = TensorDataset(torch.tensor(X_test), torch.tensor(y_test))

train_loader, test_loader = prep_dataloader(
    train_ds    = train_ds, 
    test_ds     = test_ds, 
    batch_size  = 64, 
    num_workers = 5, 
    parallel    = False
)

### Train pre-trained model

In [18]:
from ml4xcube.training.pytorch import Trainer

# Define the path for saving the best model
best_model_path = './best_model_new.pth'

# Trainer instance
trainer = Trainer(
    model          = reg_model,
    train_data     = train_loader,
    test_data      = test_loader,
    optimizer      = optimizer,
    model_path     = best_model_path,
    early_stopping = True,
    patience       = 3,
    epochs         = epochs,
    loss           = mse_loss
)

# Start training
reg_model = trainer.train()

Using cpu device
Epoch 1: Average Loss: 1.3517e+00
Epoch 1: Validation Loss: 1.3501e+00
New best model saved with validation loss: 1.350057159414751
Epoch 2: Average Loss: 1.3517e+00
Epoch 2: Validation Loss: 1.3500e+00
New best model saved with validation loss: 1.3499878493183641
Epoch 3: Average Loss: 1.3517e+00
Epoch 3: Validation Loss: 1.3503e+00
Epoch 4: Average Loss: 1.3517e+00
Epoch 4: Validation Loss: 1.3503e+00
Epoch 5: Average Loss: 1.3517e+00
Epoch 5: Validation Loss: 1.3501e+00
Stopping early due to no improvement.
Loaded best model weights.
