# ML on ESDC using PyTorch including Transfer Learning
A DeepESDL example notebook

## Linear Regression for prediction of missing land surface temperature values from air temperature values
This notebook demonstrates how to implement Machine Learning on the Earth System Data Cube using the ML library PyTorch, how to safe the model and how to reload it for a second task (Transfer Learning). The workflow is self-contained and based on a generic use case to showcase data loading, sampling strategies, model training, model evaluation and visualisation.

Please, also refer to the DeepESDL documentation and visit the platform's website for further information!

ScaDS.AI, 2023

**This notebook runs with the python environment deepesdl-ml-transfer-learning, please checkout the documentation for help on [changing the environment](https://deepesdl.readthedocs.io/en/v2022.12.1/guide/jupyterlab/).**

### Import necessary libraries


In [1]:
import torch
import xarray as xr
from torch import nn
from xcube.core.store import new_data_store
from torch.utils.data import TensorDataset

### Load Data (Earth System Data Cube)
We load the ESDC (*.zarr) from the s3 data store (lazy load). The ESDC consists of three dimensions (longitude, latitude, time). Out of many available cube variables, which are dask arrays, we load two ("land_surface_temperature", "air_temperature_2m"). 

In [2]:
data_store = new_data_store("s3", root="esdl-esdc-v2.1.1", storage_options=dict(anon=True))
dataset    = data_store.open_data('esdc-8d-0.083deg-184x270x270-2.1.1.zarr')

# Smaller cube for demo case
start_time = "2002-05-21"
end_time   = "2002-08-01"
ds         = dataset[["land_surface_temperature", "air_temperature_2m"]].sel(time=slice(start_time, end_time))
ds

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 355.96 MiB 2.78 MiB Shape (10, 2160, 4320) (10, 270, 270) Dask graph 128 chunks in 3 graph layers Data type float32 numpy.ndarray",4320  2160  10,

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 355.96 MiB 2.78 MiB Shape (10, 2160, 4320) (10, 270, 270) Dask graph 128 chunks in 3 graph layers Data type float32 numpy.ndarray",4320  2160  10,

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


### Assign a random train/test split

In [3]:
from ml4xcube.data_split import assign_rand_split

# random sampling
xds = assign_rand_split(
    ds    = ds,
    split = 0.8
)
xds

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 355.96 MiB 2.78 MiB Shape (10, 2160, 4320) (10, 270, 270) Dask graph 128 chunks in 3 graph layers Data type float32 numpy.ndarray",4320  2160  10,

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 355.96 MiB 2.78 MiB Shape (10, 2160, 4320) (10, 270, 270) Dask graph 128 chunks in 3 graph layers Data type float32 numpy.ndarray",4320  2160  10,

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 3 graph layers,128 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,88.99 MiB,711.91 kiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 2 graph layers,128 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 88.99 MiB 711.91 kiB Shape (10, 2160, 4320) (10, 270, 270) Dask graph 128 chunks in 2 graph layers Data type bool numpy.ndarray",4320  2160  10,

Unnamed: 0,Array,Chunk
Bytes,88.99 MiB,711.91 kiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 2 graph layers,128 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray


### Model set up

Select cuda device if available to use GPU ressources

In [4]:
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

Using cpu device


In [5]:
from ml4xcube.datasets.xr_dataset import XrDataset

dataset = XrDataset(ds=xds, num_chunks=2, rand_chunk=False).get_dataset()

{'land_surface_temperature': array([262.01   , 261.918  , 261.52   , ..., 289.3816 , 285.9026 ,
       283.60678], dtype=float32), 'air_temperature_2m': array([267.48087, 267.48087, 267.48087, ..., 282.1732 , 282.1732 ,
       282.1732 ], dtype=float32), 'split': array([False,  True, False, ...,  True,  True,  True])}


Get range (min, max) and statistics (mean, std) of data variables for normalization or standardization.

In [6]:
from ml4xcube.preprocessing import get_range, get_statistics

#at_range  = get_range(dataset, 'air_temperature_2m')
#lst_range = get_range(dataset, 'land_surface_temperature')

at_stat  = get_statistics(dataset, 'air_temperature_2m')
lst_stat = get_statistics(dataset, 'land_surface_temperature')

In [7]:
from ml4xcube.preprocessing import standardize

X = standardize(dataset['air_temperature_2m'], *at_stat)
y = standardize(dataset['land_surface_temperature'], *lst_stat)

In [8]:
X_train, X_test = X[dataset['split'] == True], X[dataset['split'] == False]
y_train, y_test = y[dataset['split'] == True], y[dataset['split'] == False]

X_train = X_train.reshape(-1, 1)  # Making it [num_samples, 1]
y_train = y_train.reshape(-1, 1)  # Making it [num_samples, 1]
X_test  = X_test.reshape(-1, 1)
y_test  = y_test.reshape(-1, 1)

In [9]:
from ml4xcube.datasets.pytorch import prepare_dataloader

train_ds     = TensorDataset(torch.tensor(X_train), torch.tensor(y_train))
train_loader = prepare_dataloader(train_ds, batch_size=32, num_workers=5, parallel=False)
                
test_ds      = TensorDataset(torch.tensor(X_test), torch.tensor(y_test))
test_loader  = prepare_dataloader(test_ds, batch_size=32, num_workers=5, parallel=False)

#### Define model, loss and optimizer

In [10]:
# model, loss and optimizer
class Model(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, hidden_size)
        self.fc4 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        return x

lr     = 0.0001
epochs = 10

reg_model = Model(input_size=1, hidden_size=1, output_size=1)
mse_loss  = nn.MSELoss()
optimizer = torch.optim.SGD(reg_model.parameters(), lr=lr)

### Train model

We iterate through the chunks of the ESDC. The data will be preprocessed by flattening, removing NaNs, normalization or standardization. Further, we will split the data into a training and testing fraction. We generate a train data loader and a test data loader and perform a linear regression. The train and test errors are returned during model training.

In [11]:
from ml4xcube.training.pytorch import Trainer

# Define the path for saving the best model
best_model_path = './best_model.pth'

# Trainer instance
trainer = Trainer(
    model           = reg_model,
    train_data      = train_loader,
    test_data       = test_loader,
    optimizer       = optimizer,
    best_model_path = best_model_path,
    early_stopping  = True,
    patience        = 3,
    epochs          = epochs
)

# Start training
reg_model = trainer.train()

Epoch 0: Average Loss: 2.0318e+00
Epoch 0: Validation Loss: 2.0327e+00
New best model saved with validation loss: 2.032738710013568
Epoch 1: Average Loss: 2.0318e+00
Epoch 1: Validation Loss: 2.0327e+00
New best model saved with validation loss: 2.03273724238783
Epoch 2: Average Loss: 2.0317e+00
Epoch 2: Validation Loss: 2.0326e+00
New best model saved with validation loss: 2.0325701706455463
Epoch 3: Average Loss: 2.0318e+00
Epoch 3: Validation Loss: 2.0326e+00
Epoch 4: Average Loss: 2.0318e+00
Epoch 4: Validation Loss: 2.0327e+00
Epoch 5: Average Loss: 2.0318e+00
Epoch 5: Validation Loss: 2.0327e+00
Epoch 6: Average Loss: 2.0318e+00
Epoch 6: Validation Loss: 2.0327e+00
Stopping early due to no improvement.
Loaded best model weights.


### Load pre-trained model and set up
We load the pre-trained model weights into a modified model. The last layer of the pre-trained model is replaced by a new one.
The modified model is then trained on a second task.

In [12]:
# Define the modified model
class ModifiedModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, hidden_size)
        # no layer 4

        # Add a new layer
        self.fc5 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc5(x) # This is the new layer
        return x

# Create an instance of the modified model
reg_model = ModifiedModel(input_size=1, hidden_size=1, output_size=1)

# Load the pre-trained model weights
# strict = False: ignores non matching keys
reg_model.load_state_dict(torch.load(best_model_path), strict=False)
reg_model.eval()

mse_loss  = nn.MSELoss()
optimizer = torch.optim.SGD(reg_model.parameters(), lr=0.01)

# use gpu if available
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

Using cpu device


#### Load Data
Here we use the same ESDC data as before. Normally you would use other data.

In [13]:
data_store = new_data_store("s3", root="esdl-esdc-v2.1.1", storage_options=dict(anon=True))
dataset    = data_store.open_data('esdc-8d-0.083deg-184x270x270-2.1.1.zarr')

# Smaller cube for demo case
dataset    = xr.open_zarr('lst_small.zarr')
ds         = dataset[['land_surface_temperature', 'air_temperature_2m']]

### Assign random train/test split

In [14]:
from ml4xcube.data_split import assign_rand_split

# random sampling
xds = assign_rand_split(
    ds    = ds,
    split = 0.8
)
xds

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 2 graph layers,128 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 355.96 MiB 2.78 MiB Shape (10, 2160, 4320) (10, 270, 270) Dask graph 128 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  2160  10,

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 2 graph layers,128 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 2 graph layers,128 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 355.96 MiB 2.78 MiB Shape (10, 2160, 4320) (10, 270, 270) Dask graph 128 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  2160  10,

Unnamed: 0,Array,Chunk
Bytes,355.96 MiB,2.78 MiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 2 graph layers,128 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,88.99 MiB,711.91 kiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 2 graph layers,128 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 88.99 MiB 711.91 kiB Shape (10, 2160, 4320) (10, 270, 270) Dask graph 128 chunks in 2 graph layers Data type bool numpy.ndarray",4320  2160  10,

Unnamed: 0,Array,Chunk
Bytes,88.99 MiB,711.91 kiB
Shape,"(10, 2160, 4320)","(10, 270, 270)"
Dask graph,128 chunks in 2 graph layers,128 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray


In [15]:
from ml4xcube.datasets.xr_dataset import XrDataset

dataset = XrDataset(ds=xds, num_chunks=2, rand_chunk=True).get_dataset()

{'land_surface_temperature': array([261.392  , 261.66678, 261.7304 , ..., 306.52   , 311.25235,
       313.3725 ], dtype=float32), 'air_temperature_2m': array([268.7839 , 268.7839 , 268.7839 , ..., 296.35587, 296.35587,
       296.35587], dtype=float32), 'split': array([ True,  True,  True, ...,  True, False,  True])}


In [16]:
from ml4xcube.preprocessing import get_statistics, standardize

at_stat  = get_statistics(dataset, 'air_temperature_2m')
lst_stat = get_statistics(dataset, 'land_surface_temperature')

X = standardize(dataset['air_temperature_2m'], *at_stat)
y = standardize(dataset['land_surface_temperature'], *lst_stat)

In [17]:
X_train, X_test = X[dataset['split'] == True], X[dataset['split'] == False]
y_train, y_test = y[dataset['split'] == True], y[dataset['split'] == False]

X_train = X_train.reshape(-1, 1)  # Making it [num_samples, 1]
y_train = y_train.reshape(-1, 1)  # Making it [num_samples, 1]
X_test  = X_test.reshape(-1, 1)
y_test  = y_test.reshape(-1, 1)

### Train pre-trained model

In [18]:
from ml4xcube.training.pytorch import Trainer

# Define the path for saving the best model
best_model_path = './best_model_new.pth'

# Trainer instance
trainer = Trainer(
    model           = reg_model,
    train_data      = train_loader,
    test_data       = test_loader,
    optimizer       = optimizer,
    best_model_path = best_model_path,
    early_stopping  = True,
    patience        = 3,
    epochs          = epochs
)

# Start training
reg_model = trainer.train()

Epoch 0: Average Loss: 2.0219e+00
Epoch 0: Validation Loss: 2.0217e+00
New best model saved with validation loss: 2.021725369128254
Epoch 1: Average Loss: 2.0218e+00
Epoch 1: Validation Loss: 2.0217e+00
Epoch 2: Average Loss: 2.0219e+00
Epoch 2: Validation Loss: 2.0217e+00
Epoch 3: Average Loss: 2.0219e+00
Epoch 3: Validation Loss: 2.0217e+00
New best model saved with validation loss: 2.0217235575742674
Epoch 4: Average Loss: 2.0218e+00
Epoch 4: Validation Loss: 2.0217e+00
Epoch 5: Average Loss: 2.0219e+00
Epoch 5: Validation Loss: 2.0217e+00
New best model saved with validation loss: 2.021701912893089
Epoch 6: Average Loss: 2.0218e+00
Epoch 6: Validation Loss: 2.0217e+00
New best model saved with validation loss: 2.0216948229390783
Epoch 7: Average Loss: 2.0219e+00
Epoch 7: Validation Loss: 2.0217e+00
New best model saved with validation loss: 2.021690413489332
Epoch 8: Average Loss: 2.0219e+00
Epoch 8: Validation Loss: 2.0217e+00
Epoch 9: Average Loss: 2.0218e+00
Epoch 9: Validation 