# ISPRS Part 4

This notebook is a glimpse into machine learning through the ESDL platform and `xcube`.

The idea is to have a very simple model to predict the land surface temperature from air temperature 2m in Australia. As in this tutorial we do not have any available GPU, we have to limit ourselves to such a simple model and only show the general workflow.

**Please don't expect great results from a very simple approach like this**


First, we load all the libraries packages. Especially the package `ml4xcube` has a leading role here and is the interface between `xcube` and `pytorch`.

In [None]:
import os
import math
import torch
import mlflow
import numpy as np
import xarray as xr
import pandas as pd
import dask.array as da
from torch import nn
from global_land_mask import globe
from xcube.core.store import new_data_store
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
from torch.utils.data import TensorDataset, DataLoader

from ml4xcube.cube_utilities import get_chunk_sizes
from ml4xcube.datasets.xr_dataset import XrDataset
from ml4xcube.statistics import get_range, get_statistics, standardize, standardize, undo_standardizing
from ml4xcube.datasets.pytorch_xr import prepare_dataloader
from ml4xcube.training.pytorch import Trainer
from ml4xcube.data_assignment import assign_block_split, assign_rand_split

## Data processing

We will use another datacube here to get access to the land surface temperature which does not completely cover the world map and has gaps. Also the data is only available until the end of 2011. So we will concentrate on the time between 2009 and 2011

In [None]:
data_store = new_data_store("s3", root="esdl-esdc-v2.1.1", storage_options=dict(anon=True))
dataset    = data_store.open_data('esdc-8d-0.083deg-1x2160x4320-2.1.1.zarr')

# Limit to spatial and temporal dimensions
aus_data = dataset[['land_surface_temperature', 'air_temperature_2m']].sel(
    lat = slice(-17, -37),
    lon = slice(110, 130),
    time = slice('2009-01-01', '2011-12-31')
)

Land surface temperature only exist on land. So we want to add a land mask to our dataset. The stacking of data which is not time-dependend is not necessary in reality, but it makes things much easier when accessing the data.

In [None]:
lon_grid, lat_grid = np.meshgrid(aus_data.lon,aus_data.lat)
lm0                = da.from_array(globe.is_land(lat_grid, lon_grid))
lm = da.stack([lm0 for i in range(aus_data.sizes['time'])], axis = 0)
extended_data = aus_data.assign(
    land_mask = (['time','lat','lon'], lm)
)
extended_data

Let's have a look at the data. You can clearly see the gabs in the the land surface temperature.

In [None]:
dataslice = extended_data.sel(time='2009-08-25')
fig = plt.figure(figsize=(6,10))
ax1 = fig.add_subplot(211)
dataslice['air_temperature_2m'].plot()
ax2 = fig.add_subplot(212)
dataslice['land_surface_temperature'].plot()

For machine learning approaches, we need to split the data into a training and a test dataset. This can be done with `ml4xcube`. There are two possible ways: random sampling and block sampling. While random sampling is the standard for less **autocorrelated** data, it is recommended to use block sampling in spatio-temporal data to overcome **overfitting**.

In [None]:
#random sampling
#final_data = assign_rand_split(
#    ds    = extended_data,
#    split = 0.8
#)

# block sampling
final_data = assign_block_split(
    ds    = extended_data,
    block_size = [("time", 20), ("lat", 20), ("lon", 20)],
    split = 0.8
)
final_data.split.sel(time = '2009-08-25').plot()

We now have a nice dataset which is ready to be used in machine learning approaches of different possible frameworks. In our case we decided to use **PyTorch**. So from now we have some framework agnostic code.

## Prepare the data for PyTorch

The PyTorch framework is well known and a classic approach in machine learning applications. As we are limited by computational power, we will not go into deep learning, as no GPU is present on this Jupyterlab. In general it is easy to apply for a GPU when using DeepESDL.

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

For machine learning the data needs some more processing. This includes standardization. It means all data are scaled to a common ground. This is a standard procedure in machine learning. Also DataLoader are created, which are essential for PyTorch. It is common practice to use `X` and `Y` here which resembles our data for the regression model $y = f(x)$.

Here: Hyperparameter **batch_size**

In [None]:
# make XR dataset
dataset = XrDataset(ds=final_data, num_chunks=3, rand_chunk=False).get_dataset()
# standardize the datasets
at_stat  = get_statistics(dataset, 'air_temperature_2m')
lst_stat = get_statistics(dataset, 'land_surface_temperature')
X = standardize(dataset['air_temperature_2m'], *at_stat)
Y = standardize(dataset['land_surface_temperature'], *lst_stat)
# split the arrays into Train and Test
X_train, X_test = X[dataset['split'] == True], X[dataset['split'] == False]
Y_train, Y_test = Y[dataset['split'] == True], Y[dataset['split'] == False]
# shaping as inputs
X_train = X_train.reshape(-1, 1)  # Making it [num_samples, 1]
Y_train = Y_train.reshape(-1, 1)  # Making it [num_samples, 1]
X_test  = X_test.reshape(-1, 1)
Y_test  = Y_test.reshape(-1, 1)
# prepare according PyTorch dataloaders
# Hyperparamater batch_size
train_ds     = TensorDataset(torch.tensor(X_train), torch.tensor(Y_train))
train_loader = prepare_dataloader(train_ds, batch_size=64, parallel=False)
test_ds      = TensorDataset(torch.tensor(X_test), torch.tensor(Y_test))
test_loader  = prepare_dataloader(test_ds, batch_size=64, parallel=False)

## Model development

The model of a machine learning system is also called architecture. We choose a very shallow network with only 4 linear layers. So we have 1 input layer, 2 hidden layers and 1 output layer. More layers or different architectures might improve the result, but also extend the training times significantly.

Here: Hyperparameter **learning rate**

In [None]:
# model, loss and optimizer
class Model(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, hidden_size)
        self.fc4 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        return x

# learning rate
lr     = 0.000001
epochs = 10

reg_model = Model(input_size=1, hidden_size=1, output_size=1)
mse_loss  = nn.MSELoss()
optimizer = torch.optim.SGD(reg_model.parameters(), lr=lr)

Next to the model it is important to configure the Trainer. This is the part where all basic blocks like model, datasets and more things are coming together. As you see, we also include a way to stop learning earlier, if the model didn't improve after 3 epochs.

In [None]:
# Define the path for saving the best model
best_model_path = './best_model.pth'

# Trainer instance
trainer = Trainer(
    model           = reg_model,
    train_data      = train_loader,
    test_data       = test_loader,
    optimizer       = optimizer,
    best_model_path = best_model_path,
    early_stopping  = True,
    patience        = 3,
    epochs          = epochs
)

## Train the model

The following line will train the model. This might take some minutes of time. Also this is a statistical process and might change every time you run this. Also please have in mind you need to reset the model, if you want to restart the training. This means you need to run the cells again since you defined the model.

In [None]:
# In case you want to reset all training progress of the model, run this cell

for layer in reg_model.children():
   if hasattr(layer, 'reset_parameters'):
       layer.reset_parameters()

In [None]:
# Start training
reg_model = trainer.train()

## Results

Now it's time to find out about the results of the model. We will plot the result for one single point in time. So we pick all the points and need to transform them to fit the model, like we did before.

In [None]:
result_df     = final_data.sel(time = '2009-08-25').to_dataframe()
result_df_lm  = result_df[result_df['land_mask'] == True]
orig          = result_df_lm.dropna()
to_pred = result_df_lm[np.isnan(result_df_lm['land_surface_temperature'])]
output  = to_pred.drop('land_surface_temperature', axis = 1)
X = standardize(to_pred['air_temperature_2m'], *at_stat)
X = X.values
# Ensure X is a float32 tensor
X_tensor = torch.tensor(X.reshape(-1, 1), dtype=torch.float32).to(device)

This is the cell where the model is applied to the data to find the proper result.

In [None]:
# Use the tensor with the correct dtype when calling the model
lstp = reg_model(X_tensor)
output['land_surface_temperature'] = undo_standardizing(lstp.detach().cpu().numpy(), *lst_stat)

To plot the final results, we need the result as xarray dataarrays again. This is a cumbersome process which will be better addressed in a future release of `ml4xcube`

In [None]:
# get data back to xarray
df  = pd.concat([orig['land_surface_temperature'], output['land_surface_temperature']])
series_aggregated = df.groupby(['lat', 'lon']).mean()
df_reshaped = series_aggregated.unstack()
data_array = xr.DataArray(
    df_reshaped.values,  # 2D array of the land surface temperature values
    coords={'lat': df_reshaped.index.values, 'lon': df_reshaped.columns.values},  # lat and lon as coordinates
    dims=['lat', 'lon']  # Specify dimensions
)

## Result plot

Now it comes to the last cell of the workshop. You've seen how easy it is to use machine learning models with this system. The whole process does not change for complex deep learning models. The pipeline is the same.

In [None]:
df = final_data.sel(time='2009-08-25')
fig = plt.figure(figsize=(6,10))
ax1 = fig.add_subplot(311)
(df['air_temperature_2m']).plot()
ax2 = fig.add_subplot(312)
(df['land_surface_temperature']).plot()
ax3 = fig.add_subplot(313)
data_array.plot()

The result might seem that there is no difference in the predicitons, but there are. The model is limited by design. Other models with longer computation times will get much better results here. Especially when time-series or location aspects are used. 

Some ideas for nice models in spatio temporal analysis are: CNNs, LSTM, Transformers and mixes of those. Also Autoencoders can give very nice results. Especially when it comes to gap-filling processes.