# 1.2.0: Predict final trait maps


After employing spatial K-fold cross-validation (SKCV) during initial model training and evaluating ensemble model performance using a held-out test set of data, we are able to identify the best model for each trait and train a final model on all available data to produce global trait maps.

This will entail:

1. Selecting the best model based on the SKCV results
2. Loading all available predictor and trait data
3. Fitting the model on the full data


## Imports and config


In [1]:
from pathlib import Path

import dask.dataframe as dd
import numpy as np
import pandas as pd
from autogluon.tabular import TabularDataset, TabularPredictor
from dask.distributed import Client

from src.conf.conf import get_config
from src.conf.environment import log
from src.utils.dataset_utils import (
    compute_partitions,
    eo_ds_to_ddf,
    get_eo_fns_list,
    load_rasters_parallel,
    map_da_dtypes,
)

cfg = get_config()

## Load the model

In [9]:
models_d = Path(cfg.models.dir) / cfg.PFT / cfg.model_res / cfg.datasets.Y.use / cfg.train.arch
model_dir = models_d / "X11_mean/high_20240524_234626"
predictor = TabularPredictor.load(str(model_dir))

## Load the predictor data

In [5]:
N_CHUNKS: int = 5

with Client(dashboard_address=cfg.dask_dashboard, memory_limit="80GB"):
    eo_fns = get_eo_fns_list(stage="interim")
    dtypes = map_da_dtypes(eo_fns, dask=True, nchunks=N_CHUNKS)
    ds = load_rasters_parallel(eo_fns, nchunks=N_CHUNKS)
    ddf = eo_ds_to_ddf(ds, dtypes, sample=0.01)
    df = compute_partitions(ddf)

Computing partitions:  12%|█▏        | 10/81 [07:12<51:10, 43.25s/it]
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array.reshape(shape)

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    >>> array.reshape(shape, limit='128 MiB')
  exec(code_obj, self.user_global_ns, self.user_ns)
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array.reshape(shape)

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    >>> array.reshape(shape, limit='128 MiB')
  exec(code_obj, self.user_global_ns, self.user_ns)
Computing partitions: 100%|██████████| 25/25 [16:00<00:00, 38.43s/it]


In [10]:
pred = predictor.predict(df.drop(columns=["x", "y"]))

In [12]:
pred.head()

6302212     16.507893
4322335     17.485106
11540528    17.740084
3782221     17.224155
4916370     16.720385
Name: X11_mean, dtype: float32