![](./resources/System_v1_training_header.png)

This notebook contains a demonstration on how to train custom crop type models based on your own reference data and how to apply the resulting model to generate a custom crop type map.

# Content

- [Before you start](#before-you-start)
- [1. Define region of interest](#1.-Define-a-region-of-interest)
- [2. Check public in-situ reference data](#2.-Check-public-in-situ-reference-data)
- [3. Prepare own reference data](#3.-Prepare-own-reference-data)
- [4. Extract required model inputs](#4.-Extract-required-model-inputs)
- [5. Train custom classification model](#5.-Train-custom-classification-model)
- [6. Generate a map](#6.-Generate-a-map)

# Before you start

In order to run this notebook, you need to create an account on:

- The Copernicus Data Space Ecosystem (CDSE)
--> by completing the form [HERE](https://identity.dataspace.copernicus.eu/auth/realms/CDSE/login-actions/registration?client_id=cdse-public&tab_id=eRKGqDvoYI0)

- VITO's Terrascope platform
--> by completing the form [HERE](https://sso.terrascope.be/auth/realms/terrascope/login-actions/registration?client_id=drupal-terrascope&tab_id=irBzckp2aDo)

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
# First we import the necessary modules to run this notebook

try:
    import worldcereal
except ImportError:
    import sys

    sys.path.append("/home/jovyan/worldcereal-classification/src")
    # sys.path.append('/home/cbutsko/Desktop/worldcereal-classification/src')


from worldcereal.utils.map import get_ui_map


RDM_API = "https://ewoc-rdm-api.iiasa.ac.at"

# 1. Define a region of interest

When running the code snippet below, an interactive map will be visualized.
Click the Rectangle button on the left hand side of the map to start drawing your region of interest.
When finished, execute the second cell to store the coordinates of your region of interest. 

In [3]:
m, dc = get_ui_map()
m

Map(center=[51.1872, 5.1154], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoo…

In [4]:
# retrieve bounding box from drawn rectangle
from utils import get_bbox_from_draw

bbox, poly = get_bbox_from_draw(dc)

Your area of interest: (4.195404, 51.064702, 5.339355, 51.298852)


# 2. Check public in situ reference data

Here we do a series of requests to the RDM API to retrieve the collections and samples overlapping our bbox...

‼ The following snippet does not query the RDM API, but parquet file on Cloudferro bucket with Phase I extractions

In [5]:
from utils import query_worldcereal_samples

public_df = query_worldcereal_samples(poly)

Querying WorldCereal global database ...


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Processing selected samples ...
Extracted and processed 8533 samples from global database.


# 3.Prepare own reference data

Include some guidelines on how to upload user dataset to RDM (using the UI) and requesting those user samples through the API.

In [6]:
merged_df = public_df.copy()

### 3a. Select desired crops for prediction

Crops with ticked checkboxes will be included in the prediction. All the crops that are not selected will be grouped under the "other_crop" category.

In [13]:
from utils import pick_croptypes

custom_df = pick_croptypes(merged_df)
custom_df["custom_class"].value_counts()

# 4. Extract required model inputs

Here we launch point extractions for all samples intersecting our bbox resulting in a set of parquet files.

We collect all these inputs and prepare presto features for each sample.

In [37]:
import numpy as np
from catboost import CatBoostClassifier, Pool
from sklearn.metrics import classification_report
from sklearn.utils.class_weight import compute_class_weight
from sklearn.model_selection import train_test_split
from presto.utils import DEFAULT_SEED

trn_df, val_df = train_test_split(
    merged_df,
    stratify=merged_df["custom_class"],
    test_size=0.3,
    random_state=DEFAULT_SEED,
)

In [39]:
from utils import get_encodings_targets

presto_model_path = f"{data_dir}/presto-ss-wc-ft-ct-30D_test.pt"
presto_model = Presto.load_pretrained(model_path=presto_model_path, strict=False)

train_encodings, train_targets = get_encodings_targets(
    trn_df, presto_model, batch_size=256
)
val_encodings, val_targets = get_encodings_targets(val_df, presto_model, batch_size=256)

# 5. Train custom classification model
We train a catboost model and upload this model to artifactory.

In [40]:
if np.unique(train_targets).shape[0] > 1:
    eval_metric = "TotalF1"
else:
    eval_metric = "F1"

class_weights = compute_class_weight(
    class_weight="balanced", classes=np.unique(train_targets), y=train_targets
)

custom_downstream_model = CatBoostClassifier(
    iterations=8000,
    depth=8,
    learning_rate=0.05,
    early_stopping_rounds=50,
    # l2_leaf_reg=30,
    colsample_bylevel=0.9,
    l2_leaf_reg=6,
    eval_metric=eval_metric,
    random_state=DEFAULT_SEED,
    # class_weights=class_weights,
    verbose=25,
    class_names=np.unique(train_targets),
)

custom_downstream_model.fit(
    train_encodings, train_targets, eval_set=Pool(val_encodings, val_targets)
)

pred = custom_downstream_model.predict(val_encodings).flatten()

0:	learn: 0.6550097	test: 0.6250140	best: 0.6250140 (0)	total: 160ms	remaining: 21m 21s
25:	learn: 0.6855338	test: 0.6656578	best: 0.6656578 (25)	total: 2.78s	remaining: 14m 14s
50:	learn: 0.7196221	test: 0.6831521	best: 0.6831521 (50)	total: 5.45s	remaining: 14m 9s
75:	learn: 0.7344926	test: 0.7032237	best: 0.7032237 (75)	total: 8.13s	remaining: 14m 8s
100:	learn: 0.7496629	test: 0.7054853	best: 0.7054853 (100)	total: 10.7s	remaining: 13m 59s
125:	learn: 0.7716846	test: 0.7173224	best: 0.7173224 (124)	total: 13.4s	remaining: 13m 59s
150:	learn: 0.7924791	test: 0.7287346	best: 0.7293823 (145)	total: 16.1s	remaining: 13m 56s
175:	learn: 0.8021751	test: 0.7394657	best: 0.7394662 (172)	total: 18.8s	remaining: 13m 56s
200:	learn: 0.8170485	test: 0.7414905	best: 0.7414905 (196)	total: 21.4s	remaining: 13m 51s
225:	learn: 0.8305947	test: 0.7475895	best: 0.7482350 (221)	total: 24.1s	remaining: 13m 48s
250:	learn: 0.8409549	test: 0.7539567	best: 0.7539567 (250)	total: 26.7s	remaining: 13m 45s


In [41]:
print(classification_report(val_targets, pred))

                 precision    recall  f1-score   support

          maize       0.79      0.85      0.82       559
          other       0.62      0.11      0.19        89
pasture_meadows       0.81      0.91      0.85       698
permanent_crops       0.79      0.27      0.41        55
       potatoes       0.80      0.53      0.64        62

       accuracy                           0.80      1463
      macro avg       0.76      0.53      0.58      1463
   weighted avg       0.79      0.80      0.77      1463



# 6. Generate a map

Using our custom model, we generate a map for our region of interest...