![](./resources/System_v1_training_header.png)

This notebook contains a demonstration on how to train custom crop type models based on your own reference data and how to apply the resulting model to generate a custom crop type map.

# Content

- [Before you start](#before-you-start)
- [1. Define region of interest](#1.-Define-a-region-of-interest)
- [2. Check public in-situ reference data](#2.-Check-public-in-situ-reference-data)
- [3. Prepare own reference data](#3.-Prepare-own-reference-data)
- [4. Extract required model inputs](#4.-Extract-required-model-inputs)
- [5. Train custom classification model](#5.-Train-custom-classification-model)
- [6. Generate a map](#6.-Generate-a-map)

# Before you start

In order to run this notebook, you need to create an account on:

- The Copernicus Data Space Ecosystem (CDSE)
--> by completing the form [HERE](https://identity.dataspace.copernicus.eu/auth/realms/CDSE/login-actions/registration?client_id=cdse-public&tab_id=eRKGqDvoYI0)

- VITO's Terrascope platform
--> by completing the form [HERE](https://sso.terrascope.be/auth/realms/terrascope/login-actions/registration?client_id=drupal-terrascope&tab_id=irBzckp2aDo)

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from worldcereal.utils.map import get_ui_map

RDM_API = "https://ewoc-rdm-api.iiasa.ac.at"

# 1. Define a region of interest

When running the code snippet below, an interactive map will be visualized.
Click the Rectangle button on the left hand side of the map to start drawing your region of interest.
When finished, execute the second cell to store the coordinates of your region of interest. 

In [3]:
m, dc = get_ui_map()
m

Map(center=[51.1872, 5.1154], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoo…

In [8]:
# retrieve bounding box from drawn rectangle
from worldcereal.utils.map import get_bbox_from_draw

spatial_extent, bbox, poly = get_bbox_from_draw(dc)

Your area of interest: (5.029678, 50.883976, 5.094566, 50.923813)
Area of processing extent: 21.35 km²


# 2. Extract public in situ reference data

Here we query existing reference data that have already been processed by WorldCereal and are ready to use.
We filter for croptype labels by default, intersecting with a buffer around the bbox.

In [9]:
from utils import query_worldcereal_samples

public_df = query_worldcereal_samples(poly, buffer=50000, filter_cropland=False)

Applying a buffer of 50.0 km to the selected area ...
Querying WorldCereal global database ...
Processing selected samples ...
Extracted and processed 40184 samples from global database.


# 4. Extract required model inputs

Here we prepare presto features for each sample by using a model pretrained on WorldCereal data.

In [10]:
from utils import get_inputs_outputs

encodings, targets = get_inputs_outputs(public_df, task_type="cropland")

Computing Presto embeddings ...
Done.


# 5. Train custom classification model
We train a catboost model for the selected crop types.

In [11]:
from utils import train_classifier

custom_model, report = train_classifier(encodings, targets)

Split train/test ...
Computing class weights ...
Class weights: {0: 0.6474117413724713, 1: 2.1959300369996635}
Training CatBoost classifier ...
0:	learn: 0.8031932	test: 0.7963296	best: 0.7963296 (0)	total: 89.5ms	remaining: 11m 55s
25:	learn: 0.8530088	test: 0.8450927	best: 0.8450927 (25)	total: 994ms	remaining: 5m 4s
50:	learn: 0.8685301	test: 0.8556230	best: 0.8556230 (50)	total: 2.01s	remaining: 5m 13s
75:	learn: 0.8784106	test: 0.8624545	best: 0.8624545 (75)	total: 3.01s	remaining: 5m 13s
100:	learn: 0.8879236	test: 0.8623379	best: 0.8627743 (95)	total: 4.06s	remaining: 5m 17s
125:	learn: 0.8967444	test: 0.8673692	best: 0.8675468 (124)	total: 4.99s	remaining: 5m 11s
150:	learn: 0.9049707	test: 0.8697298	best: 0.8697298 (150)	total: 5.84s	remaining: 5m 3s
175:	learn: 0.9118574	test: 0.8711755	best: 0.8714247 (166)	total: 6.73s	remaining: 4m 59s
200:	learn: 0.9180944	test: 0.8725589	best: 0.8726110 (193)	total: 7.57s	remaining: 4m 53s
225:	learn: 0.9248454	test: 0.8738893	best: 0.87

In [12]:
# Print the classification report
print(report)

              precision    recall  f1-score   support

           0       0.95      0.92      0.93      8643
           1       0.75      0.85      0.79      2549

    accuracy                           0.90     11192
   macro avg       0.85      0.88      0.86     11192
weighted avg       0.91      0.90      0.90     11192



# 6. Deploy custom model

Once trained, we have to upload our model to the cloud so it can be used for inference.


In [13]:
from utils import deploy_model

model_url = deploy_model(custom_model, pattern="demo_cropland")

Uploading model to `demo_cropland_20240702200953_custommodel.onnx`


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

Deployed to: https://artifactory.vgt.vito.be/artifactory/worldcereal_models/demo_cropland_20240702200953_custommodel.onnx


100 6715k    0   820  100 6714k   1102  9025k --:--:-- --:--:-- --:--:-- 9014k


# 7. Generate a map

Using our custom model, we generate a map for our region of interest...

In [14]:
from worldcereal.job import WorldCerealProduct, generate_map, CropLandParameters
from openeo_gfmap import TemporalContext

# Set temporal range to generate product
temporal_extent = TemporalContext(
    start_date="2021-11-01",
    end_date="2022-10-31",
)

# Initializes default parameters
parameters = CropLandParameters()

# Change the URL to the classification model
parameters.classifier_parameters.classifier_url = model_url

# Launch the job
job_results = generate_map(
    spatial_extent,
    temporal_extent,
    output_path="./cropland_map.tif",
    product_type=WorldCerealProduct.CROPLAND,
    croptype_parameters=parameters,
    out_format="GTiff",
)

Authenticated using refresh token.
Selected orbit direction: ASCENDING from max accumulated area overlap between bounds and products.


