![](./resources/System_v1_training_header.png)

**Table of contents**<a id='toc0_'></a>    
- [Before you start](#toc1_)    
- [Define a region of interest](#toc2_)    
- [Extract public in situ reference data](#toc3_)    
- [Select desired crops for prediction](#toc4_)    
- [Extract required model inputs](#toc5_)    
- [Train custom classification model](#toc6_)    
- [Deploy custom model](#toc7_)    
- [Generate a map](#toc8_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Before you start](#toc0_)

In order to run this notebook, you need to create an account on the Copernicus Data Space Ecosystem (CDSE) by completing [this](https://identity.dataspace.copernicus.eu/auth/realms/CDSE/login-actions/registration?client_id=cdse-public&tab_id=eRKGqDvoYI0).

# <a id='toc2_'></a>[Define a region of interest](#toc0_)

When running the code snippet below, an interactive map will be visualized.
Click the Rectangle button on the left hand side of the map to start drawing your region of interest. Currently, there is a maximum size of 100 km² for your area, shown during drawing of the polygon.

When finished, execute the second cell to store the coordinates of your region of interest. 

In [2]:
from worldcereal.utils.map import get_ui_map

m, dc = get_ui_map()
m

Map(center=[51.1872, 5.1154], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoo…

In [15]:
# retrieve bounding box from drawn rectangle
from worldcereal.utils.map import get_bbox_from_draw

spatial_extent, bbox, poly = get_bbox_from_draw(dc)

Your area of interest: (4.668091, 51.073159, 4.837006, 51.142138)
Area of processing extent: 95.47 km²


# <a id='toc3_'></a>[Extract public in situ reference data](#toc0_)

Here we query existing reference data that have already been processed by WorldCereal and are ready to use.
We filter for croptype labels by default, intersecting with a buffer (250 km by default) around the bbox.

In [5]:
from utils import query_worldcereal_samples

public_df = query_worldcereal_samples(poly)

Applying a buffer of 250.0 km to the selected area ...
Querying WorldCereal global database ...
Processing selected samples ...
Extracted and processed 39416 samples from global database.


# <a id='toc4_'></a>[Select desired crops for prediction](#toc0_)

Crops with ticked checkboxes will be included in the prediction. All the crops that are not selected will be grouped under the "other_crop" category. The model will be trained in a multi-class setting, not a hierarchical one. Keep this in mind when choosing your crop types.

In [8]:
from utils import pick_croptypes
from IPython.display import display

checkbox, checkbox_widgets = pick_croptypes(public_df, samples_threshold=100)
display(checkbox)

VBox(children=(Checkbox(value=False, description='maize (20406 samples)'), Checkbox(value=False, description='…

Based on your selection, a custom target label is now generated for each sample. Verify that only crops of your choice are appearing in the `custom_class`, all others will fall under `other`.

In [9]:
from utils import get_custom_labels

public_df = get_custom_labels(public_df, checkbox_widgets)
public_df["custom_class"].value_counts()

custom_class
maize                 20406
other                  6209
unspecified_wheat      5699
potatoes               4964
unspecified_barley     1682
rapeseed_rape           456
Name: count, dtype: int64

# <a id='toc5_'></a>[Extract required model inputs](#toc0_)

Here we prepare presto inputs features for each sample by using a model pretrained on WorldCereal data. The resulting `encodings` and `targets` will be used for model training.

In [10]:
from utils import get_inputs_outputs

encodings, targets = get_inputs_outputs(public_df)

Computing Presto embeddings ...
Done.


# <a id='toc6_'></a>[Train custom classification model](#toc0_)
We train a catboost model for the selected crop types. Class weights are automatically determined to balance the individual classes.

In [11]:
from utils import train_classifier

custom_model, report = train_classifier(encodings, targets)

Split train/test ...
Computing class weights ...
Class weights: {'maize': 0.3219336320358443, 'other': 1.0580994017487344, 'potatoes': 1.3233093525179855, 'rapeseed_rape': 14.415360501567399, 'unspecified_barley': 3.9036502546689302, 'unspecified_wheat': 1.1527951867636}
Training CatBoost classifier ...
0:	learn: 1.7364508	test: 1.7378444	best: 1.7378444 (0)	total: 207ms	remaining: 27m 35s
25:	learn: 1.1280108	test: 1.1903473	best: 1.1903473 (25)	total: 3.79s	remaining: 19m 23s
50:	learn: 0.9091442	test: 1.0085152	best: 1.0085152 (50)	total: 6.61s	remaining: 17m 10s
75:	learn: 0.7907767	test: 0.9217450	best: 0.9217450 (75)	total: 9.08s	remaining: 15m 47s
100:	learn: 0.7099343	test: 0.8670240	best: 0.8670240 (100)	total: 11.4s	remaining: 14m 54s
125:	learn: 0.6559995	test: 0.8305589	best: 0.8305589 (125)	total: 13.7s	remaining: 14m 16s
150:	learn: 0.6105830	test: 0.8044927	best: 0.8044927 (150)	total: 16s	remaining: 13m 50s
175:	learn: 0.5739024	test: 0.7853897	best: 0.7853897 (175)	tot

In [12]:
# Print the classification report
print(report)

                    precision    recall  f1-score   support

             maize       0.93      0.90      0.91      6122
             other       0.67      0.71      0.69      1863
          potatoes       0.73      0.76      0.74      1489
     rapeseed_rape       0.69      0.84      0.76       137
unspecified_barley       0.54      0.57      0.55       504
 unspecified_wheat       0.81      0.81      0.81      1710

          accuracy                           0.82     11825
         macro avg       0.73      0.76      0.74     11825
      weighted avg       0.83      0.82      0.82     11825



# <a id='toc7_'></a>[Deploy custom model](#toc0_)

Once trained, we have to upload our model to the cloud so it can be used for inference. Executing the cell below will require you to enter a `token`. A WorldCereal admin has to provide this token.


In [13]:
from utils import deploy_model

model_url = deploy_model(custom_model, pattern="demo_large")

Uploading model to `demo_large_20240708175000_custommodel.onnx`


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 68 38.6M    0     0   68 26.6M      0  58.3M --:--:-- --:--:-- --:--:-- 58.3M

Deployed to: https://artifactory.vgt.vito.be/artifactory/worldcereal_models/demo_large_20240708175000_custommodel.onnx


100 38.6M    0   812  100 38.6M    606  28.8M  0:00:01  0:00:01 --:--:-- 28.8M


# <a id='toc8_'></a>[Generate a map](#toc0_)

Using our custom model, we generate a map for our region of interest and download the result.

You can also manually download the resulting GeoTIFF by clicking on the link that will be diplayed.

In [14]:
from worldcereal.job import WorldCerealProduct, generate_map, CropTypeParameters
from openeo_gfmap import TemporalContext

# Set temporal range to generate product
temporal_extent = TemporalContext(
    start_date="2021-11-01",
    end_date="2022-10-31",
)

# Initializes default parameters
parameters = CropTypeParameters()

# Change the URL to the classification model
parameters.classifier_parameters.classifier_url = model_url

# Launch the job
job_results = generate_map(
    spatial_extent,
    temporal_extent,
    output_path="./cropmap.tif",
    product_type=WorldCerealProduct.CROPTYPE,
    croptype_parameters=parameters,
    out_format="GTiff",
    tile_size=128
)

Authenticated using refresh token.
TILE SIZE: 128
Selected orbit direction: ASCENDING from max accumulated area overlap between bounds and products.




0:00:00 Job 'j-24070820be4540ee8007c7622c26b189': send 'start'
0:00:27 Job 'j-24070820be4540ee8007c7622c26b189': created (progress 0%)
0:00:32 Job 'j-24070820be4540ee8007c7622c26b189': created (progress 0%)
0:00:39 Job 'j-24070820be4540ee8007c7622c26b189': created (progress 0%)
0:00:47 Job 'j-24070820be4540ee8007c7622c26b189': created (progress 0%)
0:00:57 Job 'j-24070820be4540ee8007c7622c26b189': created (progress 0%)
0:01:10 Job 'j-24070820be4540ee8007c7622c26b189': running (progress N/A)
0:01:26 Job 'j-24070820be4540ee8007c7622c26b189': running (progress N/A)
0:01:45 Job 'j-24070820be4540ee8007c7622c26b189': running (progress N/A)
0:02:09 Job 'j-24070820be4540ee8007c7622c26b189': running (progress N/A)
0:02:40 Job 'j-24070820be4540ee8007c7622c26b189': running (progress N/A)
0:03:18 Job 'j-24070820be4540ee8007c7622c26b189': running (progress N/A)
0:04:15 Job 'j-24070820be4540ee8007c7622c26b189': running (progress N/A)
0:05:14 Job 'j-24070820be4540ee8007c7622c26b189': running (progres

INFO:openeo.rest.job:Downloading Job result asset 'openEO_2020-01-01Z.tif' from https://openeo.creo.vito.be/openeo/jobs/j-24070820be4540ee8007c7622c26b189/results/assets/NGZkOWRiOTYtZDYyMC00NDU0LTliZTYtMTRhN2Q4ZTkyMzU3/5168688197e7051a9fe094e773c7f5dc/openEO_2020-01-01Z.tif?expires=1721061161 to cropmap.tif
