![](./resources/System_v1_training_header.png)

**Table of contents**<a id='toc0_'></a>    
- [Before you start](#toc1_)    
- [Define a region of interest](#toc2_)    
- [Extract public in situ reference data](#toc3_)    
- [Select desired crops for prediction](#toc4_)    
- [Extract required model inputs](#toc5_)    
- [Train custom classification model](#toc6_)    
- [Deploy custom model](#toc7_)    
- [Generate a map](#toc8_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Before you start](#toc0_)

In order to run WorldCereal crop mapping jobs from this notebook, you need to create an account on the Copernicus Data Space Ecosystem (CDSE) registering [here](https://dataspace.copernicus.eu/). This is free of charge and will grant you a number of free openEO processing credits to continue this demo.

In [1]:
# TEMPORARY CELL

import sys
sys.path.append('/home/jeroendegerickx/git/worldcereal/worldcereal-classification/notebooks')
%load_ext autoreload
%autoreload 2

# <a id='toc2_'></a>[Define a region and time of interest](#toc0_)

When running the code snippet below, an interactive map will be visualized.
Click the Rectangle button on the left hand side of the map to start drawing your region of interest. Currently, there is a maximum size of 250 km² for your area within this demo, shown during drawing of the polygon.

When finished, execute the second cell to store the coordinates of your region of interest. 

In [6]:
from worldcereal.utils.map import ui_map

map = ui_map()
map.show_map()

Map(center=[51.1872, 5.1154], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoo…

[32m2024-10-10 14:48:17.852[0m | [1mINFO    [0m | [36mworldcereal.utils.map[0m:[36mhandle_draw[0m:[36m141[0m - [1mYour processing extent: (5.442352, 49.565306, 5.500031, 49.63473)[0m
[32m2024-10-10 14:48:17.968[0m | [1mINFO    [0m | [36mworldcereal.utils.map[0m:[36mhandle_draw[0m:[36m148[0m - [1mArea of processing extent: 34.72 km²[0m
[32m2024-10-10 14:48:22.251[0m | [1mINFO    [0m | [36mworldcereal.utils.map[0m:[36mhandle_draw[0m:[36m141[0m - [1mYour processing extent: (5.123749, 49.61071, 5.184174, 49.627614)[0m
[32m2024-10-10 14:48:22.355[0m | [1mINFO    [0m | [36mworldcereal.utils.map[0m:[36mhandle_draw[0m:[36m148[0m - [1mArea of processing extent: 8.85 km²[0m
[32m2024-10-10 14:48:24.259[0m | [1mINFO    [0m | [36mworldcereal.utils.map[0m:[36mhandle_draw[0m:[36m141[0m - [1mYour processing extent: (5.197906, 49.631172, 5.239105, 49.658739)[0m
[32m2024-10-10 14:48:24.323[0m | [1mINFO    [0m | [36mworldcereal.utils.map[0

# <a id='toc3_'></a>[Extract public extractions](#toc0_)

Here we query existing reference data that have already been processed by WorldCereal and are ready to use.
We filter for croptype labels by default, intersecting with a buffer (250 km by default) around the bbox.

In [7]:
from worldcereal.utils.refdata import query_public_extractions

# retrieve the polygon you just drew
polygon = map.get_polygon_latlon()

# Query our public database of training data
public_df = query_public_extractions(polygon)

[32m2024-10-10 14:48:29.052[0m | [1mINFO    [0m | [36mworldcereal.utils.map[0m:[36mget_processing_extent[0m:[36m236[0m - [1mYour processing extent: (5.197906, 49.631172, 5.239105, 49.658739)[0m
[32m2024-10-10 14:48:29.054[0m | [1mINFO    [0m | [36mworldcereal.utils.refdata[0m:[36mquery_public_extractions[0m:[36m51[0m - [1mApplying a buffer of 250.0 km to the selected area ...[0m


[32m2024-10-10 14:48:29.201[0m | [1mINFO    [0m | [36mworldcereal.utils.refdata[0m:[36mquery_public_extractions[0m:[36m79[0m - [1mQuerying WorldCereal global extractions database (this can take a while) ...[0m
[32m2024-10-10 14:48:50.809[0m | [1mINFO    [0m | [36mworldcereal.utils.refdata[0m:[36mprocess_parquet[0m:[36m125[0m - [1mProcessing selected samples ...[0m
[32m2024-10-10 14:48:53.558[0m | [1mINFO    [0m | [36mworldcereal.utils.refdata[0m:[36mprocess_parquet[0m:[36m128[0m - [1mExtracted and processed 27134 samples from global database.[0m


# <a id='toc4_'></a>[Select desired crops for prediction](#toc0_)

Crops with ticked checkboxes will be included in the prediction. All the crops that are not selected will be grouped under the "other_crop" category. The model will be trained in a multi-class setting, not a hierarchical one. Keep this in mind when choosing your crop types.

In [8]:
from utils import pick_croptypes
from IPython.display import display

checkbox, checkbox_widgets = pick_croptypes(public_df, samples_threshold=100)
display(checkbox)

VBox(children=(Checkbox(value=False, description='maize (11624 samples)'), Checkbox(value=False, description='…

Based on your selection, a custom target label is now generated for each sample. Verify that only crops of your choice are appearing in the `custom_class`, all others will fall under `other`.

In [9]:
from utils import get_custom_labels

public_df = get_custom_labels(public_df, checkbox_widgets)
public_df["downstream_class"].value_counts()

downstream_class
other            11652
maize            11624
potatoes          3157
rapeseed_rape      701
Name: count, dtype: int64

# <a id='toc5_'></a>[Extract required model inputs](#toc0_)

Here we prepare presto inputs features for each sample by using a model pretrained on WorldCereal data. The resulting `encodings` and `targets` will be used for model training.

In [10]:
from utils import get_inputs_outputs

encodings, targets = get_inputs_outputs(public_df)

[32m2024-10-10 15:02:45.672[0m | [1mINFO    [0m | [36mutils[0m:[36mget_inputs_outputs[0m:[36m266[0m - [1mPresto URL: https://artifactory.vgt.vito.be/artifactory/auxdata-public/worldcereal/models/PhaseII/presto-ss-wc-ft-ct-30D_test.pt[0m


[32m2024-10-10 15:02:45.851[0m | [1mINFO    [0m | [36mutils[0m:[36mget_inputs_outputs[0m:[36m275[0m - [1mComputing Presto embeddings ...[0m
[32m2024-10-10 15:03:47.721[0m | [1mINFO    [0m | [36mutils[0m:[36mget_inputs_outputs[0m:[36m298[0m - [1mDone.[0m


# <a id='toc6_'></a>[Train custom classification model](#toc0_)
We train a catboost model for the selected crop types. Class weights are automatically determined to balance the individual classes.

In [11]:
from utils import train_classifier

custom_model, report, confusion_matrix = train_classifier(encodings, targets)

[32m2024-10-09 16:00:24.804[0m | [1mINFO    [0m | [36mutils[0m:[36mtrain_classifier[0m:[36m324[0m - [1mSplit train/test ...[0m
[32m2024-10-09 16:00:24.811[0m | [1mINFO    [0m | [36mutils[0m:[36mtrain_classifier[0m:[36m340[0m - [1mComputing class weights ...[0m
[32m2024-10-09 16:00:24.816[0m | [1mINFO    [0m | [36mutils[0m:[36mtrain_classifier[0m:[36m345[0m - [1mClass weights:[0m
[32m2024-10-09 16:00:24.819[0m | [1mINFO    [0m | [36mutils[0m:[36mtrain_classifier[0m:[36m368[0m - [1mTraining CatBoost classifier ...[0m


Learning rate set to 0.050282
0:	learn: 1.3126846	test: 1.3137395	best: 1.3137395 (0)	total: 100ms	remaining: 13m 21s
25:	learn: 0.6462074	test: 0.6922385	best: 0.6922385 (25)	total: 1.02s	remaining: 5m 13s
50:	learn: 0.4721582	test: 0.5584018	best: 0.5584018 (50)	total: 1.97s	remaining: 5m 6s
75:	learn: 0.3889237	test: 0.5106106	best: 0.5106106 (75)	total: 2.91s	remaining: 5m 3s
100:	learn: 0.3364593	test: 0.4900291	best: 0.4900291 (100)	total: 3.79s	remaining: 4m 56s
125:	learn: 0.2988447	test: 0.4765670	best: 0.4765670 (125)	total: 4.7s	remaining: 4m 53s
150:	learn: 0.2704937	test: 0.4695893	best: 0.4695893 (150)	total: 5.72s	remaining: 4m 57s
175:	learn: 0.2492131	test: 0.4646734	best: 0.4646734 (175)	total: 6.66s	remaining: 4m 55s
200:	learn: 0.2302107	test: 0.4608366	best: 0.4608294 (198)	total: 7.61s	remaining: 4m 55s
225:	learn: 0.2152972	test: 0.4593936	best: 0.4592960 (224)	total: 8.54s	remaining: 4m 53s
250:	learn: 0.2021578	test: 0.4577594	best: 0.4577594 (250)	total: 9.52s

In [12]:
# Print the classification report
print(report)

                    precision    recall  f1-score   support

             other       0.91      0.88      0.89       845
     rapeseed_rape       0.76      0.91      0.83       132
unspecified_barley       0.56      0.67      0.61       198
 unspecified_wheat       0.87      0.83      0.85       755

          accuracy                           0.84      1930
         macro avg       0.78      0.82      0.80      1930
      weighted avg       0.85      0.84      0.84      1930



# <a id='toc7_'></a>[Deploy custom model](#toc0_)

Once trained, we have to upload our model to the cloud so it can be used for inference. Note that these models are only kept in cloud storage for a limited amount of time.


In [13]:
from worldcereal.utils.upload import deploy_model
from openeo_gfmap.backend import cdse_connection

model_url = deploy_model(cdse_connection(), custom_model, pattern="demo_croptype_multiclass_BE")

[32m2024-10-09 16:00:48.514[0m | [1mINFO    [0m | [36mworldcereal.utils.upload[0m:[36mdeploy_model[0m:[36m205[0m - [1mDeploying model ...[0m


Authenticated using refresh token.


[32m2024-10-09 16:00:50.712[0m | [1mINFO    [0m | [36mworldcereal.utils.upload[0m:[36mdeploy_model[0m:[36m211[0m - [1mDeployed to: s3://OpenEO-artifacts/fd307620ba8a0a07c44a2dc28541b181d5c03cb4/2024/10/09/demo_croptype_multiclass_BE_custommodel.onnx[0m


In [14]:
model_url

'https://s3.prod.warsaw.openeo.dataspace.copernicus.eu/OpenEO-artifacts/fd307620ba8a0a07c44a2dc28541b181d5c03cb4/2024/10/09/demo_croptype_multiclass_BE_custommodel.onnx?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=82145f8d201e4a1797900de16792c70f%2F20241009%2Feu-central-1%2Fs3%2Faws4_request&X-Amz-Date=20241009T140050Z&X-Amz-Expires=518400&X-Amz-SignedHeaders=host&X-Amz-Security-Token=eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJyb2xlX2FybiI6ImFybjphd3M6aWFtOjowMDAwMDAwMDAwMDA6cm9sZS9TM0FjY2VzcyIsImluaXRpYWxfaXNzdWVyIjoiaHR0cHM6Ly9pZGVudGl0eS5kYXRhc3BhY2UuY29wZXJuaWN1cy5ldS9hdXRoL3JlYWxtcy9DRFNFIiwiaXNzIjoic3RzLnByb2Qud2Fyc2F3Lm9wZW5lby5kYXRhc3BhY2UuY29wZXJuaWN1cy5ldSIsInN1YiI6ImRjY2FiNmQ5LTg0NmMtNDhhOS05ZDk0LTQ5NzE0NmNiMjI4NSIsImV4cCI6MTcyODUyNTY0OCwibmJmIjoxNzI4NDgyNDQ4LCJpYXQiOjE3Mjg0ODI0NDgsImp0aSI6ImMxZmU1NzdhLTk5ZTQtNDA5Yy1iYjYxLWUyZGI4ZGZlMWI4OCJ9.kcNawuVkOnzmB1cpPBVBIoA7wRbLZNw-zt39xVJ5Ncq9a5Q0NwdgxPU_2tXFFyXs09ZfWMLGocQyeS_Spu3w3ECh0_KZyQXMjo82L_TwOiIm3tBfv0rs6IZNubnfocs0bSwW6

# <a id='toc8_'></a>[Generate a map](#toc0_)

Using our custom model, we generate a map for our region of interest and download the result.

You can also manually download the resulting GeoTIFF by clicking on the link that will be diplayed.

In [10]:
from worldcereal.job import WorldCerealProduct, generate_map, CropTypeParameters, PostprocessParameters
from openeo_gfmap import TemporalContext

# Set temporal range to generate product
temporal_extent = TemporalContext(
    start_date="2020-12-01",
    end_date="2021-11-30",
)

# Initializes default parameters
parameters = CropTypeParameters()

# Change the URL to the classification model
parameters.classifier_parameters.classifier_url = model_url

# Launch the job
job_results = generate_map(
    spatial_extent,
    temporal_extent,
    output_path="./cropmap_newpresto.tif",
    product_type=WorldCerealProduct.CROPTYPE,
    croptype_parameters=parameters,
    postprocess_parameters=PostprocessParameters(enable=True),
    job_options={"python-memory": "4g"},
    out_format="GTiff",
)

Authenticated using refresh token.


2024-10-08 12:50:26,563 - openeo_gfmap.utils - INFO - Selected orbit state: DESCENDING. Reason: Orbit has more cumulative intersected area. 15.678082454846425 > 13.936101536993151


InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Failed to load model because protobuf parsing failed.

For interpreting your raster, the following information is useful:
- Band 1 contains the class integers and by executing the cell below you can check which integer belongs to which crop type
- Band 2 contains the probability associated to the prediction

In [15]:
from worldcereal.utils.models import load_model_lut

lookup_table = load_model_lut(model_url)
print('Raster value - Class name')
for key, value in lookup_table.items():
    print(f"{value} -> {key}")

other -> 0
rapeseed_rape -> 1
unspecified_barley -> 2
unspecified_wheat -> 3
