![](./resources/System_v1_training_header.png)

**Table of contents**<a id='toc0_'></a>    
- [Before you start](#toc1_)    
- [Define a region of interest](#toc2_)    
- [Extract public in situ reference data](#toc3_)    
- [Select desired crops for prediction](#toc4_)    
- [Extract required model inputs](#toc5_)    
- [Train custom classification model](#toc6_)    
- [Deploy custom model](#toc7_)    
- [Generate a map](#toc8_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Before you start](#toc0_)

In order to run WorldCereal crop mapping jobs from this notebook, you need to create an account on the Copernicus Data Space Ecosystem (CDSE) registering [here](https://dataspace.copernicus.eu/). This is free of charge and will grant you a number of free openEO processing credits to continue this demo.

# <a id='toc2_'></a>[Define a region and time of interest](#toc0_)

When running the code snippet below, an interactive map will be visualized.
Click the Rectangle button on the left hand side of the map to start drawing your region of interest. Currently, there is a maximum size of 250 km² for your area within this demo, shown during drawing of the polygon.

When finished, execute the second cell to store the coordinates of your region of interest. 

In [1]:
from worldcereal.utils.map import get_ui_map

m, dc = get_ui_map()
m

Map(center=[51.1872, 5.1154], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoo…

In [3]:
# retrieve bounding box from drawn rectangle
from worldcereal.utils.map import get_bbox_from_draw

spatial_extent, bbox, poly = get_bbox_from_draw(dc)

[32m2024-10-01 20:46:08.882[0m | [1mINFO    [0m | [36mworldcereal.utils.map[0m:[36mget_bbox_from_draw[0m:[36m464[0m - [1mYour area of interest: (4.826202, 51.068729, 4.91272, 51.108408)[0m
[32m2024-10-01 20:46:08.930[0m | [1mINFO    [0m | [36mworldcereal.utils.map[0m:[36mget_bbox_from_draw[0m:[36m470[0m - [1mArea of processing extent: 28.18 km²[0m


# <a id='toc3_'></a>[Extract public extractions](#toc0_)

Here we query existing reference data that have already been processed by WorldCereal and are ready to use.
We filter for croptype labels by default, intersecting with a buffer (250 km by default) around the bbox.

In [4]:
from worldcereal.utils.refdata import query_public_extractions

public_df = query_public_extractions(poly)

[32m2024-10-01 20:46:12.570[0m | [1mINFO    [0m | [36mworldcereal.utils.refdata[0m:[36mquery_public_extractions[0m:[36m53[0m - [1mApplying a buffer of 250.0 km to the selected area ...[0m
[32m2024-10-01 20:46:12.809[0m | [1mINFO    [0m | [36mworldcereal.utils.refdata[0m:[36mquery_public_extractions[0m:[36m81[0m - [1mQuerying WorldCereal global extractions database (this can take a while) ...[0m


[32m2024-10-01 20:49:08.531[0m | [1mINFO    [0m | [36mworldcereal.utils.refdata[0m:[36mprocess_parquet[0m:[36m127[0m - [1mProcessing selected samples ...[0m
[32m2024-10-01 20:49:10.398[0m | [1mINFO    [0m | [36mworldcereal.utils.refdata[0m:[36mprocess_parquet[0m:[36m130[0m - [1mExtracted and processed 39126 samples from global database.[0m


# <a id='toc4_'></a>[Select desired crops for prediction](#toc0_)

Crops with ticked checkboxes will be included in the prediction. All the crops that are not selected will be grouped under the "other_crop" category. The model will be trained in a multi-class setting, not a hierarchical one. Keep this in mind when choosing your crop types.

In [11]:
from utils import pick_croptypes
from IPython.display import display

checkbox, checkbox_widgets = pick_croptypes(public_df, samples_threshold=100)
display(checkbox)

VBox(children=(Checkbox(value=False, description='maize (20379 samples)'), Checkbox(value=False, description='…

Based on your selection, a custom target label is now generated for each sample. Verify that only crops of your choice are appearing in the `custom_class`, all others will fall under `other`.

In [18]:
from utils import get_custom_labels

public_df = get_custom_labels(public_df, checkbox_widgets)
public_df["downstream_class"].value_counts()

downstream_class
maize                 20474
unspecified_wheat      5876
potatoes               4948
other                  4380
beet                   1963
unspecified_barley     1746
rapeseed_rape           529
Name: count, dtype: int64

# <a id='toc5_'></a>[Extract required model inputs](#toc0_)

Here we prepare presto inputs features for each sample by using a model pretrained on WorldCereal data. The resulting `encodings` and `targets` will be used for model training.

In [3]:
from utils import get_inputs_outputs

encodings, targets = get_inputs_outputs(public_df)

[32m2024-10-01 10:13:32.445[0m | [1mINFO    [0m | [36mutils[0m:[36mget_inputs_outputs[0m:[36m79[0m - [1mPresto URL: https://artifactory.vgt.vito.be/artifactory/auxdata-public/worldcereal/models/PhaseII/presto-ss-wc-ft-ct_long-parquet_30D_CROPTYPE0_split%3Drandom_time-token%3Dmonth_balance%3DTrue_augment%3DTrue.pt[0m
[32m2024-10-01 10:13:33.395[0m | [1mINFO    [0m | [36mutils[0m:[36mget_inputs_outputs[0m:[36m87[0m - [1mComputing Presto embeddings ...[0m
[32m2024-10-01 10:14:48.203[0m | [1mINFO    [0m | [36mutils[0m:[36mget_inputs_outputs[0m:[36m110[0m - [1mDone.[0m


# <a id='toc6_'></a>[Train custom classification model](#toc0_)
We train a catboost model for the selected crop types. Class weights are automatically determined to balance the individual classes.

In [4]:
from utils import train_classifier

custom_model, report, confusion_matrix = train_classifier(encodings, targets)

[32m2024-10-01 10:15:12.425[0m | [1mINFO    [0m | [36mutils[0m:[36mtrain_classifier[0m:[36m137[0m - [1mSplit train/test ...[0m
[32m2024-10-01 10:15:12.447[0m | [1mINFO    [0m | [36mutils[0m:[36mtrain_classifier[0m:[36m153[0m - [1mComputing class weights ...[0m
[32m2024-10-01 10:15:12.464[0m | [1mINFO    [0m | [36mutils[0m:[36mtrain_classifier[0m:[36m158[0m - [1mClass weights:[0m
[32m2024-10-01 10:15:12.471[0m | [1mINFO    [0m | [36mutils[0m:[36mtrain_classifier[0m:[36m181[0m - [1mTraining CatBoost classifier ...[0m


Learning rate set to 0.052152
0:	learn: 1.6755078	test: 1.6794653	best: 1.6794653 (0)	total: 168ms	remaining: 22m 23s
25:	learn: 0.7611667	test: 0.8103459	best: 0.8103459 (25)	total: 3.15s	remaining: 16m 6s
50:	learn: 0.5645676	test: 0.6439129	best: 0.6439129 (50)	total: 5.99s	remaining: 15m 34s
75:	learn: 0.4814657	test: 0.5844641	best: 0.5844641 (75)	total: 8.74s	remaining: 15m 11s
100:	learn: 0.4324872	test: 0.5553868	best: 0.5553868 (100)	total: 11.5s	remaining: 14m 56s
125:	learn: 0.3987216	test: 0.5393044	best: 0.5393044 (125)	total: 14.3s	remaining: 14m 52s
150:	learn: 0.3722066	test: 0.5301795	best: 0.5301795 (150)	total: 17.3s	remaining: 14m 58s
175:	learn: 0.3509112	test: 0.5247165	best: 0.5247165 (175)	total: 20.1s	remaining: 14m 54s
200:	learn: 0.3335955	test: 0.5202964	best: 0.5202964 (200)	total: 22.9s	remaining: 14m 47s
225:	learn: 0.3209134	test: 0.5177398	best: 0.5177398 (225)	total: 25.5s	remaining: 14m 37s
250:	learn: 0.3088239	test: 0.5154368	best: 0.5154368 (250)	t

In [5]:
# Print the classification report
print(report)

                    precision    recall  f1-score   support

             maize       0.94      0.89      0.92      6142
             other       0.71      0.76      0.74      1903
          potatoes       0.77      0.81      0.79      1484
     rapeseed_rape       0.83      0.89      0.86       159
unspecified_barley       0.65      0.77      0.71       524
 unspecified_wheat       0.85      0.87      0.86      1763

          accuracy                           0.85     11975
         macro avg       0.79      0.83      0.81     11975
      weighted avg       0.86      0.85      0.85     11975



# <a id='toc7_'></a>[Deploy custom model](#toc0_)

Once trained, we have to upload our model to the cloud so it can be used for inference. Note that these models are only kept in cloud storage for a limited amount of time.


In [6]:
from worldcereal.utils.upload import deploy_model
from openeo_gfmap.backend import cdse_connection

model_url = deploy_model(cdse_connection(), custom_model, pattern="demo_newpresto")

[32m2024-10-01 10:16:27.555[0m | [1mINFO    [0m | [36mutils[0m:[36mdeploy_model[0m:[36m257[0m - [1mUploading model to `demo_newpresto_20241001101627_custommodel.onnx`[0m
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 17.0M    0   824  100 17.0M   1612  33.4M --:--:-- --:--:-- --:--:-- 33.4M
[32m2024-10-01 10:18:39.192[0m | [1mINFO    [0m | [36mutils[0m:[36mdeploy_model[0m:[36m273[0m - [1mDeployed to: https://artifactory.vgt.vito.be/artifactory/worldcereal_models/demo_newpresto_20241001101627_custommodel.onnx[0m


# <a id='toc8_'></a>[Generate a map](#toc0_)

Using our custom model, we generate a map for our region of interest and download the result.

You can also manually download the resulting GeoTIFF by clicking on the link that will be diplayed.

In [14]:
from worldcereal.job import WorldCerealProductType, generate_map, CropTypeParameters, PostprocessParameters
from openeo_gfmap import TemporalContext

# Set temporal range to generate product
temporal_extent = TemporalContext(
    start_date="2020-12-01",
    end_date="2021-11-30",
)

# Initializes default parameters
parameters = CropTypeParameters()

# Change the URL to the classification model
parameters.classifier_parameters.classifier_url = model_url

# Launch the job
job_results = generate_map(
    spatial_extent,
    temporal_extent,
    output_path="./cropmap_newpresto.tif",
    product_type=WorldCerealProductType.CROPTYPE,
    croptype_parameters=parameters,
    postprocess_parameters=PostprocessParameters(enable=True),
    job_options={"python-memory": "4g"},
    out_format="GTiff",
)

INFO:openeo.rest.connection:Found OIDC providers: ['CDSE']
INFO:openeo.rest.connection:No OIDC provider given, but only one available: 'CDSE'. Using that one.
INFO:openeo.rest.connection:Using default client_id 'sh-b1c3a958-52d4-40fe-a333-153595d1c71e' from OIDC provider 'CDSE' info.
INFO:openeo.rest.connection:Found refresh token: trying refresh token based authentication.
INFO:openeo.rest.auth.oidc:Doing 'refresh_token' token request 'https://identity.dataspace.copernicus.eu/auth/realms/CDSE/protocol/openid-connect/token' with post data fields ['grant_type', 'client_id', 'refresh_token'] (client_id 'sh-b1c3a958-52d4-40fe-a333-153595d1c71e')
INFO:openeo.rest.connection:Obtained tokens: ['access_token', 'id_token', 'refresh_token']
INFO:openeo.rest.auth.config:Storing refresh token for issuer 'https://identity.dataspace.copernicus.eu/auth/realms/CDSE' (client 'sh-b1c3a958-52d4-40fe-a333-153595d1c71e')


Authenticated using refresh token.


2024-10-01 11:52:24,363 - openeo_gfmap.utils - INFO - Selected orbit state: DESCENDING. Reason: Orbit has more cumulative intersected area. 8.633429223099554 > 8.407504384493565
INFO:openeo_gfmap.utils:Selected orbit state: DESCENDING. Reason: Orbit has more cumulative intersected area. 8.633429223099554 > 8.407504384493565


0:00:00 Job 'j-241001332d7d4e61a08974174878fbbc': send 'start'
0:00:39 Job 'j-241001332d7d4e61a08974174878fbbc': running (progress N/A)
0:00:54 Job 'j-241001332d7d4e61a08974174878fbbc': running (progress N/A)
0:01:01 Job 'j-241001332d7d4e61a08974174878fbbc': running (progress N/A)
0:01:09 Job 'j-241001332d7d4e61a08974174878fbbc': running (progress N/A)
0:01:19 Job 'j-241001332d7d4e61a08974174878fbbc': running (progress N/A)
0:01:33 Job 'j-241001332d7d4e61a08974174878fbbc': running (progress N/A)
0:01:49 Job 'j-241001332d7d4e61a08974174878fbbc': running (progress N/A)
0:02:08 Job 'j-241001332d7d4e61a08974174878fbbc': running (progress N/A)
0:02:32 Job 'j-241001332d7d4e61a08974174878fbbc': running (progress N/A)
0:03:09 Job 'j-241001332d7d4e61a08974174878fbbc': running (progress N/A)
0:03:57 Job 'j-241001332d7d4e61a08974174878fbbc': running (progress N/A)
0:04:43 Job 'j-241001332d7d4e61a08974174878fbbc': running (progress N/A)
0:05:52 Job 'j-241001332d7d4e61a08974174878fbbc': running (pr

INFO:openeo.rest.job:Downloading Job result asset 'openEO_2020-01-01Z.tif' from https://openeo.creo.vito.be/openeo/jobs/j-241001332d7d4e61a08974174878fbbc/results/assets/NGZkOWRiOTYtZDYyMC00NDU0LTliZTYtMTRhN2Q4ZTkyMzU3/0d7faef7f7b036c6c86a9740cc542ee9/openEO_2020-01-01Z.tif?expires=1728382748 to cropmap_newpresto.tif


For interpreting your raster, the following information is useful:
- Band 1 contains the class integers and by executing the cell below you can check which integer belongs to which crop type
- Band 2 contains the probability associated to the prediction

In [15]:
LUT = {class_int: class_name for class_int, class_name in enumerate(custom_model.get_params()['class_names'])}
print('Raster value - Class name')
for key, value in LUT.items():
    print(f"{key} -> {value}")

Raster value - Class name
0 -> maize
1 -> other
2 -> potatoes
3 -> rapeseed_rape
4 -> unspecified_barley
5 -> unspecified_wheat
