![](./resources/System_v1_training_header.png)

This notebook contains a demonstration on how to train custom crop type models based on your own reference data and how to apply the resulting model to generate a custom crop type map.

# Content

- [Before you start](#before-you-start)
- [1. Define region of interest](#1.-Define-a-region-of-interest)
- [2. Check public in-situ reference data](#2.-Check-public-in-situ-reference-data)
- [3. Prepare own reference data](#3.-Prepare-own-reference-data)
- [4. Extract required model inputs](#4.-Extract-required-model-inputs)
- [5. Train custom classification model](#5.-Train-custom-classification-model)
- [6. Generate a map](#6.-Generate-a-map)

# Before you start

In order to run this notebook, you need to create an account on:

- The Copernicus Data Space Ecosystem (CDSE)
--> by completing the form [HERE](https://identity.dataspace.copernicus.eu/auth/realms/CDSE/login-actions/registration?client_id=cdse-public&tab_id=eRKGqDvoYI0)

- VITO's Terrascope platform
--> by completing the form [HERE](https://sso.terrascope.be/auth/realms/terrascope/login-actions/registration?client_id=drupal-terrascope&tab_id=irBzckp2aDo)

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from worldcereal.utils.map import get_ui_map
RDM_API = "https://ewoc-rdm-api.iiasa.ac.at"

# 1. Define a region of interest

When running the code snippet below, an interactive map will be visualized.
Click the Rectangle button on the left hand side of the map to start drawing your region of interest.
When finished, execute the second cell to store the coordinates of your region of interest. 

In [3]:
m, dc = get_ui_map()
m

Map(center=[51.1872, 5.1154], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoo…

In [21]:
# retrieve bounding box from drawn rectangle
from worldcereal.utils.map import get_bbox_from_draw

spatial_extent, bbox, poly = get_bbox_from_draw(dc)

Your area of interest: (2.05719, 48.114767, 2.121735, 48.154177)
Area of processing extent: 21.54 km²


# 2. Extract public in situ reference data

Here we query existing reference data that have already been processed by WorldCereal and are ready to use.
We filter for croptype labels by default, intersecting with a buffer around the bbox.

In [6]:
from utils import query_worldcereal_samples

public_df = query_worldcereal_samples(poly)

Applying a buffer of 250.0 km to the selected area ...
Querying WorldCereal global database ...
Processing selected samples ...
Extracted and processed 6110 samples from global database.


## 3. Select desired crops for prediction

Crops with ticked checkboxes will be included in the prediction. All the crops that are not selected will be grouped under the "other_crop" category.

In [7]:
from utils import pick_croptypes
from IPython.display import display

checkbox, checkbox_widgets = pick_croptypes(public_df, samples_threshold=100)
display(checkbox)

VBox(children=(Checkbox(value=False, description='unspecified_wheat (2610 samples)'), Checkbox(value=False, de…

In [9]:
from utils import get_custom_labels

public_df = get_custom_labels(public_df, checkbox_widgets)
public_df['custom_class'].value_counts()

custom_class
other            4638
rapeseed_rape     700
maize             592
sunflower         180
Name: count, dtype: int64

# 4. Extract required model inputs

Here we prepare presto features for each sample by using a model pretrained on WorldCereal data.

In [10]:
from utils import get_inputs_outputs

encodings, targets = get_inputs_outputs(public_df)

Loading Presto model ...
Computing Presto embeddings ...
Done.


# 5. Train custom classification model
We train a catboost model for the selected crop types.

In [11]:
from utils import train_classifier

custom_model, report = train_classifier(encodings, targets)

Split train/test ...
Computing class weights ...
Class weights: {'maize': 2.582729468599034, 'other': 0.32930397289805974, 'rapeseed_rape': 2.182142857142857, 'sunflower': 8.48611111111111}
Training CatBoost classifier ...
0:	learn: 1.3259160	test: 1.3342568	best: 1.3342568 (0)	total: 123ms	remaining: 16m 20s
25:	learn: 0.6451892	test: 0.7801960	best: 0.7801960 (25)	total: 1.79s	remaining: 9m 7s
50:	learn: 0.4330030	test: 0.6280405	best: 0.6280405 (50)	total: 3.45s	remaining: 8m 58s
75:	learn: 0.3249919	test: 0.5593450	best: 0.5593450 (75)	total: 4.69s	remaining: 8m 9s
100:	learn: 0.2617507	test: 0.5265702	best: 0.5265702 (100)	total: 5.89s	remaining: 7m 40s
125:	learn: 0.2194401	test: 0.5108148	best: 0.5108148 (125)	total: 7.13s	remaining: 7m 25s
150:	learn: 0.1925367	test: 0.5042110	best: 0.5042110 (150)	total: 8.37s	remaining: 7m 15s
175:	learn: 0.1700891	test: 0.5011448	best: 0.5009937 (174)	total: 9.56s	remaining: 7m 5s
200:	learn: 0.1544099	test: 0.5005817	best: 0.4995323 (188)	t

In [12]:
# Print the classification report
print(report)

               precision    recall  f1-score   support

        maize       0.68      0.87      0.76       178
        other       0.97      0.89      0.93      1391
rapeseed_rape       0.75      0.91      0.83       210
    sunflower       0.46      0.61      0.52        54

     accuracy                           0.88      1833
    macro avg       0.71      0.82      0.76      1833
 weighted avg       0.90      0.88      0.89      1833



# 6. Deploy custom model

Once trained, we have to upload our model to the cloud so it can be used for inference.


In [13]:
from utils import deploy_model

model_url = deploy_model(custom_model, pattern='demo')

Uploading model to `demo_20240702155132_custommodel.onnx`


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

Deployed to: https://artifactory.vgt.vito.be/artifactory/worldcereal_models/demo_20240702155132_custommodel.onnx


100 6228k    0   793  100 6227k   1025  8056k --:--:-- --:--:-- --:--:-- 8057k


# 7. Generate a map

Using our custom model, we generate a map for our region of interest...

In [22]:
from worldcereal.job import WorldCerealProduct, generate_map, CropTypeParameters    
from openeo_gfmap import TemporalContext

# Set temporal range to generate product
temporal_extent = TemporalContext(
    start_date="2021-11-01",
    end_date="2022-10-31",
)

# Initializes default parameters 
parameters = CropTypeParameters()

# Change the URL to the classification model
parameters.classifier_parameters.classifier_url = model_url

# Launch the job
job_results = generate_map(
    spatial_extent,
    temporal_extent,
    output_path='./cropmap.tif',
    product_type=WorldCerealProduct.CROPTYPE,
    croptype_parameters=parameters,
    out_format="GTiff",
)

Authenticated using refresh token.
Selected orbit direction: DESCENDING from max accumulated area overlap between bounds and products.




0:00:00 Job 'j-240702e26f9443d4aa8ac210d87fc865': send 'start'
