![](./resources/System_v1_training_header.png)

This notebook contains a demonstration on how to train custom crop type models based on your own reference data and how to apply the resulting model to generate a custom crop type map.

# Content

- [Before you start](#before-you-start)
- [1. Define region of interest](#1.-Define-a-region-of-interest)
- [2. Check public in-situ reference data](#2.-Check-public-in-situ-reference-data)
- [3. Prepare own reference data](#3.-Prepare-own-reference-data)
- [4. Extract required model inputs](#4.-Extract-required-model-inputs)
- [5. Train custom classification model](#5.-Train-custom-classification-model)
- [6. Generate a map](#6.-Generate-a-map)

# Before you start

In order to run this notebook, you need to create an account on:

- The Copernicus Data Space Ecosystem (CDSE)
--> by completing the form [HERE](https://identity.dataspace.copernicus.eu/auth/realms/CDSE/login-actions/registration?client_id=cdse-public&tab_id=eRKGqDvoYI0)

- VITO's Terrascope platform
--> by completing the form [HERE](https://sso.terrascope.be/auth/realms/terrascope/login-actions/registration?client_id=drupal-terrascope&tab_id=irBzckp2aDo)

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
from worldcereal.utils.map import get_ui_map
RDM_API = "https://ewoc-rdm-api.iiasa.ac.at"

# 1. Define a region of interest

When running the code snippet below, an interactive map will be visualized.
Click the Rectangle button on the left hand side of the map to start drawing your region of interest.
When finished, execute the second cell to store the coordinates of your region of interest. 

In [4]:
m, dc = get_ui_map()
m

Map(center=[51.1872, 5.1154], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoo…

In [5]:
# retrieve bounding box from drawn rectangle
from utils import get_bbox_from_draw

bbox, poly = get_bbox_from_draw(dc)

Your area of interest: (3.995798, 50.679638, 6.029137, 51.487085)


# 2. Check public in situ reference data

Here we do a series of requests to the RDM API to retrieve the collections and samples overlapping our bbox...

‼ The following snippet does not query the RDM API, but parquet file on Cloudferro bucket with Phase I extractions

In [6]:
from utils import query_worldcereal_samples

public_df = query_worldcereal_samples(poly)

Querying WorldCereal global database ...
Processing selected samples ...
Extracted and processed 18551 samples from global database.


# 3.Prepare own reference data

Include some guidelines on how to upload user dataset to RDM (using the UI) and requesting those user samples through the API.

In [7]:
merged_df = public_df.copy()

### 3a. Select desired crops for prediction

Crops with ticked checkboxes will be included in the prediction. All the crops that are not selected will be grouped under the "other_crop" category.

In [8]:
from utils import pick_croptypes
from IPython.display import display

checkbox, checkbox_widgets = pick_croptypes(merged_df, samples_threshold=100)
display(checkbox)

VBox(children=(Checkbox(value=False, description='maize (11284 samples)'), Checkbox(value=False, description='…

In [11]:
from utils import get_custom_labels

merged_df = get_custom_labels(merged_df, checkbox_widgets)
merged_df['custom_class'].value_counts()

custom_class
maize       11284
other        5823
potatoes     1444
Name: count, dtype: int64

# 4. Extract required model inputs

Here we launch point extractions for all samples intersecting our bbox resulting in a set of parquet files.

We collect all these inputs and prepare presto features for each sample.

In [18]:
from utils import get_inputs_outputs

encodings, targets = get_inputs_outputs(merged_df)

# 5. Train custom classification model
We train a catboost model and upload this model to artifactory.

In [25]:
from utils import train_classifier

custom_model, report = train_classifier(encodings, targets)

0:	learn: 1.0683194	test: 1.0696515	best: 1.0696515 (0)	total: 83.1ms	remaining: 11m 5s
25:	learn: 0.7202533	test: 0.7443568	best: 0.7443568 (25)	total: 1.95s	remaining: 9m 58s
50:	learn: 0.5995132	test: 0.6449592	best: 0.6449592 (50)	total: 3.55s	remaining: 9m 13s
75:	learn: 0.5282046	test: 0.5954327	best: 0.5954327 (75)	total: 4.76s	remaining: 8m 16s
100:	learn: 0.4821101	test: 0.5717230	best: 0.5717230 (100)	total: 5.97s	remaining: 7m 46s
125:	learn: 0.4500074	test: 0.5565383	best: 0.5565383 (125)	total: 7.07s	remaining: 7m 22s
150:	learn: 0.4218480	test: 0.5436805	best: 0.5436805 (150)	total: 8.14s	remaining: 7m 3s
175:	learn: 0.3956958	test: 0.5340225	best: 0.5340225 (175)	total: 9.19s	remaining: 6m 48s
200:	learn: 0.3749655	test: 0.5264903	best: 0.5264903 (200)	total: 10.3s	remaining: 6m 38s
225:	learn: 0.3560403	test: 0.5207208	best: 0.5207208 (225)	total: 11.3s	remaining: 6m 26s
250:	learn: 0.3388021	test: 0.5164177	best: 0.5164177 (250)	total: 12.2s	remaining: 6m 17s
275:	lear

In [26]:
print(report)

              precision    recall  f1-score   support

       maize       0.92      0.89      0.90      3386
       other       0.80      0.80      0.80      1747
    potatoes       0.58      0.71      0.64       433

    accuracy                           0.85      5566
   macro avg       0.77      0.80      0.78      5566
weighted avg       0.85      0.85      0.85      5566



# 6. Deploy custom model

Once trained, we have to upload our model to the cloud so it can be used for inference.


# 7. Generate a map

Using our custom model, we generate a map for our region of interest...