![](./resources/System_v1_training_header.png)

This notebook contains a demonstration on how to train custom crop type models based on your own reference data and how to apply the resulting model to generate a custom crop type map.

# Content

- [Before you start](#before-you-start)
- [1. Define region of interest](#1.-Define-a-region-of-interest)
- [2. Check public in-situ reference data](#2.-Check-public-in-situ-reference-data)
- [3. Prepare own reference data](#3.-Prepare-own-reference-data)
- [4. Extract required model inputs](#4.-Extract-required-model-inputs)
- [5. Train custom classification model](#5.-Train-custom-classification-model)
- [6. Generate a map](#6.-Generate-a-map)

# Before you start

In order to run this notebook, you need to create an account on:

- The Copernicus Data Space Ecosystem (CDSE)
--> by completing the form [HERE](https://identity.dataspace.copernicus.eu/auth/realms/CDSE/login-actions/registration?client_id=cdse-public&tab_id=eRKGqDvoYI0)

- VITO's Terrascope platform
--> by completing the form [HERE](https://sso.terrascope.be/auth/realms/terrascope/login-actions/registration?client_id=drupal-terrascope&tab_id=irBzckp2aDo)

In [1]:
%load_ext autoreload
%autoreload 2

In [3]:
from worldcereal.utils.map import get_ui_map
RDM_API = "https://ewoc-rdm-api.iiasa.ac.at"

# 1. Define a region of interest

When running the code snippet below, an interactive map will be visualized.
Click the Rectangle button on the left hand side of the map to start drawing your region of interest.
When finished, execute the second cell to store the coordinates of your region of interest. 

In [4]:
m, dc = get_ui_map()
m

Map(center=[51.1872, 5.1154], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoo…

In [6]:
# retrieve bounding box from drawn rectangle
from utils import get_bbox_from_draw

bbox, poly = get_bbox_from_draw(dc)

Your area of interest: (4.535852, 51.173725, 4.561776, 51.186639) (7 km2)


# 2. Check public in situ reference data

Here we do a series of requests to the RDM API to retrieve the collections and samples overlapping our bbox...

‼ The following snippet does not query the RDM API, but parquet file on Cloudferro bucket with Phase I extractions

In [7]:
from utils import query_worldcereal_samples

public_df = query_worldcereal_samples(poly, buffer=50000, filter_cropland=False)

Applying a buffer of 50.0 km to the selected area ...
Querying WorldCereal global database ...
Processing selected samples ...
Extracted and processed 38542 samples from global database.


# 3.Prepare own reference data

Include some guidelines on how to upload user dataset to RDM (using the UI) and requesting those user samples through the API.

In [8]:
merged_df = public_df.copy()

# 4. Extract required model inputs

Here we launch point extractions for all samples intersecting our bbox resulting in a set of parquet files.

We collect all these inputs and prepare presto features for each sample.

In [9]:
from utils import get_inputs_outputs

encodings, targets = get_inputs_outputs(merged_df, task_type="cropland")

Loading Presto model ...
Computing Presto embeddings ...
Done.


# 5. Train custom classification model
We train a catboost model and upload this model to artifactory.

In [10]:
from utils import train_classifier

custom_model, report = train_classifier(encodings, targets)

Split train/test ...
Computing class weights ...
Class weights: {0: 0.6461708017223703, 1: 2.2103278975977556}
Training CatBoost classifier ...
0:	learn: 0.8181683	test: 0.8073083	best: 0.8073083 (0)	total: 169ms	remaining: 22m 32s
25:	learn: 0.8607315	test: 0.8497237	best: 0.8497237 (25)	total: 2.01s	remaining: 10m 15s
50:	learn: 0.8709879	test: 0.8579291	best: 0.8581425 (48)	total: 3.88s	remaining: 10m 4s
75:	learn: 0.8792820	test: 0.8605337	best: 0.8609643 (73)	total: 5.76s	remaining: 10m
100:	learn: 0.8863502	test: 0.8644228	best: 0.8645838 (99)	total: 7.59s	remaining: 9m 53s
125:	learn: 0.8938005	test: 0.8671287	best: 0.8679282 (118)	total: 9.4s	remaining: 9m 47s
150:	learn: 0.9001281	test: 0.8672146	best: 0.8684957 (145)	total: 11.3s	remaining: 9m 47s
175:	learn: 0.9073189	test: 0.8663867	best: 0.8684957 (145)	total: 13.1s	remaining: 9m 42s
Stopped by overfitting detector  (50 iterations wait)

bestTest = 0.8684957122
bestIteration = 145

Shrink model to first 146 iterations.


In [11]:
print(report)

              precision    recall  f1-score   support

           0       0.95      0.91      0.93      8362
           1       0.73      0.84      0.78      2444

    accuracy                           0.89     10806
   macro avg       0.84      0.87      0.85     10806
weighted avg       0.90      0.89      0.90     10806



# 6. Deploy custom model

Once trained, we have to upload our model to the cloud so it can be used for inference.


# 7. Generate a map

Using our custom model, we generate a map for our region of interest...