![](./resources/System_v1_custom_croptype.png)

### Introduction

This notebook guides you through the process of training a custom crop type classification model using publicly available and harmonized in-situ reference data for your area and crop types of interest. Afterwards, the model can be applied to your season of interest to generate a crop type map.

### Content
  
- [Before you start](###-Before-you-start)
- [1. Define your region of interest](#1.-Define-your-region-of-interest)
- [2. Extract public reference data](#2.-Extract-public-reference-data)
- [3. Select your desired crop types](#3.-Select-your-desired-crop-types)
- [4. Prepare training features](#4.-Prepare-training-features)
- [5. Train custom classification model](#5.-Train-custom-classification-model)
- [6. Deploy your custom model](#6.-Deploy-your-custom-model)
- [7. Generate a map](#7.-Generate-a-map)


### Before you start

In order to run WorldCereal crop mapping jobs from this notebook, you need to create an account on the Copernicus Data Space Ecosystem (CDSE) registering [here](https://dataspace.copernicus.eu/). This is free of charge and will grant you a number of free openEO processing credits to continue this demo.

### 1. Define your region of interest

When running the code snippet below, an interactive map will be visualized.
Click the Rectangle button on the left hand side of the map to start drawing your region of interest.
The widget will automatically store the coordinates of the last rectangle you drew on the map.

<div class="alert alert-block alert-warning">
<b>Processing area limitation:</b><br> 
Processing areas beyond 750 km² are currently not supported to avoid excessive credit usage and long processing times.<br>
Upon exceeding this limit, an error will be shown, and you will need to draw a new rectangle.

For testing purposes, we recommend you to select a small area (< 250 km²) in order to limit processing time and credit usage.

A run of 250 km² will typically consume 35 credits and last between 15 and 20 mins.<br>
A run of 750 km² will typically consume 120 credits and last up to 30 mins.
</div>

In [None]:
from worldcereal.utils.map import ui_map

map = ui_map()
map.show_map()

### 2. Extract public reference data

Here we query existing reference data that have already been processed by WorldCereal and are ready to use.
To increase the number of hits, we expand the search area by 250 km in all directions.

We print the number of training samples retrieved per year.

In [None]:
from worldcereal.utils.refdata import query_public_extractions

# retrieve the polygon you just drew
polygon = map.get_polygon_latlon()

# Query our public database of training data
public_df = query_public_extractions(polygon)
public_df.year.value_counts()

### 3. Select your desired crop types

Run the next cell and select all crop types you wish to include in your model. All the crops that are not selected will be grouped under the "other" category.

In [None]:
from utils import pick_croptypes
from IPython.display import display

checkbox, checkbox_widgets = pick_croptypes(public_df, samples_threshold=100)
display(checkbox)

Based on your selection, a custom target label is now generated for each sample. Verify that only crops of your choice are appearing in the `downstream_class`, all others will fall under `other`.

In [None]:
from utils import get_custom_croptype_labels

public_df = get_custom_croptype_labels(public_df, checkbox_widgets)
public_df["downstream_class"].value_counts()

### 4. Prepare training features

Using a deep learning framework (Presto), we derive classification features for each sample. The resulting `encodings` and `targets` will be used for model training.

In [None]:
from utils import prepare_training_dataframe

training_dataframe = prepare_training_dataframe(public_df, task_type="croptype")

### 5. Train custom classification model
We train a catboost model for the selected crop types. Class weights are automatically determined to balance the individual classes.

In [None]:
from utils import train_classifier

custom_model, report, confusion_matrix = train_classifier(training_dataframe)

Before training, the available training data has been automatically split into a calibration and validation part. By executing the next cell, you get an idea of how well the model performs on the independent validation set.

In [None]:
# Print the classification report
print(report)

### 6. Deploy your custom model

Once trained, we have to upload our model to the cloud so it can be used by OpenEO for inference. Note that these models are only kept in cloud storage for a limited amount of time.


In [None]:
from worldcereal.utils.upload import deploy_model
from openeo_gfmap.backend import cdse_connection
from utils import get_input

modelname = get_input("model")
model_url = deploy_model(cdse_connection(), custom_model, pattern=modelname)

### 7. Generate a map

Using our custom model, we generate a map for our region and season of interest.
To determine your season of interest, you can consult the WorldCereal crop calendars (by executing the next cell), or check out the [USDA crop calendars](https://ipad.fas.usda.gov/ogamaps/cropcalendar.aspx).

In [None]:
from utils import retrieve_worldcereal_seasons

spatial_extent = map.get_processing_extent()
seasons = retrieve_worldcereal_seasons(spatial_extent)

Now use the slider to select your processing period. Note that the length of the period is always fixed to a year.
Just make sure your season of interest is fully captured within the period you select.

In [None]:
from utils import date_slider

slider = date_slider()
slider.show_slider()

Set some other customization options:

In [None]:
import os
from pathlib import Path

# Specify the local directory where the resulting maps should be downloaded to.
run = get_input("model run")
output_dir = Path(os.getcwd()) / f'CROPTYPE_{modelname}_{run}'
print(f"Output directory: {output_dir}")

We now have all information we need to generate our map!<br>
The next cell will submit a map inference job on CDSE through OpenEO.<br>
The first time you run this, you will be asked to authenticate with your CDSE account by clicking the link provided below the cell.<br>
Then sit back and wait untill your map is ready...

In [None]:
from worldcereal.job import PostprocessParameters, WorldCerealProductType, generate_map, CropTypeParameters

# Initializes default parameters
parameters = CropTypeParameters()

# Change the URL to your custom classification model
parameters.classifier_parameters.classifier_url = model_url
parameters.save_mask = True

# Get processing period and area
processing_period = slider.get_processing_period()
processing_extent = map.get_processing_extent()

# Launch the job
job_results = generate_map(
    processing_extent,
    processing_period,
    output_dir=output_dir,
    product_type=WorldCerealProductType.CROPTYPE,
    croptype_parameters=parameters,
    postprocess_parameters=PostprocessParameters(),
)

The classification results will be automatically downloaded to your *output_dir* in .tif format.<br>
You will get two outputs, one containing the cropland mask and one containing the crop type results.<br>

The result will be a raster file containing two bands:
1. The label of the winning class
2. The probability of the winning class [0 - 100]

Using the function below, we split this information into separate .tif files, thereby adding metadata and a color map, to ease interpretation and visualization:
- "croptype_classification_start-date_end-date.tif" --> contains the classification labels. A class look-up table is included in the .tif metadata.
- "croptype_confidence_start-date_end-date.tif" -->  contains the probability associated to the prediction [0 - 100]

In [None]:
from utils import prepare_visualization

rasters = prepare_visualization(job_results)
print(rasters)

The resulting raster files can be visualized in QGIS.