![](./resources/System_v1_custom_cropland.png)

### Introduction

This notebook guides you through the process of training a custom cropland classification model using publicly available and harmonized in-situ reference data for your area of interest. Afterwards, the model can be applied to your area and season of interest to generate a cropland extent map.

Please note that for the purpose of this demo, the processing area is currently limited to 250 km² per model run. On average, one such run consumes 35 credits on the Copernicus Data Space Ecosystem.

### Content
  
- [Before you start](###-Before-you-start)
- [1. Define your region of interest](#1.-Define-your-region-of-interest)
- [2. Extract public reference data](#2.-Extract-public-reference-data)
- [3. Create your custom cropland class](#3.-Create-your-custom-cropland-class)
- [4. Prepare training features](#4.-Prepare-training-features)
- [5. Train custom classification model](#5.-Train-custom-classification-model)
- [6. Deploy your custom model](#6.-Deploy-your-custom-model)
- [7. Generate a map](#7.-Generate-a-map)


### Before you start

In order to run WorldCereal crop mapping jobs from this notebook, you need to create an account on the Copernicus Data Space Ecosystem (CDSE) registering [here](https://dataspace.copernicus.eu/). This is free of charge and will grant you a number of free openEO processing credits to continue this demo.

### 1. Define your region of interest

When running the code snippet below, an interactive map will be visualized.
Click the Rectangle button on the left hand side of the map to start drawing your region of interest.

Currently, there is a maximum size of 250 km² for your area within this demo. Upon exceeding this limit, an error will be shown.
You can bypass this limit by altering the code below to:<br>
*map = ui_map(area_limit=750)*<br>

Processing areas beyond 750 km² are currently not supported to avoid excessive credit usage (roughly 120 credits will be consumed for this size of a processing extent).

The widget will automatically store the coordinates of the last rectangle you drew on the map.

In [None]:
from worldcereal.utils.map import ui_map

map = ui_map()
map.show_map()

### 2. Extract public reference data

Here we query existing reference data that have already been processed by WorldCereal and are ready to use.
To increase the number of hits, we expand the search area by 250 km in all directions.

We print the number of training samples retrieved per year.

In [None]:
from worldcereal.utils.refdata import query_public_extractions

# retrieve the polygon you just drew
polygon = map.get_polygon_latlon()

# Query our public database of training data
public_df = query_public_extractions(polygon, filter_cropland=False)
public_df.year.value_counts()

### 3. Create your custom cropland class

Run the next cell and select all land cover classes you would like to include in your "cropland" class. All classes that are not selected will be grouped under the "other" category. 

In [None]:
# from utils import pick_croptypes
from utils import select_landcover
from IPython.display import display

checkbox, checkbox_widgets = select_landcover(public_df)
display(checkbox)

Based on your selection, a custom target label is now generated for each sample. Verify that only land cover classes of your choice are appearing in the `downstream_class`, all others will fall under `other`.

In [None]:
from utils import get_custom_cropland_labels

public_df = get_custom_cropland_labels(public_df, checkbox_widgets)
public_df["downstream_class"].value_counts()

### 4. Prepare training features

Using a deep learning framework (Presto), we derive classification features for each sample. The resulting `encodings` and `targets` will be used for model training.

In [None]:
from utils import prepare_training_dataframe

training_dataframe = prepare_training_dataframe(public_df, task_type="cropland")

### 5. Train custom classification model
We train a catboost model for the selected land cover classes. Class weights are automatically determined to balance the individual classes.

In [None]:
from utils import train_cropland_classifier

custom_model, report, confusion_matrix = train_cropland_classifier(training_dataframe)

Before training, the available training data has been automatically split into a calibration and validation part. By executing the next cell, you get an idea of how well the model performs on the independent validation set.

In [None]:
# Print the classification report
print(report)

### 6. Deploy your custom model

Once trained, we have to upload our model to the cloud so it can be used by OpenEO for inference. Note that these models are only kept in cloud storage for a limited amount of time.



In [None]:
from worldcereal.utils.upload import deploy_model
from openeo_gfmap.backend import cdse_connection
from utils import get_input

modelname = get_input("model")
model_url = deploy_model(cdse_connection(), custom_model, pattern=modelname)

### 7. Generate a map

Using our custom model, we generate a map for our region and season of interest.
To determine your season of interest, you can consult the WorldCereal crop calendars (by executing the next cell), or check out the [USDA crop calendars](https://ipad.fas.usda.gov/ogamaps/cropcalendar.aspx).

In [None]:
from utils import retrieve_worldcereal_seasons

spatial_extent = map.get_processing_extent()
seasons = retrieve_worldcereal_seasons(spatial_extent)

Now use the slider to select your processing period. Note that the length of the period is always fixed to a year.
Just make sure your season of interest is fully captured within the period you select.

In [None]:
from utils import date_slider

slider = date_slider()
slider.show_slider()

Set some other customization options:

In [None]:
from worldcereal.job import PostprocessParameters
import os
from pathlib import Path

# Choose whether or not you want to spatially clean the classification results
postprocess_result = True
# Choose the postprocessing method you want to use ["smooth_probabilities", "majority_vote"]
# ("smooth_probabilities will do limited spatial cleaning,
# while "majority_vote" will do more aggressive spatial cleaning, depending on the value of kernel_size)
postprocess_method = "majority_vote"
# Additional parameter for the majority vote method 
# (the higher the value, the more aggressive the spatial cleaning,
# should be an odd number, not larger than 25, default = 5)
kernel_size = 5
# Do you want to save the intermediate results (before applying the postprocessing)
save_intermediate = True
# Do you want to save all class probabilities in the final product?
keep_class_probs = True

postprocess_parameters = PostprocessParameters(enable=postprocess_result,
                                               method=postprocess_method,
                                               kernel_size=kernel_size,
                                               save_intermediate=save_intermediate,
                                               keep_class_probs=keep_class_probs)

# Specify the local directory where the resulting maps should be downloaded to.
run = get_input("model run")
output_dir = Path(os.getcwd()) / f'CROPLAND_{modelname}_{run}'
print(f"Output directory: {output_dir}")

We now have all information we need to generate our map!<br>
The next cell will submit a map inference job on CDSE through OpenEO.<br>
The first time you run this, you will be asked to authenticate with your CDSE account by clicking the link provided below the cell.<br>
Then sit back and wait untill your map is ready...

In [None]:
from worldcereal.job import generate_map, CropLandParameters

# Initializes default parameters
parameters = CropLandParameters()

# Change the URL to your custom classification model
parameters.classifier_parameters.classifier_url = model_url

# Get processing period and area
processing_period = slider.get_processing_period()
processing_extent = map.get_processing_extent()

# Launch the job
job_results = generate_map(
    processing_extent,
    processing_period,
    output_dir=output_dir,
    cropland_parameters=parameters,
    postprocess_parameters=postprocess_parameters,
)

The classification results will be automatically downloaded to your *output_dir* in .tif format.
By default, OpenEO stores the class labels, confidence score and class probabilities in one file.

Using the function below, we split this information into separate .tif files, thereby adding metadata and a color map, to ease interpretation and visualization:
- "xxx_classification_start-date_end-date.tif" --> contains the classification labels. A class look-up table is included in the .tif metadata.
- "xxx_confidence_start-date_end-date.tif" -->  contains the probability associated to the prediction [0 - 100]

In case you chose to store the original per-class probabilities, these are NOT written to a separate file and need to be consulted in the original result downloaded from OpenEO.

In [None]:
from utils import prepare_visualization

rasters = prepare_visualization(job_results)
print(rasters)

The resulting raster files can be visualized in QGIS.

In case you are running this script on your local environment, you can alternatively use the following cells to visualize the outputs directly in this notebook.

In [None]:
from utils import visualize_products

visualize_products(rasters, port=8887)

In [None]:
from utils import show_color_legend

show_color_legend(rasters, "cropland")