![](./resources/Custom_croptype_map.png)

### Content

- [Introduction](###-Introduction)
- [How to run this notebook?](###-How-to-run-this-notebook?)
- [Before you start](###-Before-you-start)
- [1. Gather and prepare your training data](###-1.-Gather-and-prepare-your-training-data)
- [2. Train custom classification model](###-2.-Train-custom-classification-model)
- [3. Deploy your custom model](###-3.-Deploy-your-custom-model)
- [4. Generate your custom crop type map](###-4.-Generate-your-custom-crop-type-map)

### Introduction

This notebook guides you through the process of training a custom crop type classification model for your area, season and crop types of interest.

For training the model, you can use a combination of:
- publicly available reference data harmonized by the WorldCereal consortium;
- your own private reference data.

<div class="alert alert-block alert-warning">
In case you would like to use private reference data to train your model, make sure to first complete the workflow as outlined in our separate notebook <b>worldcereal_private_extractions.ipynb</b>.
</div>

After model training, we deploy your custom model to the cloud, from where it can be accessed by OpenEO, allowing you to apply your model on your area and season of interest and generate your custom crop type map.

### How to run this notebook?

#### Option 1: Run on Terrascope

You can use a preconfigured environment on [**Terrascope**](https://terrascope.be/en) to run the workflows in a Jupyter notebook environment. Just register as a new user on Terrascope or use one of the supported EGI eduGAIN login methods to get started.

Once you have a Terrascope account, you can run this notebook by clicking the button shown below.

<div class="alert alert-block alert-warning">When you click the button, you will be prompted with "Server Options".<br>
Make sure to select the "Worldcereal" image here. Did you choose "Terrascope" by accident?<br>
Then go to File > Hub Control Panel > Stop my server, and click the link below once again.</div>


<a href="https://notebooks.terrascope.be/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2FWorldCereal%2Fworldcereal-classification&urlpath=lab%2Ftree%2Fworldcereal-classification%2Fnotebooks%2Fworldcereal_custom_croptype.ipynb&branch=main"><img src="https://img.shields.io/badge/Generate%20custom%20crop%20type%20map-Terrascope-brightgreen" alt="Generate custom crop type map" valign="middle"></a>


<div class="alert alert-block alert-warning">
<b>WARNING:</b> <br>
Every time you click the above link, the latest version of the notebook will be fetched, potentially leading to conflicts with changes you have made yourself.<br>
To avoid such code conflicts, we recommend you to make a copy of the notebook and make changes only in your copied version.
</div>


#### Option 2: Install Locally

If you prefer to install the package locally, you can create the WorldCereal environment using **Conda** or **pip**.

First clone the repository:
```bash
git clone https://github.com/WorldCereal/worldcereal-classification.git
cd worldcereal-classification
```
Next, install the package locally:
- for Conda: `conda env create -f environment.yml`
- for Pip: `pip install .[train,notebooks]`

### Before you start

In order to run WorldCereal crop mapping jobs from this notebook, you need to create an account on the [Copernicus Data Space Ecosystem](https://dataspace.copernicus.eu/).<br>
This is free of charge and will grant you a number of free openEO processing credits to continue this demo.

##### Optional fix in case of issues with "proj"

Run the following cell in case you experience issues with version conflicts in proj.db further down this notebook:

In [None]:
import os
import sys

# Set PROJ environment variables to avoid PROJ database version conflicts
# This ensures PROJ uses the database from the current conda environment
proj_path = os.path.join(sys.prefix, 'share', 'proj')
os.environ['PROJ_LIB'] = proj_path
os.environ['PROJ_DATA'] = proj_path

print(f"PROJ paths set to: {proj_path}")

### 1. Gather and prepare your training data

For training a crop type model, you can use a combination of:
- publicly available reference data harmonized by the WorldCereal consortium;
- your own private reference data.

The cell below provides you with a quick overview of the publicly exposed reference datasets for which WorldCereal has already done satellite extractions. Hence these are ready to be plugged into model training.

<div class="alert alert-block alert-info">
<b>Note on reference data availability:</b><br>

For a detailed exploration of available reference data for your region of interest, you can:

- use the WorldCereal Reference Data Module user interface, available [here](https://rdm.esa-worldcereal.org/). More explanation can be found [here](https://worldcereal.github.io/worldcereal-documentation/rdm/explore.html#explore-data-through-our-user-interface).
- use our dedicated notebook [worldcereal_RDM_demo.ipynb](https://github.com/WorldCereal/worldcereal-classification/blob/main/notebooks/worldcereal_RDM_demo.ipynb).
</div>

In [None]:
from notebook_utils.extractions import retrieve_extractions_extent

extents, extent_map = retrieve_extractions_extent()
print(f"Found {len(extents)} datasets with extractions.")
display(extent_map)

**Step 1: Select your area of interest (AOI)**

Provide a bounding box specifying the region in which you would like to look for available reference data.<br>

When running the code snippet below, an interactive map will be visualized.<br>
Click the Rectangle button on the left hand side of the map to start drawing your region of interest.<br>
The widget will automatically store the coordinates of the last rectangle you drew on the map.<br>

Alternatively, you can also upload a vector file (either zipped shapefile or GeoPackage) delineating<br>
your area of interest. In case your vector file contains multiple polygons or points, the total bounds<br>
will be automatically computed and serve as your AOI. Files containing only a single point are not allowed.

In [None]:
from worldcereal.utils.map import ui_map

map = ui_map()

**Step 2: Get all available reference data**

Now we query both public and private extractions and retrieve the relevant samples based on your defined area of interest.

<div class="alert alert-block alert-info">
<b>Note on the use of private data</b><br>
In case you would like to include your private data, you will need to:<br>

1. Add your reference data to the [Reference Data Module](https://rdm.esa-worldcereal.org/). Detailed instructions can be found [HERE](https://worldcereal.github.io/worldcereal-documentation/rdm/upload.html).<br>
2. Extract satellite data for your reference data by following the steps in [THIS NOTEBOOK](https://github.com/WorldCereal/worldcereal-classification/blob/main/notebooks/worldcereal_private_extractions.ipynb).<br>
3. In the code cell below, make sure to specify the path where your private extractions reside.

</div>

By default, a spatial buffer of 250 km is applied to your area of interest to ensure sufficient training data is found.<br>
You can freely expand this search perimeter by changing the value of the `buffer` parameter.

<div class="alert alert-block alert-info">
<b>Important consideration on model scope!</b><br>

By default, we filter the reference data explicitly to only retain temporary crops, by setting the `filter_temporary_crops` parameter to `True`.<br>
This effectively means that you will (by default) only be able to train a model distinguishing different types of temporary crops.<br>
In case you would like to expand your scope towards other land cover and/or permanent crops, please set `filter_temporary_crops` to `False`.
</div>

In [None]:
from pathlib import Path
from notebook_utils.extractions import query_extractions

# Retrieve the polygon you drew on the map
polygon = map.get_polygon_latlon()

# Specify a buffer distance to expand your search perimeter
buffer = 250000  # meters

# Specify the path to the private extractions data; 
# if you followed the private extractions notebook, your extractions path should be the one commented below;
# if you leave this None, only public data will be queried
private_extractions_path = None
# private_extractions_path = Path('./extractions/worldcereal_merged_extractions.parquet')

# Specify whether you are only interested in temporary crops only (True) or all available classes (False)
filter_temporary_crops = True

# Query our public database of training data
extractions = query_extractions(bbox_poly=polygon, buffer=buffer, private_parquet_path=private_extractions_path, filter_cropland=filter_temporary_crops)
extractions.head()

<div class="alert alert-block alert-warning">
<b>What to do in case no samples were found? Or in case you only have observations for a single crop type?</b><br> 

1. **Increase the buffer size**: Try increasing the buffer size by adjusting the `buffer` parameter.<br>  *Current setting is: 250 km.*
2. **Pick another area**: Consult our [Reference Data Module](https://rdm.esa-worldcereal.org) to find areas with higher data density.
3. **Contribute data**: Collect some data and contribute to our global database! <br>
üåçüåæ [Learn how to contribute here.](https://worldcereal.github.io/worldcereal-documentation/rdm/upload.html)

</div>

<div class="alert alert-block alert-info">
<b>Need more control on reference data selection?</b><br>

Right now we extracted ALL available reference data for your region of interest. However, we also offer more customizable ways of sampling reference data across different regions, datasets, years, crop types etc., but this is beyond the scope of this demo.<br>

If you are interested in these options, make sure to check out the `sample_extractions` function, demonstrated in [this notebook](https://github.com/WorldCereal/worldcereal-classification/blob/main/notebooks/UN_handbook/0_data_preparation.ipynb).

</div>

**Step 3: Perform a quick quality check**

In this optional step, we provide you with some tools to quickly assess the quality of the datasets.

Upon executing this cell, you will be prompted to enter a dataset name (ref_id) for inspection.

Especially the visualization of the time series might help you better define your season of interest.

In [None]:
from notebook_utils.extractions import get_band_statistics, visualize_timeseries

dataset_name = input('Enter the dataset name: ')
subset_data = extractions.loc[extractions['ref_id'] == dataset_name]

# Check band statistics
band_stats = get_band_statistics(subset_data)

# Visualize timeseries for a few samples (5 by default)
visualize_timeseries(subset_data, nsamples=5)

Based on the reported contents or quality check of the datasets, you might want to drop some of the selected data before proceeding.<br>

Here is an example on how to drop a complete dataset from the extracted data:

In [None]:
## Drop a specific dataset
# dataset_name = '2021_AUT_LPIS_POLY_110'
# extractions = extractions.loc[extractions['ref_id'] != dataset_name]

**Step 4: Select your season of interest**

Keep in mind that in WorldCereal, we train **season-specific** crop classifiers.<br>
In this step, you are asked to specify your cropping season of interest.<br>
Based on this information, we get rid of irrelevant training data and prepare the classification features in the next step.<br>

To gain a better understanding of crop seasonality in your area of interest, you can consult the WorldCereal crop calendars (by executing the next cell), or check out the [USDA crop calendars](https://ipad.fas.usda.gov/ogamaps/cropcalendar.aspx).

In [None]:
from notebook_utils.seasons import retrieve_worldcereal_seasons

spatial_extent = map.get_extent()
seasons = retrieve_worldcereal_seasons(spatial_extent)

Now let's also check the distribution of `valid_time` in your reference data.<br>
This attribute indicates the date for which the crop label is actually valid.<br>
This is important to consider when selecting your season of interest: it does not make too much sense to train a classifier for a season in which you have barely any valid reference data to work with!

In [None]:
from notebook_utils.seasons import valid_time_distribution

valid_time_distribution(extractions)

Now use the controls below to pin down the exact **growing season window** you plan to target (maximum 12 consecutive months).

1. Pick a representative year from the dropdown (used only to help visualize the season).
2. Drag the slider handles to the desired start/end months.
3. The summary reports both the growing-season window and a derived full-year processing period.

The growing-season window will be used to filter samples by their `valid_time`. The same selection will be reused later during inference.

In [None]:
from notebook_utils.dateslider import date_slider

slider = date_slider()

**Step 5: Align extractions with your season**

This step filters your training samples to match your selected growing season and ensures all samples have exactly 12 monthly timesteps for consistent embedding computation.

The system validates that each sample has sufficient satellite data coverage for the selected processing period. Samples are dropped if:
- Their satellite extractions don't span the full 12-month processing period
- Their `valid_time` falls outside the selected season window (e.g., Aug 1 - Dec 31)

<div class="alert alert-block alert-info">
<b>Note on data coverage:</b><br>
If you see many samples being dropped with warnings about "temporal extent", it means those samples don't have satellite observations covering the complete processing period (e.g., they have Feb-Oct data but the processing period requires Jan-Dec).<br><br>
This is expected behavior - the system requires consistent 12-month coverage to ensure reliable model training and inference.
</div>

In [None]:
from notebook_utils.classifier import align_extractions_to_season
import pandas as pd

# Retrieve the derived season + processing windows from the slider selection
selection = slider.get_selection()
season_selection = selection  # persist for downstream inference
season_window = selection.season_window
processing_period = selection.processing_period

# OPTION A: Align extractions to the processing period
# season_window will filter by valid_time
training_df = align_extractions_to_season(
    extractions,
    season_window=season_window,
)

# OPTION B: Strict filtering - require complete temporal coverage (commented out)
# Uncomment this if you want to enforce that all samples have data for the full processing period
# training_df = align_extractions_to_season(
#     extractions,
#     processing_period
#     season_window=season_window,
# )

# training_df.head()


**Step 6: Select your crops of interest**

The following widget will display all available land cover classes and crop types in your training dataframe.

Tick the checkbox for each crop type you wish to explicitly include in your model.<br>
In case you wish to group multiple crops together, just tick the parent node in the hierarchy.

Non-selected crops will be merged together in an `other` class.

After selecting all your crop types of interest, hit the "Apply" button.

<div class="alert alert-block alert-info">
<b>Minimum number of samples:</b><br>
In order to train a model, we recommend a minimum of 30-50 samples to be available for each unique crop type.<br>
</div>


In [None]:
from notebook_utils.croptypepicker import CropTypePicker

croptypepicker = CropTypePicker(sample_df=training_df, expand=False)

In the next cell, we apply your selection to your training dataframe.<br>
The new dataframe will contain a `downstream_class` attribute, denoting the final label that will be used during model training.<br>

Let's first check which classes ended up in the "other" class:

In [None]:
from notebook_utils.croptypepicker import apply_croptypepicker_to_df
from worldcereal.utils.legend import translate_ewoc_codes

training_df = apply_croptypepicker_to_df(training_df, croptypepicker)
other_count = training_df.loc[training_df['downstream_class'] == 'other']['ewoc_code'].value_counts()
other_labels = translate_ewoc_codes(other_count.index.tolist())
other_count.to_frame().merge(other_labels, left_index=True, right_index=True)

Based on this list, you might consider dropping some classes.<br>
This can be done by providing the "ewoc_codes" in the following cell:

In [None]:
# CASE 1: drop specific ewoc codes (fill in the list below)
# to_drop = [1114060010]

# CASE 2: drop all ewoc codes that were labeled as 'other'
# to_drop = other_labels.index.tolist()

# CASE 3: do not drop anything
to_drop = []

# Then drop them from the training dataframe
if len(to_drop) > 0:
    training_df = training_df.loc[~training_df['ewoc_code'].isin(to_drop)]
training_df['downstream_class'].value_counts()

Finally, you could opt to combine some classes using the code snippet below as an example:

In [None]:
# Example for combining classes:
# combine_classes = {
#     'cereals': ['winter_barley', 'oats', 'millet', 'winter_rye', 'wheat']}

# In case you do not want to combine any classes, leave the dictionary empty:
combine_classes = {}

# Apply the class combinations
for new_class, old_classes in combine_classes.items():
    training_df.loc[training_df['downstream_class'].isin(old_classes), 'downstream_class'] = new_class

# Report on the contents of the data
training_df['downstream_class'].value_counts()

**Step 7: Save your final training dataframe for future reference**

Upon executing the next cell, you will be prompted to provide a unique name for your dataframe.

In [None]:
from pathlib import Path
from notebook_utils.classifier import get_input

df_name = get_input("Name dataframe")

training_dir = Path('./training_data')
training_dir.mkdir(exist_ok=True)

outfile = training_dir / f'{df_name}.csv'

if outfile.exists():
    raise ValueError(f"File {outfile} already exists. Please delete it or choose a different name.")

training_df.to_csv(outfile)

print(f"Dataframe saved to {outfile}")

**Step 8: Compute geospatial embeddings using a finetuned foundation model**

Using a geospatial foundation model (Presto), we derive training features for each sample in the dataframe resulting from your query. Presto was pre-trained on millions of unlabeled samples around the world and finetuned on global labelled land cover and crop type data from the WorldCereal reference database. The resulting 128 *embeddings* (`presto_ft_0` -> `presto_ft_127`) nicely condense the Sentinel-1, Sentinel-2, meteo timeseries and ancillary data for your season of interest into a limited number of meaningful features which we will use for downstream model training.

We provide several options aimed at increasing temporal robustness and handling real-world data gaps in your final crop model. This is controlled by the following arguments:
- `augment` parameter: when set to `True`, introduces slight temporal jittering of the processing window, making the model more robust to variations in seasonality across different years. By default, this option is set to `True`, but especially when training a model for a specific region and year with good local data, disabling this option could be considered.
- `mask_on_training` parameter: when `True`, applies sensor masking augmentations (e.g., simulating S1/S2 dropouts, additional clouds, ancillary feature removals) only to the training split to improve robustness to real-world data gaps. The validation/test split is kept untouched for fair evaluation. This is enabled by default (`True`).
- `repeats` parameter: number of times each training sample is (re)drawn with its augmentations. Higher values (>1) create more variants (with jitter/masking) and enlarge the effective training set, potentially improving generalization at the cost of longer embedding computation time. Default is 3.

The embeddings are computed by pooling Presto's time-explicit representations over your specified growing season window, ensuring features capture the seasonal crop dynamics relevant to your classification task.

In [None]:
from notebook_utils.classifier import compute_seasonal_presto_embeddings

season_id = "ShortRains"

embeddings_df = compute_seasonal_presto_embeddings(
    training_df,
    season_id=season_id,
    mask_on_training=True,  # apply sensor masking to training split
    repeats=3,  # number of times to augment each training sample
    val_size=0.15,
    test_size=0.2,
    season_window=season_window,
)


### 2. Train a seasonal torch head

Instead of fitting a CatBoost tree ensemble, we now fine-tune a lightweight PyTorch head that plugs directly into the seasonal Presto backbone. The helper `train_seasonal_torch_head()` handles class balancing, train/val/test splits, and a hyperparameter search over learning rate and weight decay, producing a ready-to-use seasonal head with comprehensive metrics, confusion matrices, and training logs written to disk.

The resulting artifacts (model weights + config + packaged `.zip`) are stored under `./downstream_heads/<season_id>_<timestamp>/`. Keep this directory around: the zip bundle will be uploaded to CDSE in the next step and the config is reused whenever you redeploy or troubleshoot the head.

Key parameters you can adjust:
- `head_type`: Model architecture - `"linear"` for a simple linear classifier (default) or `"mlp"` for a multi-layer perceptron with more capacity
- `epochs`: Number of training epochs (default: 40)
- `lr_`: Learning rate (default: 1e-2)
- `weight_decay`: L2 regularization strength (default: 0.0)
- `use_balancing`: Whether to apply class balancing to handle imbalanced datasets (default: True)

In [None]:
from datetime import datetime
import json
from pathlib import Path

from notebook_utils.classifier import train_seasonal_torch_head

head_run_name = f"{season_id}_{datetime.utcnow().strftime('%Y%m%d-%H%M%S')}"
head_output_dir = Path("./downstream_heads") / head_run_name
head_output_dir.mkdir(parents=True, exist_ok=True)

torch_head = train_seasonal_torch_head(
    embeddings_df,
    season_id=season_id,
    head_task="croptype",
    output_dir=head_output_dir,
    head_type="mlp",  # "mlp" also supported
    use_balancing=True,
)

head_config_path = head_output_dir / "config.json"
if not head_config_path.exists():
    raise FileNotFoundError(
        f"Torch head config not found at {head_config_path}. Check the training logs above."
    )

with head_config_path.open() as fp:
    head_config = json.load(fp)

head_package_name = head_config["artifacts"]["packages"]["head"]
head_package_path = head_output_dir / head_package_name
print(f"Torch head saved to: {head_output_dir}")
print(f"Packaged archive ready at: {head_package_path}")

### 3. Deploy your custom torch head

The training step produced a zipped bundle containing the PyTorch weights plus the accompanying configuration metadata. Upload that `.zip` file to the secure CDSE artifact bucket so the openEO workflow can download it during map generation. The next cell reads `head_package_path` from the previous step, prompts you to choose a short identifier for storage, and returns a presigned download URL to use in the production workflow.

Keep a local copy of the entire `downstream_heads/<season_id>_<timestamp>/` directory‚Äîthe archive and `config.json` are required if you ever need to redeploy the same head, inspect its metadata, or troubleshoot issues. The cloud copy has a limited retention period.

<div class="alert alert-block alert-info">
<b>CDSE authentication:</b><br>
After first login, your CDSE credentials will be stored on your machine avoiding the need for repeating authentication in the future. If you would want to switch CDSE account, execute the following lines of code:<br>
<br>
from notebook_utils.openeo import clear_openeo_token_cache<br>
clear_openeo_token_cache()
</div>

In [None]:
from notebook_utils.classifier import get_input
from openeo_gfmap.backend import cdse_connection
from worldcereal.utils.upload import OpenEOArtifactHelper

if "head_package_path" not in globals():
    raise ValueError(
        "Run the torch head training cell first so `head_package_path` is defined."
    )
if not head_package_path.exists():
    raise FileNotFoundError(
        f"Torch head archive not found at {head_package_path}. Re-run the training step."
    )

modelname = get_input("model identifier (for storage)")
artifact_helper = OpenEOArtifactHelper.from_openeo_connection(cdse_connection())
target_object_name = f"{modelname}_{head_package_path.name}"
print(f"Uploading torch head archive as {target_object_name} ...")

model_s3_uri = artifact_helper.upload_file(target_object_name, str(head_package_path))
model_url = artifact_helper.get_presigned_url(model_s3_uri)

print(f"S3 URI: {model_s3_uri}")
print(f"Your torch head can be downloaded from: {model_url}")

### 4. Generate your custom crop type map

Using your custom model, we generate a map for your region and season of interest.

**Step 1: Select your area of interest (AOI)**

Provide a bounding box specifying the region for which you would like to create your map.<br>
The WorldCereal system is currently optimized to process <b>20 x 20 km</b> tiles.<br>
In case your AOI exceeds this area, it will be automatically split, creating multiple map generation jobs.<br>

Every processing job will consume a number of CDSE credits (depending on size). <br>
Every CDSE user has 10,000 processing credits available each month for free.<br>
Additional credits can be purchased to support large-scale processing.

<div class="alert alert-block alert-warning">
<b>IMPORTANT NOTES ON UPSCALING:</b><br> 

- We ALWAYS recommend you to first run the model on a <b>representative set of small test areas</b> (up to 100 km¬≤) to visually check for model performance BEFORE upscaling to large areas!!

- By default, CDSE users are limited to running 2 processing jobs in parallel. This will result in long processing times for large areas. When engaging in country-scale mapping, we therefore recommend to <b>contact the WorldCereal team</b> for dedicated support to speed up processing through [our contact form](https://esa-worldcereal.org/en/contact).

</div>

We refer to [Section 1 of this notebook](###-1.-Gather-and-prepare-your-training-data) for instructions on how to use our interactive application.

In [None]:
from worldcereal.utils.map import ui_map

map = ui_map(area_limit=1200) # area_limit in km¬≤

Optionally save your drawn bounding box to a file for future reference:

In [None]:
from notebook_utils.production import bbox_extent_to_gdf
from pathlib import Path

bbox_name = input('Enter the name for the output bbox file (without extension): ')
outfile = Path(f'./bbox/{bbox_name}.gpkg')
processing_extent = map.get_extent(projection='latlon')
bbox_extent_to_gdf(processing_extent, outfile)

**Step 2: Confirm your year and season of interest**

We automatically reuse the growing-season window (and derived processing period) that you selected earlier in this notebook during Step 4. The same season window used for training will be applied during inference to ensure consistency.

If you need to target a different year or shift the window, scroll back to *Step 4: Select your season of interest*, adjust the slider, rerun the alignment and embedding computation cells, and then return here.

**Step 3: Set processing parameters**

Configure the inference and post-processing parameters:

- `mask_cropland`: Apply cropland masking during croptype predictions (recommended: True)
- `enable_cropland_head`: Also export cropland probability rasters for quality assurance (default: True)
- `export_class_probs`: Emit per-class probability layers for each crop type (default: True)
- `croptype_postprocess_enabled`: Apply spatial post-processing to croptype classifications (default: True)
- `croptype_postprocess_method`: Post-processing algorithm (e.g., "majority_vote")
- `croptype_postprocess_kernel`: Kernel size for post-processing filter (default: 5)
- `cropland_postprocess_enabled`: Apply spatial post-processing to cropland mask (default: True)
- `cropland_postprocess_method`: Post-processing algorithm for cropland
- `cropland_postprocess_kernel`: Kernel size for cropland post-processing (default: 3)

In [None]:
# Seasonal inference knobs for the Phase II workflow.
mask_cropland = True  # keep cropland gating when running croptype predictions
enable_cropland_head = True  # also export cropland rasters for QA
export_class_probs = True  # emit per-class probabilities for each selected season

croptype_postprocess_enabled = True
croptype_postprocess_method = "majority_vote"
croptype_postprocess_kernel = 5

cropland_postprocess_enabled = True
cropland_postprocess_method = "majority_vote"
cropland_postprocess_kernel = 3

**Step 4: Start map production**

The next cell configures and launches the production workflow:

1. **Configuration**: Builds a workflow configuration using your trained model URL, season selection, and processing parameters
2. **Tiling**: Splits your area of interest into tiles (default: 50x50 km) for parallel processing
3. **Execution**: Launches OpenEO jobs to generate crop type maps for each tile

You will be able to track progress through automated reporting. Progress is displayed in real-time and saved to the output directory.

As a free tier CDSE user, individual processing jobs can take several hours depending on system load. Results are automatically saved to: `runs/CROPTYPE_custom_{your_modelname}_{timestamp}`

The first time you run this, you will be prompted to authenticate with your CDSE account.

<div class="alert alert-block alert-warning">
<b>How to stop processing?</b><br> 
Simply interrupt the Python kernel to stop processing.<br>
Make sure to manually cancel any running jobs in the backend to avoid unnecessary costs!<br>
For this, visit the job tracking page in the
<a href='https://openeo.dataspace.copernicus.eu/' target='_blank' rel='noopener'>CDSE backend dashboard</a> <br><br>

<b>What to do in case of interruption?</b><br> 
In case processing got interrupted, just make sure to manually set `output_dir` to the directory you previously used. When running the below cell again, processing will just continue where it stopped.
</div>

In [None]:
import json
import pandas as pd
from pathlib import Path
from worldcereal.parameters import WorldCerealProductType
from worldcereal.openeo.workflow_config import WorldCerealWorkflowConfig
from notebook_utils.production import run_map_production

# The output directory is named after the model
timestamp = pd.Timestamp.now().strftime("%Y%m%d-%H%M%S")
output_dir = Path('./runs') / f'CROPTYPE_custom_{modelname}_{timestamp}'
print(f"Output directory: {output_dir}")

# Retrieve the derived season + processing windows from the slider selection
selection = slider.get_selection()
processing_period = selection.processing_period
season_window = selection.season_window
selected_season_id = season_id
season_windows = {
    selected_season_id: (
        str(season_window.start_date),
        str(season_window.end_date),
    )
}

# Parameterize the workflow
workflow_builder = (
    WorldCerealWorkflowConfig.builder()
    .season_ids([selected_season_id])
    .season_windows(season_windows)
    .croptype_head_zip(model_url)
    .enable_croptype_head(True)
    .enable_cropland_head(enable_cropland_head)
    .enforce_cropland_gate(mask_cropland)
    .export_class_probabilities(export_class_probs)
)
workflow_builder = workflow_builder.cropland_postprocess(
    enabled=cropland_postprocess_enabled,
    method=cropland_postprocess_method,
    kernel_size=cropland_postprocess_kernel,
)
workflow_builder = workflow_builder.croptype_postprocess(
    enabled=croptype_postprocess_enabled,
    method=croptype_postprocess_method,
    kernel_size=croptype_postprocess_kernel,
)
workflow_config = workflow_builder.build()

# Get processing area
processing_extent = map.get_extent(projection='latlon')
tile_resolution = 50   # in km

# Run the production workflow with the specified parameters and retrieve the status dataframe
status_df = run_map_production(
    spatial_extent= processing_extent,
    temporal_extent= processing_period,
    output_dir= output_dir,
    tile_resolution= tile_resolution,
    product_type= WorldCerealProductType.CROPTYPE,
    workflow_config=workflow_config,
)

**Step 5: Create merged product**

Once production across your tiles is finalized, you can use the cell below to merge the different tiles together into one map.<br>

In [None]:
from notebook_utils.production import merge_maps

merged_paths = merge_maps(output_dir)
merged_list = "\n".join(
    f"{name} -> {path}"
    for name, path in merged_paths.items()
)
print("Results merged:\n" + merged_list)

**Step 6: Inspect your map**

Up to four products are generated depending on your configuration settings:
- `croptype-raw` ‚Üí Your custom crop type product (raw model output)
- `croptype` ‚Üí Your custom crop type product after spatial post-processing (if enabled)
- `cropland-raw` ‚Üí Cropland mask produced using the global WorldCereal cropland model (if enabled)
- `cropland` ‚Üí Cropland mask after spatial post-processing (if enabled)

For each of these products, you will get a raster file containing multiple bands:
1. **Classification band**: The label of the winning class
2. **Confidence band**: The probability of the winning class [50-100]
3. **Class probability bands** (optional, if `export_class_probabilities=True`): Individual probability layers for each crop type class in your model

You can use the next cell to quickly visualize your products in this notebook.

The visualization loads the class metadata from your trained torch head artifact to properly display crop type labels and creates an interactive map showing your classification and probability layers.

<div class="alert alert-block alert-info">
<b>Supported visualization modes:</b><br>
By default (`interactive_mode=False`), your product is shown using matplotlib for quick visual inspection.<br>
By setting `interactive_mode=True`, both your classification and probability layers will be visualized in an interactive ipyleaflet window. You can toggle individual layers on/off using the layer control in the upper-right corner.

<b>NOTE:</b> For the interactive mode to work in a VSCode environment, you need to enable port forwarding for port 8889.
</div>

In [None]:
from notebook_utils.visualization import visualize_products
from worldcereal.openeo.inference import load_model_artifact

if "model_url" not in globals():
    raise ValueError(
        "`model_url` is undefined. Upload your torch head archive before visualizing the map.",
    )

artifact = load_model_artifact(model_url)
heads = artifact.manifest.get("heads", [])
luts = {}
for task in ("cropland", "croptype"):
    head = next(
        (head for head in heads if head.get("task") == task), None
    )
    if head and head.get("class_names"):
        luts[task] = {
            name: idx for idx, name in enumerate(head["class_names"])
        }
if "croptype" not in luts:
    raise ValueError("Torch head manifest is missing croptype class metadata.")

visualize_products(merged_paths, luts=luts, interactive_mode=True)


Congratulations, you have reached the end of this demo!