# WorldCereal Embeddings Demo

This notebook demonstrates how to generate spatio-temporal WorldCereal embeddings through the openEO backend and visualize the resulting multiband GeoTIFF. The pretrained geospatial model under the hood is NASA Harvest's Presto model, finetuned on WorldCereal reference data.

**Workflow overview**:
1. Interactively select one or more Area(s) of Interest (AOI)
2. Define your processing time range.
3. Optionally customize embeddings parameters (by default we use the pre-trained global Presto model).
4. Launch your processing jobs through an OpenEO job manager.
5. Inspect and visualize the embeddings (band statistics, PCA projection, pseudo-color rendering).

Use this as a starting point to explore model outputs or integrate embeddings into downstream ML tasks.

<div class="alert alert-block alert-warning">
<b>PREREQUISITE:</b> <br>
This means you need a <a href="https://dataspace.copernicus.eu/" target="_blank">CDSE account</a> in order to proceed!
</div>

<div class="alert alert-block alert-warning">
<b>NOTE ON AOI SIZE:</b> <br>
Given the purpose of these patches, we HIGHLY recommend to keep the size of the individual patches smaller than 20x20 km².<br>
Defining larger patches is possible however, and they will be automatically split  into 20x20 km² sub tiles.<br>
Keep in mind that these extraction jobs will be billed on your CDSE account, so start small and evaluate the cost!<br>

Upscaling of this workflow outside a notebook environment can be done through a dedicated script located in worldcereal-classification > scripts > inference > compute_embeddings.py.
</div>

### Step 1: Specify your area(s) of interest (AOI)

Execute the next cell to show an interactive widget, allowing you to specify the location of your patches.

**Option 1: Draw on the map**<br>
Use the rectangle or polygon buttons on the left hand side to start drawing.<br><br>
Each time you finish drawing, you are requested to provide a short, meaningful identifier for your area. <br>
Only after clicking the "Submit" button, the drawn area is added to your list of AOIs.<br><br>
Use the bin button to delete any of the drawn objects

**Option 2: Upload an existing vector file containing your patches**<br>
Use the Upload button in the upper right corner to upload an existing file containing your areas of interest.<br><br>
Supported file formats: zipped shapefile (.zip), geopackage (.gpkg) and parquet (.parquet)<br>
The geometries will be automatically reprojected to EPSG:4326.<br><br>
After selecting your file, you will need to specify which attribute contains a unique identifier for each AOI.<br>
All other attributes in your file will be dropped.

In [None]:
from worldcereal.utils.map import ui_map

map = ui_map(mode="multi")

Now save your area(s) of interest for future reference. You will be asked to provide a short descriptive name.

In [None]:
from pathlib import Path

name = input("Name for the bbox file (without extension): ")
bbox_dir = Path("./bbox")
bbox_dir.mkdir(exist_ok=True)
aoi_path = map.save_gdf(bbox_dir, outputname=name)

### Step 2: Select your processing period

**Option 1: Define one period valid for all your AOIs**

Execute the cell below to specify your year and season of interest using the interactive slider.<br>
Based on your selection, we will extract exactly one year of data, counting back from your season's end date.<br>

**Option 2: Use the WorldCereal crop calendars**

WorldCereal has identified the two dominant growing seasons for each location on the planet. By simply specifying a year of interest, we automatically select the 12 months needed to cover the two seasons ending in your specified year.<br>

> Note that in this option, the time period is computed for each AOI separately, allowing you to have different time periods for your different AOIs.

In [None]:
# RUN THIS CELL IF YOU CHOOSE OPTION 1
# and drag the slider to select your season of interest for all AOIs

from notebook_utils.dateslider import date_slider

processing_slider = date_slider()

In [None]:
# RUN THIS CELL IF YOU CHOOSE OPTION 1
# here we extract the selected season from the slider
processing_period = processing_slider.get_selected_dates()
year = None

In [None]:
# RUN THIS CELL IF YOU CHOOSE OPTION 2
# manually change the year below to your year of interest for all AOIs
year = 2024
processing_period = None

### Step 3: Optionally customize embeddings parameters

Adjust any parameters below to control the embeddings generation. You can override (for example) the model URL or other processing options if exposed by the API. After instantiation we print the effective configuration for transparency.

> Note: by default we use the global Presto model pre-trained by the WorldCereal consortium. Do not change anything in the below cell if you want to use this default model.

In [None]:
from worldcereal.parameters import EmbeddingsParameters

# Instantiate embedding parameters (override defaults by passing arguments)
# Example: EmbeddingsParameters(presto_model_url="s3://bucket/custom_model.onnx")
embedding_params = EmbeddingsParameters()

# Display the resolved parameters (assumes dataclass / attrs-like repr)
print("Embedding Parameters:\n", embedding_params)

### Step 4: Launch the processing job(s)

All set!

Execute the next cell to start embeddings computation for your AOIs!

When you execute this for the first time, you will be asked to connect to CDSE by clicking the link displayed below the cell.

You will be able to monitor the processing through a map and table indicating the status of your processing jobs.<br>
(In case your AOI's are located far apart, they will probably not show on the map.)<br>
Detailed logs are hidden by default but can be switched on by setting `simplify_logging` to `False`. 

Results will be automatically saved in a folder with the name you provided for your AOIs earlier  `./embeddings/<name>`.<br>

If desired, you can also specify a preferred orbit state for the Sentinel-1 data. If not provided, the orbit state will be automatically determined based on the availability of data.

Computing embeddings for a small area takes around 10 minutes. Hang in there!

In [None]:
from pathlib import Path
from notebook_utils.embeddings import compute_worldcereal_embeddings_notebook

aoi_gdf = map.get_gdf()

outdir = Path(f'./embeddings/{name}')

s1_orbit_state = None  # or 'ASCENDING' / 'DESCENDING'

# Whether to scale float inputs to uint16 (saves storage and speeds up processing, but may reduce precision)
scale_uint16 = True  

# Collect WorldCereal embeddings for the AOIs and processing period
job_manager = compute_worldcereal_embeddings_notebook(aoi_gdf,
                               outdir,
                               temporal_extent=processing_period,
                               year=year,
                               s1_orbit_state=s1_orbit_state,
                               scale_uint16=scale_uint16,
                               simplify_logging=True
                               )

### Step 5: Inspect & Visualize Embeddings

Let's first have a look at the status of your processing job(s).

In [None]:
from notebook_utils.job_manager import check_job_status

job_status_counts = check_job_status(outdir)

We now load one of the downloaded multiband embeddings GeoTIFFs and produce:
1. Per-band value distribution (histogram)
2. PCA projection (first 3 components) rendered as an RGB image
3. Optional pseudo-color rendering from selected bands

> Note: If data were scaled to UInt16, we scale back first to the original Float32 values.

In [None]:
from notebook_utils.embeddings import fetch_embeddings_results_from_outdir, read_embeddings_raster

outfiles = fetch_embeddings_results_from_outdir(outdir)
# We select the first one for demonstration
outfile = outfiles[0]

# Read the data
data, profile = read_embeddings_raster(outfile, scale_uint16)

print("Raster profile:\n", profile)
print("Data shape (bands, height, width):", data.shape)

Let's plot a histogram for the first few embeddings (by default we show 6):

In [None]:
from notebook_utils.embeddings import show_embeddings_histogram

bands_to_plot = 6

show_embeddings_histogram(data, bands_to_plot)

Now we run PCA analysis on the embeddings and show the first 3 bands as RGB image:

In [None]:
from notebook_utils.embeddings import run_pca_on_embeddings

pca_img, pca_min, pca_max = run_pca_on_embeddings(data)

Finally, we present some functionality to plot a pseudo-RGB image from three arbitrarily chosen embeddings bands:

In [None]:
from notebook_utils.embeddings import plot_embeddings_as_rgb

plot_embeddings_as_rgb(data)

## Next Steps
You can now:
- Feed embeddings into clustering or dimensionality reduction for pattern discovery.
- Sample embeddings at reference points for model training.
- Compare embeddings across seasons or AOIs.
- ...