## Extracting pre-processed inputs

This notebook demonstrates how you can extract, for one or multiple patches, pre-processed satellite time series required for running a cropland or crop type model locally on your machine.

### Why?

Having a set of patches available comes in handy during the development of a custom crop type model. It allows you to quickly test different model set-ups, as each time you have trained a new model, you can immediately apply it to the same set of patches and check for improvements. By not having to deploy and run the model on CDSE, this drastically reduces the time required to get to your ideal crop model!

### How does it work?

All you need to specify is:
- the geometry of one or multiple patches (in case the patches are large, they will be split)
- start and end date of the time series to be extracted (can be specified per patch)

The notebook will then launch, for each of the specified geometries, an OpenEO processing job on the Copernicus Data Space Ecosystem (CDSE) extracting all relevant Sentinel-1, Sentinel-2, meteo and digital elevation information that is used by the WorldCereal classification algorithms to predict cropland and crop types.

<div class="alert alert-block alert-warning">
<b>PREREQUISITE:</b> <br>
This means you need a <a href="https://dataspace.copernicus.eu/" target="_blank">CDSE account</a> in order to proceed!
</div>

<div class="alert alert-block alert-warning">
<b>NOTE ON AOI SIZE:</b> <br>
Given the purpose of these patches, we HIGHLY recommend to keep the size of the individual patches smaller than 20x20 km².<br>
Defining larger patches is possible however, and they will be automatically split  into 20x20 km² sub tiles.<br>
Keep in mind that these extraction jobs will be billed on your CDSE account, so start small and evaluate the cost!<br>

Upscaling of this workflow outside a notebook environment can be done through a dedicated script located in worldcereal-classification > scripts > inference > collect_inputs.py.
</div>

### Step 1: Specify your area(s) of interest (AOI)

Execute the next cell to show an interactive widget, allowing you to specify the location of your patches.

**Option 1: Draw on the map**<br>
Use the rectangle or polygon buttons on the left hand side to start drawing.<br><br>
Each time you finish drawing, you are requested to provide a short, meaningful identifier for your area. <br>
Only after clicking the "Submit" button, the drawn area is added to your list of AOIs.<br><br>
Use the bin button to delete any of the drawn objects

**Option 2: Upload an existing vector file containing your patches**<br>
Use the Upload button in the upper right corner to upload an existing file containing your areas of interest.<br><br>
Supported file formats: zipped shapefile (.zip), geopackage (.gpkg) and parquet (.parquet)<br>
The geometries will be automatically reprojected to EPSG:4326.<br><br>
After selecting your file, you will need to specify which attribute contains a unique identifier for each AOI.<br>
All other attributes in your file will be dropped.

In [None]:
from worldcereal.utils.map import ui_map

map = ui_map(mode="multi")

Now save your area(s) of interest for future reference. You will be asked to provide a short descriptive name.

In [None]:
from pathlib import Path

name = input("Name for the bbox file (without extension): ")
bbox_dir = Path("./bbox")
bbox_dir.mkdir(exist_ok=True)
aoi_path = map.save_gdf(bbox_dir, outputname=name)

### Step 2: Select your processing period

**Option 1: Define one period valid for all your AOIs**

Execute the cell below to specify your year and season of interest using the interactive slider.<br>
Based on your selection, we will extract exactly one year of data, counting back from your season's end date.<br>

**Option 2: Use the WorldCereal crop calendars**

WorldCereal has identified the two dominant growing seasons for each location on the planet. By simply specifying a year of interest, we automatically select the 12 months needed to cover the two seasons ending in your specified year.<br>

> Note that in this option, the time period is computed for each AOI separately, allowing you to have different time periods for your different AOIs.

In [None]:
# RUN THIS CELL IF YOU CHOOSE OPTION 1
# and drag the slider to select your season of interest for all AOIs

from notebook_utils.dateslider import date_slider

processing_slider = date_slider()

In [None]:
# RUN THIS CELL IF YOU CHOOSE OPTION 1
# here we extract the selected season from the slider
processing_period = processing_slider.get_selected_dates()
year = None

In [None]:
# RUN THIS CELL IF YOU CHOOSE OPTION 2
# manually change the year below to your year of interest for all AOIs
year = 2024
processing_period = None

### Step 3: Launch the processing job(s)

All set!

Execute the next cell to start the extractions of EO time series for your AOIs!

When you execute this for the first time, you will be asked to connect to CDSE by clicking the link displayed below the cell.

You will be able to monitor the processing through a map and table indicating the status of your processing jobs.<br>
(In case your AOI's are located far apart, they will probably not show on the map.)<br>
Detailed logs will also be shown.

Results will be automatically saved in a folder with the name you provided for your AOIs earlier  `./preprocessed_inputs/<name>`.<br>

If desired, you can also specify a preferred orbit state for the Sentinel-1 data. If not provided, the orbit state will be automatically determined based on the availability of data.

Extracting inputs for a small area takes around 10 minutes. Hang in there!

In [None]:
from notebook_utils.preprocessed_inputs import collect_worldcereal_inputs_notebook

aoi_gdf = map.get_gdf()

outdir = Path(f'./preprocessed_inputs/{name}')

s1_orbit_state = None  # or 'ASCENDING' / 'DESCENDING'

job_manager = collect_worldcereal_inputs_notebook(aoi_gdf,
                                   outdir,
                                   temporal_extent=processing_period,
                                   year=year,
                                   s1_orbit_state=s1_orbit_state)

### Step 4: Check results!

Let's first have a look at the status of your processing job(s).

In [None]:
from notebook_utils.job_manager import check_job_status

job_status_counts = check_job_status(outdir)

As a result of each processing job, you should have received one NetCDF file.<br>

Let's list the output files we got and check the content of the first one:

In [None]:
from notebook_utils.preprocessed_inputs import fetch_inputs_results_from_outdir
import xarray as xr

outfiles = fetch_inputs_results_from_outdir(outdir)
outfile = outfiles[0]
ds = xr.open_dataset(outfile)
ds

We can do a quick quality check on the extracted data:

In [None]:
from notebook_utils.preprocessed_inputs import get_band_statistics_netcdf, visualize_timeseries_netcdf

stats = get_band_statistics_netcdf(ds)
visualize_timeseries_netcdf(ds, band="NDVI", npixels=6, random_seed=42)

All done!<br>