In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os

os.environ['EOTDL_API_URL'] = 'https://api.eotdl.com/'
# os.environ['EOTDL_API_URL'] = 'http://localhost:8000/'

In this use case we show how to perform feature engineering with openEO within EOTDL.

https://github.com/earthpulse/eotdl/issues/190


1. stage the EuroCrops dataset with EOTDL.
2. filter the EuroCrops Dataset to create a subset of parcels, e.g., 8 crop classes, each with 1000 examples, for one country
3. run feature engineering with openEO, creating temporal metrics from a S1 and S2 time series (temporally optimised for crops classe of interest). Store feature engineering process graph with the training datsets in EOTDL
4. Use EOTDL functionality to train a model (for this the features need to be retrieved..). Store the model along with the openEO process graph in EOTDL.
5. Use the model to run inference (from within EOTDL?) in an openEO platform such as CDSE or openEO platform. Make use of the feature engineering process graph stored along with the EOTDL model.

## 1 Stage EuroCrops from EOTDL

Dataset can be found at https://www.eotdl.com/datasets/EuroCrops/. The dataset contains a zip file, which in turn contains zip files for each country with the shapefiles (16 total).

> Uncomment the following cells to stage the dataset.

In [3]:
# !eotdl datasets get EuroCrops -v 1 -f -a
# !unzip -o ~/.cache/eotdl/datasets/EuroCrops/EuroCrops.zip -d data/

In [4]:
# from glob import glob

# zips = glob('data/*.zip')

# zips

In [5]:
# # unzip shapefiles

# import zipfile

# for zip_file in zips:
# 	with zipfile.ZipFile(zip_file, 'r') as zip_ref:
# 		zip_ref.extractall('data/')


In [6]:
# cleanup

# !rm -rf data/*.zip

List of all the shapefiles in the dataset.

In [63]:
from glob import glob

shapefiles = glob('data/**/*.shp', recursive=True)

shapefiles

['data/EE_2021_EC21.shp',
 'data/NL_2020_EC21.shp',
 'data/DK_2019_EC21.shp',
 'data/SI_2021_EC21.shp',
 'data/DE_NRW_2021_EC21.shp',
 'data/AT_2021_EC21.shp',
 'data/BE_VLG_2021_EC21.shp',
 'data/DE_LS_2021_EC21.shp',
 'data/LT_2021_EC.shp',
 'data/filtered_gdf.shp',
 'data/SK_2021_EC21.shp',
 'data/LV_2021_EC21.shp',
 'data/SE/SE_2021_EC21.shp',
 'data/NA/ES_NA_2020_EC21.shp',
 'data/RO/RO_ny_EC21.shp',
 'data/HR/HR_2020_EC21.shp',
 'data/FR/FR_2018_EC21.shp']

In [8]:
import geopandas as gpd

path = shapefiles[0]

gdf = gpd.read_file(path)

gdf.head()


Unnamed: 0,taotlusaas,pollu_id,pindala_ha,taotletud_,taotletu_1,niitmise_t,niitmise_1,viimase_mu,taotletu_2,taotleja_n,taotleja_r,EC_trans_n,EC_hcat_n,EC_hcat_c,geometry
0,2021,19994165,0.25,Karjatamine väljaspool põllumaj. maad,Karjatamine väljaspool põllumaj. maad,,,2021/05/02 14:37:52.000,,FIE,,Rough grazings,pasture_meadow_grassland_grass,3302000000,"POLYGON ((26.50243 59.31839, 26.50244 59.31843..."
1,2021,19990783,1.7,rohttaimed,Püsirohumaa,Niidetud,28.06.2021-04.07.2021,2021/05/02 06:59:17.000,Kliimat ja keskkonda säästvate põllumajandusta...,ERAISIK,,grasses,pasture_meadow_grassland_grass,3302000000,"POLYGON ((24.54648 58.86884, 24.54674 58.86879..."
2,2021,19990784,0.49,rohttaimed,Püsirohumaa,Ei kuulu jälgimisele,,2021/05/02 06:59:17.000,Kliimat ja keskkonda säästvate põllumajandusta...,ERAISIK,,grasses,pasture_meadow_grassland_grass,3302000000,"POLYGON ((24.54597 58.86827, 24.54668 58.86816..."
3,2021,19996106,0.54,talinisu allakülvita,Põllukultuurid,,,2021/05/02 20:58:12.000,Kliimat ja keskkonda säästvate põllumajandusta...,ERAISIK,,Winter wheat,winter_common_soft_wheat,3301010101,"POLYGON ((27.42837 58.11975, 27.42839 58.11972..."
4,2021,19990620,2.48,"punane ristik (vähemalt 80% ristikut, kuni 20%...",Põllukultuurid,Niidetud,06.07.2021-11.07.2021,2021/07/05 07:26:35.000,Kliimat ja keskkonda säästvate põllumajandusta...,TAMSAMÄE OÜ,11350602.0,Red clover (at least 80% clover up to 20% gras...,clover,3301090303,"POLYGON ((26.66816 57.82049, 26.66815 57.8205,..."


In [9]:
# columns
gdf.columns

Index(['taotlusaas', 'pollu_id', 'pindala_ha', 'taotletud_', 'taotletu_1',
       'niitmise_t', 'niitmise_1', 'viimase_mu', 'taotletu_2', 'taotleja_n',
       'taotleja_r', 'EC_trans_n', 'EC_hcat_n', 'EC_hcat_c', 'geometry'],
      dtype='object')

## 2. Filter EuroCropsDataset

Filter the EuroCropsDataset to create a subset of parcels, e.g., 8 crop classes, each with 1000 examples, for one country

In [10]:
# random country

import numpy as np

ix = np.random.randint(0, len(shapefiles))
country = shapefiles[ix]

country

'data/LV_2021_EC21.shp'

In [11]:
gdf = gpd.read_file(path)

In [12]:
crop_classes = gdf['EC_hcat_n'].unique()

crop_classes

array(['pasture_meadow_grassland_grass', 'winter_common_soft_wheat',
       'clover', 'peas', 'winter_barley', 'winter_rapeseed_rape',
       'spring_barley', 'fresh_vegetables', 'fallow_land_not_crop',
       'orchards_fruits', 'potatoes', 'oats', 'spring_common_soft_wheat',
       'not_known_and_other', 'buckwheat',
       'legumes_dried_pulses_protein_crops', 'raspberry_raspberries',
       'legumes_harvested_green', 'mangelwurzel_fodder_beet', 'melilot',
       'mustard', 'lolium_ryegrass', 'alfalfa_lucerne', 'strawberries',
       'apples', 'blueberry', 'unspecified_cereals', 'beans', 'rye',
       'rhubarb', 'spring_rapeseed_rape', 'winter_triticale',
       'spring_triticale', 'nurseries_nursery', 'coriander',
       'hippophae_sea_buckthorns_seaberry', 'blackcurrant_cassis',
       'willows_osiers', 'beetroot_beets', 'grain_maize_corn_popcorn',
       'pumpkin_squash_gourd', 'cucumber_pickle', 'aronia_chokeberries',
       'aromatic_medicinal_culinary_plants_spices_herbs', 'red

In [13]:
# number of samples per class

num_samples_per_class = {class_: len(gdf[gdf['EC_hcat_n'] == class_]) for class_ in crop_classes}

num_samples_per_class = dict(sorted(num_samples_per_class.items(), key=lambda x: x[1], reverse=True))

num_samples_per_class

{'pasture_meadow_grassland_grass': 84107,
 'winter_common_soft_wheat': 13726,
 'legumes_harvested_green': 13152,
 'spring_barley': 10737,
 'oats': 6911,
 'clover': 6877,
 'winter_rapeseed_rape': 6012,
 'spring_common_soft_wheat': 5525,
 'peas': 4210,
 'potatoes': 3438,
 'winter_barley': 2245,
 'fallow_land_not_crop': 1922,
 'beans': 1719,
 'fresh_vegetables': 1600,
 'rye': 1427,
 'alfalfa_lucerne': 1241,
 'spring_rapeseed_rape': 1236,
 'buckwheat': 1140,
 'grain_maize_corn_popcorn': 852,
 'strawberries': 846,
 'orchards_fruits': 832,
 'legumes_dried_pulses_protein_crops': 831,
 'winter_triticale': 533,
 'finola': 522,
 'melilot': 446,
 'apples': 383,
 'hippophae_sea_buckthorns_seaberry': 322,
 'raspberry_raspberries': 299,
 'blackcurrant_cassis': 275,
 'not_known_and_other': 234,
 'mustard': 209,
 'spring_triticale': 195,
 'unspecified_cereals': 127,
 'aromatic_medicinal_culinary_plants_spices_herbs': 118,
 'blueberry': 94,
 'garlic': 93,
 'carrots_daucus': 84,
 'lolium_ryegrass': 83,


In [14]:
# import matplotlib.pyplot as plt

# plt.figure(figsize=(5, 25))
# plt.barh(list(num_samples_per_class.keys()), list(num_samples_per_class.values()))
# plt.tight_layout()
# plt.show()

In [15]:
# filter 1000 examples per class

# Each job runs separately, so we need to limit the number of classes and samples per class
# samples = 1000
# num_classes = 8

samples = 10
num_classes = 2

# keep classes with at least 1000 samples
classes = [class_ for class_, count in num_samples_per_class.items() if count >= samples]

# random 8 classes
classes = np.random.choice(classes, num_classes, replace=False)

classes


array(['oats', 'calendula_marigold'], dtype='<U47')

In [16]:
filtered_gdf = gdf[gdf['EC_hcat_n'].isin(classes)]

filtered_gdf = filtered_gdf.groupby('EC_hcat_n').sample(n=samples, random_state=42)

filtered_gdf.head()

Unnamed: 0,taotlusaas,pollu_id,pindala_ha,taotletud_,taotletu_1,niitmise_t,niitmise_1,viimase_mu,taotletu_2,taotleja_n,taotleja_r,EC_trans_n,EC_hcat_n,EC_hcat_c,geometry
101026,2021,21383056,4.0,"saialill, harilik",Põllukultuurid,,,2021/05/21 00:33:53.000,Kliimat ja keskkonda säästvate põllumajandusta...,ABENORA OÜ,11177993,English marigold,calendula_marigold,3301061210,"POLYGON ((25.66828 59.14383, 25.66851 59.14395..."
9231,2021,20138897,3.65,"saialill, harilik",Põllukultuurid,,,2021/05/06 19:57:28.000,Kliimat ja keskkonda säästvate põllumajandusta...,KEKKAR GRUPP OÜ,14414254,English marigold,calendula_marigold,3301061210,"POLYGON ((25.70297 58.96856, 25.70305 58.96863..."
140918,2021,21844594,0.33,"saialill, harilik",Põllukultuurid,,,2021/05/31 09:07:27.000,Kliimat ja keskkonda säästvate põllumajandusta...,OLEMARI TALU OÜ,14397171,English marigold,calendula_marigold,3301061210,"POLYGON ((26.5513 58.67763, 26.55129 58.67763,..."
161371,2021,22137173,4.0,"saialill, harilik",Põllukultuurid,,,2021/06/14 21:20:53.000,Kliimat ja keskkonda säästvate põllumajandusta...,SP AGRO OÜ,12805447,English marigold,calendula_marigold,3301061210,"POLYGON ((26.59734 59.18037, 26.59741 59.18038..."
32271,2021,20443741,2.59,"saialill, harilik",Põllukultuurid,,,2021/05/14 16:00:36.000,Kliimat ja keskkonda säästvate põllumajandusta...,MAHE KATI OÜ,11630537,English marigold,calendula_marigold,3301061210,"POLYGON ((25.99934 58.4477, 25.99927 58.44769,..."


In [17]:
assert len(filtered_gdf) == num_classes * samples

# save to disk
filtered_gdf.to_file('data/filtered_gdf.shp')


## 3. Feature Engineering with openEO

## 3.1 Feature Enginering pipeline


The first thing that we need is a feature engineering pipeline.

In [18]:
!cat s1_weekly_statistics.json

{"process_graph":{"loadcollection1":{"process_id":"load_collection","arguments":{"bands":["VH","VV"],"id":"SENTINEL1_GRD","spatial_extent":{"from_parameter":"spatial_extent"},"temporal_extent":{"from_parameter":"temporal_extent"}}},"sarbackscatter1":{"process_id":"sar_backscatter","arguments":{"coefficient":"sigma0-ellipsoid","contributing_area":false,"data":{"from_node":"loadcollection1"},"elevation_model":"COPERNICUS_30","ellipsoid_incidence_angle":false,"local_incidence_angle":false,"mask":false,"noise_removal":true}},"aggregatetemporalperiod1":{"process_id":"aggregate_temporal_period","arguments":{"data":{"from_node":"sarbackscatter1"},"period":"week","reducer":{"process_graph":{"mean1":{"process_id":"mean","arguments":{"data":{"from_parameter":"data"}},"result":true}}}}},"applydimension1":{"process_id":"apply_dimension","arguments":{"data":{"from_node":"aggregatetemporalperiod1"},"dimension":"t","process":{"process_graph":{"quantiles1":{"process_id":"quantiles","arguments":{"data"

In [19]:
!cat s2_weekly_statistics.json

{
    "process_graph": {
        "loadcollection1": {
            "process_id": "load_collection",
            "arguments": {
                "bands": [
                    "B02",
                    "B03",
                    "B04",
                    "B05",
                    "B06",
                    "B07",
                    "B08",
                    "B8A",
                    "B11",
                    "B12"
                ],
                "id": "SENTINEL2_L2A",
                "properties": {
                    "eo:cloud_cover": {
                        "process_graph": {
                            "lte1": {
                                "process_id": "lte",
                                "arguments": {
                                    "x": {
                                        "from_parameter": "value"
                                    },
                                    "y": 75.0
                                },
                                "resul

We can ingest the pipelines to the EOTDL. First, create a folder with the metadata (README.md) and the pipelines.

In [20]:
text = """---
name: EuroCropsPipeline
authors: 
  - eotdl
license: free
source: https://github.com/earthpulse/eotdl/tree/main/tutorials/usecases/openEO
---

# EuroCropsPipeline

This pipeline will extract features from a S1 and S2 time series for a given set of parcels in the EuroCrops dataset.
"""

os.makedirs('pipeline', exist_ok=True)
with open(f"pipeline/README.md", "w") as outfile:
    outfile.write(text)
    
!cp s1_weekly_statistics.json pipeline/.
!cp s2_weekly_statistics.json pipeline/.
!cat pipeline/README.md

---
name: EuroCropsPipeline
authors: 
  - eotdl
license: free
source: https://github.com/earthpulse/eotdl/tree/main/tutorials/usecases/openEO
---

# EuroCropsPipeline

This pipeline will extract features from a S1 and S2 time series for a given set of parcels in the EuroCrops dataset.


Then, ingest to EOTDL.

In [21]:
from eotdl.fe import ingest_openeo 

ingest_openeo('pipeline')

Ingesting directory: pipeline


Ingesting files: 100%|██████████| 3/3 [00:01<00:00,  2.51it/s]


A new version was created, your dataset has changed.
Num changes: 1


PosixPath('pipeline/catalog.parquet')

In [22]:
!eotdl pipelines ingest -p pipeline

Ingesting directory: pipeline
Ingesting files: 100%|████████████████████████████| 3/3 [00:01<00:00,  2.43it/s]
A new version was created, your dataset has changed.
Num changes: 1


In [23]:
!eotdl pipelines list

['EuroCropsPipeline']


## 3.2 Running the pipeline

We can retrieve the pipeline from the EOTDL very easily.

In [27]:
from eotdl.fe import stage_pipeline 

stage_pipeline('EuroCropsPipeline', path="pipeline2")

'pipeline2/EuroCropsPipeline'

In [28]:
!eotdl pipelines get EuroCropsPipeline -p pipeline2 -a -f

Staging assets: 100%|█████████████████████████████| 3/3 [00:02<00:00,  1.09it/s]
Data available at pipeline2/EuroCropsPipeline


In [30]:
!ls pipeline2/EuroCropsPipeline

README.md                 s1_weekly_statistics.json
catalog.v3.parquet        s2_weekly_statistics.json


But openeo needs access to the pipelines from public links

> The url should return the actual json, not the json file (download).


In [3]:
from eotdl.files import get_file_content_url

s1_weekly_statistics_url = get_file_content_url('s1_weekly_statistics.json', 'EuroCropsPipeline', 'pipelines')
s2_weekly_statistics_url = get_file_content_url('s2_weekly_statistics.json', 'EuroCropsPipeline', 'pipelines')

s1_weekly_statistics_url, s2_weekly_statistics_url

('https://api.eotdl.com/pipelines/68306c8ee2cef594e0c0ef07/raw/s1_weekly_statistics.json',
 'https://api.eotdl.com/pipelines/68306c8ee2cef594e0c0ef07/raw/s2_weekly_statistics.json')

In [4]:
import requests

response = requests.get(s1_weekly_statistics_url)
print(response.text)


{"process_graph":{"loadcollection1":{"process_id":"load_collection","arguments":{"bands":["VH","VV"],"id":"SENTINEL1_GRD","spatial_extent":{"from_parameter":"spatial_extent"},"temporal_extent":{"from_parameter":"temporal_extent"}}},"sarbackscatter1":{"process_id":"sar_backscatter","arguments":{"coefficient":"sigma0-ellipsoid","contributing_area":false,"data":{"from_node":"loadcollection1"},"elevation_model":"COPERNICUS_30","ellipsoid_incidence_angle":false,"local_incidence_angle":false,"mask":false,"noise_removal":true}},"aggregatetemporalperiod1":{"process_id":"aggregate_temporal_period","arguments":{"data":{"from_node":"sarbackscatter1"},"period":"week","reducer":{"process_graph":{"mean1":{"process_id":"mean","arguments":{"data":{"from_parameter":"data"}},"result":true}}}}},"applydimension1":{"process_id":"apply_dimension","arguments":{"data":{"from_node":"aggregatetemporalperiod1"},"dimension":"t","process":{"process_graph":{"quantiles1":{"process_id":"quantiles","arguments":{"data"

Run feature engineering with openEO, creating temporal metrics from a S1 and S2 time series (temporally optimised for crops classe of interest). 

In [5]:
import geopandas as gpd

gdf = gpd.read_file('data/filtered_gdf.shp')

gdf.shape

(20, 15)

Add urls to the gdf

In [6]:
gdf['s1_weekly_statistics_url'] = s1_weekly_statistics_url
gdf['s2_weekly_statistics_url'] = s2_weekly_statistics_url

gdf.head()

Unnamed: 0,taotlusaas,pollu_id,pindala_ha,taotletud_,taotletu_1,niitmise_t,niitmise_1,viimase_mu,taotletu_2,taotleja_n,taotleja_r,EC_trans_n,EC_hcat_n,EC_hcat_c,geometry,s1_weekly_statistics_url,s2_weekly_statistics_url
0,2021,21383056,4.0,"saialill, harilik",Põllukultuurid,,,2021/05/21 00:33:53.000,Kliimat ja keskkonda säästvate põllumajandusta...,ABENORA OÜ,11177993,English marigold,calendula_marigold,3301061210,"POLYGON ((25.66828 59.14383, 25.66851 59.14395...",https://api.eotdl.com/pipelines/68306c8ee2cef5...,https://api.eotdl.com/pipelines/68306c8ee2cef5...
1,2021,20138897,3.65,"saialill, harilik",Põllukultuurid,,,2021/05/06 19:57:28.000,Kliimat ja keskkonda säästvate põllumajandusta...,KEKKAR GRUPP OÜ,14414254,English marigold,calendula_marigold,3301061210,"POLYGON ((25.70297 58.96856, 25.70305 58.96863...",https://api.eotdl.com/pipelines/68306c8ee2cef5...,https://api.eotdl.com/pipelines/68306c8ee2cef5...
2,2021,21844594,0.33,"saialill, harilik",Põllukultuurid,,,2021/05/31 09:07:27.000,Kliimat ja keskkonda säästvate põllumajandusta...,OLEMARI TALU OÜ,14397171,English marigold,calendula_marigold,3301061210,"POLYGON ((26.5513 58.67763, 26.55129 58.67763,...",https://api.eotdl.com/pipelines/68306c8ee2cef5...,https://api.eotdl.com/pipelines/68306c8ee2cef5...
3,2021,22137173,4.0,"saialill, harilik",Põllukultuurid,,,2021/06/14 21:20:53.000,Kliimat ja keskkonda säästvate põllumajandusta...,SP AGRO OÜ,12805447,English marigold,calendula_marigold,3301061210,"POLYGON ((26.59734 59.18037, 26.59741 59.18038...",https://api.eotdl.com/pipelines/68306c8ee2cef5...,https://api.eotdl.com/pipelines/68306c8ee2cef5...
4,2021,20443741,2.59,"saialill, harilik",Põllukultuurid,,,2021/05/14 16:00:36.000,Kliimat ja keskkonda säästvate põllumajandusta...,MAHE KATI OÜ,11630537,English marigold,calendula_marigold,3301061210,"POLYGON ((25.99934 58.4477, 25.99927 58.44769,...",https://api.eotdl.com/pipelines/68306c8ee2cef5...,https://api.eotdl.com/pipelines/68306c8ee2cef5...


> will run one job per parcel, very slow and not cost effective (~5mins/parcel, can speed up with `parallel_jobs`)

In [9]:
from eotdl.fe.openeo import eurocrops_point_extraction 

!rm -rf jobs.csv

eurocrops_point_extraction(
    gdf, 
    start_date = "2024-01-01", 
    nb_months = 2, 
    job_tracker = 'jobs.csv', 
    parallel_jobs=10, 
    extra_cols=['EC_hcat_n']
)


Authenticated using refresh token.


In [3]:
import pandas as pd 

job = pd.read_csv('jobs.csv')
job

Unnamed: 0,geometry,crs,temporal_extent,s1_weekly_statistics_url,s2_weekly_statistics_url,EC_hcat_n,id,backend_name,status,start_time,running_start_time,cpu,memory,duration
0,"POLYGON ((25.66827701 59.1438296, 25.66850873 ...",EPSG:4326,"['2024-01-01', '2024-03-01']",https://api.eotdl.com/pipelines/68306c8ee2cef5...,https://api.eotdl.com/pipelines/68306c8ee2cef5...,calendula_marigold,j-250610114337402e8783cc790f989e16,cdse,finished,2025-06-10T11:43:37Z,,181.968132084 cpu-seconds,604386.09375 mb-seconds,115 seconds
1,"POLYGON ((25.70297087 58.96855719, 25.70304916...",EPSG:4326,"['2024-01-01', '2024-03-01']",https://api.eotdl.com/pipelines/68306c8ee2cef5...,https://api.eotdl.com/pipelines/68306c8ee2cef5...,calendula_marigold,j-2506101143544ce3851bd6b8a2b0cd73,cdse,finished,2025-06-10T11:43:54Z,,183.37440868800002 cpu-seconds,579515.6302083334 mb-seconds,188 seconds
2,"POLYGON ((26.55129624 58.6776346, 26.5512948 5...",EPSG:4326,"['2024-01-01', '2024-03-01']",https://api.eotdl.com/pipelines/68306c8ee2cef5...,https://api.eotdl.com/pipelines/68306c8ee2cef5...,calendula_marigold,j-2506101144114e87adfabefc5820d9bb,cdse,finished,2025-06-10T11:44:11Z,2025-06-10T11:47:31Z,211.697147428 cpu-seconds,942888.9375 mb-seconds,216 seconds
3,"POLYGON ((26.59733773 59.18037039, 26.59740858...",EPSG:4326,"['2024-01-01', '2024-03-01']",https://api.eotdl.com/pipelines/68306c8ee2cef5...,https://api.eotdl.com/pipelines/68306c8ee2cef5...,calendula_marigold,j-25061011442847d38aae326d5d7ef0f5,cdse,finished,2025-06-10T11:44:28Z,2025-06-10T11:47:31Z,169.839017443 cpu-seconds,977978.470703125 mb-seconds,241 seconds
4,"POLYGON ((25.99934068 58.44770006, 25.99926861...",EPSG:4326,"['2024-01-01', '2024-03-01']",https://api.eotdl.com/pipelines/68306c8ee2cef5...,https://api.eotdl.com/pipelines/68306c8ee2cef5...,calendula_marigold,j-250610114445434bb5bdce03c6070314,cdse,finished,2025-06-10T11:44:45Z,2025-06-10T11:47:31Z,212.48763890799998 cpu-seconds,922621.4166666666 mb-seconds,218 seconds
5,"POLYGON ((25.12190965 58.18811529, 25.12179914...",EPSG:4326,"['2024-01-01', '2024-03-01']",https://api.eotdl.com/pipelines/68306c8ee2cef5...,https://api.eotdl.com/pipelines/68306c8ee2cef5...,calendula_marigold,j-250610114502444a923b572692e6f436,cdse,finished,2025-06-10T11:45:02Z,2025-06-10T11:50:29Z,178.073198549 cpu-seconds,1153232.4166666665 mb-seconds,334 seconds
6,"POLYGON ((25.63674171 59.19314093, 25.63685957...",EPSG:4326,"['2024-01-01', '2024-03-01']",https://api.eotdl.com/pipelines/68306c8ee2cef5...,https://api.eotdl.com/pipelines/68306c8ee2cef5...,calendula_marigold,j-2506101145184afaa17a894ade959753,cdse,finished,2025-06-10T11:45:18Z,2025-06-10T11:47:32Z,214.838795331 cpu-seconds,768602.427734375 mb-seconds,169 seconds
7,"POLYGON ((26.76718428 58.04747825, 26.76717541...",EPSG:4326,"['2024-01-01', '2024-03-01']",https://api.eotdl.com/pipelines/68306c8ee2cef5...,https://api.eotdl.com/pipelines/68306c8ee2cef5...,calendula_marigold,j-2506101145354f8ab69845744a64d351,cdse,finished,2025-06-10T11:45:35Z,2025-06-10T11:50:31Z,197.638628325 cpu-seconds,1409532.099609375 mb-seconds,315 seconds
8,"POLYGON ((25.95496171 58.58241344, 25.95491367...",EPSG:4326,"['2024-01-01', '2024-03-01']",https://api.eotdl.com/pipelines/68306c8ee2cef5...,https://api.eotdl.com/pipelines/68306c8ee2cef5...,calendula_marigold,j-25061011455349d49e6d8db88f8451e5,cdse,finished,2025-06-10T11:45:53Z,2025-06-10T11:50:31Z,185.678531479 cpu-seconds,1410277.4609375 mb-seconds,295 seconds
9,"POLYGON ((26.76707399 58.04743726, 26.76702201...",EPSG:4326,"['2024-01-01', '2024-03-01']",https://api.eotdl.com/pipelines/68306c8ee2cef5...,https://api.eotdl.com/pipelines/68306c8ee2cef5...,calendula_marigold,j-250610114613465d8b6fb5b9627460dd,cdse,finished,2025-06-10T11:46:13Z,2025-06-10T11:50:31Z,234.71645667400003 cpu-seconds,2196871.806640625 mb-seconds,313 seconds


In [4]:
# Initialize an empty list to store all dataframes
all_data = []

# Loop through each job and read its parquet file
for idx, _job in job.iterrows():
    try:
        job_data = pd.read_parquet(f'job_{_job["id"]}/timeseries.parquet')
        # Add job_id as a column to identify the source
        job_data['job_id'] = _job["id"]
        job_data['EC_hcat_n'] = _job["EC_hcat_n"]
        all_data.append(job_data)
    except Exception as e:
        print(f"Error reading job {_job['id']}: {e}")

# Concatenate all dataframes into one
if all_data:
    data = pd.concat(all_data, ignore_index=True)
    print(f"Successfully merged {len(all_data)} time series datasets")
else:
    data = pd.DataFrame()
    print("No time series data was loaded")

data

Successfully merged 20 time series datasets


  data = pd.concat(all_data, ignore_index=True)


Unnamed: 0,geometry,feature_index,B02_P10,B02_P25,B02_P50,B02_P75,B02_P90,B03_P10,B03_P25,B03_P50,...,VH_P50,VH_P75,VH_P90,VV_P10,VV_P25,VV_P50,VV_P75,VV_P90,job_id,EC_hcat_n
0,b'\x01\x03\x00\x00\x00\x01\x00\x00\x000\x00\x0...,0,,,,,,,,,...,0.003624,0.00547,0.010616,0.026428,0.03187,0.039562,0.052361,0.072581,j-250610114337402e8783cc790f989e16,calendula_marigold
1,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00\x11\x00...,0,,,,,,,,,...,0.00425,0.006289,0.012866,0.025365,0.030869,0.039567,0.052357,0.076722,j-2506101143544ce3851bd6b8a2b0cd73,calendula_marigold
2,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00\x13\x00...,0,,,,,,,,,...,0.004955,0.006882,0.010076,0.025439,0.030766,0.040386,0.051318,0.062309,j-2506101144114e87adfabefc5820d9bb,calendula_marigold
3,b'\x01\x03\x00\x00\x00\x01\x00\x00\x008\x00\x0...,0,,,,,,,,,...,0.006075,0.008646,0.01451,0.029551,0.035773,0.045899,0.061793,0.091778,j-25061011442847d38aae326d5d7ef0f5,calendula_marigold
4,"b""\x01\x03\x00\x00\x00\x01\x00\x00\x00\x15\x00...",0,,,,,,,,,...,0.004288,0.006369,0.011817,0.02342,0.027779,0.034683,0.045601,0.07481,j-250610114445434bb5bdce03c6070314,calendula_marigold
5,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00\n\x00\x...,0,,,,,,,,,...,0.007189,0.008921,0.015544,0.055672,0.058913,0.062077,0.06514,0.097181,j-250610114502444a923b572692e6f436,calendula_marigold
6,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00-\x00\x0...,0,,,,,,,,,...,0.003256,0.004635,0.007868,0.018415,0.02211,0.027997,0.03693,0.055871,j-2506101145184afaa17a894ade959753,calendula_marigold
7,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00\t\x00\x...,0,,,,,,,,,...,,,,,,,,,j-2506101145354f8ab69845744a64d351,calendula_marigold
8,"b'\x01\x03\x00\x00\x00\x01\x00\x00\x00""\x00\x0...",0,,,,,,,,,...,0.007069,0.010717,0.015627,0.033581,0.039622,0.050462,0.06171,0.094921,j-25061011455349d49e6d8db88f8451e5,calendula_marigold
9,b'\x01\x03\x00\x00\x00\x01\x00\x00\x009\x00\x0...,0,,,,,,,,,...,0.007424,0.010451,0.012721,0.032003,0.047651,0.060252,0.07042,0.096813,j-250610114613465d8b6fb5b9627460dd,calendula_marigold


In [5]:
data.columns

Index(['geometry', 'feature_index', 'B02_P10', 'B02_P25', 'B02_P50', 'B02_P75',
       'B02_P90', 'B03_P10', 'B03_P25', 'B03_P50', 'B03_P75', 'B03_P90',
       'B04_P10', 'B04_P25', 'B04_P50', 'B04_P75', 'B04_P90', 'B05_P10',
       'B05_P25', 'B05_P50', 'B05_P75', 'B05_P90', 'B06_P10', 'B06_P25',
       'B06_P50', 'B06_P75', 'B06_P90', 'B07_P10', 'B07_P25', 'B07_P50',
       'B07_P75', 'B07_P90', 'B08_P10', 'B08_P25', 'B08_P50', 'B08_P75',
       'B08_P90', 'B8A_P10', 'B8A_P25', 'B8A_P50', 'B8A_P75', 'B8A_P90',
       'B11_P10', 'B11_P25', 'B11_P50', 'B11_P75', 'B11_P90', 'B12_P10',
       'B12_P25', 'B12_P50', 'B12_P75', 'B12_P90', 'VH_P10', 'VH_P25',
       'VH_P50', 'VH_P75', 'VH_P90', 'VV_P10', 'VV_P25', 'VV_P50', 'VV_P75',
       'VV_P90', 'job_id', 'EC_hcat_n'],
      dtype='object')

In [6]:
data.to_csv('data/features.csv')


## 4. Train a model with EOTDL

We will train a simple random forest model on the features.


In [96]:
data = pd.read_csv('data/features.csv')

data

Unnamed: 0.1,Unnamed: 0,geometry,feature_index,B02_P10,B02_P25,B02_P50,B02_P75,B02_P90,B03_P10,B03_P25,...,VH_P50,VH_P75,VH_P90,VV_P10,VV_P25,VV_P50,VV_P75,VV_P90,job_id,EC_hcat_n
0,0,b'\x01\x03\x00\x00\x00\x01\x00\x00\x000\x00\x0...,0,,,,,,,,...,0.003624,0.00547,0.010616,0.026428,0.03187,0.039562,0.052361,0.072581,j-250610114337402e8783cc790f989e16,calendula_marigold
1,1,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00\x11\x00...,0,,,,,,,,...,0.00425,0.006289,0.012866,0.025365,0.030869,0.039567,0.052357,0.076722,j-2506101143544ce3851bd6b8a2b0cd73,calendula_marigold
2,2,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00\x13\x00...,0,,,,,,,,...,0.004955,0.006882,0.010076,0.025439,0.030766,0.040386,0.051318,0.062309,j-2506101144114e87adfabefc5820d9bb,calendula_marigold
3,3,b'\x01\x03\x00\x00\x00\x01\x00\x00\x008\x00\x0...,0,,,,,,,,...,0.006075,0.008646,0.01451,0.029551,0.035773,0.045899,0.061793,0.091778,j-25061011442847d38aae326d5d7ef0f5,calendula_marigold
4,4,"b""\x01\x03\x00\x00\x00\x01\x00\x00\x00\x15\x00...",0,,,,,,,,...,0.004288,0.006369,0.011817,0.02342,0.027779,0.034683,0.045601,0.07481,j-250610114445434bb5bdce03c6070314,calendula_marigold
5,5,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00\n\x00\x...,0,,,,,,,,...,0.007189,0.008921,0.015544,0.055672,0.058913,0.062077,0.06514,0.097181,j-250610114502444a923b572692e6f436,calendula_marigold
6,6,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00-\x00\x0...,0,,,,,,,,...,0.003256,0.004635,0.007868,0.018415,0.02211,0.027997,0.03693,0.055871,j-2506101145184afaa17a894ade959753,calendula_marigold
7,7,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00\t\x00\x...,0,,,,,,,,...,,,,,,,,,j-2506101145354f8ab69845744a64d351,calendula_marigold
8,8,"b'\x01\x03\x00\x00\x00\x01\x00\x00\x00""\x00\x0...",0,,,,,,,,...,0.007069,0.010717,0.015627,0.033581,0.039622,0.050462,0.06171,0.094921,j-25061011455349d49e6d8db88f8451e5,calendula_marigold
9,9,b'\x01\x03\x00\x00\x00\x01\x00\x00\x009\x00\x0...,0,,,,,,,,...,0.007424,0.010451,0.012721,0.032003,0.047651,0.060252,0.07042,0.096813,j-250610114613465d8b6fb5b9627460dd,calendula_marigold


In [97]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# split train/test

X_train, X_test, y_train, y_test = train_test_split(data.drop(columns=['EC_hcat_n']), data['EC_hcat_n'], test_size=0.2, random_state=42)

In [98]:
# Create a pipeline for data cleaning and model training
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler

features = ['B02_P10', 'B02_P25', 'B02_P50', 'B02_P75',
       'B02_P90', 'B03_P10', 'B03_P25', 'B03_P50', 'B03_P75', 'B03_P90',
       'B04_P10', 'B04_P25', 'B04_P50', 'B04_P75', 'B04_P90', 'B05_P10',
       'B05_P25', 'B05_P50', 'B05_P75', 'B05_P90', 'B06_P10', 'B06_P25',
       'B06_P50', 'B06_P75', 'B06_P90', 'B07_P10', 'B07_P25', 'B07_P50',
       'B07_P75', 'B07_P90', 'B08_P10', 'B08_P25', 'B08_P50', 'B08_P75',
       'B08_P90', 'B8A_P10', 'B8A_P25', 'B8A_P50', 'B8A_P75', 'B8A_P90',
       'B11_P10', 'B11_P25', 'B11_P50', 'B11_P75', 'B11_P90', 'B12_P10',
       'B12_P25', 'B12_P50', 'B12_P75', 'B12_P90', 'VH_P10', 'VH_P25',
       'VH_P50', 'VH_P75', 'VH_P90', 'VV_P10', 'VV_P25', 'VV_P50', 'VV_P75',
       'VV_P90']

data_pipeline = Pipeline([
    ('inputer', SimpleImputer(strategy='constant', fill_value=0)),
    ('scaler', StandardScaler()),
])

preprocessor = ColumnTransformer([
    ('num', data_pipeline, features)
])

pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('classifier', RandomForestClassifier(n_estimators=100, random_state=42))
])

pipeline.fit(X_train, y_train)

In [99]:
pipeline.score(X_test, y_test)

0.25

In [100]:
pipeline.predict(X_test)

array(['oats', 'calendula_marigold', 'calendula_marigold',
       'calendula_marigold'], dtype=object)

In [101]:
y_test

0     calendula_marigold
17                  oats
15                  oats
1     calendula_marigold
Name: EC_hcat_n, dtype: object

Ingest model to EOTDL

In [102]:
import os
import joblib

dst_path = 'outputs/model'

os.makedirs(dst_path, exist_ok=True)

joblib.dump(pipeline, f'{dst_path}/pipeline.pkl')

text = """---
name: EuroCropsModel
authors: 
  - eotdl
license: free
source: https://github.com/earthpulse/eotdl/tree/main/tutorials/usecases/openEO
---

# EuroCropsModel

This model will predict the crop type of a given parcel in the EuroCrops dataset.

Learn how to use it at https://github.com/earthpulse/eotdl/tree/main/tutorials/usecases/openEO
"""

with open(f"{dst_path}/README.md", "w") as outfile:
    outfile.write(text)

In [103]:
!eotdl models ingest -p outputs/model

Ingesting directory: outputs/model
Ingesting files: 100%|████████████████████████████| 2/2 [00:00<00:00,  2.38it/s]


In [104]:
!eotdl models list -n EuroCrops

['EuroCropsModel']


## 5. Run inference with EOTDL

Let's perform inference on some new parcels.

In [64]:
import numpy as np

ix = np.random.randint(0, len(shapefiles))
country = shapefiles[ix]

country

'data/NA/ES_NA_2020_EC21.shp'

In [66]:
import geopandas as gpd

path = shapefiles[0]

gdf = gpd.read_file(path)

gdf.head()

Unnamed: 0,taotlusaas,pollu_id,pindala_ha,taotletud_,taotletu_1,niitmise_t,niitmise_1,viimase_mu,taotletu_2,taotleja_n,taotleja_r,EC_trans_n,EC_hcat_n,EC_hcat_c,geometry
0,2021,19994165,0.25,Karjatamine väljaspool põllumaj. maad,Karjatamine väljaspool põllumaj. maad,,,2021/05/02 14:37:52.000,,FIE,,Rough grazings,pasture_meadow_grassland_grass,3302000000,"POLYGON ((26.50243 59.31839, 26.50244 59.31843..."
1,2021,19990783,1.7,rohttaimed,Püsirohumaa,Niidetud,28.06.2021-04.07.2021,2021/05/02 06:59:17.000,Kliimat ja keskkonda säästvate põllumajandusta...,ERAISIK,,grasses,pasture_meadow_grassland_grass,3302000000,"POLYGON ((24.54648 58.86884, 24.54674 58.86879..."
2,2021,19990784,0.49,rohttaimed,Püsirohumaa,Ei kuulu jälgimisele,,2021/05/02 06:59:17.000,Kliimat ja keskkonda säästvate põllumajandusta...,ERAISIK,,grasses,pasture_meadow_grassland_grass,3302000000,"POLYGON ((24.54597 58.86827, 24.54668 58.86816..."
3,2021,19996106,0.54,talinisu allakülvita,Põllukultuurid,,,2021/05/02 20:58:12.000,Kliimat ja keskkonda säästvate põllumajandusta...,ERAISIK,,Winter wheat,winter_common_soft_wheat,3301010101,"POLYGON ((27.42837 58.11975, 27.42839 58.11972..."
4,2021,19990620,2.48,"punane ristik (vähemalt 80% ristikut, kuni 20%...",Põllukultuurid,Niidetud,06.07.2021-11.07.2021,2021/07/05 07:26:35.000,Kliimat ja keskkonda säästvate põllumajandusta...,TAMSAMÄE OÜ,11350602.0,Red clover (at least 80% clover up to 20% gras...,clover,3301090303,"POLYGON ((26.66816 57.82049, 26.66815 57.8205,..."


In [67]:
gdf = gdf.sample(n=3)
gdf

Unnamed: 0,taotlusaas,pollu_id,pindala_ha,taotletud_,taotletu_1,niitmise_t,niitmise_1,viimase_mu,taotletu_2,taotleja_n,taotleja_r,EC_trans_n,EC_hcat_n,EC_hcat_c,geometry
127702,2021,21695382,0.06,rohttaimed,Püsirohumaa,Ei kuulu jälgimisele,,2021/05/23 22:58:47.000,Kliimat ja keskkonda säästvate põllumajandusta...,FIE,,grasses,pasture_meadow_grassland_grass,3302000000,"POLYGON ((26.52608 57.53664, 26.52619 57.53663..."
158687,2021,22070607,6.33,rohttaimed,Püsirohumaa,Niidetud,01.08.2021-02.08.2021,2021/06/14 09:01:35.000,Kliimat ja keskkonda säästvate põllumajandusta...,FIE,,grasses,pasture_meadow_grassland_grass,3302000000,"POLYGON ((22.17822 58.0714, 22.1785 58.07148, ..."
57880,2021,20735465,2.2,rohttaimed,Püsirohumaa,Niidetud,28.06.2021-04.07.2021,2021/05/18 08:56:17.000,Kliimat ja keskkonda säästvate põllumajandusta...,FIE,,grasses,pasture_meadow_grassland_grass,3302000000,"POLYGON ((23.34593 58.5394, 23.34592 58.53941,..."


First, we compute the features using the pipeline.

In [68]:
from eotdl.files import get_file_content_url

s1_weekly_statistics_url = get_file_content_url('s1_weekly_statistics.json', 'EuroCropsPipeline', 'pipelines')
s2_weekly_statistics_url = get_file_content_url('s2_weekly_statistics.json', 'EuroCropsPipeline', 'pipelines')

gdf['s1_weekly_statistics_url'] = s1_weekly_statistics_url
gdf['s2_weekly_statistics_url'] = s2_weekly_statistics_url

gdf.head()

Unnamed: 0,taotlusaas,pollu_id,pindala_ha,taotletud_,taotletu_1,niitmise_t,niitmise_1,viimase_mu,taotletu_2,taotleja_n,taotleja_r,EC_trans_n,EC_hcat_n,EC_hcat_c,geometry,s1_weekly_statistics_url,s2_weekly_statistics_url
127702,2021,21695382,0.06,rohttaimed,Püsirohumaa,Ei kuulu jälgimisele,,2021/05/23 22:58:47.000,Kliimat ja keskkonda säästvate põllumajandusta...,FIE,,grasses,pasture_meadow_grassland_grass,3302000000,"POLYGON ((26.52608 57.53664, 26.52619 57.53663...",https://api.eotdl.com/pipelines/68306c8ee2cef5...,https://api.eotdl.com/pipelines/68306c8ee2cef5...
158687,2021,22070607,6.33,rohttaimed,Püsirohumaa,Niidetud,01.08.2021-02.08.2021,2021/06/14 09:01:35.000,Kliimat ja keskkonda säästvate põllumajandusta...,FIE,,grasses,pasture_meadow_grassland_grass,3302000000,"POLYGON ((22.17822 58.0714, 22.1785 58.07148, ...",https://api.eotdl.com/pipelines/68306c8ee2cef5...,https://api.eotdl.com/pipelines/68306c8ee2cef5...
57880,2021,20735465,2.2,rohttaimed,Püsirohumaa,Niidetud,28.06.2021-04.07.2021,2021/05/18 08:56:17.000,Kliimat ja keskkonda säästvate põllumajandusta...,FIE,,grasses,pasture_meadow_grassland_grass,3302000000,"POLYGON ((23.34593 58.5394, 23.34592 58.53941,...",https://api.eotdl.com/pipelines/68306c8ee2cef5...,https://api.eotdl.com/pipelines/68306c8ee2cef5...


In [69]:
from eotdl.fe.openeo import eurocrops_point_extraction 

!rm -rf jobs-inference.csv

eurocrops_point_extraction(
    gdf, 
    start_date = "2024-01-01", 
    nb_months = 2, 
    job_tracker = 'jobs-inference.csv', 
    parallel_jobs=10, 
    extra_cols=['EC_hcat_n']
)

Authenticated using refresh token.


In [105]:
job = pd.read_csv("jobs-inference.csv")
job

Unnamed: 0,geometry,crs,temporal_extent,s1_weekly_statistics_url,s2_weekly_statistics_url,EC_hcat_n,id,backend_name,status,start_time,running_start_time,cpu,memory,duration
0,"POLYGON ((26.52607801 57.53663606, 26.52618805...",EPSG:4326,"['2024-01-01', '2024-03-01']",https://api.eotdl.com/pipelines/68306c8ee2cef5...,https://api.eotdl.com/pipelines/68306c8ee2cef5...,pasture_meadow_grassland_grass,j-2506110835094e95a2ce4d97e056a4eb,cdse,finished,2025-06-11T08:35:09Z,2025-06-11T08:38:03Z,304.39155771199995 cpu-seconds,4293946.9921875 mb-seconds,306 seconds
1,"POLYGON ((22.17822217 58.07140315, 22.17849956...",EPSG:4326,"['2024-01-01', '2024-03-01']",https://api.eotdl.com/pipelines/68306c8ee2cef5...,https://api.eotdl.com/pipelines/68306c8ee2cef5...,pasture_meadow_grassland_grass,j-2506110835274052be3339b74155dd13,cdse,finished,2025-06-11T08:35:27Z,2025-06-11T08:38:04Z,234.61724288800002 cpu-seconds,2066425.40625 mb-seconds,218 seconds
2,"POLYGON ((23.34593042 58.53940151, 23.34591833...",EPSG:4326,"['2024-01-01', '2024-03-01']",https://api.eotdl.com/pipelines/68306c8ee2cef5...,https://api.eotdl.com/pipelines/68306c8ee2cef5...,pasture_meadow_grassland_grass,j-2506110835454fef9214c1c1122c4ba2,cdse,finished,2025-06-11T08:35:46Z,2025-06-11T08:38:04Z,239.56393808400003 cpu-seconds,2181271.1608072915 mb-seconds,223 seconds


In [106]:
# Initialize an empty list to store all dataframes
all_data = []

# Loop through each job and read its parquet file
for idx, _job in job.iterrows():
    try:
        job_data = pd.read_parquet(f'job_{_job["id"]}/timeseries.parquet')
        # Add job_id as a column to identify the source
        job_data['job_id'] = _job["id"]
        job_data['EC_hcat_n'] = _job['EC_hcat_n'] # in this case we have this value, will use to evaluate model. But in normal inference conditions, we will not have it (this is what we want to predict)
        all_data.append(job_data)
    except Exception as e:
        print(f"Error reading job {_job['id']}: {e}")

# Concatenate all dataframes into one
if all_data:
    data = pd.concat(all_data, ignore_index=True)
    print(f"Successfully merged {len(all_data)} time series datasets")
else:
    data = pd.DataFrame()
    print("No time series data was loaded")

data

Successfully merged 3 time series datasets


Unnamed: 0,geometry,feature_index,B02_P10,B02_P25,B02_P50,B02_P75,B02_P90,B03_P10,B03_P25,B03_P50,...,VH_P50,VH_P75,VH_P90,VV_P10,VV_P25,VV_P50,VV_P75,VV_P90,job_id,EC_hcat_n
0,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00\x88\x00...,0,,,,,,,,,...,0.006949,0.007984,0.010382,0.026207,0.030987,0.035713,0.041551,0.052017,j-2506110835094e95a2ce4d97e056a4eb,pasture_meadow_grassland_grass
1,b'\x01\x03\x00\x00\x00\x03\x00\x00\x00\xf3\x00...,0,521.29477,521.29477,521.29477,521.29477,521.29477,704.324881,704.324881,704.324881,...,0.010036,0.014124,0.017429,0.034041,0.04211,0.056198,0.07185,0.087548,j-2506110835274052be3339b74155dd13,pasture_meadow_grassland_grass
2,b'\x01\x03\x00\x00\x00\x05\x00\x00\x00s\x00\x0...,0,,,,,,,,,...,0.00528,0.008364,0.01098,0.02242,0.027602,0.035104,0.044734,0.054007,j-2506110835454fef9214c1c1122c4ba2,pasture_meadow_grassland_grass


Then, we can stage and apply the model.

In [107]:
!eotdl models get EuroCropsModel -p outputs -a -f

Staging assets: 100%|█████████████████████████████| 2/2 [00:00<00:00,  2.14it/s]
Data available at outputs/EuroCropsModel


In [108]:
import joblib

full_pipeline = joblib.load('outputs/EuroCropsModel/pipeline.pkl')

preds = full_pipeline.predict(data)

preds

array(['calendula_marigold', 'oats', 'calendula_marigold'], dtype=object)

In [109]:
gdf.EC_hcat_n

127702    pasture_meadow_grassland_grass
158687    pasture_meadow_grassland_grass
57880     pasture_meadow_grassland_grass
Name: EC_hcat_n, dtype: object

Of course model is not good, need to train with more parcels & classes. You can use this notebook to do so.