## WorldCereal RDM interaction

This notebook demonstrates the different possibilities to interact with the WorldCereal Reference Data Module (RDM)

In [1]:
from shapely.geometry import Polygon

from worldcereal.rdm_api import RdmInteraction

Data in the WorldCereal RDM are organized in datasets (collections).
One dataset contains observations derived from a single source and for a single year.

### 1. Query collections

In [2]:
# Access all public collections
rdm_interaction = RdmInteraction()
collections = rdm_interaction.get_collections()
ids = [col.id for col in collections]
print(len(ids))
print(ids)

97
['2017ascawaprojectpoly111', '2017canaafccropinventorypoint110', '2017cmrcgiargardianpoint110', '2017lbnfaowapor1poly111', '2017lbnfaowapor2poly111', '2017mdgjecamciradpoly111', '2017ngacgiargardianpoint110', '2017zafjecamciradpoly111', '2018asremelgadopoly111', '2018bfjecamciradpoly111', '2017brajecamciradpoly111', '2018afoafpoint110', '2017ugaradiantearth01poly110', '2017bfajecamciradpoly111', '2018ethwapor1poly111', '2018ethwapor2poly111', '2018mlinhicropharvestpoly110', '2018ingardian29point110', '2018mgjecamciradpoly111', '2018nerwapor1poly111', '2018senjecamciradpoly111', '2019afdewatrain1poly100', '2019afdewatrain2poly100', '2019afdewaval2point100', '2019egwapor1poly111', '2019egwapor2poly111', '2019afdewaval1point100', '2018tzradiantearth01poly110', '2019afnhicropharvestpoly100', '2019mgjecamciradpoly111', '2019snjecamciradpoly111', '2020brlemmarpoly110', '2020rwwapor2point111', '2020rwawaporakagerapoint111', '2020sdnwapor1poly110', '2020sdnwapor2poly111', '2021lkawapor1poly

In [3]:
# Each collection contains some metadata. Here we show which information is available:
col = collections[0]
print(f'ID: {col.id}')
print(f'Title: {col.title}')
print(f'Number of samples: {col.feature_count}')
print(f'Data type: {col.data_type}')
print(f'Access type: {col.access_type}')
print(f'Observation method: {col.observation_method}')
print(f'Confidence score for land cover: {col.confidence_lc}')
print(f'Confidence score for crop type: {col.confidence_ct}')
print(f'Confidence score for irrigation label: {col.confidence_irr}')
print(f'List of available crop types: {col.ewoc_codes}')
print(f'List of available irrigation labels: {col.irr_codes}')
print(f'Spatial extent: {col.spatial_extent}')
print(f'Coordinate reference system (CRS): {col.crs}')
print(f'Temporal extent: {col.temporal_extent}')
print(f'Additional data: {col.additional_data}')
print(f'Last modified: {col.last_modified}')
print(f'Last modified by: {col.last_modified_by}')
print(f'Creation time: {col.creation_time}')
print(f'Created by: {col.created_by}')
print(f'fid: {col.fid}')


ID: 2017ascawaprojectpoly111
Title: A crop type dataset on Central Asia, 2017 (Remelgado et al, 2020)
Number of samples: 498
Data type: Polygon
Access type: Public
Observation method: Unknown
Confidence score for land cover: 98
Confidence score for crop type: 98
Confidence score for irrigation label: 0
List of available crop types: [1101060000, 1101080000, 1103000000, 1101070030, 1101070010, 1100000000, 1106000020, 1101020002, 1108000010, 1101010001, 1201000000, 1204000000, 1201000010]
List of available irrigation labels: [0]
Spatial extent: {'bbox': [[70.81434037291103, 40.32024248653031, 71.66640200857353, 40.623060414705684]], 'crs': 'http://www.opengis.net/def/crs/OGC/1.3/CRS84'}
Coordinate reference system (CRS): ['http://www.opengis.net/def/crs/EPSG/0/4326']
Temporal extent: ['2017-04-01T00:00:00', '2017-10-01T00:00:00']
Additional data: 
Last modified: None
Last modified by: None
Creation time: 2024-06-26T10:13:51.959962
Created by: None
fid: 3a136636-6ad7-f292-ca8c-6274e89696a2

There are various ways to filter collections and only retrieve a subset of collections matching your search criteria. At the moment you can filter based on:
- crop type
- spatial extent
- temporal extent
- access type (public vs private)

Here are some example queries demonstrating these filtering options:

In [5]:
# Access public collections containing a certain crop type
ewoc_codes = [1106000020] # soybean
collections = rdm_interaction.get_collections(ewoc_codes=ewoc_codes)

# Access public collections for geometry (located near Kenya)
coords = [
(34.45619011, -0.91010781),
(34.79638823, -0.91010781),
(34.79638823, -0.34539808),
(34.45619011, -0.34539808),
(34.45619011, -0.91010781)]
polygon = Polygon(coords)   
collections = rdm_interaction.get_collections(geometry=polygon)

# Access public collections for bounding box and year
temporal_extent = ["2020-01-01", "2020-12-31"]
collections = rdm_interaction.get_collections(geometry=polygon,
                                        temporal_extent=temporal_extent)

# Access private collections
# NOTE: this requires authentication using a valid Terrascope account
rdm_interaction = RdmInteraction().authenticate()
collections = rdm_interaction.get_collections(include_public=False,
                                        include_private=True)

# Access private collections for crop type
collections = rdm_interaction.get_collections(include_public=False,
                                        include_private=True,
                                        ewoc_codes=ewoc_codes)

# Access both public and private collections for crop type
collections = rdm_interaction.get_collections(include_private=True,
                                        ewoc_codes=ewoc_codes)

### 2. Get crop counts
This functionality allows you to count the number of samples belonging to one or multiple crop types across one or multiple collections.

In case you do not specify the collection id's you would like to inspect, the tool will automatically consider all collections in the RDM containing your crop type(s) of interest and matching your search criteria as defined by a combination of:
- geometry
- temporal_extent
- include_public
- include_private

In [None]:
col_ids = ['2018canaafcacigtdpoint110', '2017canaafccropinventorypoint110']
crop_codes = [1106000020, 1101010002] # soybean + spring wheat

geometry=None
temporal_extent = None
include_public = True
include_private = False

rdm_interaction = RdmInteraction()
counts = rdm_interaction.get_crop_counts(ewoc_codes=crop_codes,
                                         collection_ids=col_ids,
                                         geometry=geometry,
                                         temporal_extent=temporal_extent,
                                         include_public=include_public,
                                         include_private=include_private
                                         )
print(counts)

EwocCode                            1101010002  1106000020
collectionId                                              
2017ascawaprojectpoly111                     0           2
2017bfajecamciradpoly111                     0          34
2017brajecamciradpoly111                     0          79
2017canaafcacigtdpoint110                  799       16907
2017canaafccropinventorypoint110           799       16907
2017fralpispoly110                          95       31302
2017mdgjecamciradpoly111                     0          18
2018bfjecamciradpoly111                      0          59
2018canaafcacigtdpoint110                  799       15180
2018eulucaspoint110                          0         155
2018fralpispoly110                          82       31882
2018mgjecamciradpoly111                      0          52
2018nldlpispoly110                        4819         193
2019canaafcacigtdpoint110                 1597       11984
2019dnkeurocropspoly110                      0          

### 3. Download individual samples
Now that you have an idea about data availability, let's download individual samples!

You can once again make use of all filtering options, i.e.:
- collection id's
- crop types (ewoc_codes)
- geometry
- temporal_extent
- whether or not to include public collections
- whether or not to include private collections

In [None]:
rdm = RdmInteraction()

# collection_ids = ["2017ascawaprojectpoly111"]
collection_ids = None
# geometry = None
coords = [
    (34.45619011, -0.91010781),
    (34.79638823, -0.91010781),
    (34.79638823, -0.34539808),
    (34.45619011, -0.34539808),
    (34.45619011, -0.91010781),
]
geometry = Polygon(coords)
temporal_extent = None
ewoc_codes = [1101060000]
# ewoc_codes = None
include_public = True
include_private = False

gdf = rdm.download_samples(
    collection_ids=collection_ids,
    geometry=geometry,
    temporal_extent=temporal_extent,
    ewoc_codes=ewoc_codes,
    include_public=include_public,
    include_private=include_private,
)

gdf.head()