## Query Global Land product catalogue via OpenSearch API and download the products

This notebook shows how to query the Global Land product catalogue service, that implements a standardized [OpenSearch interface](http://docs.opengeospatial.org/is/17-047r1/17-047r1.html).

OpenSearch is a self-describing search. This means that the service describes how it can be used, which search filters (keys and allowed values) are available.
For example: 
* [overall API description](https://globalland.vito.be/catalogue/description)
* [description for searching Burnt Area collection](https://globalland.vito.be/catalogue/description?collection=clms_global_ba_300m_v3_daily_netcdf)

The catalogue service can be used in a [Python client](https://github.com/VITObelgium/terracatalogueclient), that is available from VITO's [Terrascope](https://terrascope.be/en) platform, which allows for easy integration in Python notebooks and Python-based processing chains.

In this demo, we'll show you how to search for Global Land products through this OpenSearch API and then download them.

**Important!**
We'll use the new collection of daily Burnt Area 300m v3.1.1 products as example. This API is a recent development - more product collections will become available in the course of 2023.

### Table of contents
* [Install & import packages](#install-import)
* [Discover collections](#discover-collections)
* [Search products](#search-products)
* [Download products](#download-products)

#### Install & import packages <a class="anchor" id="install-import"></a>

Let's start with installing the Python catalogue client from the python package repository:

In [None]:
!pip3 install --user --quiet --index-url=https://artifactory.vgt.vito.be/api/pypi/python-packages/simple terracatalogueclient==0.1.14

On the Terrascope virtual machines and Jupyter notebook environment, the catalogue client package is pre-installed for your convenience.

Next, we import some required packages and initialize the catalogue client.

In [1]:
from terracatalogueclient import Catalogue
from terracatalogueclient.config import CatalogueConfig

Copy the configuration file from the Github repository to the same folder.
Then use it to configure the catalogue client to query Global Land's catalogue.

In [2]:
config = CatalogueConfig.from_file("terracatalog_config_GlobalLand.txt")
catalogue = Catalogue(config)

#### Discover collections <a class="anchor" id="discover-collections"></a>

The get_collections() call can be used to discover which collections of Global Land products.
At this moment, only the products for the daily Burnt Area v3.1.1 collection are available. All other Global Land collections will be added to this API.

In [3]:
import pandas as pd
collections = catalogue.get_collections()

rows = []
for c in collections:
    rows.append([c.id, c.properties['title']])

df = pd.DataFrame(data = rows, columns = ['Identifier', 'Description'])
df.style.set_properties(**{'text-align': 'left'})

Unnamed: 0,Identifier,Description
0,clms_global_ba_300m_v3_daily_netcdf,"Burnt Area: global daily (raster 333m) - version 3, July 2023"
1,clms_global_ba_300m_v3_monthly_netcdf,"Burnt Area: global monthly (raster 333m) - version 3, Sept 2023"
2,clms_global_ndvi_300m_v2_10daily_netcdf,"Normalized Difference Vegetation Index: global 10-daily (raster 333m) - version 2, Dec 2020"


#### Search products <a class="anchor" id="search-products"></a>

Using the above collection identifier, let's search for the available burnt area products.

The get_products() call supports filtering time period that the product covers (start/end parameters), or date when the file was last updated (modificationDate)

In [4]:
import pandas as pd
import datetime as dt

rows = []
products = catalogue.get_products(
    "clms_global_ba_300m_v3_daily_netcdf",
    start=dt.date(2023, 6, 26),
    end=dt.date(2023, 6, 30),
)
for product in products:
    rows.append([product.id, product.data[0].href, (product.data[0].length/(1024*1024))])

df = pd.DataFrame(data = rows, columns = ['Identifier', 'URL', 'Size (MB)'])
df.style.set_properties(**{'text-align': 'left'})

Unnamed: 0,Identifier,URL,Size (MB)
0,c_gls_BA300-NRT_202306260000_GLOBE_S3_V3.1.1,https://globalland.vito.be/download/netcdf/burnt_area/ba_300m_v3_daily/2023/20230626/c_gls_BA300-NRT_202306260000_GLOBE_S3_V3.1.1.nc,58.9173
1,c_gls_BA300-NRT_202306270000_GLOBE_S3_V3.1.1,https://globalland.vito.be/download/netcdf/burnt_area/ba_300m_v3_daily/2023/20230627/c_gls_BA300-NRT_202306270000_GLOBE_S3_V3.1.1.nc,58.956
2,c_gls_BA300-NRT_202306280000_GLOBE_S3_V3.1.1,https://globalland.vito.be/download/netcdf/burnt_area/ba_300m_v3_daily/2023/20230628/c_gls_BA300-NRT_202306280000_GLOBE_S3_V3.1.1.nc,58.3691
3,c_gls_BA300-NRT_202306290000_GLOBE_S3_V3.1.1,https://globalland.vito.be/download/netcdf/burnt_area/ba_300m_v3_daily/2023/20230629/c_gls_BA300-NRT_202306290000_GLOBE_S3_V3.1.1.nc,58.9934
4,c_gls_BA300-NRT_202306300000_GLOBE_S3_V3.1.1,https://globalland.vito.be/download/netcdf/burnt_area/ba_300m_v3_daily/2023/20230630/c_gls_BA300-NRT_202306300000_GLOBE_S3_V3.1.1.nc,58.3554


For a list of available search options, see the opensearch description document or the available help:

*help(catalogue.get_products)*

#### Download products <a class="anchor" id="download-products"></a>

The product download is free and fully open - it does not require any username or password.

Note that the above get_products() call returns a Python generator! 
If you want to be able to iterate over the results more than once, you can convert it to a list.
Mind that this will load all results in memory, which could be huge depending on the number of results returned!

In [5]:
product_list = list(catalogue.get_products(
    "clms_global_ba_300m_v3_daily_netcdf",
    start=dt.date(2023, 6, 26),
    end=dt.date(2023, 6, 30),
))

Let's actually download the files to the directory where this notebook resides.

Depending on the connection speed and data volume, this can take a few minutes.

In [6]:
catalogue.download_products(product_list, './')

You are about to download 307.85 MB, do you want to continue? [Y/n]  Y


Finally, let's check if the data files are downloaded.

The download_products() call stores the downloaded files in folders, named after the product identifier.

In [7]:
import os
os.listdir('./c_gls_BA300-NRT_202306260000_GLOBE_S3_V3.1.1/')

['c_gls_BA300-NRT_202306260000_GLOBE_S3_V3.1.1.nc']

**Note**

This demo is designed to download a small set of files, directly to the (limited) workspace of this notebook.

To download larger sets of files
* download this code as a stand-alone Python script (CGLS_catalogue_and_download.py), modify it and run it e.g. on your computer or your [Terrascope Virtual Machine](https://terrascope.be/en/services)
* or save the list of URLs as a text file and provide that as input to download tools like [WinWget](https://winwget.sourceforge.net/), command-line [wget](https://www.gnu.org/software/wget/) or [curl](https://curl.se/)