# General

Geospatial APIs contains petabytes of queryable data. It is therefore necessary to understand the storage data model and how the metadata can be retrieved.

## The storage model

The storage model in Geospatial APIs is illustrated by the following diagram:

![storage_model](images/storage_model.png)

The [Data Set](data_set.ipynb) is the highest tier in the hierarchy; it serves as a collection for Data Layers. 

A [Data Layer](data_layer.ipynb) is the entry point to making a Geospatial APIs query; it represents one type of data to retrieve in the [Layers](../query/layers.ipynb) section.

A Data Layer [Dimension](dimension.ipynb) gives context to the value recorded for the data layer and is also specified in a query in the Layer section, see [Catalog - Dimensions](dimension.ipynb). Dimensions are only available in certain layers (e.g. Soil Grids, CMIP6 etc).

## Searching the catalog
### Search by word

In order to search the whole catalog for a term, the `catalog` sub-module contains a helper function `ibmpairs.catalog.search()`:

```bash
catalog.search(<search_term>)
```

In the example below, the catalog is searched for all datasets and datalayers that contain references to `Sentinel`. The information returned is a reduced version of the metadata held by the catalog that can be used to determine the immediate usefulness of the returned datasets and datalayers to the user.

In [None]:
%pip install configparser
%pip install ibmpairs

In [1]:
import os
import ibmpairs.client as client
import ibmpairs.catalog as catalog
import configparser

config = configparser.RawConfigParser()
config.read('../../../auth/secrets.ini')
# Best practice is not to include secrets in source code so we read
# an api key, tenant id and org id from a secrets.ini file.
# You could set the credentials in-line here but we don't
# recommend it for security reasons.

EI_API_KEY    = config.get('EI', 'api.api_key')
EI_TENANT_ID  = config.get('EI', 'api.tenant_id') 
EI_ORG_ID     = config.get('EI', 'api.org_id') 

# Authenticate and get a client object.
ei_client = client.get_client(api_key   = EI_API_KEY,
                              tenant_id = EI_TENANT_ID,
                              org_id    = EI_ORG_ID)

search_by_word = catalog.search("Sentinel")
search_by_word

2024-08-06 14:02:32 - paw - INFO - The client authentication method is assumed to be OAuth2.
2024-08-06 14:02:32 - paw - INFO - Legacy Environment is False
2024-08-06 14:02:32 - paw - INFO - The authentication api key type is assumed to be IBM Cloud IAM, because the api key prefix 'PHX' is not present.
2024-08-06 14:02:34 - paw - INFO - Authentication success.
2024-08-06 14:02:34 - paw - INFO - HOST: https://api.ibm.com/geospatial/run/na/core/v3


Unnamed: 0,dataset_id,data_layer_id,data_layer_name,data_layer_description_short,data_layer_description_long,data_layer_level,data_layer_type,data_layer_unit,data_set_id,data_set_name,data_set_description_short,data_set_description_long
0,176,51648,Band 11 (SWIR 1612 nm),"Central wavelength 1613.7/1610.4 nm, bandwidth...",,22,Raster,,176,High res imagery (ESA Sentinel 2) (TOA),This dataset contains layers from the Level-1C...,Images from the European Space Agency Sentinel...
1,176,50364,hollstein,A cloud mask as defined in a paper by Hollstei...,A cloud mask as defined in a paper by Hollstei...,23,Raster,,176,High res imagery (ESA Sentinel 2) (TOA),This dataset contains layers from the Level-1C...,Images from the European Space Agency Sentinel...
2,176,49358,Band 4 (red),"Central wavelength 664.5/665.0 nm, bandwidth 3...",,23,Raster,,176,High res imagery (ESA Sentinel 2) (TOA),This dataset contains layers from the Level-1C...,Images from the European Space Agency Sentinel...
3,176,49359,Band 8 (NIR),"Central wavelength 835.1/833.0 nm, bandwidth 1...",,23,Raster,,176,High res imagery (ESA Sentinel 2) (TOA),This dataset contains layers from the Level-1C...,Images from the European Space Agency Sentinel...
4,176,50096,Band 10 (SWIR 1370 nm),"Central wavelength 1373.5/1376.9 nm, bandwidth...",,20,Raster,,176,High res imagery (ESA Sentinel 2) (TOA),This dataset contains layers from the Level-1C...,Images from the European Space Agency Sentinel...
...,...,...,...,...,...,...,...,...,...,...,...,...
15,177,49681,Band 3 (green),"Central wavelength 560.0/559.0 nm, bandwidth 4...",,23,Raster,,177,High res imagery (ESA Sentinel 2),Images from the European Space Agency Sentinel...,Sentinel-2 is a set of two satellites in polar...
16,177,49682,Band 5 (vegetation red edge),"Central wavelength 703.9/703.8 nm, bandwidth 1...",,22,Raster,,177,High res imagery (ESA Sentinel 2),Images from the European Space Agency Sentinel...,Sentinel-2 is a set of two satellites in polar...
17,177,49464,Normalized difference vegetation index,A measure of the amount of vegetation at the p...,NDVI is generally calculated as (NIR - VIR) / ...,23,Raster,,177,High res imagery (ESA Sentinel 2),Images from the European Space Agency Sentinel...,Sentinel-2 is a set of two satellites in polar...
18,177,49689,Water vapor,Atmospheric water vapor content derived from b...,"""Water vapour retrieval over land is performed...",22,Raster,,177,High res imagery (ESA Sentinel 2),Images from the European Space Agency Sentinel...,Sentinel-2 is a set of two satellites in polar...


The full metadata for the dataset or datalayer can then be retrieved by subsequent methods, e.g.:

In [2]:
dl = catalog.get_data_layer(id = "49464")
print(dl)

{
    "color_table": {
        "colors": "153A91,84F588,FFF787,FF7C3B,FF1921",
        "id": "4",
        "name": "Spectral"
    },
    "created_at": "1593733829000",
    "crs": "",
    "data_layer_response": {},
    "data_source_links": [],
    "dataset_id": "177",
    "datatype": "sh",
    "description_links": [],
    "description_long": "NDVI is generally calculated as (NIR - VIR) / (NIR + VIR). For Sentinel 2, this translates to (band 8 - band 4) / (band 8 + band 4).",
    "description_short": "A measure of the amount of vegetation at the pixel.",
    "id": "49464",
    "interpolation": "bilinear",
    "latitude_max": 90.0,
    "latitude_min": -90.0,
    "level": 23,
    "longitude_max": 180.0,
    "longitude_min": -180.0,
    "name": "Normalized difference vegetation index",
    "name_alternate": "Normalized difference vegetation index",
    "permanence": true,
    "properties": {},
    "rating": 1.0,
    "spatial_coverage": {
        "country": [
            "Belgium",
          

### Search by identifier

In the event that you already know the Geospatial APIs identifier (ID) for a data set or data layer the search method can also be used to retrieve the limited metadata returned by the `search` method:

In [3]:
search_by_id = catalog.search("49464")
search_by_id

Unnamed: 0,dataset_id,data_layer_id,data_layer_name,data_layer_description_short,data_layer_description_long,data_layer_level,data_layer_type,data_layer_unit,data_set_id,data_set_name,data_set_description_short,data_set_description_long
0,177,49464,Normalized difference vegetation index,A measure of the amount of vegetation at the p...,NDVI is generally calculated as (NIR - VIR) / ...,23,Raster,,177,High res imagery (ESA Sentinel 2),Images from the European Space Agency Sentinel...,Sentinel-2 is a set of two satellites in polar...


## Retrieve metadata
A series of helper methods can be used to retrieve the metadata concerning Data Sets, Data Layers and Data Layer Dimensions. 
The results for these catalog methods can be displayed (returned in a truncated form as a dataframe):

```bash
ds_list = catalog.get_data_sets()
ds_list.display()
```

or printed (which returns a string representation of an object):

```bash
ds = catalog.get_data_set(id = <data_set_id>)
print(ds)
```

Where applicable, the methods also allow for return of lists of embedded objects (e.g. Data Layers per Data Set):

```bash
catalog.get_data_layers(data_set_id = <data_set_id>)
```

These functions will be discussed further in the [Data Set](data_set.ipynb), [Data Layer](data_layer.ipynb) and [Dimension](dimension.ipynb) sections.