# Quick Start - Catalog

IBM Environmental Intelligence Service: Geospatial Analytics (Geospatial Analytics) contains a several petabytes of queryable data. It is therefore necessary to understand, briefly, the storage data model and how the metadata can be retreived.

## The Storage Model

The storage model in Geospatial Analytics is illustrated by the following diagram:

![Queries1](catalogstoragemodel.png)

### Data Sets
The Data Set is the highest tier in the hierarchy. It functions as a collection for Data Layers for which defaults (e.g. attributes, security controls etc) can be applied. In most cases, the Data Set serves as a collection for interrelated data, usually acquired contemporaneously, (e.g. the 'bands' in the data provided by the ESA Sentinel 2 satellite) however, there is nothing that restricts a Data Set from use on more informal terms.

### Data Layers
The Data Layer is the tier of the hierarchy that is directly connected to the storage sub-system. The metadata defined here overwrites the Data Set defined defaults and applies directly to the manner in which the data is stored.
When querying data from Geospatial Analytics, this tier is the entry point.

<div class="alert alert-info">
Vector data layers exist within the context of a Data Layer `Group`. The Data Layers that exist within the same group are intertwined from a storage point of view. 
Any operations that apply to a Vector Data Layer (e.g. deletion) apply to all within the Group.
</div>

In configuring a Data Layer, two important attributes, in particular, are the level and the datatype (as they determine storage size and speed of retrieval at query time):
- the level is a granularity band range,
- the datatype is the data type that will be applied to the storage.

The most efficient level and type that can contain the data to be uploaded to a Data Layer should always be used.

Level:
- 29 (11.125 cm at equator)
- 28 (22.25 cm at equator)
- 27 (44.5 cm at equator)
- 26 (0.89 m at equator)
- 25 (1.78 m at equator)
- 24 (3.56 m at equator)
- 23 (7.12 m at equator)
- 22 (14.24 m at equator)
- 21 (28.48 m at equator)
- 20 (56.96 m at equator)
- 19 (113.92 m at equator)
- 18 (227.84 m at equator)
- 17 (455.68 m at equator)
- 16 (911.36 m at equator)
- 15 (1.82272 km at equator)
- 14 (3.64544 km at equator)
- 13 (7.29088 km at equator)
- 12 (14.58176 km at equator)
- 11 (29.16352 km at equator)
- 10 (58.32704 km at equator)
- 9 (116.65408 km at equator)
- 8 (233.30816 km at equator)
- 7 (466.61632 km at equator)
- 6 (933.23264 km at equator)
- 5 (1866.46528 km at equator)
- 4 (3732.93056 km at equator)
- 3 (7465.86112 km at equator)
- 2 (14931.72224 km at equator)
- 1 (29863.44448 km at equator)

Data Type:
- Raster & Vector:
  - bt (Byte)
  - sh (Short Integer)
  - in (Integer) 
  - db (Double) 
  - fl (Float)
- Vector Only:
  - lo (Long) 
  - st (String)

### Data Layer Dimensions
A Data Layer Dimension gives context to the value recorded for the data layer. For example, weather forecast models typically create a prediction for each day in the future for a number of days. When the weather model is run on a Monday it will generate a predictions for temperature on Tuesday, Wednesday, Thursday, Friday and Saturday. When the forecast is run on the Tuesday the predications are made for Wednesday, Thursday, Friday etc. The days for which predictions are generated are the data layer dimensions. 

Imagine a temperature prediction data layer which records the temperate against a spatial and temporal key:

```
lat/long/timestamp = temprature
```

This only stores one temperature value per key. Each time we run the weather model we generate 5 temperature predictions. One for each of the 5 future days. How can we store them? The answer is that the key is extended with a dimension:

```
lat/long/timestamp/dimension = temprature
```

Now each place and time can also have 5 predictions associated with it. 

I Geospatial Analytics we typically arrange that data so that:
```
timestamp = the date/time that the value is valid
dimenion  = the predicted day number (horizon) from the model run
```
Using this scheme is it straightforward to compare different model predictions for the same date and time. 

### Data Layer Properties
A Data Layer Property is an addition  value stored alongside the data; this is a Vector only part of the model and is outside of the scope of this quick start.

## Searching the Catalog
### Search by Word

In order to search the whole catalog for a term, you can use the catalog sub-module and the method `search`:

```bash
catalog.search(<search_term>)
```

In the example below, the catalog is searched for all datasets and datalayers that contain references to `Sentinel`. The information returned is a reduced version of the metadata held by the catalog that can be used to determine the prima facie usefulness of the returned datasets and datalayers to the user.

In [1]:
import os
import ibmpairs.authentication as authentication
import ibmpairs.client as client
import ibmpairs.catalog as catalog

EIS_USERNAME=os.environ.get('EIS_USERNAME')
EIS_APIKEY=os.environ.get('EIS_APIKEY')

credentials  = authentication.OAuth2(username = EIS_USERNAME,
                                     api_key  = EIS_APIKEY)

eis_client = client.Client(authentication = credentials)

search_by_word = catalog.search("Sentinel")
search_by_word

Unnamed: 0,dataset_id,data_layer_id,data_layer_name,data_layer_description_short,data_layer_description_long,data_layer_level,data_layer_type,data_layer_unit,data_set_id,data_set_name,data_set_description_short,data_set_description_long
0,335,50253,VV polarization,Synthetic Aperture Radar with VV Partial polar...,The data is preprocessed following correction ...,23,Raster,,335,Satellite based radar (ESA Sentinel 1),Sentinel-1 is an imaging radar mission providi...,Sentinel-1 provides dual polarization capabili...
1,177,49362,Scene classification,Pixel-by-pixel classification in image of 4 ty...,The different labels and classifications are: ...,22,Raster,,177,High res imagery (ESA Sentinel 2),Images from the European Space Agency Sentinel...,Sentinel-2 is a set of two satellites in polar...
2,335,50254,VH polarization,Synthetic Aperture Radar with VH Partial polar...,The data is preprocessed following correction ...,23,Raster,,335,Satellite based radar (ESA Sentinel 1),Sentinel-1 is an imaging radar mission providi...,Sentinel-1 provides dual polarization capabili...
3,177,49360,Band 4 (red),"Central wavelength 664.5/665.0 nm, bandwidth 3...",,23,Raster,,177,High res imagery (ESA Sentinel 2),Images from the European Space Agency Sentinel...,Sentinel-2 is a set of two satellites in polar...
4,177,49464,Normalized difference vegetation index,A measure of the amount of vegetation at the p...,NDVI is generally calculated as (NIR - VIR) / ...,23,Raster,,177,High res imagery (ESA Sentinel 2),Images from the European Space Agency Sentinel...,Sentinel-2 is a set of two satellites in polar...
5,177,49681,Band 3 (green),"Central wavelength 560.0/559.0 nm, bandwidth 4...",,23,Raster,,177,High res imagery (ESA Sentinel 2),Images from the European Space Agency Sentinel...,Sentinel-2 is a set of two satellites in polar...
6,177,49361,Band 8 (NIR),"Central wavelength 835.1/833.0 nm, bandwidth 1...",,23,Raster,,177,High res imagery (ESA Sentinel 2),Images from the European Space Agency Sentinel...,Sentinel-2 is a set of two satellites in polar...
7,177,49682,Band 5 (vegetation red edge),"Central wavelength 703.9/703.8 nm, bandwidth 1...",,22,Raster,,177,High res imagery (ESA Sentinel 2),Images from the European Space Agency Sentinel...,Sentinel-2 is a set of two satellites in polar...
8,177,49683,Band 6 (vegetation red edge),"Central wavelength 740.2/739.1 nm, bandwidth 1...",,22,Raster,,177,High res imagery (ESA Sentinel 2),Images from the European Space Agency Sentinel...,Sentinel-2 is a set of two satellites in polar...
9,177,49688,Aerosol optical thickness,"""AOT describes attenuation of sunlight by a co...","For Sentinel 2 level 2A products, the ""aerosol...",22,Raster,,177,High res imagery (ESA Sentinel 2),Images from the European Space Agency Sentinel...,Sentinel-2 is a set of two satellites in polar...


The full metadata for the dataset or datalayer can then be retrieved by subsequent methods, e.g.:

In [10]:
dl = catalog.get_data_layer(id = "49464")
print(dl)

{
    "color_table": {
        "colors": "153A91,84F588,FFF787,FF7C3B,FF1921",
        "id": "4",
        "name": "Spectral"
    },
    "created_at": "1593726629784",
    "crs": "",
    "data_layer_response": {},
    "data_source_links": [],
    "dataset_id": "177",
    "datatype": "sh",
    "description_links": [],
    "description_long": "NDVI is generally calculated as (NIR - VIR) / (NIR + VIR). For Sentinel 2, this translates to (band 8 - band 4) / (band 8 + band 4).",
    "description_short": "A measure of the amount of vegetation at the pixel.",
    "id": "49464",
    "interpolation": "bilinear",
    "latitude_max": 90.0,
    "latitude_min": -90.0,
    "level": 23,
    "longitude_max": 180.0,
    "longitude_min": -180.0,
    "name": "Normalized difference vegetation index",
    "name_alternate": "Normalized difference vegetation index",
    "permanence": true,
    "properties": {},
    "rating": 1.0,
    "spatial_coverage": {
        "country": [
            "Belgium",
          

### Search by ID

In the event that you already know the Geospatial Analytics ID for a dataset or datalayer the search method can also be used to retrieve the limited metadata returned by the `search` method:

In [18]:
search_by_id = catalog.search("49464")
search_by_id

Unnamed: 0,dataset_id,data_layer_id,data_layer_name,data_layer_description_short,data_layer_description_long,data_layer_level,data_layer_type,data_layer_unit,data_set_id,data_set_name,data_set_description_short,data_set_description_long
0,177,49464,Normalized difference vegetation index,A measure of the amount of vegetation at the p...,NDVI is generally calculated as (NIR - VIR) / ...,23,Raster,,177,High res imagery (ESA Sentinel 2),Images from the European Space Agency Sentinel...,Sentinel-2 is a set of two satellites in polar...


## Retrieve Metadata
A series of helper methods can be used to retrieve the metadata concerning Data Sets, Data Layers and Data Layer Dimensions. 
The results for these catalog methods can be displayed (returned in a truncated form as a dataframe):

```bash
ds_list = catalog.get_data_sets()
ds_list.display()
```

or printed (which returns a string representation of an object):

```bash
ds = catalog.get_data_set(id = <data_set_id>)
print(ds)
```

Where applicable, the methods also allow for return of lists of embedded objects (e.g. Data Layers per Data Set):

```bash
catalog.get_data_layers(data_set_id = <data_set_id>)
```

### Get a List of Data Sets
In order to return all data sets available to a user, you can execute the `get_data_sets` method:

In [12]:
ds_list = catalog.get_data_sets()
ds_list.display()

Unnamed: 0,id,name,description_short,description_long
0,14,10 m res elevation (US NED),USGS National Elevation Dataset (NED). Raster-...,
1,5,16 day 250 m res imagery (NASA MODIS Aqua),Images from the Moderate Resolution Imaging Sp...,Contains global images from Aqua MODIS spectra...
2,7,16 day 250 m res imagery (NASA MODIS Terra),Images from the Moderate Resolution Imaging Sp...,Contains global images from Terra MODIS spectr...
3,330,16 day weather forecast (GFS),Medium range (up to 16 days ahead) weather for...,The Global Forecast System (GFS) is a global n...
4,145,16 day weather forecast (GFS) (daily),Medium range (up to 16 days ahead) weather for...,
...,...,...,...,...
75,510,VeriDaaS Geiger Mode LiDAR Commercial Dupage I...,Geiger Mode LiDAR data at 30 points per square...,Each rasterized LiDAR layer corresponds to one...
76,509,VeriDaaS Geiger Mode LiDAR Rural New York (Test),Geiger Mode LiDAR data at 30 points per square...,Each rasterized LiDAR layer corresponds to one...
77,507,VeriDaaS TTT New,Geiger Mode LiDAR data at 30 points per square...,Each rasterized LiDAR layer corresponds to one...
78,284,Wildfire risk potential,Wildfire Hazard Potential can help to inform e...,Wildfire Hazard Potential* for the conterminou...


### Get a Data Set
In order to return all metadata about a Data Set, the `get_data_set` method can be used with a provided Data Set ID:

In [19]:
ds = catalog.get_data_set(id = "177")
print(ds)

{
    "category": {
        "id": 1,
        "name": "Satellite"
    },
    "created_at": "1593726629770",
    "crs": "",
    "data_set_response": {},
    "data_source_attribution": "'Copernicus Sentinel data [Year]' for Sentinel data; see https://lta.cr.usgs.gov/sites/default/files/Sentinel_Data_Terms_and_Conditions.pdf",
    "data_source_description": "Level-2A is generated by the Payload Data Ground Segment using the Sen2Cor processor. Level-2A products are made available to users via the Copernicus Open Access Hub: https://scihub.copernicus.eu/dhus/#/home",
    "data_source_links": [
        "https://sentinel.esa.int/web/sentinel/sentinel-data-access"
    ],
    "data_source_name": "European Space Agency Sentinel-2",
    "description_links": [],
    "description_long": "Sentinel-2 is a set of two satellites in polar orbit 180 degrees apart. It monitors land surface and coastal waters every 5 days at the equator and more frequently at mid-latitudes. The coverage is between latitudes

### Get a List of Data Layers per Data Set
As discussed above in §. The Storage Model, Data Layers belong to Data Sets. The `get_data_layers` method can be used to return a list of all Data Layers in a specific Data Set by providing the Data Set ID:

In [14]:
dl_list_by_set = catalog.get_data_layers(data_set_id = "177")
dl_list_by_set.display()

Unnamed: 0,dataset_id,id,name,description_short,description_long,level,type,unit
0,177,49360,Band 4 (red),"Central wavelength 664.5/665.0 nm, bandwidth 3...",,,,
1,177,49361,Band 8 (NIR),"Central wavelength 835.1/833.0 nm, bandwidth 1...",,,,
2,177,49685,Band 8a (narrow IR),"Central wavelength 864.8/864.0 nm, bandwidth 3...",,,,
3,177,49686,Band 11 (SWIR 1610 nm),"Central wavelength 1613.7/1610.4 nm, bandwidth...",,,,
4,177,49687,Band 12 (SWIR 2200 nm),"Central wavelength 2202.4/2185.7 nm, bandwidth...",,,,
5,177,49688,Aerosol optical thickness,"""AOT describes attenuation of sunlight by a co...","For Sentinel 2 level 2A products, the ""aerosol...",,,
6,177,49689,Water vapor,Atmospheric water vapor content derived from b...,"""Water vapour retrieval over land is performed...",,,
7,177,49690,Band 1 (coastal aerosol),"Central wavelength 443.9/442.3 nm, bandwidth 2...",,,,
8,177,49691,Band 9 (water vapor),"Central wavelength 945.0/943.2 nm, bandwidth 2...",,,,
9,177,50250,Cloud probability map,A 20m mask indicating the calculated probabili...,,,,


### Get a List of Data Layers
In the same way as the `get_data_sets` method can be used to return all Data Sets a user has access to, the `get_data_layers` method can be used to return all Data Layers a user has access to:

In [15]:
dl_list = catalog.get_data_layers()
dl_list.display()

Unnamed: 0,dataset_id,id,name,description_short,description_long,level,type,unit
0,464,50663,Data density,Data density indicator for the algorithm's inp...,,20,Raster,
1,464,50665,Bare cover,Percent Vegetation Cover (PVC) for bare-sparse...,,20,Raster,
2,464,50666,Cropland cover (std),Quality indicator (std. dev.) of the cropland ...,,20,Raster,
3,464,50667,Cropland cover,Percent Vegetation Cover (PVC) for cropland la...,,20,Raster,
4,464,50668,Classification,Land cover classification.,Classes are as follows: 0: No input data avail...,20,Raster,
...,...,...,...,...,...,...,...,...
985,398,P622C6368,statestatus.days_to_double_cases,Days to double (cases) (rolling 7 days),,23,Vector,count
986,398,P622C6363,statestatus.daily_fatalities_per_100000_capita,Daily fatalities per 100K capita (rolling 7 days),,23,Vector,count
987,398,P622C6367,statestatus.daily_percentage_growth_cases,Daily percentage growth in cases (rolling 7 days),,23,Vector,count
988,398,P623C6373,statestatusanalytics.current_impact,Current impact derived from fatalities per 100...,,23,Vector,count


### Get a Data Layer
The metadata about a specific Data Layer can be returned by providing the `get_data_layer` method a Data Layer ID:

In [20]:
dl = catalog.get_data_layer(id = "49464")
print(dl)

{
    "color_table": {
        "colors": "153A91,84F588,FFF787,FF7C3B,FF1921",
        "id": "4",
        "name": "Spectral"
    },
    "created_at": "1593726629784",
    "crs": "",
    "data_layer_response": {},
    "data_source_links": [],
    "dataset_id": "177",
    "datatype": "sh",
    "description_links": [],
    "description_long": "NDVI is generally calculated as (NIR - VIR) / (NIR + VIR). For Sentinel 2, this translates to (band 8 - band 4) / (band 8 + band 4).",
    "description_short": "A measure of the amount of vegetation at the pixel.",
    "id": "49464",
    "interpolation": "bilinear",
    "latitude_max": 90.0,
    "latitude_min": -90.0,
    "level": 23,
    "longitude_max": 180.0,
    "longitude_min": -180.0,
    "name": "Normalized difference vegetation index",
    "name_alternate": "Normalized difference vegetation index",
    "permanence": true,
    "properties": {},
    "rating": 1.0,
    "spatial_coverage": {
        "country": [
            "Belgium",
          

### Get a List of Data Layer Dimensions per Data Layer
To list all Data Layer Dimensions belonging to a Data Layer, the `get_data_layer_dimensions` method is provided a Data Layer ID:

In [26]:
dlds = catalog.get_data_layer_dimensions(data_layer_id = "49166")
dlds.display()

Unnamed: 0,id,short_name,identifier,order,full_name,type,unit
0,243,issuetime,A,1,,,
1,244,horizon,B,2,,,


### Get a Data Layer Dimension
To find out more about a Data Layer Dimension, once the Data Layer Dimension ID is known, the `get_data_layer_dimension` method is provided a Data Layer Dimension ID:

In [32]:
dld = catalog.get_data_layer_dimension(id = "243")
print(dld)

{
    "data_layer_dimension_response": {},
    "full_name": "Issuetime",
    "id": "243",
    "identifier": "A",
    "order": 1,
    "short_name": "issuetime",
    "type": "integer",
    "unit": "hour"
}
