# High priority dataset assessment

![dataset-integration-paths.png](dataset-integration-paths.png)

|                     | Integration path | Portal       | Data services team work                                                                                                         | ODD team work                                                                                                                                                                             | UI work                                                                                                                                                                                           |
|---------------------|------------------|--------------|---------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| NLDAS-3             | 2                | WaterInsight | Move NetCDFs to ODR Publish STAC collection + items                                                                             | Upgrade titiler-multidim to use `sel` and  `sel_method` parameters for both /tiles and /statistics.  NOTE: Check this is true for this dataset. Regardless it is true for BlueFlux.       | Integrate titiler-multidim /tiles and /statistics for STAC items                                                                                                                                  |
| MiCASA              | 5                | GHG Center   | Publish STAC collection                                                                                                         | Fix /statistics endpoint (in-progress)                                                                                                                                                    | (in-progress) Integrate titiler-cmr timeseries into the UI                                                                                                                                        |
| BlueFlux            | 2                | GHG Center   | Move NetCDFs to VEDA bucket Publish STAC collection + items                                                                     | Same as NLDAS-3                                                                                                                                                                           | Same as NLDAS-3 + pass sel + sel_method parameters                                                                                                                                                |
| MUR SST (NetCDFs)   | 5                | Coastal      | Publish STAC collection                                                                                                         | None                                                                                                                                                                                      | Same as MiCASA                                                                                                                                                                                    |
| MUR SST (icechunk)  | 4                | Coastal      | Publish STAC collection(s)  NOTE: Currently this is 2 icechunk stores. Publishing 2 collections is most straightforward option. | Enable titiler/titiler-multidim to read icechunk stores and produce timeseries from icechunk stores.  NOTE: We may be blocked on this until PO.DAAC enables requester-pays and icechunk supports it OR we use EDL credential rotation. | Once titiler-multidim can read icechunk stores, tiles should work via zarr-timeseries. The UI will need to be able to generate timeseries from titiler-multidim via a STAC collection.  |

In [1]:
import requests
import json

multidim_base_url = "https://staging.openveda.cloud/api/titiler-multidim"
cmr_base_url = "https://staging.openveda.cloud/api/titiler-cmr"

wa_state_coords = [
    [-116, 45],
    [-116, 48],
    [-124, 48],
    [-124, 45],
    [-116, 45]
]

timeseries_headers = {
    "accept": "application/geo+json",
    "Content-Type": "application/json"
}

tile_headers = {
    "accept": "image/png"
}

geojson_data = {
    "type": "FeatureCollection",
    "features": [
        {
            "type": "Feature",
            "properties": {},
            "geometry": {
                "coordinates": [ wa_state_coords ],
                "type": "Polygon"
            }
        }
    ]
}

## Portal: WaterInsight

### NLDAS-3 (titiler-multidim via ODR)

### Where is the data? 

* Planned to be NetCDFs on AWS open data registry
* Right now it is in the protected bucket s3://nasa-waterinsight, Sid provided me (Aimee) with credentials for access

### Can we visualize this data using titiler-multidim?

Yes:

In [6]:
tiles_params = {
    "scale": "1",
    "format": "png",
    "url": "s3://nasa-eodc-public/NLDAS3/forcing/monthly/2023/NLDAS_FOR0010_M.A202301.030.beta.nc",
    "variable": "Tair",
    "decode_times": "false",
    "colormap_name": "balance",
    "rescale": "230,303"  # Temperature range in Kelvin (230K = -43°C, 303K = 30°C)
}

# Make the GET request for tile
tiles_response = requests.get(
    url=f"{multidim_base_url}/tiles/WebMercatorQuad/0/0/0",
    params=tiles_params
)

if tiles_response.status_code == 200:
    print("Success! Received PNG tile image")
    print(f"Content-Type: {tiles_response.headers.get('content-type')}")
    print(f"Content-Length: {len(tiles_response.content)} bytes")
    
    # Optionally save the tile image
    # with open("tile_0_0_0.png", "wb") as f:
    #     f.write(tiles_response.content)
    #     print("Tile saved as tile_0_0_0.png")
else:
    print(f"Error: {tiles_response.status_code}")
    print(tiles_response.text)


=== TILES API RESPONSE ===
Success! Received PNG tile image
Content-Type: image/png
Content-Length: 9047 bytes


### Can we visualize it in VEDA UI? 

👷 Not yet: VEDA UI does not have an integration titiler-multidim item-based visualization, only for a single zarr endpoint. 

In theory if we had an icechunk store, and titiler/titiler-multidim supported reading icechunk, we could visualize it.


### Can we produce time series using titiler-multidim?

👷 No: titiler-multidim produces statistics for a given URL. Integration into the UI means we need:

1. To index all the items into STAC (data services team)
2. The UI to query STAC for items and then make request to all individual items.

In [5]:
# Query parameters
params = {
    "url": "s3://nasa-eodc-public/NLDAS3/forcing/monthly/2023/NLDAS_FOR0010_M.A202301.030.beta.nc",
    "variable": "Tair",
    "decode_times": "false",
    "histogram_bins": "8"
}

# Make the POST request
response = requests.post(
    url=f"{multidim_base_url}/statistics",
    params=params,
    headers=timeseries_headers,
    json=geojson_data  # Using json parameter automatically handles JSON serialization
)

# Check response
if response.status_code == 200:
    result = response.json()
    print("Success!")
    # print(json.dumps(result, indent=2))
else:
    print(f"Error: {response.status_code}")
    print(response.text)

Success!


In [None]:
Can we produce time series in VEDA UI?
No, VEDA UI does not have an integration with titiler-multidim for time series generation.


## GHG Center

### MiCASA (titiler-cmr via GES DISC)

### Can we visualize the dataset?

In [9]:
# Query parameters
cmr_params = {
    "scale": "1",
    "concept_id": "C3273639213-GES_DISC",
    "datetime": "2018-02-12T09:00:00Z",
    "variable": "NPP",
    "backend": "xarray",
    "colormap_name": "purd",
    "rescale": "0,0.00000008"
}

# Make the GET request
response = requests.get(
    url=f"{cmr_base_url}/tiles/WebMercatorQuad/0/0/0",
    params=cmr_params,
    headers=tile_headers
)

# Check response
if response.status_code == 200:
    print("Success! Received PNG tile image")
    print(f"Content-Type: {response.headers.get('content-type')}")
    print(f"Content-Length: {len(response.content)} bytes")
    
    # Optionally save the tile image
    # with open("cmr_tile_0_0_0.png", "wb") as f:
    #     f.write(response.content)
    #     print("Tile saved as cmr_tile_0_0_0.png")
else:
    print(f"Error: {response.status_code}")
    print(response.text)

Success! Received PNG tile image
Content-Type: image/jpeg
Content-Length: 4124 bytes


### Can we visualize it in the UI?

Almost certain this will work given existing GPM IMERG implementation.

### Can we produce time series?

🐛 Maybe? Perhaps there is a bug

In [15]:
request = requests.post(
    f"{cmr_base_url}/timeseries/statistics",
    params={
        "concept_id": "C3273639213-GES_DISC",
        "datetime": "2022-03-01T00:00:01Z/2022-03-10T23:59:59Z",
        "step": "P1D",
        "temporal_mode": "point",
        "variable": "NPP",
        "backend": "xarray",
    },
    json=geojson_data,
    timeout=None,
)

request.raise_for_status()
response = request.json()

HTTPError: 500 Server Error: Internal Server Error for url: https://staging.openveda.cloud/api/titiler-cmr/timeseries/statistics?concept_id=C3273639213-GES_DISC&datetime=2022-03-01T00%3A00%3A01Z%2F2022-03-10T23%3A59%3A59Z&step=P1D&temporal_mode=point&variable=NPP&backend=xarray

In [16]:
#response
request.text

'{"detail":"\'FeatureCollection\' object has no attribute \'properties\'"}'

### Can we produce time series in the UI?

Hanbyul is currently working on this, see the [veda-ui issue #1727](https://github.com/NASA-IMPACT/veda-ui/issues/1727) and [WIP PR #1747](https://github.com/NASA-IMPACT/veda-ui/pull/1747).

## BlueFlux (titiler-multidim via VEDA bucket)

Since the data is maintained by ORNL DAAC, which we don't currently have access too, it was suggested to copy the data into the VEDA SMCE bucket and tile it from there using titiler-multidim.

Looks like it is only 4 files. 

In [35]:
import earthaccess

earthaccess.login()

granule_results = earthaccess.search_data(
    collection_concept_id="C3498325287-ORNL_CLOUD"
)
print(f"{len(granule_results)} granules found")

s3_link = granule_results[0].data_links(access="direct")[0]
s3fs = earthaccess.get_s3_filesystem(daac='ORNLDAAC')

# s3fs.download(s3_link, s3_link.split('/')[-1])

4 granules found


[None]

In [37]:
#!aws s3 cp blueflux_fco2_micromol_500m_std_v1.nc s3://nasa-eodc-public/BlueFlux/blueflux_fco2_micromol_500m_std_v1.nc

# Can we visualize it?

👷 Probably, but we need to upgrade titiler-multidim to include the `sel` parameter since each file has many dates in it.


In [2]:
import xarray as xr
xds = xr.open_dataset('blueflux_fco2_micromol_500m_std_v1.nc')

In [25]:
xds

In [14]:
xds.fco2_std[100].min().values, xds.fco2_std[100].max().values

(array(0.04123636, dtype=float32), array(1.5548011, dtype=float32))

In [20]:
import morecantile

tms = morecantile.tms.get("WebMercatorQuad")

x, y, z = tms.tile(-81, 26, 7)

In [24]:
x, y, z

(35, 54, 7)

In [23]:
tiles_params = {
    "scale": "1",
    "format": "png",
    "url": "s3://nasa-eodc-public/BlueFlux/blueflux_fco2_micromol_500m_std_v1.nc",
    "variable": "fco2_std",
    "sel": "time=2000-04-10",
    "colormap_name": "pink",
    "rescale": "0.04,1.55"
}

# Make the GET request for tile
tiles_response = requests.get(
    url=f"{multidim_base_url}/tiles/WebMercatorQuad/{z}/{x}/{y}",
    params=tiles_params
)

if tiles_response.status_code == 200:
    print("Success! Received PNG tile image")
    print(f"Content-Type: {tiles_response.headers.get('content-type')}")
    print(f"Content-Length: {len(tiles_response.content)} bytes")
    
    #Optionally save the tile image
    with open("tile.png", "wb") as f:
        f.write(tiles_response.content)
        print("Tile saved as tile.png")
else:
    print(f"Error: {tiles_response.status_code}")
    print(tiles_response.text)

Success! Received PNG tile image
Content-Type: image/png
Content-Length: 854 bytes
Tile saved as tile.png


## Can we produce timeseries?

👷 Similarly, we need to include the `sel` parameter into the statistics endpoint of titiler-multidim.

## Coastal Portal

### MUR SST

We can integrate this dataset visually, it has been previously demonstrated.

Time series should work once titiler-cmr timeseries integration is complete.

👷 However, this will all be slow without the use of a virtual layer.

## Global Mangrove Aboveground Biomass, Carbon Stocks and Canopy Height

I think we decided this was a no-op since it is a GeoTIFF and must be convereted to COG to work with any tiler.