# Adding a new dataset to the dashboard
This notebook provides steps to gather the required details needed to add a new cloud optimized datasest to the dashboard. After gathering the necessary configuration information, you can create a new dataset json (see examples in [datasets](../datasets)) and add that json file to the dataset list in the [config.yml](config.yml). 

## Configure dataset
1. Validate COG
2. Summarize dataset statistics
3. Configure dataset display preferences and tile source for the dashboard
4. Optional: Prepare MosaicJSON for COG data stored as granules

## Create dataset with a new pull request
When you have added your dataset json to [datasets](../datasets) and referenced it in [config.yml](config.yml), create a pull request to add the dataset to the individual tiles.
   
## Prerequisites
- COG datasets stored in S3
- Jupyter notebook, boto3, rasterio, rio-cogeo, supermercado, cogeo-mosaic
- [geo-environment.yml](geo-environment.yml) is provided as a starting point for creating a conda environment to satisfy requirements

## Resources
Content in this notebook directly sourced from:
- [MAAP Project MosaicJSON Tutorials](https://docs.maap-project.org/en/develop/visualization/srtm-stac-mosaic.html#MosaicJSON)
- [MAAP Project Add User-Created Datasets Docs](https://github.com/MAAP-Project/maap-documentation/blob/ab/create-dashboard-datasets-guidance/docs/source/user_data/create-datasets-for-dashboard.ipynb)

Other:
- [Rio-Cogeo How To](https://cogeotiff.github.io/rio-cogeo/Is_it_a_COG/)
- [GDAL vsis3](https://gdal.org/user/virtual_file_systems.html#vsis3-aws-s3-files)

In [None]:
import boto3

# for colormaps and legends
import json
import matplotlib.cm


# if preparing a dataset composed of granules
import os
from cogeo_mosaic.mosaic import MosaicJSON
from cogeo_mosaic.backends import MosaicBackend

s3 = boto3.client('s3')

## Validate COG

In [None]:
bucket = 'my-bucket'
key = 'object-name.tif'
s3_path = f'{bucket}/{key}'

In [None]:
%%bash -s "$s3_path"
rio cogeo validate /vsis3/$1

## Summarize dataset to identify rescale and color parameters

In [None]:
%%bash -s "$s3_path"
gdalinfo /vsis3/$1 -stats

In [None]:
%%bash -s "$s3_path"
rio cogeo info /vsis3/$1

## Tiler
The steps below demonstrate how to check that a the dataset can be rendered and how to configure the dataset colors. 

### Dynamic tiler URL

[TiTiler](https://github.com/NASA-IMPACT/titiler) is used as the dynamic tiler in this example, the current value for this variable can be found in .github/workflows/deploy.yml

In [None]:
titiler_url = '<tiler base url>'

### Get valid x, y parameters for a given zoom¶ 
When generating tiles, the titiler API requires a valid x, y set for the given zoom.
- Note: for datasets with sparse data, the test image url may not show much. A tool like [QGIS](https://qgis.org/en/site/index.html) can help to identify where data are available.
- For COGs with limited spatial extent, the cell below can help identify valid z/x/y values to use when testing the tiler. For datasets with global extent, zxy = 1/0/1 will work.


In [None]:
# %%bash -s "$s3_path"
# rio bounds /vsis3/$1 | supermercado burn 1 # this last value is the "zoom"

In [None]:
s3_uri = f's3://{s3_path}'
band_min = 0
band_max = 1
zxy = '1/0/1'
rescale = f"{band_min},{band_max}"
band_index = 1

test_img_url = f"{titiler_url}/cog/tiles/{zxy}.png?url={s3_uri}&rescale={rescale}&bidx={band_index}"

# Jupyter auto adds &amp; to links so copy / paste everything after "x" into a browser
print(f"x{test_img_url}")


## Define a color map

By default, the image will be displayed in greyscale if no colormap_name parameter is passed to the titiler API. Guidance below is provided to help determine what a valid colormap_name might be and how to create a legend for the dashboard.
Dashboard ColorRamps & Legends

When using the dashboard, there 2 components for implementing a color scheme for your map. There is the map render and there is the legend.

TiTiler used for Cloud Optimized Geotiff (COG) rendering accepts any color scheme from the python matplotlib library, and custom color formulas.

- [Rio Tiler Colors](https://cogeotiff.github.io/rio-tiler/colormap/)
- [Matplotlib Colors](https://matplotlib.org/stable/tutorials/colors/colormaps.html)


### Example 1: Class based known colors

In this example, the raster represents classes of forest with 11 possible values. There are specific colors selected to correspond to each class. We combine the list of colors and the list of classes and format them for the legend parameter the dashboard needs.

https://github.com/MAAP-Project/dashboard-datasets-maap/blob/main/datasets/taiga-forest-classification.json

In [None]:
colors = [
    '#5255A3','#1796A3','#FDBF6F','#FF7F00', '#FFFFBF','#D9EF8B','#91CF60','#1A9850', '#C4C4C4','#FF0000','#0000FF'
]

labels = [
    'Sparse & Uniform',
    'Sparse & Diffuse-gradual',
    'Sparse & Diffuse-rapid',
    'Sparse & Abrupt ',
    'Open & Uniform ',
    'Open & Diffuse-gradual',
    'Open & Diffuse-rapid',
    'Open & Abrupt',
    'Intermediate & Closed',
    'Non-forest edge (dry)',
    'Non-forest edge (wet)'
]

legend = [dict(color=colors[i],label=labels[i]) for i in range(0, len(colors))]
print(json.dumps(legend, indent=2))

# Copy and Paste the output below to your dashboard config.

### Example 2: Discrete ColorRamp

In this example, the range of values is known, but the color scale has many non-sequential colors. Starting with the premade color list, we create a continuous color ramp that uses the known colors as stops points. Arbitrarly 12 breaks looked decent in the dashboard legend so we split it into 12 discrete colors. Then combine the list of values and colors into the correct json syntax.

https://github.com/MAAP-Project/dashboard-datasets-maap/blob/main/datasets/ATL08.json

In [None]:
forest_ht = matplotlib.colors.LinearSegmentedColormap.from_list('forest_ht', ['#636363','#FC8D59','#FEE08B','#FFFFBF','#D9EF8B','#91CF60','#1A9850','#005A32'], 12)
cols = [matplotlib.colors.to_hex(forest_ht(i)) for i in range(forest_ht.N)]

cats = range(0,25, (25//len(cols)))
legend = [[cats[i],cols[i]] for i in range(0, len(cols))]
text = (json.dumps(legend, separators=(',', ': ') ))

print(text.replace('],[','],\n['))
 
# Copy and Paste the output below to your dashboard config.

### Example 3: Continuous ColorRamp

In this example, we are using a built in ColorRamp from matplotlib. So we just need to extract enough colors to fill the legend adequately, and convert the colors to hex codes.

https://github.com/MAAP-Project/dashboard-datasets-maap/blob/main/datasets/topo.json

In [None]:
cmap_name = 'viridis'
cmap = matplotlib.cm.get_cmap(cmap_name, 12)
cols = [matplotlib.colors.to_hex(cmap(i)) for i in range(cmap.N)]
print(cols)

# Copy and Paste the output below to your dashboard config.

## Create and submit your dashboard dataset json¶

>**Note:** See the "Optional: Mosaicing datasets" section to configure tiles links for layers composed of granules.

In [None]:
# This example is for a continuous color ramps

dataset_id = "my_dataset"
dataset_name = "My Dataset Name"
dataset_type = "raster"
legend_type = "gradient"
info = "Description and units"

band_index = 1
band_min = 0
band_max = 1800
nodata = -9

s3_uri = f's3://{s3_path}'

# define an array of cols as demonstrated in the continues colormap example above
stops = cols
tiles_link = f"{{titiler_server_url}}/cog/tiles/{{z}}/{{x}}/{{y}}@1x?url={s3_uri}&colormap_name={cmap_name}&rescale={band_min},{band_max}&bidx={band_index}"

In [None]:
dataset_dict = {
    "id": dataset_id,
    "name": dataset_name,
    "type": dataset_type,
    "swatch": {
      "color": "#6976d7",
      "name": "Dark Green"
    },
    "source": {
        "type": dataset_type,
        "tiles": [ tiles_link ]
    },
    "legend": {
      "type": legend_type,
      "min": band_min,
      "max": band_max,
    "stops": stops
    },
    "info": info
}
print(json.dumps(dataset_dict, indent=4))

## Optional: Mosaicing datasets

Many datasets are comprised of many tiles distributed spatially over the globe. In order to visualize them all together, we can use [mosaicJSON](https://github.com/developmentseed/mosaicjson-spec) to create a mosaic for the dynamic tiler API. The dynamic tiler API knows how to read this mosaicJSON and select which tiles to render based on the current zoom, x and y coordinates across spatially distinct COGs.

### Identify tiles in S3
Select the tiles you want to mosaic in S3. 

In [None]:
bucket = 'my-bucket'
key_prefix = 'key-prefix'

response = s3.list_objects_v2(
    Bucket=bucket,
    Prefix=key_prefix
)
response['Contents'][0]

### Generate a list of URIs from selected object keys

In [None]:
s3_uris = []
for obj in response['Contents']:
    s3_uris.append(f"s3://{bucket}/{obj['Key']}")
print(f'{len(s3_uris)} objects uris identified with Prefix={key_prefix}')

In [None]:
mosaicdata = MosaicJSON.from_urls(s3_uris, minzoom=1, maxzoom=10)

# Optional, save the mosaic json to a local file
mosaicjson_path = f'my_local_mosaic.json'

# To use with TiTiler, upload the mosaic json to a s3 bucket accessible to the dashboard tiler (use a s3 url instead of local filename)
# If your mosaic is a part of a time series, use a pattern that can be parsed by TiTiler
# mosaicjson_path = f's3://<bucket>/mosaics/<dataset prefix>/<dataset YYYY.mm.dd or YYYY.mm>.json'

with MosaicBackend(mosaicjson_path, mosaic_def=mosaicdata) as mosaic:
    mosaic.write(overwrite=True)

### Configure dashboard dataset tiles link
The same general format can be used to create a tiles link for the mosaic json just created as the link generated above for the single file COG example. Add this link to the dataset json.

**Single COG tiles link**

`tiles_link = f"{{titiler_server_url}}/cog/tiles/{{z}}/{{x}}/{{y}}@1x?url={s3_uri}&colormap_name={cmap_name}&rescale={band_min},{band_max}&bidx={band_index}`

**MosaicJSON tiles link**

`tiles_link = f"{{titiler_server_url}}/cog/mosaicjson/{{z}}/{{x}}/{{y}}@1x?url={s3_mosaicjson_uri}&colormap_name={cmap_name}&rescale={band_min},{band_max}&bidx={band_index}`