## Approach

This notebook is intented to be used as a reference for data providers who want to add new datasets to the VEDA Dashboard. As always it is important that the data provider has read the documentation for [Data Ingestion](https://nasa-impact.github.io/veda-docs/contributing/dataset-ingestion/) before moving forward with this notebook example. 

For example purposes, we will walk the end user through adding the GEOGLAM June 2023 dataset directly to the VEDA Dashboard. 

## Authenticate with VEDA backend

In [1]:
!pip install cognito-client --quiet

You should consider upgrading via the '/Users/kathrynaberger/Documents/Work/veda-docs/_env/bin/python3 -m pip install --upgrade pip' command.[0m[33m
[0m

Running the following cell will trigger a request for your `CognitoClient` `username` and `password`. If you do not already have these credentails please reach out to our VEDA Data Services team for an account to be set up for you. The first time you log in using the `CognitoClient` in this notebook with the new credentials, you'll be prompted to set a new password. 

In [2]:
from cognito_client import CognitoClient

client = CognitoClient(
    client_id="o8c93cebc17upumgstlbqm44f",
    user_pool_id="us-west-2_9mMSsMcxw",
    identity_pool_id="us-west-2:40f39c19-ab88-4d0b-85a3-3bad4eacbfc0",
)
_ = client.login()

TOKEN = client.access_token

In [3]:
import os

import rio_cogeo
import rasterio
import boto3
import requests

## Define item metadata

Below we will define some of the variables to be used including the `API` address and `TARGET_FILENAME` for the datafile you want to upload. Note that in this example we will demonstrate the ingestion of GEOGLAM's June 2023 data. It is important that the file you want to upload (e.g., `CropMonitor_2023_06_28.tif` ) is located in the same repository folder as this notebook. 

In the cell below we are using `TARGET_FILENAME` to revise the `LOCAL_FILE_PATH` into the correct file format as advised in the `File preparation` [documentation](https://nasa-impact.github.io/veda-docs/contributing/dataset-ingestion/file-preparation.html). See example formats in the link provided. 

If the `LOCAL_FILE_PATH` is already properly formatted, then both `LOCAL_FILE_PATH` and `TARGET_FILENAME` will be identical. 

In [1]:
API = "https://ig9v64uky8.execute-api.us-west-2.amazonaws.com/staging/"

LOCAL_FILE_PATH = "CropMonitor_2023_06_28.tif"
YEAR, MONTH = 2023, 6

TARGET_FILENAME = f"CropMonitor_{YEAR}{MONTH:02}.tif"

## Validate data format

The following code is used to test whether the data format you are planning to upload is Cloud Optimized GeoTiff (COG) that enables more efficient workflows in the cloud environment. If the validation process identifies that it is not a COG, it will convert it into one. 

In [7]:
file_is_a_cog = rio_cogeo.cog_validate(LOCAL_FILE_PATH)
if not file_is_a_cog:
    raise ValueError()
    print("File is not a COG - converting")
    rio_cogeo.cog_translate(LOCAL_FILE_PATH, LOCAL_FILE_PATH, in_memory=True)

## Upload file to S3

The following code will upload your COG data into `veda-data-store-staging` bucket. It will use the `TARGET_FILENAME` to assign the correct month and year values we have provided earlier in this notebook, under the `geoglam` bucket on `S3`.

In [8]:
s3 = boto3.client("s3")
BUCKET = "veda-data-store-staging"
KEY = f"{BUCKET}/geoglam/{TARGET_FILENAME}"
S3_FILE_LOCATION = f"s3://{KEY}"

if False:
    s3.upload_file(LOCAL_FILE_PATH, KEY)

## Construct dataset definition

Here the data provider will construct the dataset definition (and supporting metadata) that will be used for dataset ingestion. It is imperative that these values are correct and align to the data the provider is planning to upload to the VEDA Platform. For example, make sure that the `startdate` and `enddate` are realistic (e.g., an `"enddate":"2023-06-31T23:59:59Z"` would be an incorrect value for June, as it contains only 31 days). 

For further detail on metadata required for entries in the VEDA STAC to work with the VEDA Dashboard, see documentation [here.](https://nasa-impact.github.io/veda-docs/contributing/dataset-ingestion/stac-collection-conventions.html) In particular, note recommendations for the fields `is_periodic` and `time_density`. For example, in the code block below we define the `is_periodic` field as `False` because we are ingesting only one month of data. Even though we know that the monthly observations are provided routinely by GEOGLAM, we will only have a single file to ingest and so do not have a temporal range of items in the collection with a monthly time density to generate a time picker from the available data. 

In [9]:
dataset = {
    "collection": "geoglam",
    "title": "GEOGLAM Crop Monitor",
    "data_type": "cog",
    "spatial_extent": {"xmin": -180, "ymin": -90, "xmax": 180, "ymax": 90},
    "temporal_extent": {
        "startdate": "2020-01-01T00:00:00Z",
        "enddate": "2023-06-30T23:59:59Z",
    },
    "license": "MIT",
    "description": "The Crop Monitors were designed to provide a public good of open, timely, science-driven information on crop conditions in support of market transparency for the G20 Agricultural Market Information System (AMIS). Reflecting an international, multi-source, consensus assessment of crop growing conditions, status, and agro-climatic factors likely to impact global production, focusing on the major producing and trading countries for the four primary crops monitored by AMIS (wheat, maize, rice, and soybeans). The Crop Monitor for AMIS brings together over 40 partners from national, regional (i.e. sub-continental), and global monitoring systems, space agencies, agriculture organizations and universities. Read more: https://cropmonitor.org/index.php/about/aboutus/",
    "is_periodic": False,
    "time_density": "month",
    "sample_files": [S3_FILE_LOCATION],
    "discovery_items": [
        {
            "discovery": "s3",
            "prefix": "geoglam/",
            "bucket": "veda-data-store-staging",
            "filename_regex": f"(.*){TARGET_FILENAME}$",
        }
    ],
}
import json

print(json.dumps(dataset, indent=2))

{
  "collection": "geoglam",
  "title": "GEOGLAM Crop Monitor",
  "data_type": "cog",
  "spatial_extent": {
    "xmin": -180,
    "ymin": -90,
    "xmax": 180,
    "ymax": 90
  },
  "temporal_extent": {
    "startdate": "2020-01-01T00:00:00Z",
    "enddate": "2023-06-30T23:59:59Z"
  },
  "license": "MIT",
  "description": "The Crop Monitors were designed to provide a public good of open, timely, science-driven information on crop conditions in support of market transparency for the G20 Agricultural Market Information System (AMIS). Reflecting an international, multi-source, consensus assessment of crop growing conditions, status, and agro-climatic factors likely to impact global production, focusing on the major producing and trading countries for the four primary crops monitored by AMIS (wheat, maize, rice, and soybeans). The Crop Monitor for AMIS brings together over 40 partners from national, regional (i.e. sub-continental), and global monitoring systems, space agencies, agriculture

## Validate dataset definition

The following code block is used to validate the above dataset definition, and if valid, confirms that it is ready to be published on the VEDA Platform. 

In [10]:
auth_header = f"Bearer {TOKEN}"
headers = {
    "Authorization": auth_header,
    "content-type": "application/json",
    "accept": "application/json",
}
response = requests.post((API + "dataset/validate"), json=dataset, headers=headers)
response.raise_for_status()
print(response.text)

["Dataset metadata is valid and ready to be published - geoglam"]


## Publish to STAC

The final code block below will initiate a workflow and publish the dataset to the VEDA Platform. 

In [11]:
response = requests.post((API + "dataset/publish"), json=dataset, headers=headers)
response.raise_for_status()
print(response.text)

{"message":"Successfully published collection: geoglam. 1  workflows initiated.","workflows_ids":["db6a2097-3e4c-45a3-a772-0c11e6da8b44"]}


Congratulations! You have now successfully uploaded a COG dataset to the [VEDA Dashboard](https://www.earthdata.nasa.gov/dashboard/). You can now explore the data catalog to verify the ingestion process has worked successfully, as now uploaded data should be ready for viewing and exploration. 