## A demo notebook to publish datacubes and workflow to EarthCODE catalog
### A DeepESDL example notebook

Please, also refer to the [DeepESDL documentation](https://deepesdl.readthedocs.io/en/latest/guide/jupyterlab/) and visit the platform's [website](https://www.earthsystemdatalab.net/) for further information!

Brockmann Consult, 2025

-----------------

**This notebook runs with the python environment `users-deep-code-test`, please checkout the documentation for [help on changing the environment](https://deepesdl.readthedocs.io/en/latest/guide/jupyterlab/#python-environment-selection-of-the-jupyter-kerne).**

###  📘 Pre-requisite:
Before using the deep-code CLI or API to publish metadata, users must configure GitHub access by creating a .gitaccess file in the working directory from which deep-code is executed.

1. Generate a Personal Access Token (PAT) from your GitHUB account:
    1. Navigate to GitHub → Settings → Developer settings → Personal access tokens.
    2. Click “Generate new token”.
    3. Choose the following scopes to ensure full access:
        - repo (Full control of repositories — includes fork, pull, push, and read)
    4. Generate the token and copy it immediately — GitHub won’t show it again.

2. Create a .gitaccess File

In the same directory where you run the deep-code commands, create a file named .gitaccess with the following content:
```
github-username: your-git-user
github-token: personal access token
```
Replace your-git-user and your-personal-access-token with your actual GitHub username and token.

This file is required to allow deep-code to fork the Open Science Metadata repository, commit metadata changes, and open a pull request to the EarthCODE Catalog.

In [None]:
import os
import xcube
import warnings
import deep_code

from xcube.webapi.viewer import Viewer
from xcube.core.store import new_data_store
from deep_code.tools.lint import LintDataset
from deep_code.tools.publish import Publisher

In [None]:
warnings.filterwarnings('ignore')

## Generate starter configuration templates for publishing to EarthCODE openscience catalog.

In [None]:
!deep-code generate-config

## Here we create a small dataset from xcube-cmems store

In [None]:
store = new_data_store("cmems")
store

In [None]:
ds = store.open_data(
    "DMI-BALTIC-SST-L3S-NRT-OBS_FULL_TIME_SERIE",
    variable_names=["sea_surface_temperature"],
    bbox=[9, 53, 20, 62],
    time_range=("2022-01-01", "2022-01-05"),
)
ds

## Lint your in-memory dataset for metadata correctness and completness, before publishing to EarthCODE open science catalog

In [None]:
linter = LintDataset(dataset=ds)
linter.lint_dataset()

## Fix the errors from the linter

Adding gcmd_keyword_url connects your data to a semantic network of Earth science concepts, enabling:

- Better automated discovery

- Stronger metadata interoperability

- Alignment with international FAIR standards

To find the the gcmd url for your variable, please use, https://gcmd.earthdata.nasa.gov/KeywordViewer/scheme/all?gtm_scheme=all

In [None]:
ds.attrs["description"] = (
    "This is a extracted dataset from copernicus marine data store" 
)

ds["sea_surface_temperature"].attrs["gcmd_keyword_url"] = "https://gcmd.earthdata.nasa.gov/KeywordViewer/scheme/all/e4d58a7f-7eaa-4f75-996a-18238c698063?gtm_keyword=SEA%20SURFACE%20FOUNDATION%20TEMPERATURE&gtm_scheme=Earth%20Science"

## Write the dataset to the team s3 bucket

In [None]:
S3_USER_STORAGE_KEY = os.environ["S3_USER_STORAGE_KEY"]
S3_USER_STORAGE_SECRET = os.environ["S3_USER_STORAGE_SECRET"]
S3_USER_STORAGE_BUCKET = os.environ["S3_USER_STORAGE_BUCKET"]

In [None]:
team_store = new_data_store(
    "s3", 
    root=S3_USER_STORAGE_BUCKET, 
    storage_options=dict(
        anon=False, 
        key=S3_USER_STORAGE_KEY, 
        secret=S3_USER_STORAGE_SECRET
    )
)

In [None]:
team_store.write_data(ds, "cmems_sst_v2.zarr", replace=True)

The user workflow which is the JNB has to be pushed to git repository: https://github.com/deepesdl/cube-gen/blob/main/Permafrost/Create-CCI-Permafrost-cube-EarthCODE.ipynb

# 📘 Publishing Metadata to the EarthCODE Catalogue

Once the dataset and workflow metadata are prepared and validated, users can initiate the publishing process using the deep-code CLI. The following command automates the entire workflow:

## 🔹 The below command performs the following steps:

1. Generates valid STAC and OGC API Records based on the provided configuration files

2. Forks the open-science-catalog-metadata repository on GitHub

3. Inserts the generated records into the correct directory structure

4. Creates a Pull Request (PR) for review by the Open Science Catalog steward

## publish using the python function

In [None]:
# publish using the python function
publisher = Publisher(
    dataset_config_path="dataset-config.yaml",
    workflow_config_path="workflow-config.yaml",
    environment="staging",
)
publisher.publish_all()

## publish using cli

In [None]:
!deep-code publish dataset-config.yaml workflow-config.yaml -e staging