Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 27 additions & 21 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,10 @@ on:
push:
branches:
- main
tags:
- '*'
pull_request:
branches:
- main
workflow_dispatch:

permissions:
contents: read
Expand All @@ -15,10 +16,23 @@ permissions:
jobs:
build:
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main' || startsWith(github.ref, 'refs/tags/') || github.event_name == 'pull_request'

steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v5

- name: Docker meta
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ secrets.OVH_HARBOR_REGISTRY }}/eopf-sentinel-zarr-explorer/data-pipeline
tags: |
type=sha
type=ref,event=branch
type=ref,event=pr
type=ref,event=tag
type=raw,value=latest,enable=${{ github.ref == 'refs/heads/main' }}

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
Expand All @@ -30,23 +44,6 @@ jobs:
username: ${{ secrets.OVH_HARBOR_USERNAME }}
password: ${{ secrets.OVH_HARBOR_PASSWORD }}

- name: Generate Docker tags
id: meta
run: |
IMAGE="${{ secrets.OVH_HARBOR_REGISTRY }}/eopf-sentinel-zarr-explorer/data-pipeline"
SHA_SHORT=$(echo ${{ github.sha }} | cut -c1-7)

if [ "${{ github.event_name }}" = "pull_request" ]; then
# PR builds: tag as pr-<number>
TAGS="${IMAGE}:pr-${{ github.event.pull_request.number }},${IMAGE}:sha-${SHA_SHORT}"
else
# Push to main: tag as both 'main' and 'latest'
TAGS="${IMAGE}:main,${IMAGE}:latest,${IMAGE}:sha-${SHA_SHORT}"
fi

echo "tags=${TAGS}" >> $GITHUB_OUTPUT
echo "primary_tag=$(echo ${TAGS} | cut -d',' -f1 | cut -d':' -f2)" >> $GITHUB_OUTPUT

- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
Expand All @@ -55,11 +52,20 @@ jobs:
platforms: linux/amd64
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max

- name: Image summary
run: |
echo "### Docker Image Built 🐳" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "**Tags:** ${{ steps.meta.outputs.tags }}" >> $GITHUB_STEP_SUMMARY
echo "**Tags:**" >> $GITHUB_STEP_SUMMARY
echo "\`\`\`" >> $GITHUB_STEP_SUMMARY
echo "${{ steps.meta.outputs.tags }}" >> $GITHUB_STEP_SUMMARY
echo "\`\`\`" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "**Labels:**" >> $GITHUB_STEP_SUMMARY
echo "\`\`\`" >> $GITHUB_STEP_SUMMARY
echo "${{ steps.meta.outputs.labels }}" >> $GITHUB_STEP_SUMMARY
echo "\`\`\`" >> $GITHUB_STEP_SUMMARY
153 changes: 57 additions & 96 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# EOPF GeoZarr Data Pipeline
# EOPF Explorer Samples Data Pipeline

**Kubernetes pipeline: Sentinel Zarr → Cloud-Optimized GeoZarr + STAC Registration**

Expand All @@ -22,8 +22,18 @@ Transforms Sentinel-1/2 satellite data into web-ready visualizations:

## Setup

### Prerequisites:
- Kubernetes cluster with [platform-deploy](https://github.com/EOPF-Explorer/platform-deploy) (Argo Workflows, RabbitMQ, STAC API, TiTiler)
### Environments

The data pipeline is deployed in two Kubernetes namespaces:

- **`devseed-staging`** - Testing and validation environment
- **`devseed`** - Production data pipeline

This documentation uses `devseed-staging` in examples. For production, replace with `devseed`.

### Prerequisites

- Kubernetes cluster with [platform-deploy](https://github.com/EOPF-Explorer/platform-deploy) (Argo Workflows, STAC API, TiTiler)
- Python 3.13+ with `uv`
- `GDAL` installed (on MacOS: `brew install gdal`)
- `kubectl` installed
Expand All @@ -40,30 +50,16 @@ kubectl get nodes # Verify: should list several nodes

#### Quick verification:
```bash
kubectl get wf,sensor,eventsource -n devseed-staging
```

### Retrieve RABBITMQ_PASSWORD and store in .env file

```bash
# Check if RABBITMQ_PASSWORD already exists in .env
if [ -f .env ] && grep -q "^RABBITMQ_PASSWORD=" .env; then
echo "RABBITMQ_PASSWORD already exists in .env"
else
echo "RABBITMQ_PASSWORD=$(kubectl get secret rabbitmq-password -n core -o jsonpath='{.data.rabbitmq-password}' | base64 -d)" >> .env
echo "✅ RABBITMQ_PASSWORD added to .env"
fi
kubectl get wf -n devseed-staging
```

### Add Harbor Registry credentials to .env file

Make sure you have an `HARBOR_USERNAME` and `HARBOR_PASSWORD` for OVH container registry added to the `.env` file.

### Setup port forwarding for webhook access

### Setup port forwarding from local machine to RabbitMQ service
```bash
kubectl port-forward -n core svc/rabbitmq 5672:5672 &
```
See [operator-tools/README.md](operator-tools/README.md#port-forwarding) for webhook port forwarding setup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same info is in the operator readme, could we just replace l61-l69 by

See [operator-tools/README.md](operator-tools/README.md#port-forwarding) for webhook port forwarding setup.

### For development

Expand Down Expand Up @@ -101,27 +97,21 @@ docker build -f docker/Dockerfile --network host -t w9mllyot.c1.de1.container-re
docker push w9mllyot.c1.de1.container-registry.ovh.net/eopf-sentinel-zarr-explorer/data-pipeline:v0
```

- Once the new image is pushed, run the example [Notebook](submit_stac_items_notebook.ipynb) and verify that worflows are running in [Argo Workflow server](https://workspace.devseed.hub-eopf-explorer.eox.at/argo-workflows-server)


- Once the new image is pushed, run the example [Notebook](operator-tools/submit_stac_items_notebook.ipynb) and verify that workflows are running in [Argo Workflows](https://argo-workflows.hub-eopf-explorer.eox.at/workflows/devseed-staging)

---

## Submit Workflow

### Method 1: RabbitMQ (Production - Event-Driven)

Triggers via EventSource → Sensor:

**Submit workflow from python script**
```bash
python submit_test_workflow.py
```

or using the example [Notebook](submit_stac_items_notebook.ipynb)
### Method 1: HTTP Webhook (Recommended)

Use the operator tools to submit STAC items via HTTP webhook. See [operator-tools/README.md](operator-tools/README.md) for:
- Interactive notebook for batch submissions
- Python script for single item testing
- Port forwarding setup
- Common actions and target collections

### Method 2: kubectl (Testing - Bypasses Event System)
### Method 2: kubectl (Testing - Direct Workflow Submission)

Direct workflow submission:

Expand All @@ -145,23 +135,14 @@ EOF
kubectl get wf -n devseed-staging --watch
```

**Monitor:** [Argo UI](https://argo.core.eopf.eodc.eu/workflows/devseed-staging)
**Monitor:** [Argo Workflows UI](https://argo-workflows.hub-eopf-explorer.eox.at/workflows/devseed-staging)

---

## Web Interfaces

Access via **EOxHub workspace** (single sign-on): [workspace.devseed.hub-eopf-explorer.eox.at](https://workspace.devseed.hub-eopf-explorer.eox.at/)

| Service | Purpose | URL |
|---------|---------|-----|
| **Argo Workflows** | Monitor pipelines | [argo.core.eopf.eodc.eu](https://argo.core.eopf.eodc.eu/workflows/devseed-staging) |
| **STAC Browser** | Browse catalog | [api.explorer.eopf.copernicus.eu/stac](https://api.explorer.eopf.copernicus.eu/stac) |
| **TiTiler Viewer** | View maps | [api.explorer.eopf.copernicus.eu/raster](https://api.explorer.eopf.copernicus.eu/raster) |

💡 **Tip:** Login to EOxHub first for seamless authentication across all services.
**View Results:**

- [STAC Browser](https://api.explorer.eopf.copernicus.eu/stac) - Browse catalog
- [TiTiler Viewer](https://api.explorer.eopf.copernicus.eu/raster) - View maps

💡 **Tip:** Login to [EOxHub workspace](https://workspace.devseed.hub-eopf-explorer.eox.at/) for seamless authentication.

---

Expand All @@ -176,11 +157,12 @@ Access via **EOxHub workspace** (single sign-on): [workspace.devseed.hub-eopf-ex
**Runtime:** ~15-20 minutes per item

**Stack:**
- Orchestration: Argo Workflows, Kustomize

- Processing: eopf-geozarr, Dask, Python 3.13
- Storage: S3 (OVH)
- Catalog: pgSTAC, TiTiler
- Events: RabbitMQ

**Infrastructure:** Deployment configuration and infrastructure details are maintained in [platform-deploy](https://github.com/EOPF-Explorer/platform-deploy/tree/main/workspaces/devseed-staging/data-pipeline)

---

Expand Down Expand Up @@ -215,30 +197,15 @@ kubectl get wf -n devseed-staging --sort-by=.metadata.creationTimestamp \

```
scripts/
├── convert_v0.py # Zarr → GeoZarr conversion for V0 and S3 upload
└── register.py # STAC item creation and catalog registration
├── convert_v0.py # Zarr → GeoZarr conversion and S3 upload
└── register.py # STAC item creation and catalog registration

workflows/ # Kubernetes manifests
├── base/ # WorkflowTemplate, EventSource, Sensor, RBAC
└── overlays/staging/ # Environment configuration
/production/

docker/Dockerfile # Container image
tests/unit/ # Unit tests
/integration/ # Integration tests
operator-tools/ # Tools for submitting workflows
docker/Dockerfile # Container image
tests/ # Unit and integration tests
```

---

## Configuration

**📖 Full configuration:** See [workflows/README.md](workflows/README.md) for secrets setup and parameters.

**Quick reference:**
- S3: `s3.de.io.cloud.ovh.net` / `esa-zarr-sentinel-explorer-fra`
- Staging collection: `sentinel-2-l2a-dp-test`
- Production collection: `sentinel-2-l2a`
- **Enable debug logs:** `export LOG_LEVEL=DEBUG` (or add to workflow env)
**Deployment Configuration:** Kubernetes manifests and infrastructure are maintained in [platform-deploy](https://github.com/EOPF-Explorer/platform-deploy/tree/main/workspaces/devseed-staging/data-pipeline)

---

Expand All @@ -248,19 +215,15 @@ tests/unit/ # Unit tests
# Watch workflows
kubectl get wf -n devseed-staging --watch

# View logs
# View workflow logs
kubectl logs -n devseed-staging -l workflows.argoproj.io/workflow=<name> --tail=100

# Running workflows
# Running workflows only
kubectl get wf -n devseed-staging --field-selector status.phase=Running

# Sensor logs (RabbitMQ message processing)
kubectl logs -n devseed-staging -l sensor-name=geozarr-sensor --tail=50

# EventSource logs (RabbitMQ connection)
kubectl logs -n devseed-staging -l eventsource-name=rabbitmq-geozarr --tail=50
```

**Web UI:** [Argo Workflows](https://argo-workflows.hub-eopf-explorer.eox.at/workflows/devseed-staging)


---

Expand All @@ -269,30 +232,28 @@ kubectl logs -n devseed-staging -l eventsource-name=rabbitmq-geozarr --tail=50
| Problem | Solution |
|---------|----------|
| **"No group found in store"** | Using direct zarr URL instead of STAC item URL |
| **"Connection refused"** | RabbitMQ port-forward not active: `kubectl port-forward -n devseed-staging svc/rabbitmq 5672:5672` |
| **Workflow not starting** | Check sensor/eventsource logs for connection errors |
| **S3 access denied** | Verify secret `geozarr-s3-credentials` exists in `devseed-staging` namespace |
| **Workflow stuck** | Check logs: `kubectl logs -n devseed-staging -l workflows.argoproj.io/workflow=<name>` |
| **"Webhook not responding"** | See [operator-tools troubleshooting](operator-tools/README.md#troubleshooting) |
| **Workflow not starting** | Check webhook submission returned success, verify port-forward |
| **S3 access denied** | Contact infrastructure team to verify S3 credentials |
| **Workflow stuck/failed** | Check workflow logs: `kubectl logs -n devseed-staging -l workflows.argoproj.io/workflow=<name>` |

For infrastructure issues, see platform-deploy troubleshooting: [staging](https://github.com/EOPF-Explorer/platform-deploy/tree/main/workspaces/devseed-staging/data-pipeline) | [production](https://github.com/EOPF-Explorer/platform-deploy/tree/main/workspaces/devseed/data-pipeline)



---

## Resources
## Related Projects

**Container Image:** `w9mllyot.c1.de1.container-registry.ovh.net/eopf-sentinel-zarr-explorer/data-pipeline:latest`
- [data-model](https://github.com/EOPF-Explorer/data-model) - `eopf-geozarr` conversion library
- [platform-deploy](https://github.com/EOPF-Explorer/platform-deploy) - Infrastructure deployment and configuration

**Resource Limits:**
- CPU: 2 cores (convert), 500m (register)
- Memory: 8Gi (convert), 2Gi (register)
- Timeout: 3600s (convert), 600s (register)
## Documentation

**Related Projects:**
- [data-model](https://github.com/EOPF-Explorer/data-model) - `eopf-geozarr` conversion library
- [platform-deploy](https://github.com/EOPF-Explorer/platform-deploy) - Infrastructure (Argo, RabbitMQ, STAC, TiTiler)
- **Operator Tools:** [operator-tools/README.md](operator-tools/README.md)
- **Tests:** `tests/` - pytest unit and integration tests
- **Deployment:** [platform-deploy/workspaces/devseed-staging/data-pipeline](https://github.com/EOPF-Explorer/platform-deploy/tree/main/workspaces/devseed-staging/data-pipeline)

**Documentation:**
- Workflow manifests: `workflows/README.md`
- Tests: `tests/` (pytest unit and integration tests)
## License

**License:** MIT
MIT
Loading