Skip to content

Commit

Permalink
Merge pull request #47 from QGreenland-Net/2024-05-22-design-meeting
Browse files Browse the repository at this point in the history
2024 05 22 design meeting notes
  • Loading branch information
mfisher87 committed May 22, 2024
2 parents 7966718 + ec11727 commit f4b48c1
Show file tree
Hide file tree
Showing 2 changed files with 138 additions and 16 deletions.
106 changes: 106 additions & 0 deletions notes/2024-05-22_design-meeting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
---
title: "OGDC Design Meeting"
date: "2024-05-22"
---

## Background

We identified a need for a longer (2+ hours) collaborative design session to flesh out a
design vision and technical roadmap. We also identified a need to avoid a Big Design Up
Front (BDUF) approach.

[Previous discussion notes about dev milestones](https://docs.google.com/document/d/1IOWOMqCkb7HLzh2Gq0I0LjreoPm5xymBO67Utly-pks/edit)


## Discussion

The sub-headers added here are not intended to be exhaustive. Please add more.


### Design goals and requirements

Documentation: https://qgreenland-net.github.io/requirements.html

* Matt J: Be able to do everything in:
* https://github.com/PermafrostDiscoveryGateway/viz-staging
* https://github.com/PermafrostDiscoveryGateway/viz-raster
* https://github.com/PermafrostDiscoveryGateway/viz-3d-tiles
* https://github.com/PermafrostDiscoveryGateway/viz-points
* Be able to generate a processing pipeline based on examining incoming data
* Matt J: What is it we’re going to produce?
* Envisioning a set of services running and waiting for requests.
* Or a workflow platform where you submit steps to be executed.
* **Workflow platform seems like where we’re headed.**
* Matt J: Cluster configuration can drastically affect the design of workflows
* Matt J: Dynamic workflow generation
* Based on input data, generate a workflow DAG
* Currently static from config file. Real world example:
<https://github.com/PermafrostDiscoveryGateway/viz-workflow/blob/main/workflow_configs/ice-wedge-polygons.json>
* Matt J summary: Deal with transformations for existing PDG visualization
challenges. Think we’re on the right track.


## Tool selection

https://qgreenland-net.github.io/evaluations/orchestrator/

* Considering:
* Argo
* How is storage managed? Dynamic PVCs?
* Continued evaluation: What does the parallel version of one of our example workflows
look like? Run on a tileset of N files
* Experiment with drone imagery? <TODO: link>
* What UX tools are there? Can we generate an SVG graph from a YAML?
* Parsl
* Ray
* Promising for ML-specific stuff






## Implementation roadmap

NOTE: Pre-populated by Trey & Matt, but we didn't get to discussing it today.

Let’s break this into milestones! Brainstorm:

* End-to-end data test (simple case): take some data, apply transformations, and publish
results as DataONE dataset
* Implemented some workflows in Argo & Parsl; remaining tasks are publishing to DataONE
and triggering automatically from GitHub events.
* Migrate QGreenland workflows to selected orchestrator
* One existing workflow (arctic circle) successfully migrated programmatically to Argo
YAML, ~20 others still need testing, ~200 more still need implementing.
* Implement big and complex processing case for Cesium 3D tiles using e.g., drone imagery
data as input
* Implement community accessibility functionality, e.g. bots, checks, and other
automations on GitHub
* ...?
* Build QGreenland using data transformed and published to DataONE using OGDC
* Extract QGreenland’s framework code to a “QAnywhere” library for compiling regional QGIS
data projects



## What other decisions do we need to finalize in this meeting?


## What are the next decision points? Do we need a follow-up meeting?

* Decide on a workflow tech! Depends on action items.


## Action items

- [ ] Rushiraj & Matt: Pick 3 datasets (small serial, medium, large parallel), build workflows
for them and _publish_ to same PVC the visualization app is using to read
from (see Rushiraj’ PDG branch). Evaluate at the end of each step (small,
medium, large). Medium: Hydrology Ice Basins; Large: drone imagery dataset (see
notes from last architecture meeting)?.
- [ ] Rushiraj: Work with ADC k8s admins to install Argo Workflows to “argo” namespace
- [x] Matt: Set up a new daily standup meeting (10 minutes) without Trey. Reach out if we need
him!


48 changes: 32 additions & 16 deletions requirements.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,13 @@
title: "Requirements"
---

[See related GitHub issue](https://github.com/QGreenland-Net/.github/issues/31)
* [See related GitHub issue](https://github.com/QGreenland-Net/.github/issues/31)
* PDG transformation workflows
* [Staging/Tiling](https://github.com/PermafrostDiscoveryGateway/viz-staging)
* [Rasterization](https://github.com/PermafrostDiscoveryGateway/viz-raster)
* [3d-tiles](https://github.com/PermafrostDiscoveryGateway/viz-3d-tiles)
* [point clouds](https://github.com/PermafrostDiscoveryGateway/viz-points)
* [overview](https://github.com/PermafrostDiscoveryGateway/viz-info)


## Data transformations
Expand All @@ -15,34 +21,44 @@ title: "Requirements"
* Subset
* Resample (down/upsample or re-grid)
* File-level metadata changes, e.g.:
* Assignment or correction of projection
* `gdal_edit` operations
* Assignment or correction of projection
* `gdal_edit` operations
* Raster math, e.g.:
* `gdal_calc.py`
* `gdal_calc.py`
* Compression, e.g.:
* Apply `DEFLATE` compression to geotiff
* Build overviews, e.g.:
* `gdaladdo`
* Apply `DEFLATE` compression to geotiff
* Generate overviews / tile pyramids, e.g.:
* `gdaladdo`
* Vector geometry operations
* Make valid
* Feature deduplication (_expensive_)
* Make valid
* Simplify (less points)
* Segmentize (more points)
* Filtering (e.g. SQL `WHERE`)
* Changing / adding attributes (e.g. calculating a `label` attribute from a
`value` and `unit` attribute)
* Filtering (e.g. SQL `WHERE`)
* Changing / adding attributes (e.g. calculating a `label` attribute from a
`value` and `unit` attribute)
* Generating / combining data
* Contourize (raster data -> vector contours)
* Vector <-> raster transforms
* Contourize (raster data -> vector contours)
* Climatological mean or other data-reductions
* Enriching datasets / data fusion / data integration (e.g. combining
attributes from at least 2 vector data sources)
* Enriching datasets / data fusion / data integration (e.g. combining
attributes from at least 2 vector data sources)
* Tiling (large dataset -> many chunks)
* Mosaicing (many chunks -> unified dataset)
* Tiling/Mosaicing specific challenges:
* Managing "edge effects": When feature spans a tile boundary, how is it managed? Keep
it in tile of centroid. Split. Keep whole polygon in all tiles it intersects. Other
algorithms. All trade-offs.


## Service-y stuff

* Workflow service for running arbitrary geospatial workflows
* libraries of transformation functions
* workflow libraries for composition
* gdal and ogr as base building blocks
* User submitted recipes that trigger a workflow that results in downloadable
data file(s) archived as a new DataONE dataset
data file(s) archived as a new DataONE dataset

:::{.callout-important}
We've been making the assumption that we'd be archiving our outputs, even if the
Expand All @@ -55,4 +71,4 @@ title: "Requirements"
:::

* Creation of [3D Tiles](https://www.ogc.org/standard/3dtiles/) for geospatial
datasets to enable fast viz in portal cesium app
datasets to enable fast viz in portal cesium app

0 comments on commit f4b48c1

Please sign in to comment.