Skip to content

Weeks 1‐2 Report

Vishmayraj Zala edited this page Jun 13, 2026 · 1 revision

Weekly Report - Weeks 1 and 2 (May 25 - June 7, 2026)

Note: Weeks 1 and 2 are reported together as both fell during my university examination period (May 19 - June 2). Claudio was informed of this in advance via email on May 2. The workload for these two weeks was planned to be documentation and design heavy rather than implementation heavy to account for this.


I performed the following tasks during weeks 1 and 2:

Project scaffold and configuration

  • Set up the project repository with the directory structure agreed upon during the bonding period, including connector package, docs, and test directories.
  • Implemented config.py using pydantic-settings as the single source of truth for all connector configuration, with field-level validation, a get_settings() singleton via lru_cache, and full docstrings matching the variable spec in the reference docs.

Harvesting layer design and reference documentation

  • Authored docs/Harvesting-Layer-Reference.md, a complete design specification for harvester.py and cache.py covering the STA query strategy, internal data model (HarvestedCatalog, HarvestedThing, and all nested dicts), the transformer contract (12 guarantees downstream code can rely on), and the public interface.
  • The reference doc was written before any harvester implementation to lock down the transformer contract first, since both stac_transformer.py and dcat_transformer.py depend on it.

STAC transformation layer design and reference documentation

  • Authored docs/STA-STAC-Mapping-Reference.md, a field-by-field mapping specification covering the full STA to STAC 1.0 transformation: Thing to Collection, Datastream to Item, spatial fallback chain (observedArea to Location.geometry to null with warning), temporal fallback chain (phenomenonTime parsing, open-ended interval handling, skip on absence), bbox derivation, collection extent computation, asset construction, and all STAC link relations.

Benchmark testing

  • Ran the benchmark script against two STA deployments to evaluate the HTTP harvesting architecture before committing to it:
    • Local dev instance (5 Things, 20 Datastreams): fetch 63ms, transformation 7.34ms averaged over 100 iterations, 0.37ms per item.
    • Fraunhofer FROST public production instance (5,610 Things, 22,941 Datastreams): fetch 42.4 seconds across 57 sequential paginated requests, transformation 1,203ms, 0.053ms per item.
  • The results confirm that transformation is not a bottleneck at any realistic deployment scale. The bottleneck is sequential HTTP pagination. On the Fraunhofer instance, 57 round trips over the public internet averaged roughly 750ms per page. Even on a local or LAN deployment this structural problem does not disappear, it only shrinks.
  • Based on these results and Daniele's earlier advice during the bonding period, I have drafted a revised architecture proposal (direct Postgres access via asyncpg + Redis with LISTEN/NOTIFY invalidation) to discuss with mentors at the next meeting. The full proposal with the harvest query, trigger design, and integration point inside api/app/ is documented and ready for review.

Details can be found in:

  • Harvesting Layer Reference: docs/Harvesting-Layer-Reference.md
  • STAC Mapping Reference: docs/STA-STAC-Mapping-Reference.md
  • Benchmark script and results: benchmark_stac.py, benchmark.log

What do I plan to do next week?

  • Complete the STA to DCAT-AP 3.0 mapping reference document, covering Datastream to dcat:Dataset, Thing to dcat:DatasetSeries, mandatory field gap handling via Datastream.properties, and JSON-LD and Turtle serialization strategy.
  • Present the benchmark results and the revised Postgres integration proposal to mentors and get a decision on the architecture direction before week 4 implementation begins.
  • Begin harvester.py implementation in whichever shape the mentor discussion confirms.

Am I blocked on anything?

The architecture decision has to be gone over once again at the next mentor meeting. Other than that, I have more mapping decisions to make and no blockage.

Clone this wiki locally